#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    memory management


    Hi all,

    I have coded a program in 'C' to simulate a shopfloor.
    i am using 'VC++' compiler . In my program i am using malloc(size) and free(ptr) functions dynamically, it depends on the number of jobs that i am interested .

    The simulation time(execution time of the program) is increasing drastically as the number of jobs increased. (1000 jobs - 1.5 sec, 10000 jobs - 8 seconds , 100000 jobs - 15 mins).
    i am suspecting there could be the memory allocation or freeing
    problem as i am using a large blocks of memory. Could anyone tell me any routines or any method to know the how much memory is utilized and not freed by the program.

    Thank you,
  2. #2
  3. *bounce*
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2002
    Location
    Delft, The Netherlands
    Posts
    514
    Rep Power
    42
    I don't know of any built-in accounting functionality, but I suppose you could write wrapper functions for malloc() and free() that keep score. You'd have to keep a list of pointer-and-size pairs, and a global variable that contains the sum of all the sizes. Some pseudo-code:

    Code:
    size_t used_heap;
    
    void *
    my_malloc (size_t size)
    {
        void *p;
    
        if ( (p = malloc(size)) != NULL) {
            add_to_list(p, size);
            used_heap += size;
        }
    
        return p;
    }
    
    void
    free (void *p)
    {
        size_t size;
    
        if ( (size = rem_from_list(p)) > 0) {
            used_heap -= size;
            free(p);
        }
    }
    Something like that :)

    Also, bear in mind that allocating memory is a rather expensive operation. If you can replace ten allocations of, say, 1 kilobyte, with a single allocation of 10 kilobytes, you'll definitely save time, especially if this happens in a loop.

    Another way is to reuse memory. If you're using a lot of equally-sized chunks, instead of free()ing and the malloc()ing them, don't free them but add them to a list. This too can help reduce the amount of those expensive memory-allocation system calls.

    Good luck :)
    "A poor programmer is he who blames his tools."
    http://analyser.oli.tudelft.nl/
  4. #3
  5. *bounce*
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2002
    Location
    Delft, The Netherlands
    Posts
    514
    Rep Power
    42
    ramakrishna said:

    > thank you for ur response.
    Quite welcome. Though I don't understand why you're replying by email rather than through the forums; that way, others might read it, and get some ideas from it. Also, if I make a mistake, someone might correct it. So please, don't reply to this email, but use the forums :)

    > you advised me to wrap up malloc functions
    > i couldnt get what that means exactly,
    > and how can i use it in my program?

    Well, suppose you have the following code:

    Code:
    void *p;
    
    if ( (p = malloc(size)) == 0) {
        /* error */
    }
    
    /* do some stuff with the memory */
    
    free(p);
    Now, if you want to know how much allocated memory your program uses, you'll need to keep track. One way to do that is to keep a list of all the chunks of memory you allocated.

    So, you design a simple struct to do that:

    Code:
    struct chunk {
        void *p;	/* points to the chunk */
        size_t size /* holds the size of the chunk */
    }
    You'll also need a dynamically allocated list to hold all the chunk structs, an integer that keeps track of the size of chunkv, and an integer that holds the sum of all the chunks' sizes:

    Code:
    struct chunk *chunkv;
    int chunksz;
    int totalsz;
    Then you have to initialise it:

    Code:
    totalsz = 0;
    chunksz = 32;
    if ( (chunkv = calloc(chunksz, sizeof(struct chunk))) == 0) {
        /* error */
    }
    I'm using calloc() because it zeroes out the allocated memory. This is necessary, since I'm using null pointers to mark unused elements in the array.

    So now, instead of doing this:

    Code:
    void *p;
    
    if ( (p = malloc(size)) == 0) {
        /* error */
    }
    
    /* do some stuff with the memory */
    
    free(p);
    you do this instead:

    Code:
    void *p;
    int i;
    
    for (i = 0; i < chunksz; i++) {
        if (chunkv[i].p == 0) {
            break; /* found unused slot in array */
        }
    }
    
    if (i == chunksz) {
        /* array is full */
    
        chunksz *= 2; /* double the size of the array */
    
        void *newv;
        if ( (newv = realloc(chunkv, chunksz * sizeof(struct chunk))) == 0) {
            /* error: add some panic code! */
        }
        chunkv = newv;
    
        /* initialise the new half of the array */
        memset(&(newv[i]), 0, i*sizeof(struct chunk));
    }
    
    if ( (chunkv[i].p = malloc(size)) == 0) {
        /* error */
    }
    
    chunkv[i].size = size;
    totalsz += size;
    
    /* do some stuff with the memory */
    
    free(p);
    
    for (i = 0; i < chunksz; i++) {
        if (p == chunkv[i].p) {
    	totalsz -= chunkv[i].size;
            chunkv[i].p = 0;
            chunkv[i].size = 0;
        }
    }
    And you have to do that for every piece of memory you want to allocate in your program.

    Seems like a lot of extra code, huh? That's why people invented WRAPPER functions. You stuff all this code into two new functions, say, my_malloc() and my_free(), which do all the extra stuff:

    Code:
    void *my_malloc (size_t size)
    {
        void *p;
        int i;
    
        for (i = 0; i < chunksz; i++) {
            if (chunkv[i].p == 0) {
                break; /* found unused slot in array */
            }
        }
    
        if (i == chunksz) {
            /* array is full; expand it */
    
            chunksz *= 2; /* double the size of the array */
    
            void *newv;
            if ( (newv = realloc(chunkv, chunksz * sizeof(struct chunk))) == 0) {
                /* error-handling code goes here */
    	    return 0;
            }
            chunkv = newv;
    
            /* initialise the new half of the array */
            memset(&(newv[i]), 0, i*sizeof(struct chunk));
        }
    
        if ( (chunkv[i].p = malloc(size)) == 0) {
            /* error-handling code goes here */
    	return 0;
        }
    
        chunkv[i].size = size;
        totalsz += size;
        
        return p;
    }
    
    void my_free (void *ptr)
    {
        int i;
    
        for (i = 0; i < chunksz; i++) {
            if (p == chunkv[i].p) {
    	    totalsz -= chunkv[i].size;
                chunkv[i].p = 0;
                chunkv[i].size = 0;
            }
        }
    }
    Now you can use almost the same code as you did before:

    Code:
    void *p;
    
    if ( (p = my_malloc(size)) == 0) {
        /* error */
    }
    
    /* do some stuff with the memory */
    
    my_free(p);
    > my program execution time is greatly affected by the number of jobs in
    > the system which in turn calls malloc that many times. could u explain
    > how to improve the performance of the code. thank you

    Well, I can't say anything about your code, obviously. I can give you some general hints though.

    Like I said, memory allocations are expensive, like -any other- system call. The kernel has to switch contexts etc, and that takes time.

    So mimimize the amount of system calls that you do, either implicitly or explicitly. For example, instead of using fgetc() to fetch a character one at a time, fetch the whole string and chop it up yourself. (Ok, so fgetc() is a C library call. It in turn uses system calls though. So you're making -implicit- system calls).

    I/O is slow, too. As is creating a new process. Batch your IO operations, and consider using threads instead of processes. This helps in two ways; both creating and switching is faster with threads.

    Really though, I can't say much more. I'd have to see the code, and I'd have to study it to make any more suggestions. And I'm not exactly keen on spending a lot of time optimizing someone else's code.

    Last but not least, I'll be posting this email on the forum as well. If you have any more questions, ask there :)

    Kind regards,
    Analyser.

    PS: Disclaimer: I did not test any of the above code. Use at your own risk ;)
    Last edited by Analyser; July 29th, 2003 at 07:29 AM.
    "A poor programmer is he who blames his tools."
    http://analyser.oli.tudelft.nl/
  6. #4
  7. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244

    The need for speed


    In the past I wrote a shop floor simulation (part of an evolutionary algorithm based assembly line optimizer) and did extensive work with pointers to large blocks of memory. Since the programs could easily run all night, I became obsessed with speed. I found that even using simple C++ objects in arrays is measurably more expensive than simply allocating a large block and keeping track of your own location within it. At program startup I allocated blocks of memory based on user input parameters (with appropriate sanitation), then using pointers and enums for ease of program maintenance, I indexed around based on where my simulation was in the shop. You will likely wind up thrashing the cache unless your memory blocks are accessed sequentially, but you should be able to control that if you understand your access patterns. What I do is something approximately like this (no warranties expressed or implied!)

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    #define BLOCKSIZE 5 //the size of your 'data structure'
    
    enum {
        enuIndex1 = 0,//these are indexes into your BLOCKSIZE data structure
        enuIndex2 = 1,
        enuIndex3 = 2,
        enuIndex4 = 3,
        enuIndex5 = 4
    };
    
    int main(){
    	int i, size1;
        int * intPtr, *intTmpPtr;
        void * ptrTmp;
    
    	printf("Enter size of your memory: ");
    	scanf("%d", &size1);
        if (size1 < 0 || size1 > 1000000){//'sanitation' of variable
            fprintf(stderr, "value entered is out of range!\n");
            exit(1);
        }
        ptrTmp = malloc(size1 * BLOCKSIZE * sizeof(int));
        if (ptrTmp == NULL){
            fprintf(stderr, "unable to allocate memory!\n");
            exit(1);
        }
        intPtr = (int *) ptrTmp;
    
        //here is where the rest of your program is, you can either pass the memory
        //around to each subroutine or make the pointer global
        for (i=0; i<size1; i++){
            intTmpPtr = &intPtr[i*BLOCKSIZE];//locate pointer at appropriate block
            intTmpPtr[enuIndex1] = 10;       //access/manipulate data
            intTmpPtr[enuIndex2] = 10;
            intTmpPtr[enuIndex3] = 10;
            intTmpPtr[enuIndex4] = 10;
            intTmpPtr[enuIndex5] = 10;
        }
    
        free(intPtr);
    
        printf("program completed successfully\npress return to exit");
        getchar();
        //for some reason running getchar() twice is required
        getchar();
        
        return 0;
    }
    For all practical purposes you are creating 'objects' of your own on the heap. It is much less clean than using C++ objects but I find that even my most optimized C++ objects are still causing a slowdown of anywhere from 10-25% over my use of plain C pointers. Lately I have worked a compromise and use C++ objects (map, vector, custom, etc.) to get the input data and preprocess it, then in a single part of an object I allocate all the memory I need and then play my indexing tricks.

    Another thing to look at if your run times to not increase linearly with your increase in 'objects' is if you are accessing your memory in a random (as far as the cache optimizer is concerned) way. You can lose orders of magnitude in performance by having a poor cache hit ratio. If you don't know where the memory you are accessing is in relation to your last memory access, this may be what is killing you. Also be sure you have all your compiler optimizers turned on!

    Good luck!

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    execution time of the program


    hi,
    now i had run my C program on hp-linux ..
    i used gprof command to profile my program.
    it is giving the number of calls to each of the function in the program that has been
    executed and also some library routines.
    one such routine," _mcount" , which is taking around 38% execution time
    could any body tell me about this function.
    and one of my functions "find_job" is taking 25 % of execution time, this function basically does
    searching of a linked list. (recursive function).
    This function has strcmp function which is taking 35% of the execution time .
    this strcmp function had been used in different places also.

    thanking you,
  10. #6
  11. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244

    Take a step back


    First: google "_mcount" and see if the info helps you (it didn't help me, it may be routine called in a standard library routine). I don't think it is terribly applicable and suspect that it is some wrapper. Check to see if the number of calls is exactly the same as another call (such as strcmp).

    Second: I strongly suggest you consider learning about C++ STL 'map's. You can build an 'index' into your linked list and use that to locate nodes. Another thing you can consider if you want to stick with C and don't want to write your own hash routines is to somehow convert your strings into (ideally unique) int values and search through that list of ints. strcmp is a rather expensive routine, particularly if you have long strings.

    Third: If you are using linked lists with the memory for each node allocated 'randomly' throughout your memory space, you are surely missing the cache as you scan through it and as your memory usage increases your program is spending most of its time waiting to load memory from RAM to cache. Please consider doing your own memory management as I suggested before.

    Fourth: Recursive functions are fine for learning programming and are often the choice for maintainable non-performance oriented sections of code, but each recursion requires a new stack frame and if that is in a critical section of your code (as it sound to me) then you are taking a beating as you search through your list.

    Mitakeet

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  12. #7
  13. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    Smile execution time


    hi,
    i ve changed my recursive function (for searching a linked list),
    to a while loop , and the execution time is reduced by more than half .
    (earlier it was 15 mins for simulating 100000jobs, now it is around 6.5 mins).

    still i need to optimize my code.
    while searching the linked list im doing it in a linear way,
    and the function im using strcmp affecting the execution time.

    could anybody tells me abt any other searching methods available and wat is the replacement for the strcmp function.

    thanking you,
  14. #8
  15. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244

    Can you use C++?


    Using a C++ map (see snippet below) add the string value to the map along with a pointer to your object. That way you can rely on highly optimized search and retrieval code you don't have to write. Keep in mind that the map retrieval is designed for large number of objects (I am sure that 100K fits the bill) and is actually a performance drag for small datasets.

    I have given you advice on memory management that can get you significant performance boosts, but it takes time to master. Performance tuning is a subspecialty of programming and not something that one becomes expert on in a couple of hours.

    Also: Are you running your program in debug mode? If so you may see as much as a 5-fold improvement if you run it fully optimized.

    Code:
    std::map < string, [ptr to object in list, i.e. OBJECT *] > myMap;
    
    myMap[<string value you are strcmping on>] = &object;
    
    //to retrieve value:
    
    OBJECT *ptr2obj = myMap[<string you are strcmping on>];
    
    ptr2obj->whatevervalue;

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw

IMN logo majestic logo threadwatch logo seochat tools logo