#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2003
    Location
    China ex Australia
    Posts
    32
    Rep Power
    11

    creating dynamic 3D array of type float


    Hi,

    I am trying to dynamically create a 3D array and allocate to a contiguous memory block in C++. I am getting compiletime failure and errors of "assignment to `float **' from `int' lacks a cast" type of thing, I think because the array is to be filled with floats but the pointers are integers... but I dont really know. I have commented the errors into the source at the two lines that generate them.

    Do I need to cast something (and how would I do that), or is there another problem?
    Thanks greatly for your thoughts,
    finlaylabs.

    PS trying to use the PHP thing, not sure if Ive got the right method...

    PHP Code:
    #include <iostream>

    using namespace std;

    int main()
    {
        
    // contiguous-memory 3D array filled with floats
        
    float ***c33;
        
        
    // dimensions of 3D array
        
    int nxnynz;

        
    // real version requires user input for nx, ny, nz
        
    nx=5;
        
    ny=4;
        
    nz=5;
        
        
    // allocate pointers to x
        
    **c33=new float[nx];
        
        
    // allocate pointers to y
        
    *c33[0]=new float[nx*ny];
        
        
    // allocate storage for entire array
        
    c33[0][0]=new float[nx*ny*nz];
        
        
    // index the array for x,y,z, using indices 0,1,2,3.....n-1.
        // such that c33[i][j][k]points to c33[(i*ny*nz)+(j*nz)+k];
        
    for (int i=0i<nxi++) {
            
    c33[i]=(i*ny*nz); // error: assignment to `float **' from `int' lacks a cast
            
    }

        for (
    int i=0;i<nx;i++) {
            for (
    int j=0;j<ny;j++) {
                
    c33[i][j]=((i*ny*nz)+(j*nz)); // error: assignment to `float *' from `int' lacks a cast
            
    }
        }
        
        for (
    int i=0;i<nx;i++) {
            for (
    int j=0;j<ny;j++) {
                for (
    int k=0;k<nz;k++) {
                    
    c33[i][j][k]=(i*ny*nz)+(j*nz)+k;
                }
            }
        }
            
        
    c33[2][3][3]=2.3;
        
    c33[1][2][2]=44.5;
        
        
    cout<<c33[2][3][3]<<" ";
        
    cout<<endl;
        
    cout<<c33[0][0][0]<<" ";
        
    cout<<endl;

        
    delete[] c33[0][0]; // delete array storage
        
    delete[] c33[0]; // delete pointers to y
        
    delete[] c33// delete pointers to x

        
    return(0);

  2. #2
  3. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    If I understand what you are trying to accomplish in the long run is to be able to control where your memory is and how you access it. I have done this trick many times emulating 2d arrays, but my efforts to emulate for you the 3d arrays came to naught. I have two examples of what you appear to be trying to accomplish that is much simpler, you may want to consider it. The first uses the contiguous memory and a function to emulate the 3d access. The second is an actual 3d array.

    BTW: The compiler error you are getting is because you need to be storing a pointer to a float rather than an integer. My advice is to abandon your previous approach, or just rely on new to create the correct structures. You are not likely to create code that is easy to maintain (keep in mind that often as much or more than 80% of the time spent on code is for maintenance).


    Code:
    #include <iostream>
    
    using namespace std;
    
    //global storage for array sizes:
    int nx, ny, nz;
    
    int getIndex(int x, int y, int z){
        //this function is responsible for emulating the 3d array
        return (x*nx*ny) + (y*ny) + y + z;
    }
    
    
    /***********************************
        oneArray allocates the memory in one single chunk
        then using the getIndex function (above) it emulates
        a 3d array.
    ***********************************/
    
    void oneArray(){
        float *fltData;//where your contiguous data is stored
        int i, j, k;
    
        // allocate contiguous storage for entire array
        try{
            fltData = new float[nx*ny*nz];
        }catch(...){//you can catch specific exceptions here
            cerr << "Exception in allocating array in oneArray()!\n";
            exit(1);
        }
    
        //throw some data in the array using the index
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                for (k=0;k<nz;k++) {
                    fltData[getIndex(i,j,k)] = (float) (i*100) + (j*10) + k;
                }
            }
        }
    
        //dump it out to see what we got
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                for (k=0;k<nz;k++) {
                    cout << fltData[getIndex(i,j,k)] << "\t";
                }
                cout << endl;
            }
            cout << endl;
        }
    
        // delete data
        delete fltData;
    
        return;
    }
    
    /***********************************
        threeDArray allocates the memory in different chunks
        you cannot be guarenteed that the memory is contiguous,
        though that is a high probability.
    ***********************************/
    
    void threeDArray(){
        float ***fltData;//where your contiguous data is stored
        int i, j, k;
    
        // allocate memory for array one dimension at a time
        try{
            fltData = new float**[nx];
            for (i=0;i<nx;i++) {
                fltData[i] = new float*[ny];
                for (j=0;j<ny;j++) {
                    fltData[i][j] = new float[nz];
                }
            }
        }catch(...){//you can catch specific exceptions here
            cerr << "Exception in allocating array in threeDArray()!\n";
            exit(1);
        }
    
        //throw some data in the array using the index
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                for (k=0;k<nz;k++) {
                    fltData[i][j][k] = (float) (i*100) + (j*10) + k;
                }
            }
        }
    
        //dump it out to see what we got
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                for (k=0;k<nz;k++) {
                    cout << fltData[i][j][k] << "\t";
                }
                cout << endl;
            }
            cout << endl;
        }
    
        // delete data in reverse order
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                delete [] fltData[i][j];
            }
            delete [] fltData[i];
        }
        delete fltData;
    
        return;
    }
    
    int main(){
        // real version requires user input for nx, ny, nz
        nx=5;
        ny=4;
        nz=5;
        
        printf("The data emulating the 3d array (oneArray()):\n\n");
        oneArray();
    
        printf("The same thing from threeDArray():\n\n");
        threeDArray();
    
        return 0;
    }
    Here is the output:

    Code:
    The data emulating the 3d array (oneArray()):
    
    0        1        2        3        4
    10      11      12      13      14
    20      21      22      23      24
    30      31      32      33      34
    
    100     101     102     103     104
    110     111     112     113     114
    120     121     122     123     124
    130     131     132     133     134
    
    200     201     202     203     204
    210     211     212     213     214
    220     221     222     223     224
    230     231     232     233     234
    
    300     301     302     303     304
    310     311     312     313     314
    320     321     322     323     324
    330     331     332     333     334
    
    400     401     402     403     404
    410     411     412     413     414
    420     421     422     423     424
    430     431     432     433     434
    
    The same thing from threeDArray():
    
    0        1        2        3        4
    10      11      12      13      14
    20      21      22      23      24
    30      31      32      33      34
    
    100     101     102     103     104
    110     111     112     113     114
    120     121     122     123     124
    130     131     132     133     134
    
    200     201     202     203     204
    210     211     212     213     214
    220     221     222     223     224
    230     231     232     233     234
    
    300     301     302     303     304
    310     311     312     313     314
    320     321     322     323     324
    330     331     332     333     334
    
    400     401     402     403     404
    410     411     412     413     414
    420     421     422     423     424
    430     431     432     433     434

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2003
    Location
    China ex Australia
    Posts
    32
    Rep Power
    11
    Wow thanks mitakeet for a reply beyond the call of duty. It looks like what I was after. but some work on it will have to wait till tomorrow (late here, other side of the world to u).

    I need to build several such arrays of very large size in a geophysical modeling program, so since i need to repeditively calculate math functions requiring retrieving several array elements from several different arrays all in the one function i think retrieving values from the second of your examples may be cleaner.

    My original hope for contiguous memory was to maximise the use of heap by not having many fractured allocations and wasted space inbetween, Im not sure if that was a valid concern. Are the separate arrays that emulate a 3d array of example two going to cost more in terms of time when accessing many elements from the different arrays in the one function over and over?

    Thanks Again,
    finlaylabs.
  6. #4
  7. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    To your private post, Your Welcome.

    Performance optimization is still quite a bit of an art, though there are those that try to make it more a science.

    First, be sure to turn all your compiler optimizations on before you do any time tests.

    Second, inside your loops you want to access memory in consecutive chunks. You can get away with having your chunks scattered around in RAM as long as you do a lot of manipulations on each chunk before you move on. In other words, if your data cache is 32K and your chunk is 32K or less, you should not have too many problems with cache misses as long as you don’t need other chunks of memory to complete the transformation. If you are going to be jumping all over your RAM, you will probably be getting a series of cache misses and may want to seriously take a look at how your algorithm works. Changing the algorithm often can get you massive amounts of performance improvement, far beyond what the best optimizing compiler can accomplish.

    Third (presuming you do the extra work to allocate consecutive memory), if you are going to be accessing consecutive locations in memory, sometimes the compiler gets cornfused by all the array de-referencing and it is sometimes better to get a pointer to the start of a block outside a loop, then simply increment the pointer inside the loop. Doing so often give the compiler optimization enough info to unroll your loops and pipeline the instructions.

    Fourth, take a close look at what transformations you do inside your loops. Often you can rearrange your loops to get a better cache hit ratio, make variables constant within the loop(s), make things clearer to the compiler what parts can be done in parallel (meaning there are no side effects on other array elements). You can also take a close look at what actual assembler output is produced by the compiler and may be able to either inline assembly code or change your C code. Compilers are conservative and won't make any changes to your code unless it can be certain there are no side effects. You know the global goal of your program and so have better info than the compiler.

    Fifth, do you need to use floats? If there is any way you can do your work with integers you are likely to see significant improvement. Most modern chips do a pretty good job of floating point math, but (I believe) in all cases there are fewer machine instructions for any given int manipulation vs. the same float manipulation.

    I suggest that you worry about getting the program to work first, then spend time on optimization later. Certainly the allocation of the 3d array is much more meaningful to some dewd coming behind you to maintain your code and it will make debugging a lot more strait forward. Most optimizations make the code more difficult to understand and debug.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2003
    Location
    China ex Australia
    Posts
    32
    Rep Power
    11

    progress...


    I have modified mitakeets threeDArray() and have managed from his model to actually get an *arr[][][] expression to return values from a single contiguous-block-of-memory dynamically created 3D array. Lots of pointers though.

    Will the many (nx*ny) pointers-to-float arrays (of size ny and nz) that I require be a major waste of memory??

    Finally, I am also generating the correct output under win and linux with this modified code, but a segmentation fault is reported before exiting under linux. Any ideas why?

    finlaylabs.

    PS lengthy post, first the code and then the output:

    PHP Code:
    #include <iostream>
    #include <stdio.h>
    //had to include <stdio.h> to compile on my windows system
    // due to the call to printf. Didnt need it on linux.

    using namespace std;

    //global storage for array sizes:
    int nxnynz;


    /***********************************
    threeDArray now creates a single 3D array of floats and allocates the memory
    as a single contiguous chunk. Float values are input into the array via
    *fltptr[i][j][k] = f(x)
     and accessed in math expressions by
    g = f(*fltptr[i][j][k])
    This requires creating nx*ny pointers-to-float arrays of size ny and nz,
    this may be a waste of memory if memory is critical???
    It is giving correct output in both windows and linux, but reporting a
    segmentation fault under linux. Cant see where its overrunning allocated
    memory however.
    ***********************************/

    void threeDArray(){

        
    float ****fltptr, *fltdata;
        
    //fltptr is a set of arrays of pointers to floats, fltdata is the
        // contiguous array of floats where data is stored
        
    int ijk;

        
    // allocate memory for array and create x,y,z-dimension pointers
        
    try {
            
    // allocate a single contiguous memory block for nx*ny*nz floats
             
    fltdata = new float[nx*ny*nz];
             
    // create and index the pointer arrays
             
    fltptr = new float***[nx];
             for (
    i=0;i<nx;i++) {
                
    fltptr[i] = new float**[ny];
                for (
    j=0;j<ny;j++) {
                    
    fltptr[i][j] = new float*[nz];
                    for (
    k=0;k<nz;k++) {
                        
    fltptr[i][j][k] = &fltdata [i*ny*nz+j*nz+k];
                        
    // cause fltptr[i][j][k] to point to fltdata [i*ny*nz+j*nz+nz]
                    
    }
                }
            }
        }catch(...) { 
    //you can catch specific exceptions here
            
    cerr << "Exception in allocating array in threeDArray()!\n";
            exit(
    1);
        }

        
    //throw some data in the array using the index - this works,
        // passes data on so it ends up in the right place in fltdata.
        
    for (i=0;i<nx;i++) {
            for (
    j=0;j<ny;j++) {
                for (
    k=0;k<nz;k++) {
                    *
    fltptr[i][j][k] = (float) (i*100) + (j*10) + k;
                    
    // ie, the object pointed to by fltptr[i][j][k] is assigned the value...
                    // neater than saying fltdata [i*ny*nz+j*nz+k] = ...
                
    }
            }
        }

        
    //dump it out to see what we got
        
    for (i=0;i<nx;i++) {
            for (
    j=0;j<ny;j++) {
                for (
    k=0;k<nz;k++) {
                    
    cout << *fltptr[i][j][k] << "\t";
                    
    // this dumps out floats held in the array, an equivalent
                    // but prettier way of saying cout << fltdata [i*ny*nz+j*nz+k]
                
    }
                
    cout << endl;
            }
            
    cout << endl;
        }
        
        
    printf("Held in memory addresses():\n\n");
            
    //dump it out to see what we got
        
    for (i=0;i<nx;i++) {
            for (
    j=0;j<ny;j++) {
                for (
    k=0;k<nz;k++) {
                    
    cout << fltptr[i][j][k] << "\t";
                    
    // this dumps out the addresses of the floats held in fltdata
                
    }    
                
    cout << endl;
            }
            
    cout << endl;
        }
        
        
    // delete data in reverse order
        
    for (i=0;i<nx;i++) {
            for (
    j=0;j<ny;j++) {
                for (
    k=0;k<nz;k++) {
                    
    delete [] fltptr[i][j][k];
                }
                
    delete [] fltptr[i][j];
            }
            
    delete [] fltptr[i];
        }
        
    delete fltdata;

        return;
    }

    int main(){
        
        
    // real version requires user input for nx, ny, nz
        
    nx=5;
        
    ny=4;
        
    nz=5;
        
        
    printf("Contents of threeDArray():\n\n");
        
    threeDArray();

        return 
    0;

    output is

    Code:
    [root@localhost test]# /home/finlaylabs/invert/test/superarray01.exe
    Contents of threeDArray():
    
    0       1       2       3       4
    10      11      12      13      14
    20      21      22      23      24
    30      31      32      33      34
    
    100     101     102     103     104
    110     111     112     113     114
    120     121     122     123     124
    130     131     132     133     134
    
    200     201     202     203     204
    210     211     212     213     214
    220     221     222     223     224
    230     231     232     233     234
    
    300     301     302     303     304
    310     311     312     313     314
    320     321     322     323     324
    330     331     332     333     334
    
    400     401     402     403     404
    410     411     412     413     414
    420     421     422     423     424
    430     431     432     433     434
    
    Held in memory addresses():
    
    0x804a450       0x804a454       0x804a458       0x804a45c       0x804a460
    0x804a464       0x804a468       0x804a46c       0x804a470       0x804a474
    0x804a478       0x804a47c       0x804a480       0x804a484       0x804a488
    0x804a48c       0x804a490       0x804a494       0x804a498       0x804a49c
    
    0x804a4a0       0x804a4a4       0x804a4a8       0x804a4ac       0x804a4b0
    0x804a4b4       0x804a4b8       0x804a4bc       0x804a4c0       0x804a4c4
    0x804a4c8       0x804a4cc       0x804a4d0       0x804a4d4       0x804a4d8
    0x804a4dc       0x804a4e0       0x804a4e4       0x804a4e8       0x804a4ec
    
    0x804a4f0       0x804a4f4       0x804a4f8       0x804a4fc       0x804a500
    0x804a504       0x804a508       0x804a50c       0x804a510       0x804a514
    0x804a518       0x804a51c       0x804a520       0x804a524       0x804a528
    0x804a52c       0x804a530       0x804a534       0x804a538       0x804a53c
    
    0x804a540       0x804a544       0x804a548       0x804a54c       0x804a550
    0x804a554       0x804a558       0x804a55c       0x804a560       0x804a564
    0x804a568       0x804a56c       0x804a570       0x804a574       0x804a578
    0x804a57c       0x804a580       0x804a584       0x804a588       0x804a58c
    
    0x804a590       0x804a594       0x804a598       0x804a59c       0x804a5a0
    0x804a5a4       0x804a5a8       0x804a5ac       0x804a5b0       0x804a5b4
    0x804a5b8       0x804a5bc       0x804a5c0       0x804a5c4       0x804a5c8
    0x804a5cc       0x804a5d0       0x804a5d4       0x804a5d8       0x804a5dc
    
    Segmentation fault
    [root@localhost test]#
  10. #6
  11. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    You have too many deletes. I changed the delete block to this:

    Code:
        // delete data in reverse order
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                delete [] fltptr[i][j];
            }
            delete [] fltptr[i];
        }
        delete fltptr;
        delete fltdata;
    Your 'delete's must match your 'new's.

    When I started to write the example I was trying to do exactly what you did, but missed a couple of stars (*). It has been a long time since I did that sort of stuff extensively and I am a bit sheepish to say I completely forgot how to write the code (the compiler and runtime errors are obvious in retrospect now). As for a waste in memory, that depends on your attitude (of course, if your arrays are so big that they are approaching the limits of your RAM, then it is another matter, but RAM is cheap nowadays). Ease of maintenance is a non-trivial thing and anything that makes things clearer is better.

    BTW: What I meant about incrementing pointers is like this:

    Code:
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                float *fltPtrTmp = fltptr[i][j][0];
                for (k=0;k<nz;k++) {
                    *fltPtrTmp = (float) (i*100) + (j*10) + k;
                    fltPtrTmp++;
                }
            }
        }
    In this particular case you can simply set a pointer to the first block in memory and increment all the way through like this:

    Code:
        float *fltPtrTmp = fltptr[0][0][0];
        for (i=0;i<nx;i++) {
            for (j=0;j<ny;j++) {
                for (k=0;k<nz;k++) {
                    *fltPtrTmp = (float) (i*100) + (j*10) + k;
                    fltPtrTmp++;
                }
            }
        }
    Depending on the compiler, this can be much faster because you are eliminiating a bunch of array dereferencing (some compilers figure out this is what you want to do and do it behind the scenes).

    If you need any help on optimization please don't hessitate to drop me a line, it is a hobby of mine. I have had some success in the past, I got one process (in C on an unused Alpha box) to run in 3 hours vs. an estimated 15 days (on a busy IBM in SAS) and another (in Perl if you can believe that) to run in 12 minutes vs. 15 hours. I am certainly not big on geophysics math (though am interested in almost all science), but if you have working code to test against, I might be able to squeeze out some CPU cycles and speed it up for you.

    Just as another FYI: Most (if not all) actual float operations are done on doubles and there is a cost associated with switching betwixed the twain. You may notice better performance doing everything in doubles instead (presuming you can afford the memory).

    Good Luck!

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2003
    Location
    China ex Australia
    Posts
    32
    Rep Power
    11
    Wow a busy day for me.

    Losing one delete seems to do the job. Thanks.

    I think my arrays will be VERY big, when we need to do an earth model on say a 3 metre grid, for a cube of several km each dimension, with an array each for a few different rock parameters.... most 3D modeling ends up on supercomputers. I dont know how big a model I will fit on my laptop (384 MB RAM), but I can at least develop the software on it.

    Since the pointer arrays will waste a lot of (precious) space I have relooked at your oneArray() function, and with what you have shown me over the last day or two I have rewritten it as below. It works and looks neat, do u agree?

    I have two options for the indexing function, one inputs and outputs very cleanly via calls to *c33(), fn reproduced below:

    PHP Code:
    floatc33(int xint yint z){
        
    ptr = &c33data[(x*nx*ny) + (y*ny) + z];
        return 
    ptr;

    The other I have commented out but include FYI and comment. My thoughts are if heavily iterated math functions of form f = g[ multiple-calls-to*c33(x,y,z) ] are slowed up at all due to many returns via pointers, then I could use separate array input and output functions. The input fn stays as above, the output function is changed to avoid pointers and directly return the sought-after value, becoming:

    PHP Code:
    float c33_out(int xint yint z){
        return 
    c33data[(x*nx*ny) + (y*ny) + z];

    Is this what you mean by "this can be much faster because you are eliminiating a bunch of array dereferencing "? Would using c33_out() help since I need to do 100's of 1000's of such calculations each requiring several (about 8) elements from several arrays? (again, I will need to port this to a supercomputer for real earth models, but I can have it running on a mini-model till I get the software working). What do u think? Should i use c33_out() as well for better speed, or is c33() just as fast?

    I will test it with doubles when its functioning a bit, and see whether I find speed or space the most restrictive. Thanks for the optimization invitation, it would be extremely helpful, as you can tell optimization will have a major impact on what I need to do!

    Code below again. Gives same output as all our other efforts, but may be easier to follow, and wastes less memory. Thanks again!

    finlaylabs.

    PHP Code:
    #include <iostream>
    #include <stdio.h>

    using namespace std;

    //global storage for array sizes:
    int nxnynz;
    //global array storage for rock properties (will be several of these arrays)
    float *c33data;
    //global pointer used in emulating input to the arrays
    float *ptr;

    //this function is responsible for emulating input and output from/to the 3d array
    //use as many of these functions as you have rock property arrays, one for each
    floatc33(int xint yint z){
        
    ptr = &c33data[(x*nx*ny) + (y*ny) + z];
        return 
    ptr;
    }

    //if math functions of form f = g[ multiple-calls-to*c33(x,y,z) ] are slow due to
    //many returns via pointers, then use different array input and output functions.
    //The input fn stays as above, the output function is changed to avoid pointers
    //and directly return the sought-after value, becoming:

    //float c33_out(int x, int y, int z){
    //    return c33data[(x*nx*ny) + (y*ny) + y + z];
    //}



    /***********************************
        oneArray allocates the memory in one single chunk
        to c33data then using the c33 function (above) it emulates
        a 3d array.
    ***********************************/

    void oneArray(){
        
        
    int ijk;

        
    // allocate contiguous storage for entire array
        
    try{
            
    c33data = new float[nx*ny*nz];
        }catch(...){
    //you can catch specific exceptions here
            
    cerr << "Exception in allocating array in oneArray()!\n";
            exit(
    1);
        }

        
    //throw some data in the array using the index
        
    for (i=0;i<nx;i++) {
            for (
    j=0;j<ny;j++) {
                for (
    k=0;k<nz;k++) {
                    *
    c33(i,j,k) = (float) (i*100) + (j*10) + k;
                }
            }
        }

        
    //dump it out to see what we got
        
    for (i=0;i<nx;i++) {
            for (
    j=0;j<ny;j++) {
                for (
    k=0;k<nz;k++) {
                    
    cout << *c33(i,j,k) << "\t";
                }
                
    cout << endl;
            }
            
    cout << endl;
        }

        
    // delete data
        
    delete c33data;

        return;
    }


    int main(){
        
    // real version requires user input for nx, ny, nz
        
    nx=5;
        
    ny=4;
        
    nz=5;

        
    printf("The data emulating the 3d array (oneArray()):\n\n");
        
    oneArray();

        return 
    0;

  14. #8
  15. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    First I want to point out that the optimizations for a super computer (SC) are completely different than optimizations for a workstation (even if it has multiple processors). Sadly, I lack any experience on SCs (though would be happy to learn on yours!), so I can only offer you 'generic' advice on multi-processor workstations. Some of it will be applicable to a SC, but some of it (and I can't say which parts) will kill you if used on the SC. The memory access patterns for a SC are totally hardware dependent, though there are libraries that can conceal a lot of the complexity. I imagine some are even open source. The idea is to parallelize your code. If, for instance, each element (each float in this case) is acted on by all others, you could set up two parallel arrays (presuming, naturally, you have the memory). The first (and they alternate being first between each run) is the 'before' picture, the second is the 'after' picture. In thread/process 1 you take element 1 from the before array (stored in a register or some other thread local memory, the original location doesn’t change) and then transform it against all the other elements based on some sort of algorithm (that depends on its proximity, soil conditions, whatever). You then write the results to the second array. In this case you could presumably have all the elements executing in parallel. HOWEVER, if transforming element 2 depends on what the results of element 1's transformation, you will never get any speedup with parallelization (in fact, it will slow down with all the multi-processing overhead). I have studied molecular modeling of large proteins and DNAs and have similar constraints/challenges, so I have given it much thought.

    Second, the single block of memory is only more efficient if you access the memory in a linear fashion. If you are jumping around in memory all the time, you are very likely to invalidate your cache and the RAM I/O becomes a bottleneck (and can result in more than an order of magnitude decrease in otherwise optimal performance!). Either of your examples will work fine for accessing the memory and are likely to have almost the exact same performance. My point of incrementing pointers (no pun intended) is that you eliminate the array de-referencing (which can be significant, presuming you have optimized the surrounding code). For instance, if you know in your code that you are going to access the next 10 consecutive blocks (floats, in this case) in memory, then create a temp pointer to the first block, and then increment the pointer for the next 9 accesses. Some compiler optimization flags will attempt to do exactly that, but if you are using variables to index into your arrays, the compiler may get confused (and therefore conservative) and actually do the de-referencing. Look again at my examples and see if you get my meaning. Array de-referencing is not terribly expensive, but if you are doing it billions of times (as I expect you will) it can add up.

    Third, if you find you need more RAM than you have, you can do tricks with storage on the hard drive (after all, that is exactly what databases do). It has the added benefit of being able to save the state of your program so that if it crashes on day 24 of your 25-day run, you can restart it exactly where it left off. You have to pay even more attention to your memory access (you can memory map the file, so there does not need to be much extra code), because instead of the cache to RAM latency you will have the RAM to disk latency (which is typically three orders of magnitude slower!). The OS can manage the pages of your file on disk, but you will have to periodically tell it to flush changes to disk, as it will buffer them as a performance optimization. Another advantage of memory mapping your ‘array’ is you can easily (and transparently) move from a machine with low RAM to one with high RAM, just more of your file stays in memory (and, generally, your performance goes up (but if you are slick with your memory access, the benefit may not be that great)). Debugging can also be easier because the actual state is in a file instead of memory locations in RAM somewhere.

    Another note: if you know that the memory you are _reading_ from is not going to (or supposed to, at least) change while you are looping through it, pass it as a const to the routine. That will allow the compiler to do some optimization tricks since it knows the values won’t change.

    BTW: Look closely at your SC, if it is more than a couple of years old (or is in constant use) you can probably get results in less time by getting your own dedicated workstation. Of course you may not have a budget for it, but there are used and refurbished systems (Dell offers refurbished systems all the time) with dual to quad processors with multiple GB of memory available for relatively cheap. I purchased (about 1.5 years ago, now it is looking a bit pedestrian) a dual 1.5 GHz Xenon system with 1.5 GB RAM from Dell for $3,000. A quick look at Dell refurbished right now shows a dual 2.8 GHz 512K cache Xenon with 4 GB RAM for $5,250 and (for more than a bit more) a quad 2 GHz 2 MB cache Xenon with 16 GB RAM for ‘only’ $21,000. Shared resources, if shared too much, wind up being much less useful than relatively slow resources that are not constantly being interrupted. For instance, my 3 hour vs. 15 day ‘speedup’ I mentioned earlier took advantage of the fact that no one else was using the machine and I could use all the RAM and did not compete with anyone else for I/O. It can make a massive difference! The Xenon chips, by the way, have hyperthreading CPUs so you can get more than twice the number of ‘virtual’ processors as you actually pay for, which, depending on how your code runs, can actually wind up with doubling your performance (and you don’t even have to change your code, just add more threads!).

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  16. #9
  17. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    1
    Rep Power
    0

    Is getIndex() correct?


    I appreciated your discussion about how best to create a 3D array in contiguous memory. When I tried using your code on 3D arrays of various sizes, it appeared that the function you used to convert 3D array indices in order to retrieve the correct value from a 1D arrray didn't work properly in general. It seems to me that your getIndex() function should be:


    PHP Code:
    int getIndex(int xint yint z){
        
    //this function is responsible for emulating the 3d array
      
    return (z*x_size*y_size) + (y*x_size) + x;

    I'm just wondering if I'm missing something.

    Chad
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2003
    Location
    China ex Australia
    Posts
    32
    Rep Power
    11
    Hi Chad,

    well done, you didnt 'miss anything', except the slow changing of this code over the last weeks to correct this and other errors, which I havent posted. Sorry. Heres another part of the array stuff, the intoArray() function, that uses the indexing exactly as you figured it must do...
    PHP Code:
    //******************************* intoArray ***********************************
    // returns a pointer to an array element for the purpose of assigning a new
    // value to that element.
    // eg: *intoArray(array_name,i,j,k) = new_value;
    doubleintoArray(double array[],int iint jint k){
        
    double *ptr;
        
    ptr = &array[(j*nx*nz) + (k*ny) + i];
        return 
    ptr;

    ...except that for my own reasons I have now it set up as y-major (yours is z-major) so that I can 'visualise' it in my head as pages of equal y, and read it in and out directly to/from ascii files in this format, the format being like so...
    PHP Code:
    //*****************************************************************************
    //                       File Input/Output Formatting
    //*****************************************************************************

        //  have ascii data files of the 3D model arranged as panels of inlines:
        //
        //  0xxxxxxxxx>
        //  z
        //  z
        //  z   y=0
        //  z
        //  v
        //
        //  0xxxxxxxxx>
        //  z
        //  z
        //  z   y=1
        //  z
        //  v
        //
        //  etc...
        //
        //  such that basic models can be intuitively written as ascii files,
        //  and can also be output in readable form.
        //  This is accomplished with indexing such that
        //  array(i,j,k) = array[(j*nx*nz) + (k*ny) + i]
        //  and for-loops of structure
        //  for (int j=0;j<ny;j++) {
        //    for (int k=0;k<nz;k++) {
        //      for (int i=0;i<nx;i++) { 
    I use this as its a neat format for viewing earth-related data as if it were vertical cross-sections thru the earth, like seismic lines.

    If anything else would be of use, let me know.

    finlaylabs.

IMN logo majestic logo threadwatch logo seochat tools logo