Thread: Binning Data

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Location
    Bristol
    Posts
    13
    Rep Power
    0

    Binning Data


    PART A
    Write a program to calculate the distribution of numbers produced by the random number function
    rand().

    1) Calculate a random number between 0 and 1. To do this use rand() and divide by
    RAND_MAX. Remember to use a CAST for RAND_MAX.

    2) Next write some code to ‘bin’ the data. Create an array where each element represents a bin
    (use, say, 10 bins to start but make this variable). Increment (increase by one) the array element
    where the number falls. You need to calculate which bin the random number will fall into. If x is
    the random number between 0 and 1 and there are 10 bins, then the bin number will be the integer
    part of x*10 (assuming your array starts at 0).

    Repeat this for a large number of times (use a for(){} loop).


    Okay, so I think I have everything down but one thing.

    This is my code.

    Code:
    #include <stdio.h> 
    #include <stdlib.h> 
    
    int main() 
    
    { 
    float x; 
    int i,j; 
    
    for(i=0;i<9;i++) 
    
    { 
    x = rand()/(double) RAND_MAX; printf("%f\n",x); 
    } 
    
    if ((0 <= x) && (x < 0.1)) {/*statements*/ } 
    else if ((0.1 <= x) && (x < 0.2)) { /* statements */ } 
    else if ((0.2 <= x) && (x < 0.3)) { /* statements */ } 
    else if ((0.3 <= x) && (x < 0.4)) { /* statements */ } 
    else if ((0.4 <= x) && (x < 0.5)) { /* statements */ } 
    else if ((0.5 <= x) && (x < 0.6)) { /* statements */ } 
    else if ((0.6 <= x) && (x < 0.7)) { /* statements */ } 
    else if ((0.7 <= x) && (x < 0.8)) { /* statements */ } 
    else if ((0.8 <= x) && (x < 0.9)) { /* statements */ } 
    else ((0.9 <= x) && (x < 1.0)) { /* statements */ } 
    
    return 0; 
    }
    So all I need to do is declare the bins by the statements.

    I had a look around online and found this thread http://cboard.cprogramming.com/c-programming/145610-reading-data-file-binning.html

    But I'm not sure how to use it in mine if I can at all. As far as I can tell this looks like a good idea as it doesn't actually matter what the numbers are that fall into the bins, it's the distribution that counts.

    Any thoughts?
  2. #2
  3. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    I think you need to look at your logic in your if statements, I doubt any would execute. Also, your instructions talk running this in a for loop and while you got one, you ain't runnin it in the loop.

    It looks to me like you are not taking the time to understand what you are trying to do and are just randomly coding. Take a step back and write out the logic in high-level pseudo code and model that code by hand. Once you have something that works on paper, _then_ convert it into code and start again.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Location
    Bristol
    Posts
    13
    Rep Power
    0
    Originally Posted by mitakeet
    I think you need to look at your logic in your if statements, I doubt any would execute. Also, your instructions talk running this in a for loop and while you got one, you ain't runnin it in the loop.

    It looks to me like you are not taking the time to understand what you are trying to do and are just randomly coding. Take a step back and write out the logic in high-level pseudo code and model that code by hand. Once you have something that works on paper, _then_ convert it into code and start again.
    Okay, had another go and came up with this lot. Isn't quite doing what I expected, the number of elements in each bin is far greater than expected. :S

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #define N_SAMPLES 1000
    #define N_BINS 10
    
    int main()
    {
     int i,bins[N_BINS],j; //declares integer variables i and j, and integer array arr
     float x,nums[N_SAMPLES],y;//declares floating point variable x
    
     for(i=0; i<N_SAMPLES; i++)
     {
         nums[i]=0;
     }
    
     for(i=0; i<N_BINS; i++)
     {
         bins[i]=0;
     }//initializes both arrays
    
    
     for(i=0;i<N_SAMPLES; i++)
     {
     x = rand()/(double) RAND_MAX;//x is a random number between 0 and 1
    
         nums[i]=x;//assigns a random number between 0 and 1 to each element in array 'nums'
     }
     for(i=0;i<N_SAMPLES;i++)//for every random number generated
     {
         for(j=0;j<N_BINS;j++)//for each bin
         {
             y=nums[i];
    
             if((y>=j/10) && (y<(j+1)/10))
             {
                 bins[j] = (bins[j] + 1);
             }
             else
             {
                 bins[j] = (bins[j]);
             }
         }
     }
     for(j=0;j<N_BINS;j++)
     {
         printf("Bin Number: %d\n",j);
         printf("Number of Elements: %d\n", &
    
         bins[j]);
     }
    
        return 0;
    }
  6. #4
  7. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    That is certainly a _lot_ better. Why, though, do you feel the need to create an array and store all the numbers between zero and one? Why not just create them one at a time and test them?

    Why did you switch from the stack of if/elseif statements? I bet it would be a lot easier to debug if you stuck with what you had before!

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  8. #5
  9. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    Code:
    #include <string.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    #define N_SAMPLES 1000
    #define N_BINS 10
    
    #define SYMBOL '*'
    #define MAX(A,B) ((A) < (B) ? (B) : (A))
    
    int main() {
      char bar[50];
      int i,j,bins[N_BINS],max_counts;
      /*Lambert Electronics, LLC.  USA, NY*/
      double x,nums[N_SAMPLES];
    
      srand(time(NULL));		/* seed the prng (pseudo random number generator) */
      puts("/* initialized bar for histogram */");
      memset(bar,SYMBOL,sizeof bar);
    
      puts("/* removed initialization memset(nums,0,sizeof(nums)); */");
    
      for(i=0; i<N_BINS; i++)
        bins[i]=0;
    
      puts("/* initialize nums with original data *****************/");
      for(i=0;i<N_SAMPLES; i++) {
        for (x = rand(); RAND_MAX == x; x = rand())
          puts("/* AVOIDED RAND_MAX *****************/");
        nums[i] = x/(double)RAND_MAX;
      }
    
      puts("/* Directly compute the bins index *****************/");
      for(i=0;i<N_SAMPLES;i++)
        ++bins[(int)(N_BINS*nums[i])]; /* note buffer overrun if we had retained RAND_MAX */
      puts("Lambert Electronics, LLC.  USA, NY");
    
      puts("/* Find the maximum counts for histogram */");
      for (i = max_counts = 0; i < N_BINS; ++i)
        max_counts = MAX(max_counts,bins[i]);
    
      for(i=0;i<N_BINS;++i) {
        printf("%-3d",i);
        j = 40*(bins[i]/(double)max_counts);
        bar[j] = 0;
        puts(bar);
        bar[j] = SYMBOL;
      }
      return 0;
    }
    Note: Some learn by example. Having struggled and considered this problem a bit I hope some lights flash inside Sophie's head, gains insight, and becomes a better programmer.

    How exactly does "directly computing the bin index" work?

    Why was initializing nums to 0 useless?

    What does bar[j] = 0; do?

    * we save the nums because Sophie might want to use it for other purposes. Statistical tests min/max, different bin sizes, grand sum of counts must match N_SAMPLES, etceteras.
    Last edited by b49P23TIvg; September 12th, 2012 at 09:37 AM.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo