#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Location
    Wilmington, DE
    Posts
    4
    Rep Power
    0

    Question on binary searching of arrays.


    Hi all.

    My beginner programming class has reached a section on sorting and searching arrays. On sorting, we've gone over the bubble sort and the selection sort. I'm trying to figure out the heap sort and quick sort on my own (we don't cover those in this particular class), but am having trouble with that. But that's a completely different matter all together.

    On searching, we went over sequential and binary searching. I think I've got things down pretty well. No formal assignment has been given on sorting or searching, but I was noodling around in my compiler yesterday and I wrote something that seems to work.

    Here's the binary search function I wrote. The array numbers[] is 30 randomly generated integers from 0-99. It is already sorted in ascending order by a selection sort prior to being passed to this function. Search is working, sort of. If a number repeats itself, it'll only inform you of the first occurrence of that number. I'm thinking that through, and maybe the solution to that will change the answer to my next question. But for now, let's assume that I don't care that it's always returning the first occurrence of a given number.

    My question is more about code efficiency. The while loop that searches the array has this condition:

    Code:
    while( (!found) && (bottom <= top) )
    But I've been wondering, is it better to use a joint condition like that (I dunno if there's a more technical term), or to just do away with the first condition (and the boolean variable found)? I'm thinking that if, right after the line where found is set to true, I add a line saying bottom = top + 1 , I can eliminate the first condition of the while loop and the logic remains the same. The search key is found, so set up a condition where the loop condition will evaluate as false, and the program stops searching the array. My thought here is that if you use TWO conditions for the while loop, the program needs to make TWO decisions at every iteration, effectively doubling the work.

    The reason I bring this up is because when he was introducing the binary search, my instructor set up a code sample which did use a joint condition like the one I've used below. This professor can be a bit of a stickler for doing things the way he envisions them on the basis that code should be more efficient. I'm thinking that, actually, my code with one condition for the while loop is more efficient than code with two conditions. I know that in a small program like this there's virtually no difference, but in general, is my reasoning correct? And is it bad practice to tinker with your variables and intentionally set them to certain values for the express purpose of forcing a conditional statement to evaluate as false?

    Code:
    void binarysearch(int numbers[],int size)
    {
    	int key;
    	int bottom;
    	int middle;
    	int top;
    	bool found;
    	bool searchagain = true;
    	char choice[10];
    
    	while(searchagain)
    	{
    		bottom = 0;
    		top = size;
    		found = false;
    		cout <<"\n\nWhat number to search for?   ";
    		cin >> key;
    		while( (!found) && (bottom <= top) )
    		{
    			middle = (bottom + top) / 2;
    			if(numbers[middle] == key)
    			{
    				cout <<"\nI found " <<key <<" at position " <<middle <<".";	
    				found = true;
    			}
    			if(key < numbers[middle])			//It's in the bottom half of the array.
    					top = middle - 1;			//New top is one before the midpoint.
    			if(key > numbers[middle])			//It's in the top half of the array.
    				bottom = middle + 1;			//New bottom is one after the midpoint.
    		}
    		if(!found)
    			cout <<"\nSorry, I couldn't find " <<key <<" anywhere.";
    		cout <<"\nSearch again?   ";
    		cin >> choice;
    		switch(choice[0])
    		{
    			case 'Y':	searchagain = true;
    						break;
    			case 'N':	searchagain = false;
    						break;
    			default:	cout <<"\nI don't understand that, so we're not going to search again.";
    						searchagain = false;
    						break;
    		}
    	}
    cin.get();
    }
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,392
    Rep Power
    1871
    There are all sorts of ways of doing it.

    First and foremost, I would go for code clarity above minuscule "performance" advantages - many of which tend to disappear by the time an optimising compiler has had a go at the code.

    Messing around with existing variables to "overload" their meaning, just to contrive an exit condition is a point where bugs can get introduced.

    Here's another alternative.
    Code:
    		while( (bottom <= top) )
    		{
    			middle = (bottom + top) / 2;
    			if(numbers[middle] == key)
    			{
    				cout <<"\nI found " <<key <<" at position " <<middle <<".";	
    				found = true;
    				break;  //!! watch out for the 'no goto' brigade
    			}
    			if(key < numbers[middle])			//It's in the bottom half of the array.
    					top = middle - 1;			//New top is one before the midpoint.
    			if(key > numbers[middle])			//It's in the top half of the array.
    				bottom = middle + 1;			//New bottom is one after the midpoint.
    		}
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,886
    Rep Power
    481
    Concern yourself with choice of algorithm.

    Bubble sort or selection sort runs in order n squared time. qsort usually runs in order n log(n) time.

    If you're searching for the string
    a="abc" in b="abXabXabc"
    you might compare
    a[0] with b[0]. They match, so compare
    a[1] with b[1]. Etceteras.

    Or you might precompute a table and first compare
    a[2] with b[2]. They don't match, and 'X' isn't in a, so your search continues with b[2+strlen(a)] (the distance is found in the precomputed look up table)
    compare
    b[5] with a[2].
    After two comparisons we've advanced to next checking b[8] whereas in the naive algorithm the third comparison is at b[2].

    Study and use fast algorithms. Now that you've written a binary search and survived the torments of checking whether it works with target in or not in the list, and for odd and even length lists, and for empty and length 1 lists, and having resolved all the off-by-1 errors and infinite loops with incorrect interval updates or tests,

    learn to use qsort and bsearch from the standard c library. stdlib.h is the header file that declares them.
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Location
    Wilmington, DE
    Posts
    4
    Rep Power
    0
    Originally Posted by salem
    Messing around with existing variables to "overload" their meaning, just to contrive an exit condition is a point where bugs can get introduced.
    Thanks for the input salem. That's kinda what I was thinking, that my alternate approach was using a variable for something other than its original meaning. It seems like one of those "it's not gonna affect THIS code, but it's just laying the seeds for a bad habit" situations.

    I hadn't considered using break. It feels like a bit of a "cheat" to me, for some reason. Sorta like doing something in an infinite loop, but using a return function as the only escape.

IMN logo majestic logo threadwatch logo seochat tools logo