The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages - More
> Software Design
|
Page 2 -
efficient way to find only the closest objects
Page 2 - Discuss efficient way to find only the closest objects in the Software Design forum on Dev Shed. efficient way to find only the closest objects Software design forum discussing design principles and non-language specific algorithms. Get help with logic, algebraic, or relational concepts.
|
|
 |
|
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

March 13th, 2002, 07:36 PM
|
|
Contributing User
|
|
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Fast Distance calculation
Hi,
I noticed that a lot of people have been advocating multipass solutions to this problem - ie:
1) Quick Reject in X
2) Quick Reject in Y
3) Distance calculation
I would, however, be rather surprised if it wasn't faster to do a single pass test based on the magnitude of the distance squared:
Code:
if(dx*dx + dy*dy > rad2)
{
//...
}
I'll do some timing results and see.
|

March 13th, 2002, 08:07 PM
|
|
Contributing User
|
|
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Results.
Basically, a single magnitude squared test turns out to be about 20-25% faster than the two-layer test, on a P4 under VC++ 6.0.
Time for 10^8 distance rejection/acceptance calculations was:
1) Magnitude squared: 917084 / (1.19 * 10^6) s
2) Two-layer: 1286742 / (1.19 * 10^6) s
The code follows. arrayx[] and arrayy[] are initialised to random values in the range 0...1000, as are px and py.
Magnitude squared:
Code:
for(ctr = 0, count=0; ctr < 1000; ctr++)
{
dx = px-arrayx[ctr];
dy = py-arrayy[ctr];
if(dx*dx + dy*dy < 100)
{
count++;
}
}
Two-layer + magnitude squared:
Code:
for(ctr = 0, count = 0; ctr < 1000; ctr++)
{
dx = px-arrayx[ctr];
dy = py-arrayy[ctr];
if(dx > -10 && dx < 10 && dy > -10 && dy < 10)
{
if(dx*dx+dy*dy < 100) count++;
}
}
Neither of these is safe if dx*dx + dy*dy > 0x7FFFFFFF. Options for making this safe include using 64-bit integers for the mag^2 calculations, and also using double-precision numbers.
The timing results for the double precision case are even more skewed in favor of magintude squared only:
10^8 double precision distance tests:
1) Mag-squared = 1.5582 / 1.19 s
2) Two-layer = 4.2211 / 1.19 s
Last edited by The Ostrich : March 13th, 2002 at 09:32 PM.
|

March 13th, 2002, 08:11 PM
|
|
Contributing User
|
|
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
I figure that the 3x discrepancy in the timing results for the double precision case is probably because either dx or dy is being swapped into / out of memory more times.
Or it could just be that the more complex condition breaks the branch prediction.
|

March 13th, 2002, 09:21 PM
|
|
Contributing User
|
|
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Humble Pie
Whoops.
If I replace that UGLY if statement with
Code:
//integer version
if(abs(dx) < 10 && abs(dy) < 10) ...
Code:
//double precision
if(fabs(dx) < 10 && fabs(dy) < 10) ...
Then the axis-aligned bounding box is faster. Duh. After that change and some inner-loop optimisation, we get:
Integer:
1) Mag^2 test: 9.4 / 1.19 s
2) Two-level: 6.7 / 1.19 s
Double Prec
1) Mag^2 test: 1.52 / 1.19 s
2) Two-level: 9.2 / 1.19 s
Last edited by The Ostrich : March 13th, 2002 at 09:36 PM.
|

March 13th, 2002, 11:55 PM
|
|
Junior Member
|
|
Join Date: Dec 2001
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
so... what you're saying is... the two loops are faster if you're dealing with integers, and one loop is faster if dealing with doubles (like me)?
|

March 14th, 2002, 08:14 PM
|
|
Contributing User
|
|
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
No. One loop which does two tests is faster in both cases, provided that you use the abs() and fabs() macros / functions. What I mean is:
Code:
N = ...
N2 = N*N;
for(ctr = 0; ctr < thingy; ctr++)
{
dx = ...
dy = ...
if(fabs(dx) < N && fabs(dy) < N)
{
if(dx*dx + dy*dy < N2) ...
}
}
I believe that this form, and this form only, is the fastest one.
Important notes:
1) There is only one loop
2) fabs(n) is used
3) I never take a square root.
Last edited by The Ostrich : March 14th, 2002 at 08:18 PM.
|

March 23rd, 2002, 11:14 AM
|
|
Contributing User
|
|
Join Date: Mar 2001
Location: Dublin
Posts: 413
Time spent in forums: 2 h 18 m 18 sec
Reputation Power: 13
|
|
|
I've no idea of the speed gains / tradeoffs here but you could also complement the quick rejection with a quick acceptance before resorting to the distance formula .. for this two dimensional example 50% of the unknown area (that not rejected by testing the axes individually) can be quickly accepted.
Quick reject if distance >20 on either axis
Quick accept if distance <20/(2^0.5) on both axes
(I think - off the top of my head)
Again don't know if this speeds things up but I'd guess so...
|

March 26th, 2002, 06:12 PM
|
|
Junior Member
|
|
Join Date: Mar 2002
Posts: 6
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
Quote: | I believe that this form, and this form only, is the fastest one. |
Wrong, at least when you wrote C++ code
This code should be a little faster:
Code:
const float maxDist=20;
const float maxDist2 = maxDist*maxDist;
for (pointsDatatype* cur=points; cur != points.end(); ++cur)
{
float distx = testPoint.x - cur->x;
if (fabs(distx) > maxDist) continue;
float disty = testPoint.y - cur->y;
if (fabs(disty) > maxDist
|| (distx*distx + disty*disty) > maxDist2) continue;
// if we have reached this far, the point is near
// do something now ...
}
1. There is no need to calculate disty, when the point can already be rejected by calculating distx
2. Use pointers instead of an an integer and the [] operator
3. use ++cur instead of cur++
This code is also not the fastest, you can do loop unrolling which should be at least 20% or so faster.
Do you need to do this calculation for every point? If so, there are some better algorithms do do this test.
Last edited by martinus : March 26th, 2002 at 06:14 PM.
|

March 27th, 2002, 03:34 AM
|
|
Canta como rafaé
|
|
Join Date: Feb 2001
Location: Barcelona
Posts: 74
Time spent in forums: < 1 sec
Reputation Power: 13
|
|
Well, if we are talking about fast code, this is not by far the fastest code.
Code:
const float maxDist=20;
const float maxDist2 = maxDist*maxDist;
for (pointsDatatype* cur=points; cur != points.end(); ++cur)
{
float distx = testPoint.x - cur->x;
if (fabs(distx) > maxDist) continue;
float disty = testPoint.y - cur->y;
if (fabs(disty) > maxDist
|| (distx*distx + disty*disty) > maxDist2) continue;
// if we have reached this far, the point is near
// do something now ...
}
I don't have time to write a new code, but I see that
1 - You are using "const float" instead of "#define". Do you think is this an improvement ? So, every time you do if ( ... > maxDist), the program has to go to the data area, fetch the value, and then compare 2 registers, instead of simply comparing to a constant value.
2 - You are calling the function "points.end ()" in every loop in "for". Isn't it more efficient to keep that value in a "register float" to avoid the delay of calling to a function ?
3 - For the first rejections, I won't use float values. It does not mind if the rejection checks are not as precise as using floats, but it's worth to use integers.
Code:
Example (integer version in parentheses): 23.7 (23)- 3.2 (3) = 20.5 (20) => rejected (not rejected)
But the number of mistakes would be very small and they would be checked in the real distance test.
4 - With the use of integers, you can always do as for the abs:
Code:
if (res & 0x80000000) res = abs (res);
and avoid another function call.
5 - Loop unrolling only works when the size of the array is finite.
6 - If you use C++, all of those testPoint.x are more costly that iterating an array in C.
7 - And the last question ... why is it more efficient to use ++cur instead of cur++ ?
The only improvement of your code is not to calculate disty if you can reject it by distx.
Where are the days in which when we wanted speed we went directly to assembler ? But if you don't know assembler, you can always try to use -O3 in gcc and hope a bit luck.
__________________
Thrasher
'Y se ahogaron los dooos
No eran duros pa pagar, cuñaaoo !!'
El vagamundo - El risitas y su cuñao
|

March 27th, 2002, 04:57 AM
|
|
Junior Member
|
|
Join Date: Mar 2002
Posts: 6
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
1. const float: With every modern compiler this should be as fast as a #define. one should not use #defines in C++ code, inlined code is much cleaner and as fast.
2. There is no function calling overhead, this procedure is inline and does not need any calculation.
3. have a look at this table Relative Performance of Common Operations (PIII-733)
* and + operations are as fast as for integers, so you should stick to floats. Converting from float to int and reverse would also cost time.
4. fabs() should also be inlined, and it is a direct assembler call which clears a flag
5. I have never seen an infinite array, probably I do not have enough RAM 
If you mean that unrolling only works for fixed sized arrays, have a look at this code:
Code:
int i;
for (i=0; i<max % 4; ++i)
doSomething;
for (; i<max; i += 4) {
doSomething;
doSomething;
doSomething;
doSomething;
}
6. It should be faster using temporary variables, but the same applies to code in C, there is no difference in speed. Believe me, a loop which iterates with a pointer (or an interator in C++) is always faster than iterating an integer and using [].
7. the Prefix ++ is always faster than the postfix ++, because the postfix operator has to temporarily save it's value, increment it and return the temporary value. The Prefix ++ does not need a temporarily object.
martin
|

March 27th, 2002, 06:17 AM
|
|
Canta como rafaé
|
|
Join Date: Feb 2001
Location: Barcelona
Posts: 74
Time spent in forums: < 1 sec
Reputation Power: 13
|
|
|
Hi
So, as I understand, points 1, 2, 4, 5 and 7 relies on compiler optimizations. Is it too difficult to make them by hand avoiding that the compiler couldn't handle those optimizations? So, don't say your code is faster, your compiler code is faster.
In point number 3 you might be right, for an Xbox (PIII). But in order to believe those statistics, use the Intel official clock ticks for every function.
In point 6 I wanted to say that, having an array 'int * a' and accessing as '*act' is faster than an array of objects and accessing the values as 'act -> x'.
Point 7 is nonsense. If you are not assigning i++ to any variable, there's no assignation nor backuping of the previous value.
|

March 27th, 2002, 12:36 PM
|
|
Junior Member
|
|
Join Date: Mar 2002
Posts: 6
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
I do not understand your first point. Of course (every) code relies on compiler optimisations. But compilers are dumb, and above code makes it easier for them to generate fast code.
Yes, point 7 is wrong: "i++ can be slower or as fast as ++i, so one should prefer ++i" should be correct.
Whenever you use i++ and think it is as fast as ++i, you are heavily relying on your compiler
BTW: Only trust benchmarks, never assume anything 
|

March 28th, 2002, 03:27 AM
|
|
Canta como rafaé
|
|
Join Date: Feb 2001
Location: Barcelona
Posts: 74
Time spent in forums: < 1 sec
Reputation Power: 13
|
|
If you are always relying on benchmarks, you will fall into
ignorance.
The increment discussion: Let's make some tests
Code:
int main (void) {
register int i = 0;
register int j;
/* i ++ */
i ++;
/* ++ i */
++ i;
/* j = i++ */
j = i ++;
/* j = ++ i */
j = ++ i;
return 0;
}
Now let's compile with
Code:
gcc testcode.c -S -o testcode.asm -O0 -Wall
Note that I've disabled any compiler optimizations with -O0.
The compiler assembler result is
Code:
main:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
/* register int i = 0; */
movl $0, %eax
/* i ++ */
incl %eax
/* ++ i */
incl %eax
/* j = i ++ */
movl %eax, -4(%ebp) /* Saves the value of i in j */
incl %eax /* Increments register i */
/* j = i ++ */
incl %eax /* Increments the value of i */
movl %eax, -4(%ebp) /* Saves the value of i in j */
/* return 0 */
movl $0, %eax
leave
ret
j is stored in -4(%ebp)
Now, please, tell me the difference with preincrement and
postincrement.
Benchmarks are good tools for analyzing the temporary overall
of an application, but relying on them to assure something is
the fastest way is a silly thing. A program is a program and
its environment, and that's what benchmarks show.
If you suspected that I used optimizations, this is the ouput
with the first level of optimization
Code:
main:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
popl %ebp
ret
This is a clever compiler, uh ?
Last edited by Thrasher : March 28th, 2002 at 03:36 AM.
|

March 28th, 2002, 01:21 PM
|
|
Junior Member
|
|
Join Date: Mar 2002
Posts: 6
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
This discussion is getting boring. Have you ever overloaded the ++ operator? Ever used the STL extensively? Do you always know in advance what a good compiler generates? g++, visual C++, Visual Age, Intel or even KAI produce very different output.
(My last posting here)
|

April 9th, 2002, 09:26 PM
|
|
Contributing User
|
|
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Martinus:
Before you assert code is faster, it is an absolute necessity to get timing results from it. I am not saying that my code was unequivocally faster, merely that I tried various different combinations of form, and the one I put up was the one I found to have the fastest timing results.
Having said this, I would be very surprised if your code was faster. The reason being that the more if statements you have, the more likely you are to get branch prediction errors.
Thrasher: Nice tip on abs - but the above comment applies. I would try it out, and time both, but I can't be arsed.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|