Software Design
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreSoftware Design

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #16  
Old March 13th, 2002, 07:36 PM
The Ostrich The Ostrich is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28 The Ostrich User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Fast Distance calculation

Hi,

I noticed that a lot of people have been advocating multipass solutions to this problem - ie:

1) Quick Reject in X
2) Quick Reject in Y
3) Distance calculation

I would, however, be rather surprised if it wasn't faster to do a single pass test based on the magnitude of the distance squared:

Code:
if(dx*dx + dy*dy > rad2)
{
   //...
}


I'll do some timing results and see.

Reply With Quote
  #17  
Old March 13th, 2002, 08:07 PM
The Ostrich The Ostrich is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28 The Ostrich User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Results.

Basically, a single magnitude squared test turns out to be about 20-25% faster than the two-layer test, on a P4 under VC++ 6.0.

Time for 10^8 distance rejection/acceptance calculations was:

1) Magnitude squared: 917084 / (1.19 * 10^6) s

2) Two-layer: 1286742 / (1.19 * 10^6) s

The code follows. arrayx[] and arrayy[] are initialised to random values in the range 0...1000, as are px and py.

Magnitude squared:
Code:
for(ctr = 0, count=0; ctr < 1000; ctr++)
{
   dx = px-arrayx[ctr];		
   dy = py-arrayy[ctr];
   if(dx*dx + dy*dy < 100)
   {
      count++;
   }
}


Two-layer + magnitude squared:
Code:
for(ctr = 0, count = 0; ctr < 1000; ctr++)
{
   dx = px-arrayx[ctr];
   dy = py-arrayy[ctr];
   if(dx > -10 && dx < 10 && dy > -10 && dy < 10)
   {
      if(dx*dx+dy*dy < 100) count++;
   }
}


Neither of these is safe if dx*dx + dy*dy > 0x7FFFFFFF. Options for making this safe include using 64-bit integers for the mag^2 calculations, and also using double-precision numbers.

The timing results for the double precision case are even more skewed in favor of magintude squared only:

10^8 double precision distance tests:

1) Mag-squared = 1.5582 / 1.19 s

2) Two-layer = 4.2211 / 1.19 s

Last edited by The Ostrich : March 13th, 2002 at 09:32 PM.

Reply With Quote
  #18  
Old March 13th, 2002, 08:11 PM
The Ostrich The Ostrich is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28 The Ostrich User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I figure that the 3x discrepancy in the timing results for the double precision case is probably because either dx or dy is being swapped into / out of memory more times.

Or it could just be that the more complex condition breaks the branch prediction.

Reply With Quote
  #19  
Old March 13th, 2002, 09:21 PM
The Ostrich The Ostrich is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28 The Ostrich User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Humble Pie

Whoops.

If I replace that UGLY if statement with

Code:
//integer version
if(abs(dx) < 10 && abs(dy) < 10) ...


Code:
//double precision
if(fabs(dx) < 10 && fabs(dy) < 10) ...


Then the axis-aligned bounding box is faster. Duh. After that change and some inner-loop optimisation, we get:

Integer:

1) Mag^2 test: 9.4 / 1.19 s

2) Two-level: 6.7 / 1.19 s

Double Prec

1) Mag^2 test: 1.52 / 1.19 s

2) Two-level: 9.2 / 1.19 s

Last edited by The Ostrich : March 13th, 2002 at 09:36 PM.

Reply With Quote
  #20  
Old March 13th, 2002, 11:55 PM
mccutchen mccutchen is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2001
Posts: 28 mccutchen User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
so... what you're saying is... the two loops are faster if you're dealing with integers, and one loop is faster if dealing with doubles (like me)?

Reply With Quote
  #21  
Old March 14th, 2002, 08:14 PM
The Ostrich The Ostrich is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28 The Ostrich User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
No. One loop which does two tests is faster in both cases, provided that you use the abs() and fabs() macros / functions. What I mean is:

Code:
N = ...
N2 = N*N;

for(ctr = 0; ctr < thingy; ctr++)
{
    dx = ...
    dy = ...
    if(fabs(dx) < N && fabs(dy) < N)
    {
        if(dx*dx + dy*dy < N2) ...
    }
}


I believe that this form, and this form only, is the fastest one.

Important notes:

1) There is only one loop
2) fabs(n) is used
3) I never take a square root.

Last edited by The Ostrich : March 14th, 2002 at 08:18 PM.

Reply With Quote
  #22  
Old March 23rd, 2002, 11:14 AM
epl epl is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2001
Location: Dublin
Posts: 413 epl User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 18 m 18 sec
Reputation Power: 13
I've no idea of the speed gains / tradeoffs here but you could also complement the quick rejection with a quick acceptance before resorting to the distance formula .. for this two dimensional example 50% of the unknown area (that not rejected by testing the axes individually) can be quickly accepted.

Quick reject if distance >20 on either axis
Quick accept if distance <20/(2^0.5) on both axes

(I think - off the top of my head)

Again don't know if this speeds things up but I'd guess so...

Reply With Quote
  #23  
Old March 26th, 2002, 06:12 PM
martinus martinus is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Posts: 6 martinus User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via ICQ to martinus
Quote:
I believe that this form, and this form only, is the fastest one.

Wrong, at least when you wrote C++ code

This code should be a little faster:
Code:
const float maxDist=20;
const float maxDist2 = maxDist*maxDist;

for (pointsDatatype* cur=points; cur != points.end(); ++cur)
{
    float distx = testPoint.x - cur->x;
    if (fabs(distx) > maxDist) continue;
    float disty = testPoint.y - cur->y;
    if (fabs(disty) > maxDist
	|| (distx*distx + disty*disty) > maxDist2) continue;
    // if we have reached this far, the point is near
    // do something now ...
}

1. There is no need to calculate disty, when the point can already be rejected by calculating distx
2. Use pointers instead of an an integer and the [] operator
3. use ++cur instead of cur++

This code is also not the fastest, you can do loop unrolling which should be at least 20% or so faster.

Do you need to do this calculation for every point? If so, there are some better algorithms do do this test.

Last edited by martinus : March 26th, 2002 at 06:14 PM.

Reply With Quote
  #24  
Old March 27th, 2002, 03:34 AM
Thrasher Thrasher is offline
Canta como rafaé
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2001
Location: Barcelona
Posts: 74 Thrasher User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 13
Send a message via ICQ to Thrasher
Well, if we are talking about fast code, this is not by far the fastest code.

Code:
const float maxDist=20;
const float maxDist2 = maxDist*maxDist;

for (pointsDatatype* cur=points; cur != points.end(); ++cur)
{
    float distx = testPoint.x - cur->x;
    if (fabs(distx) > maxDist) continue;
    float disty = testPoint.y - cur->y;
    if (fabs(disty) > maxDist
	|| (distx*distx + disty*disty) > maxDist2) continue;
    // if we have reached this far, the point is near
    // do something now ...
}


I don't have time to write a new code, but I see that

1 - You are using "const float" instead of "#define". Do you think is this an improvement ? So, every time you do if ( ... > maxDist), the program has to go to the data area, fetch the value, and then compare 2 registers, instead of simply comparing to a constant value.

2 - You are calling the function "points.end ()" in every loop in "for". Isn't it more efficient to keep that value in a "register float" to avoid the delay of calling to a function ?

3 - For the first rejections, I won't use float values. It does not mind if the rejection checks are not as precise as using floats, but it's worth to use integers.
Code:
Example (integer version in parentheses): 23.7 (23)- 3.2 (3) = 20.5 (20) => rejected (not rejected)
But the number of mistakes would be very small and they would be checked in the real distance test.

4 - With the use of integers, you can always do as for the abs:
Code:
if (res & 0x80000000) res = abs (res);
and avoid another function call.

5 - Loop unrolling only works when the size of the array is finite.

6 - If you use C++, all of those testPoint.x are more costly that iterating an array in C.

7 - And the last question ... why is it more efficient to use ++cur instead of cur++ ?

The only improvement of your code is not to calculate disty if you can reject it by distx.

Where are the days in which when we wanted speed we went directly to assembler ? But if you don't know assembler, you can always try to use -O3 in gcc and hope a bit luck.
__________________
Thrasher



'Y se ahogaron los dooos
No eran duros pa pagar, cuñaaoo !!'
El vagamundo - El risitas y su cuñao

Reply With Quote
  #25  
Old March 27th, 2002, 04:57 AM
martinus martinus is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Posts: 6 martinus User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via ICQ to martinus
1. const float: With every modern compiler this should be as fast as a #define. one should not use #defines in C++ code, inlined code is much cleaner and as fast.

2. There is no function calling overhead, this procedure is inline and does not need any calculation.

3. have a look at this table Relative Performance of Common Operations (PIII-733)
* and + operations are as fast as for integers, so you should stick to floats. Converting from float to int and reverse would also cost time.

4. fabs() should also be inlined, and it is a direct assembler call which clears a flag

5. I have never seen an infinite array, probably I do not have enough RAM
If you mean that unrolling only works for fixed sized arrays, have a look at this code:
Code:
int i;
for (i=0; i<max % 4; ++i) 
    doSomething;

for (; i<max; i += 4) {
    doSomething;
    doSomething;
    doSomething;
    doSomething;
}


6. It should be faster using temporary variables, but the same applies to code in C, there is no difference in speed. Believe me, a loop which iterates with a pointer (or an interator in C++) is always faster than iterating an integer and using [].

7. the Prefix ++ is always faster than the postfix ++, because the postfix operator has to temporarily save it's value, increment it and return the temporary value. The Prefix ++ does not need a temporarily object.

martin

Reply With Quote
  #26  
Old March 27th, 2002, 06:17 AM
Thrasher Thrasher is offline
Canta como rafaé
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2001
Location: Barcelona
Posts: 74 Thrasher User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 13
Send a message via ICQ to Thrasher
Hi

So, as I understand, points 1, 2, 4, 5 and 7 relies on compiler optimizations. Is it too difficult to make them by hand avoiding that the compiler couldn't handle those optimizations? So, don't say your code is faster, your compiler code is faster.

In point number 3 you might be right, for an Xbox (PIII). But in order to believe those statistics, use the Intel official clock ticks for every function.

In point 6 I wanted to say that, having an array 'int * a' and accessing as '*act' is faster than an array of objects and accessing the values as 'act -> x'.

Point 7 is nonsense. If you are not assigning i++ to any variable, there's no assignation nor backuping of the previous value.

Reply With Quote
  #27  
Old March 27th, 2002, 12:36 PM
martinus martinus is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Posts: 6 martinus User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via ICQ to martinus
I do not understand your first point. Of course (every) code relies on compiler optimisations. But compilers are dumb, and above code makes it easier for them to generate fast code.

Yes, point 7 is wrong: "i++ can be slower or as fast as ++i, so one should prefer ++i" should be correct.
Whenever you use i++ and think it is as fast as ++i, you are heavily relying on your compiler

BTW: Only trust benchmarks, never assume anything

Reply With Quote
  #28  
Old March 28th, 2002, 03:27 AM
Thrasher Thrasher is offline
Canta como rafaé
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2001
Location: Barcelona
Posts: 74 Thrasher User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 13
Send a message via ICQ to Thrasher
If you are always relying on benchmarks, you will fall into
ignorance.

The increment discussion: Let's make some tests

Code:
int main (void) {

        register int i = 0;
        register int j;

        /* i ++ */
        i ++;

        /* ++ i */
        ++ i;

        /* j = i++ */
        j = i ++;

        /* j = ++ i */
        j = ++ i;

        return 0;
}


Now let's compile with

Code:
gcc testcode.c -S -o testcode.asm -O0 -Wall


Note that I've disabled any compiler optimizations with -O0.
The compiler assembler result is

Code:
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        
        /* register int i = 0; */
        movl    $0, %eax

        /* i ++ */
        incl    %eax

        /* ++ i */
        incl    %eax

        /* j = i ++ */
        movl    %eax, -4(%ebp)  /* Saves the value of i in j */
        incl    %eax            /* Increments register i */

        /* j = i ++ */
        incl    %eax            /* Increments the value of i */
        movl    %eax, -4(%ebp)  /* Saves the value of i in j */

        /* return 0 */
        movl    $0, %eax
        leave
        ret


j is stored in -4(%ebp)

Now, please, tell me the difference with preincrement and
postincrement.

Benchmarks are good tools for analyzing the temporary overall
of an application, but relying on them to assure something is
the fastest way is a silly thing. A program is a program and
its environment, and that's what benchmarks show.

If you suspected that I used optimizations, this is the ouput
with the first level of optimization

Code:
main:
        pushl   %ebp
        movl    %esp, %ebp
        movl    $0, %eax
        popl    %ebp
        ret


This is a clever compiler, uh ?

Last edited by Thrasher : March 28th, 2002 at 03:36 AM.

Reply With Quote
  #29  
Old March 28th, 2002, 01:21 PM
martinus martinus is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Posts: 6 martinus User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via ICQ to martinus
This discussion is getting boring. Have you ever overloaded the ++ operator? Ever used the STL extensively? Do you always know in advance what a good compiler generates? g++, visual C++, Visual Age, Intel or even KAI produce very different output.
(My last posting here)

Reply With Quote
  #30  
Old April 9th, 2002, 09:26 PM
The Ostrich The Ostrich is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2002
Location: Oxford, UK
Posts: 28 The Ostrich User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Martinus:

Before you assert code is faster, it is an absolute necessity to get timing results from it. I am not saying that my code was unequivocally faster, merely that I tried various different combinations of form, and the one I put up was the one I found to have the fastest timing results.

Having said this, I would be very surprised if your code was faster. The reason being that the more if statements you have, the more likely you are to get branch prediction errors.

Thrasher: Nice tip on abs - but the above comment applies. I would try it out, and time both, but I can't be arsed.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreSoftware Design > efficient way to find only the closest objects

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap