Forums: » Register « |  Free Tools |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |

New Free Tools on Dev Shed!

#1
September 13th, 2012, 04:13 PM
 Shaun_B
Contributing User

Join Date: Sep 2012
Posts: 62
Time spent in forums: 1 Day 1 h 9 m 54 sec
Reputation Power: 3
Using bitwise operators for multiplication.

I was explaining to someone at work this:

#define TIMES16(a) a<<4

which used to be quite a quick way to multiplication (perhaps this is not so with modern CPUs?)

Anyway, I wrote a quick-n-dirty demonstrator to show the principle up to X*12, so I thought that I'd share it. If the principles are still broadly the same, this sort of thing might save you a few processor clicks, so if you're working on a big project which uses lots of multiplication (or on older hardware), it could be useful. If anyone needs an explanation then I'll do my best.

Code:
```#include <stdio.h>
#include <conio.h>
#include <stdlib.h>

/**
* A programme demonstrates using bitwise operators for multiplication
* Tested using Code::Blocks and Windows 7 Professional SP1
*
* @Author:     Shaun B
* @Version:	1.0.0.1 - 2012-09-13
* @Todo:
*
**/

static unsigned int a=0;
static int times[12];
int main();

int main()
{
system("cls");
printf("Demonstration of using bitwise operators to multiply whole numbers.\n\n");
printf("C:\\>");
scanf("%d",&a);
times[ 0]=  0;
times[ 1]=  a;                          /**Times  1*/
times[ 2]=  a<<1;                       /**Times  2*/
times[ 3]=  (a<<1)+a;                   /**Times  3*/
times[ 4]=  a<<2;                       /**Times  4*/
times[ 5]=  (a<<2)+a;                   /**Times  5*/
times[ 6]=  (a<<2)+(a<<1);              /**Times  6*/
times[ 7]=  (a<<2)+(a<<1)+(a);          /**Times  7*/
times[ 8]=  a<<3;                       /**Times  8*/
times[ 9]=  (a<<3)+(a);                 /**Times  9*/
times[10]=  (a<<3)+(a<<1);              /**Times 10*/
times[11]=  (a<<3)+(a<<1)+(a);          /**Times 11*/
times[12]=  (a<<3)+(a<<2);              /**Times 12*/
for(int i=0; i<=12; i++)
{
printf("%4d x %4d = %10d\n", a, i, times[i]);
}
printf("\nPress the any key to end demonstration...");
_getch();
return 0;
}```

#2
September 13th, 2012, 05:05 PM
 bdb
Contributing User

Join Date: Aug 2012
Posts: 156
Time spent in forums: 1 Week 15 h 48 m 11 sec
Reputation Power: 33
The compiler should be able to optimize multiplications by a constant better than you

Anyway, your macro (all function-like macros) should use parenthesis around arguments and around the whole expression

Code:
```#define TIMES16(a) a<<4
#define TIMES16PAREN(a) ((a) << 4)

/* ... */

... 5 + TIMES16(j - 3) * 2 ...      // 5 + j - 3 << 4 * 2
... 5 + TIMES16PAREN(j - 3) * 2 ... // 5 + ((j - 3) << 4) * 2
```

#3
September 13th, 2012, 05:32 PM
 Shaun_B
Contributing User

Join Date: Sep 2012
Posts: 62
Time spent in forums: 1 Day 1 h 9 m 54 sec
Reputation Power: 3
Thanks for the advice regarding Macros. I suppose if you were to do assembly (which I love doing), this would be more useful.

When I did some testing on my PC, I found that using look-up tables were a few clicks quicker than multiplication, for instance this:

http://pastebin.com/MtkfELP8

was quicker than this:

http://pastebin.com/mrsgUg8W

on average.

When I did some MIPs stuff, I passed it onto a games programmer that had worked with the PSP quite a bit, and he advised me to use bitwise operators rather than the multiplication operands.

Regards,

Shaun.

#4
September 13th, 2012, 05:36 PM
 requinix
Forgetful

Join Date: Mar 2007
Location: Washington, USA
Posts: 13,468
Time spent in forums: 5 Months 2 Weeks 2 Days 3 h 55 m 33 sec
Reputation Power: 9259
Quote:
 Originally Posted by Shaun_B Edit: You should be able to use | instead of +, like: Code: `times[12]= (a<<3)|(a<<2); /**Times 12*/`

Uh, what? Bitwise OR won't carry digits that addition would. Consider 7+3=10 while 7|3=7.

#5
September 13th, 2012, 05:45 PM
 Shaun_B
Contributing User

Join Date: Sep 2012
Posts: 62
Time spent in forums: 1 Day 1 h 9 m 54 sec
Reputation Power: 3
Quote:
 Originally Posted by requinix Uh, what? Bitwise OR won't carry digits that addition would. Consider 7+3=10 while 7|3=7.

I've just worked it out, my logic was elsewhere.

But you could use it with something like:

Code:
```a= 0x0f; // High nybbles
b= 0x0a; // Low nybbles
c= (a<<4)|b;```

Regards,

Shaun.

Last edited by Shaun_B : September 13th, 2012 at 06:33 PM. Reason: I'm tired... typos ahoy!

#6
September 13th, 2012, 10:16 PM
 Lux Perpetua
Contributing User

Join Date: Feb 2004
Location: San Francisco Bay
Posts: 1,939
Time spent in forums: 1 Month 1 Week 3 h 27 m 29 sec
Reputation Power: 1312
Quote:
 Originally Posted by Shaun_B I was explaining to someone at work this: #define TIMES16(a) a<<4 which used to be quite a quick way to multiplication (perhaps this is not so with modern CPUs?)
It depends somewhat on the processor, but multiplication instructions are still among the slowest integer arithmetic instructions on any machine I know. That said:
Quote:
 Originally Posted by bdb The compiler should be able to optimize multiplications by a constant better than you
The compiler is certainly not required to generate 'mul' instructions just because your C expression uses multiplication, so this is probably true, especially since there are generally multiple ways to implement multiplication by a given constant in terms of shifts and additions, some of which are likely to be faster than others. For instance, on Intel processors, shift instructions are typically slower than addition, so e.g. to multiply by twelve, (a+a+a)<<2 may well be faster than (a<<3) + (a<<2). (For the record, the former is indeed how GCC compiles a*12.)

N. B.: things like this are highly machine-dependent; for example, the story is completely different on ARM, where arithmetic instructions get a free left or right shift on one operand. This is yet another reason to trust the compiler unless you're definitely targeting a very specific architecture (and perhaps even then).

#7
September 14th, 2012, 08:40 AM
 clifford
Contributing User

Join Date: Aug 2003
Location: UK
Posts: 4,961
Time spent in forums: 1 Month 4 Days 2 h 35 m 15 sec
Reputation Power: 1801
Quote:
 Originally Posted by Shaun_B If the principles are still broadly the same, this sort of thing might save you a few processor clicks, so if you're working on a big project which uses lots of multiplication (or on older hardware), it could be useful.

Still applicable in many embedded microcontroller applications - this need not even be "older hardware" there are many modern 8 bit microcontroller applications with real-time maths requirements and many of then do not have a hardware multiplier, and those that do are often slower than a shift or shift+add operation.

However most compilers will at least translate a multiply-by-power-of-two into a shift. YMMV with non-power-of-two multiplicands.

The technique is only useful of course when multiplying by a constant. A variable multiplied by a variable cannot be similarly optimised.

For more complex expressions and use in fixed-point calculations involving both multiply and divide, see this article.
Shaun_B agrees: Thanks for the link!

#8
September 15th, 2012, 11:06 AM
 Shaun_B
Contributing User

Join Date: Sep 2012
Posts: 62
Time spent in forums: 1 Day 1 h 9 m 54 sec
Reputation Power: 3
Quote:
 Originally Posted by Lux Perpetua It depends somewhat on the processor, but multiplication instructions are still among the slowest integer arithmetic instructions on any machine I know. That said:The compiler is certainly not required to generate 'mul' instructions just because your C expression uses multiplication, so this is probably true, especially since there are generally multiple ways to implement multiplication by a given constant in terms of shifts and additions, some of which are likely to be faster than others. For instance, on Intel processors, shift instructions are typically slower than addition, so e.g. to multiply by twelve, (a+a+a)<<2 may well be faster than (a<<3) + (a<<2). (For the record, the former is indeed how GCC compiles a*12.) N. B.: things like this are highly machine-dependent; for example, the story is completely different on ARM, where arithmetic instructions get a free left or right shift on one operand. This is yet another reason to trust the compiler unless you're definitely targeting a very specific architecture (and perhaps even then).

I suppose if memory isn't the issue, and you have fast access to RAM, then pre-calculations are still useful? My interest in optimised code stems from the fact that I'm an 8-bit guy and making things happen on very limited systems such as the Sinclair ZX81 is something of a fun challenge. However, I do of course program for more modern platforms and web stuff.

Many thanks,

Shaun.

#9
September 15th, 2012, 01:10 PM
 EEmaestro
Contributing User

Join Date: Apr 2012
Posts: 59
Time spent in forums: 1 Day 18 h 13 m 7 sec
Reputation Power: 2
Quote:
 Originally Posted by Shaun_B I suppose if memory isn't the issue, and you have fast access to RAM, then pre-calculations are still useful? My interest in optimised code stems from the fact that I'm an 8-bit guy and making things happen on very limited systems such as the Sinclair ZX81 is something of a fun challenge. However, I do of course program for more modern platforms and web stuff. Many thanks, Shaun.

When I think of code optimization for a specific microprocessor, I think of two code attributes--speed and accuracy. For example, maybe I can recode a function to be an order of magnitude faster. Does it matter? That is a question you must answer in the context of your embedded application and its data structures. You must answer this question by making BOTH speed AND accuracy measurements before and after the code changes. It might not matter if the revised code is ten times faster if it contains an occult bug that only reveals itself once in every ten million iterations, for example.

After reading Kyle York's article, I also recall the code attribute of accuracy. When evaluating this attribute, you must ask yourself "is the code as accurate as it needs be ?" for this application, not "is the code as accurate as possible?"

Now suppose you do increase your function's speed by an order of magnitude Good work! Now ask yourself, what difference does this make?

Obviously it makes a great deal of difference time-wise if your function is executed ten million times per second, and less difference if your function is executed ten times per second. This "hidden" scale factor is built into all embedded threads. It is repetition rate. You need to have at least a rudimentary quantitative idea of the function's repetition rate. If you don't have an idea, then you need to make high and low estimates by asking yourself "What is the fastest this function needs to be called ?" and "What is the number of times per second this function would need to be called if the user is operating this device as slowly as Grandma in mid-January with her arthritic finger? " By asking yourself these questions, you can sometimes set upper and lower limits on the repetition rate of a piece of code. Then you can see, in a hierachical fashion, what sets of instructions can most benefit from optimizing code.

Sometimes (but not always) there are tradeoffs between memory usage and execution speed. That is to say, sometimes you can make a function do a math operation much faster but at the cost of using more memory. You must also ask yourself the questions (1) Can this memory afford to be used for this purpose and at what cost(s)? and (2) if I recode this function using so much more memory, what will be the overall speed increase (taking into consideration using more memory takes more time) for my application ? You can make a stab an answering this second question by making measurements. The first question is more subjective as it involves attributes your product might be given in the future if more memory is available to it.

 Viewing: Dev Shed Forums > Programming Languages > C Programming > Using bitwise operators for multiplication.