September 13th, 2012, 04:13 PM

Using bitwise operators for multiplication.
I was explaining to someone at work this:
#define TIMES16(a) a<<4
which used to be quite a quick way to multiplication (perhaps this is not so with modern CPUs?)
Anyway, I wrote a quickndirty demonstrator to show the principle up to X*12, so I thought that I'd share it. If the principles are still broadly the same, this sort of thing might save you a few processor clicks, so if you're working on a big project which uses lots of multiplication (or on older hardware), it could be useful. If anyone needs an explanation then I'll do my best.
Code:
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
/**
* A programme demonstrates using bitwise operators for multiplication
* Tested using Code::Blocks and Windows 7 Professional SP1
*
* @Author: Shaun B
* @Version: 1.0.0.1  20120913
* @Todo:
*
**/
static unsigned int a=0;
static int times[12];
int main();
int main()
{
system("cls");
printf("Demonstration of using bitwise operators to multiply whole numbers.\n\n");
printf("Please enter a number\n");
printf("C:\\>");
scanf("%d",&a);
times[ 0]= 0;
times[ 1]= a; /**Times 1*/
times[ 2]= a<<1; /**Times 2*/
times[ 3]= (a<<1)+a; /**Times 3*/
times[ 4]= a<<2; /**Times 4*/
times[ 5]= (a<<2)+a; /**Times 5*/
times[ 6]= (a<<2)+(a<<1); /**Times 6*/
times[ 7]= (a<<2)+(a<<1)+(a); /**Times 7*/
times[ 8]= a<<3; /**Times 8*/
times[ 9]= (a<<3)+(a); /**Times 9*/
times[10]= (a<<3)+(a<<1); /**Times 10*/
times[11]= (a<<3)+(a<<1)+(a); /**Times 11*/
times[12]= (a<<3)+(a<<2); /**Times 12*/
for(int i=0; i<=12; i++)
{
printf("%4d x %4d = %10d\n", a, i, times[i]);
}
printf("\nPress the any key to end demonstration...");
_getch();
return 0;
}
September 13th, 2012, 05:05 PM

The compiler should be able to optimize multiplications by a constant better than you :)
Anyway, your macro (all functionlike macros) should use parenthesis around arguments and around the whole expression
Code:
#define TIMES16(a) a<<4
#define TIMES16PAREN(a) ((a) << 4)
/* ... */
... 5 + TIMES16(j  3) * 2 ... // 5 + j  3 << 4 * 2
... 5 + TIMES16PAREN(j  3) * 2 ... // 5 + ((j  3) << 4) * 2
September 13th, 2012, 05:32 PM

Thanks for the advice regarding Macros. I suppose if you were to do assembly (which I love doing), this would be more useful.
When I did some testing on my PC, I found that using lookup tables were a few clicks quicker than multiplication, for instance this:
http://pastebin.com/MtkfELP8
was quicker than this:
http://pastebin.com/mrsgUg8W
on average.
When I did some MIPs stuff, I passed it onto a games programmer that had worked with the PSP quite a bit, and he advised me to use bitwise operators rather than the multiplication operands.
Regards,
Shaun.
September 13th, 2012, 05:36 PM

Originally Posted by Shaun_B
Edit: You should be able to use  instead of +, like:
Code:
times[12]= (a<<3)(a<<2); /**Times 12*/
Uh, what? Bitwise OR won't carry digits that addition would. Consider 7+3=10 while 73=7.
September 13th, 2012, 05:45 PM

Originally Posted by requinix
Uh, what? Bitwise OR won't carry digits that addition would. Consider 7+3=10 while 73=7.
I've just worked it out, my logic was elsewhere.
But you could use it with something like:
Code:
a= 0x0f; // High nybbles
b= 0x0a; // Low nybbles
c= (a<<4)b;
Regards,
Shaun.
Last edited by Shaun_B; September 13th, 2012 at 06:33 PM.
Reason: I'm tired... typos ahoy!
September 13th, 2012, 10:16 PM

Originally Posted by Shaun_B
I was explaining to someone at work this:
#define TIMES16(a) a<<4
which used to be quite a quick way to multiplication (perhaps this is not so with modern CPUs?)
It depends somewhat on the processor, but multiplication instructions are still among the slowest integer arithmetic instructions on any machine I know. That said:
Originally Posted by bdb
The compiler should be able to optimize multiplications by a constant better than you :)
The compiler is certainly not required to generate 'mul' instructions just because your C expression uses multiplication, so this is probably true, especially since there are generally multiple ways to implement multiplication by a given constant in terms of shifts and additions, some of which are likely to be faster than others. For instance, on Intel processors, shift instructions are typically slower than addition, so e.g. to multiply by twelve, (a+a+a)<<2 may well be faster than (a<<3) + (a<<2). (For the record, the former is indeed how GCC compiles a*12.)
N. B.: things like this are highly machinedependent; for example, the story is completely different on ARM, where arithmetic instructions get a free left or right shift on one operand. This is yet another reason to trust the compiler unless you're definitely targeting a very specific architecture (and perhaps even then).
September 14th, 2012, 08:40 AM

Originally Posted by Shaun_B
If the principles are still broadly the same, this sort of thing might save you a few processor clicks, so if you're working on a big project which uses lots of multiplication (or on older hardware), it could be useful.
Still applicable in many embedded microcontroller applications  this need not even be "older hardware" there are many modern 8 bit microcontroller applications with realtime maths requirements and many of then do not have a hardware multiplier, and those that do are often slower than a shift or shift+add operation.
However most compilers will at least translate a multiplybypoweroftwo into a shift. YMMV with nonpoweroftwo multiplicands.
The technique is only useful of course when multiplying by a constant. A variable multiplied by a variable cannot be similarly optimised.
For more complex expressions and use in fixedpoint calculations involving both multiply and divide, see this article.
Comments on this post
September 15th, 2012, 11:06 AM

Originally Posted by Lux Perpetua
It depends somewhat on the processor, but multiplication instructions are still among the slowest integer arithmetic instructions on any machine I know. That said:The compiler is certainly not required to generate 'mul' instructions just because your C expression uses multiplication, so this is probably true, especially since there are generally multiple ways to implement multiplication by a given constant in terms of shifts and additions, some of which are likely to be faster than others. For instance, on Intel processors, shift instructions are typically slower than addition, so e.g. to multiply by twelve, (a+a+a)<<2 may well be faster than (a<<3) + (a<<2). (For the record, the former is indeed how GCC compiles a*12.)
N. B.: things like this are highly machinedependent; for example, the story is completely different on ARM, where arithmetic instructions get a free left or right shift on one operand. This is yet another reason to trust the compiler unless you're definitely targeting a very specific architecture (and perhaps even then).
I suppose if memory isn't the issue, and you have fast access to RAM, then precalculations are still useful? My interest in optimised code stems from the fact that I'm an 8bit guy and making things happen on very limited systems such as the Sinclair ZX81 is something of a fun challenge. However, I do of course program for more modern platforms and web stuff.
Many thanks,
Shaun.
September 15th, 2012, 01:10 PM

Originally Posted by Shaun_B
I suppose if memory isn't the issue, and you have fast access to RAM, then precalculations are still useful? My interest in optimised code stems from the fact that I'm an 8bit guy and making things happen on very limited systems such as the Sinclair ZX81 is something of a fun challenge. However, I do of course program for more modern platforms and web stuff.
Many thanks,
Shaun.
When I think of code optimization for a specific microprocessor, I think of two code attributesspeed and accuracy. For example, maybe I can recode a function to be an order of magnitude faster. Does it matter? That is a question you must answer in the context of your embedded application and its data structures. You must answer this question by making BOTH speed AND accuracy measurements before and after the code changes. It might not matter if the revised code is ten times faster if it contains an occult bug that only reveals itself once in every ten million iterations, for example.
After reading Kyle York's article, I also recall the code attribute of accuracy. When evaluating this attribute, you must ask yourself "is the code as accurate as it needs be ?" for this application, not "is the code as accurate as possible?"
Now suppose you do increase your function's speed by an order of magnitude Good work! Now ask yourself, what difference does this make?
Obviously it makes a great deal of difference timewise if your function is executed ten million times per second, and less difference if your function is executed ten times per second. This "hidden" scale factor is built into all embedded threads. It is repetition rate. You need to have at least a rudimentary quantitative idea of the function's repetition rate. If you don't have an idea, then you need to make high and low estimates by asking yourself "What is the fastest this function needs to be called ?" and "What is the number of times per second this function would need to be called if the user is operating this device as slowly as Grandma in midJanuary with her arthritic finger? " By asking yourself these questions, you can sometimes set upper and lower limits on the repetition rate of a piece of code. Then you can see, in a hierachical fashion, what sets of instructions can most benefit from optimizing code.
Sometimes (but not always) there are tradeoffs between memory usage and execution speed. That is to say, sometimes you can make a function do a math operation much faster but at the cost of using more memory. You must also ask yourself the questions (1) Can this memory afford to be used for this purpose and at what cost(s)? and (2) if I recode this function using so much more memory, what will be the overall speed increase (taking into consideration using more memory takes more time) for my application ? You can make a stab an answering this second question by making measurements. The first question is more subjective as it involves attributes your product might be given in the future if more memory is available to it.