#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    62
    Rep Power
    3

    Cool Using bitwise operators for multiplication.


    I was explaining to someone at work this:

    #define TIMES16(a) a<<4

    which used to be quite a quick way to multiplication (perhaps this is not so with modern CPUs?)

    Anyway, I wrote a quick-n-dirty demonstrator to show the principle up to X*12, so I thought that I'd share it. If the principles are still broadly the same, this sort of thing might save you a few processor clicks, so if you're working on a big project which uses lots of multiplication (or on older hardware), it could be useful. If anyone needs an explanation then I'll do my best.

    Code:
    #include <stdio.h>
    #include <conio.h>
    #include <stdlib.h>
    
    /**
     * A programme demonstrates using bitwise operators for multiplication
     * Tested using Code::Blocks and Windows 7 Professional SP1
     *
     * @Author:     Shaun B
     * @Version:	1.0.0.1 - 2012-09-13
     * @Todo:
     *
     **/
    
    static unsigned int a=0;
    static int times[12];
    int main();
    
    int main()
    {
        system("cls");
        printf("Demonstration of using bitwise operators to multiply whole numbers.\n\n");
        printf("Please enter a number\n");
        printf("C:\\>");
        scanf("%d",&a);
        times[ 0]=  0;
        times[ 1]=  a;                          /**Times  1*/
        times[ 2]=  a<<1;                       /**Times  2*/
        times[ 3]=  (a<<1)+a;                   /**Times  3*/
        times[ 4]=  a<<2;                       /**Times  4*/
        times[ 5]=  (a<<2)+a;                   /**Times  5*/
        times[ 6]=  (a<<2)+(a<<1);              /**Times  6*/
        times[ 7]=  (a<<2)+(a<<1)+(a);          /**Times  7*/
        times[ 8]=  a<<3;                       /**Times  8*/
        times[ 9]=  (a<<3)+(a);                 /**Times  9*/
        times[10]=  (a<<3)+(a<<1);              /**Times 10*/
        times[11]=  (a<<3)+(a<<1)+(a);          /**Times 11*/
        times[12]=  (a<<3)+(a<<2);              /**Times 12*/
        for(int i=0; i<=12; i++)
        {
            printf("%4d x %4d = %10d\n", a, i, times[i]);
        }
        printf("\nPress the any key to end demonstration...");
        _getch();
        return 0;
    }
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    156
    Rep Power
    33
    The compiler should be able to optimize multiplications by a constant better than you :)

    Anyway, your macro (all function-like macros) should use parenthesis around arguments and around the whole expression

    Code:
    #define TIMES16(a) a<<4
    #define TIMES16PAREN(a) ((a) << 4)
    
    /* ... */
    
    ... 5 + TIMES16(j - 3) * 2 ...      // 5 + j - 3 << 4 * 2
    ... 5 + TIMES16PAREN(j - 3) * 2 ... // 5 + ((j - 3) << 4) * 2
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    62
    Rep Power
    3
    Thanks for the advice regarding Macros. I suppose if you were to do assembly (which I love doing), this would be more useful.

    When I did some testing on my PC, I found that using look-up tables were a few clicks quicker than multiplication, for instance this:

    http://pastebin.com/MtkfELP8

    was quicker than this:

    http://pastebin.com/mrsgUg8W

    on average.

    When I did some MIPs stuff, I passed it onto a games programmer that had worked with the PSP quite a bit, and he advised me to use bitwise operators rather than the multiplication operands.

    Regards,

    Shaun.
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Originally Posted by Shaun_B
    Edit: You should be able to use | instead of +, like:
    Code:
    times[12]=  (a<<3)|(a<<2);              /**Times 12*/
    Uh, what? Bitwise OR won't carry digits that addition would. Consider 7+3=10 while 7|3=7.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    62
    Rep Power
    3
    Originally Posted by requinix
    Uh, what? Bitwise OR won't carry digits that addition would. Consider 7+3=10 while 7|3=7.
    I've just worked it out, my logic was elsewhere.

    But you could use it with something like:

    Code:
    a= 0x0f; // High nybbles
    b= 0x0a; // Low nybbles
    c= (a<<4)|b;
    Regards,

    Shaun.
    Last edited by Shaun_B; September 13th, 2012 at 05:33 PM. Reason: I'm tired... typos ahoy!
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    San Francisco Bay
    Posts
    1,939
    Rep Power
    1313
    Originally Posted by Shaun_B
    I was explaining to someone at work this:

    #define TIMES16(a) a<<4

    which used to be quite a quick way to multiplication (perhaps this is not so with modern CPUs?)
    It depends somewhat on the processor, but multiplication instructions are still among the slowest integer arithmetic instructions on any machine I know. That said:
    Originally Posted by bdb
    The compiler should be able to optimize multiplications by a constant better than you :)
    The compiler is certainly not required to generate 'mul' instructions just because your C expression uses multiplication, so this is probably true, especially since there are generally multiple ways to implement multiplication by a given constant in terms of shifts and additions, some of which are likely to be faster than others. For instance, on Intel processors, shift instructions are typically slower than addition, so e.g. to multiply by twelve, (a+a+a)<<2 may well be faster than (a<<3) + (a<<2). (For the record, the former is indeed how GCC compiles a*12.)

    N. B.: things like this are highly machine-dependent; for example, the story is completely different on ARM, where arithmetic instructions get a free left or right shift on one operand. This is yet another reason to trust the compiler unless you're definitely targeting a very specific architecture (and perhaps even then).
  12. #7
  13. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,075
    Rep Power
    1802
    Originally Posted by Shaun_B
    If the principles are still broadly the same, this sort of thing might save you a few processor clicks, so if you're working on a big project which uses lots of multiplication (or on older hardware), it could be useful.
    Still applicable in many embedded microcontroller applications - this need not even be "older hardware" there are many modern 8 bit microcontroller applications with real-time maths requirements and many of then do not have a hardware multiplier, and those that do are often slower than a shift or shift+add operation.

    However most compilers will at least translate a multiply-by-power-of-two into a shift. YMMV with non-power-of-two multiplicands.

    The technique is only useful of course when multiplying by a constant. A variable multiplied by a variable cannot be similarly optimised.

    For more complex expressions and use in fixed-point calculations involving both multiply and divide, see this article.

    Comments on this post

    • Shaun_B agrees : Thanks for the link!
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    62
    Rep Power
    3
    Originally Posted by Lux Perpetua
    It depends somewhat on the processor, but multiplication instructions are still among the slowest integer arithmetic instructions on any machine I know. That said:The compiler is certainly not required to generate 'mul' instructions just because your C expression uses multiplication, so this is probably true, especially since there are generally multiple ways to implement multiplication by a given constant in terms of shifts and additions, some of which are likely to be faster than others. For instance, on Intel processors, shift instructions are typically slower than addition, so e.g. to multiply by twelve, (a+a+a)<<2 may well be faster than (a<<3) + (a<<2). (For the record, the former is indeed how GCC compiles a*12.)

    N. B.: things like this are highly machine-dependent; for example, the story is completely different on ARM, where arithmetic instructions get a free left or right shift on one operand. This is yet another reason to trust the compiler unless you're definitely targeting a very specific architecture (and perhaps even then).
    I suppose if memory isn't the issue, and you have fast access to RAM, then pre-calculations are still useful? My interest in optimised code stems from the fact that I'm an 8-bit guy and making things happen on very limited systems such as the Sinclair ZX81 is something of a fun challenge. However, I do of course program for more modern platforms and web stuff.

    Many thanks,

    Shaun.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    59
    Rep Power
    3
    Originally Posted by Shaun_B
    I suppose if memory isn't the issue, and you have fast access to RAM, then pre-calculations are still useful? My interest in optimised code stems from the fact that I'm an 8-bit guy and making things happen on very limited systems such as the Sinclair ZX81 is something of a fun challenge. However, I do of course program for more modern platforms and web stuff.

    Many thanks,

    Shaun.
    When I think of code optimization for a specific microprocessor, I think of two code attributes--speed and accuracy. For example, maybe I can recode a function to be an order of magnitude faster. Does it matter? That is a question you must answer in the context of your embedded application and its data structures. You must answer this question by making BOTH speed AND accuracy measurements before and after the code changes. It might not matter if the revised code is ten times faster if it contains an occult bug that only reveals itself once in every ten million iterations, for example.

    After reading Kyle York's article, I also recall the code attribute of accuracy. When evaluating this attribute, you must ask yourself "is the code as accurate as it needs be ?" for this application, not "is the code as accurate as possible?"

    Now suppose you do increase your function's speed by an order of magnitude Good work! Now ask yourself, what difference does this make?

    Obviously it makes a great deal of difference time-wise if your function is executed ten million times per second, and less difference if your function is executed ten times per second. This "hidden" scale factor is built into all embedded threads. It is repetition rate. You need to have at least a rudimentary quantitative idea of the function's repetition rate. If you don't have an idea, then you need to make high and low estimates by asking yourself "What is the fastest this function needs to be called ?" and "What is the number of times per second this function would need to be called if the user is operating this device as slowly as Grandma in mid-January with her arthritic finger? " By asking yourself these questions, you can sometimes set upper and lower limits on the repetition rate of a piece of code. Then you can see, in a hierachical fashion, what sets of instructions can most benefit from optimizing code.

    Sometimes (but not always) there are tradeoffs between memory usage and execution speed. That is to say, sometimes you can make a function do a math operation much faster but at the cost of using more memory. You must also ask yourself the questions (1) Can this memory afford to be used for this purpose and at what cost(s)? and (2) if I recode this function using so much more memory, what will be the overall speed increase (taking into consideration using more memory takes more time) for my application ? You can make a stab an answering this second question by making measurements. The first question is more subjective as it involves attributes your product might be given in the future if more memory is available to it.

IMN logo majestic logo threadwatch logo seochat tools logo