#1
  1. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,640
    Rep Power
    4247

    Puzzle of the Month 05-05-2008


    First, here's wishing you all a Happy Sinco De Mayo. With that said, today's puzzle has an especially strong reason to be here -- it happened to keep me up all weekend. This bug existed in our code for a few weeks, but was only triggered by some other recent changes to our codebase. I discovered the cause of the bug on Friday, but due to our policy of not releasing patches on Friday, I couldn't fix it. Instead I kept getting called by our 3rd party monitoring company every time our app crashed, which really sucks, because I didn't write the bug in the first place -- I should really have given those guys the phone # of the guy that wrote it :D

    With all that ranting done with, here's the code in question (slightly modified to remove a few things irrelevant to this puzzle). The code attempts to concatenate as many attributes as possible into a buffer, so that it can log the line to a file. It is perfectly all right to discard any data that cannot fit into the buffer, so if you have more data than the size of the buffer, you can simply discard the rest of it.
    Code:
    static void create_attr_line(attribute_t *attributes, int num_attribs, char *buf_str, int max_len)
    {
            int         i;
            int         offset = 0;
            attribute_t *p_attr;
    
            for (i = 0; i < num_attribs; i++) {
                p_attr = attributes[i];
                offset += snprintf(buf_str + offset, max_len - offset, "%d:%s:%d:%d:%d:%s:%d:%d:%s;", 
                         p_attr->attr.business_id,
                         p_attr->attr.update_date,
                         p_attr->attr.event_type_id,
                         p_attr->attr.group_id,
                         p_attr->attr.subgroup_id,
                         p_attr->attr.attribute_name,
                         p_attr->score,
                         p_attr->count,
                         p_attr->attr.attribute_value
                );
            }
    }
    
    // called like this:
    char attr_buf[4096];
    create_ups_string(ctx->attributes, ctx->attribute_count, attr_buf, sizeof(attr_buf));
    At first glance, the code appears to be doing some proper size checking to make sure that the size of the buffer cannot be exceeded. However there is a serious problem with it. For the purposes of this discussion, the compiler is assumed to be ANSI compliant (in my case, it is gcc running on Linux).
    The puzzle is to point out where the buffer overflow is. Next question is to point out how to fix it.
    BONUS QUESTION: Why will Microsoft Visual C++ produce different results with the same code?
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  2. #2
  3. Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jun 2005
    Posts
    5,964
    Rep Power
    4852
    I'm going to shoot from the hip, here. Some versions of the function returned -1 when the data was truncated, instead of the number of characters that would have been written if all had gone well. This could garfle up maxlen-offset.

    Incidentally, what are you doing if a format error causes -1 to be returned?
    Write no code whose complexity leaves you wondering what the hell you did.
    Politically Incorrect DaWei on Pointers Grumpy on Exceptions
  4. #3
  5. Paris est magique!
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2004
    Location
    France!
    Posts
    370
    Rep Power
    96
    Originally Posted by sizablegrin
    I'm going to shoot from the hip, here. Some versions of the function returned -1 when the data was truncated, instead of the number of characters that would have been written if all had gone well. This could garfle up maxlen-offset.
    This would cause a buffer overflow only if offset goes lower than 0 (due to adding -1 one or several times) - well a buffer underflow :p
    It would generally overwrite previous data (the last character).


    As of glibc 2.1, snprintf returns the number of char that would have benn written if the buffer was large enough:

    Originally Posted by man snprintf
    ... if the output was truncated due to this limit then the return value is the number of characters (not including the trailing '\0') which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. (See also below under NOTES.) If an output error is encountered, a negative value is returned.
    This way, the offset continue to grow, even if the limit of the buffer is reached.

    Oh, oh, wait! what happens if offset grows above max_len? (maxlen - offset) would be negative, no data will be printed in the buf... Nope, since size_t is generally typedef-ed as an unsigned int, it is cast as unsigned (a very high value!). All characters are likely to be copied. Bang.


    The solution would be to test after snprintf() if the result is 0 <= res < max_len - offset.
    If not, return.

    Or, replace (max_len - offset) with (max_len - offset > 0 ? (max_len - offset) : 0)

    With MSVC, snprintf() returns -1 if all characters could not be copied into the buffer. So, generally, it would work (see above).
    etienne:~ > %blow
    fg: %blow: no such job


    There are 10 kind of people:
    - those who know binary
    - those who don't.
  6. #4
  7. Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jun 2005
    Posts
    5,964
    Rep Power
    4852
    Okay, I'll shoot from the other hip. If the function is ANSI compiliant and the data is truncated, the return will be the number of characters that would have been written had size been sufficiently large. Suppose this is 4100. Offset goes beyond the 4096 length. Maxlen - offset is negative. However, that's an unsigned value, no negative values. The next operation will write beyond the buffer with an invalid length value.

    Comments on this post

    • etienne141 agrees : you got it, cow-boy
    Write no code whose complexity leaves you wondering what the hell you did.
    Politically Incorrect DaWei on Pointers Grumpy on Exceptions
  8. #5
  9. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,640
    Rep Power
    4247
    Boy you guys are good. I also got a PM from L7Sqr along the same lines.
    1. As you guys pointed out, once it goes -ve and gets converted to unsigned, all hell breaks loose.
    2. For the record, the patch we did was to break out of the loop by detecting when offset > max_len. Like I mentioned above, it is ok for the last bit of data to be truncated for our purposes.
    3. As you guys pointed out, _snprintf() for VC++ returns a negative value if the buffer is not big enough, whereas ANSI says that if the output was truncated then the return value is the number of characters (not including the trailing '\0') which would have been written to the final string if enough space had been available. This is why some people implement their own version of snprintf for portability (e.g. Apache Portable Library with apr_snprintf).

    BTW a small issue that no one has pointed out yet -- some compilers/libraries (e.g. some old versions of glibc, libdb (from Sleepycat) and Solaris libc) implement snprintf() very badly. They would disregard the size parameter and simply call sprintf() :eek: !!! It didn't apply to our particular situation, but I thought I should throw out a warning to anyone reading this to watch out for those ancient libcs floating around.

    Anyway, I hope you guys had fun solving this puzzle! I can tell you my weekend wasn't very pleasant though, what with answering all those calls due to this darn bug.
    Last edited by Scorpions4ever; May 6th, 2008 at 12:32 AM.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    8
    Rep Power
    0

    Thumbs up replace max_len-offset by sizeof(p_attr)


    Hi

    The problem is with snprintf() function, for the first time
    offset =0 and max_len-offset(=max_len=4096) no.of characters are printed in the buffer, which is size of the buffer. As the set of attributes < max_len, the rest of the data printed in buf_str will be some garbage. which will lead to mislead the procedure.

    And at second time offset = 4096 , hence the buf_str is untouched. No data is formatted from the second set of attributes.

    To avoid this use sizeof(p_attr) in the place of max_len-offset.

    Regards,
    sailaja

    Originally Posted by Scorpions4ever
    First, here's wishing you all a Happy Sinco De Mayo. With that said, today's puzzle has an especially strong reason to be here -- it happened to keep me up all weekend. This bug existed in our code for a few weeks, but was only triggered by some other recent changes to our codebase. I discovered the cause of the bug on Friday, but due to our policy of not releasing patches on Friday, I couldn't fix it. Instead I kept getting called by our 3rd party monitoring company every time our app crashed, which really sucks, because I didn't write the bug in the first place -- I should really have given those guys the phone # of the guy that wrote it :D

    With all that ranting done with, here's the code in question (slightly modified to remove a few things irrelevant to this puzzle). The code attempts to concatenate as many attributes as possible into a buffer, so that it can log the line to a file. It is perfectly all right to discard any data that cannot fit into the buffer, so if you have more data than the size of the buffer, you can simply discard the rest of it.
    Code:
    static void create_attr_line(attribute_t *attributes, int num_attribs, char *buf_str, int max_len)
    {
            int         i;
            int         offset = 0;
            attribute_t *p_attr;
    
            for (i = 0; i < num_attribs; i++) {
                p_attr = attributes[i];
                offset += snprintf(buf_str + offset, max_len - offset, "%d:%s:%d:%d:%d:%s:%d:%d:%s;", 
                         p_attr->attr.business_id,
                         p_attr->attr.update_date,
                         p_attr->attr.event_type_id,
                         p_attr->attr.group_id,
                         p_attr->attr.subgroup_id,
                         p_attr->attr.attribute_name,
                         p_attr->score,
                         p_attr->count,
                         p_attr->attr.attribute_value
                );
            }
    }
    
    // called like this:
    char attr_buf[4096];
    create_ups_string(ctx->attributes, ctx->attribute_count, attr_buf, sizeof(attr_buf));
    At first glance, the code appears to be doing some proper size checking to make sure that the size of the buffer cannot be exceeded. However there is a serious problem with it. For the purposes of this discussion, the compiler is assumed to be ANSI compliant (in my case, it is gcc running on Linux).
    The puzzle is to point out where the buffer overflow is. Next question is to point out how to fix it.
    BONUS QUESTION: Why will Microsoft Visual C++ produce different results with the same code?
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2009
    Posts
    9
    Rep Power
    0
    Originally Posted by Scorpions4ever
    Code:
            for (i = 0; i < num_attribs; i++) {
                p_attr = attributes[i];
                offset += snprintf(buf_str + offset, max_len - offset, "%d:%s:%d:%d:%d:%s:%d:%d:%s;", 
                         p_attr->attr.business_id,
                         p_attr->attr.update_date,
                         p_attr->attr.event_type_id,
                         p_attr->attr.group_id,
                         p_attr->attr.subgroup_id,
                         p_attr->attr.attribute_name,
                         p_attr->score,
                         p_attr->count,
                         p_attr->attr.attribute_value
                );
            }

    the offset variable value can exceed the size of the buf_str as the loop proceeds

    Comments on this post

    • nattylife disagrees : did you see the date of the original post?

IMN logo majestic logo threadwatch logo seochat tools logo