#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    142
    Rep Power
    2

    Several questions regarding strings


    I had several things I didn't understand about some strings, so I did experiments.
    I was able to come up with my own answers to the questions I had, but just in case I got lucky with the experiments, I want to ask whether my answers are true.

    #1.
    what is inside an empty string?
    (i.e. a string that is only declared)
    -> garbage

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main()
    {
        char str[6];
        printf("%c", str[0]);
    }
    I got a yen mark for the output.
    At first I thought that the first character of an empty string would be the null character but that doesn't seem to be the case.

    #2
    then what happens if I concatenate a string to an empty string?
    -> I might get lucky and get the string appended, but in most cases I get some garbage in front of the appended string.
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main()
    {
        char str[18];
        strcat(str, "more");
        printf("%s", str);
    }
    #3
    what happens if a character in the delimiter is part of the string that is not a delimiter?
    -> strtok breaks up the string if it encounters any character that's part of the delimiter.
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main()
    {
        char str[10] = "I.am.happy";
        char chunk[6];
        strcpy(chunk, strtok(str, "m."));
        printf("%s\n", chunk);
    
    }
    I expected to get "I.a", but I got "I". I reasoned this happened because "I.a" includes a period, which is in the delimiter, so strntok returned "I".

    #4 (not an experiment quesion)
    strncpy doesn't add the null character for you.
    At the moment, I am manually adding the null character after each call to strncpy.
    Is there a quicker way to get around this problem?
    (It's a problem because if I want to print a string immediately after copying something onto that string, I get garbage after the copied message, which results from there being no null character to indicate the end of string to printf.)

    #5
    does the string returned by strtok have a null character appended to it?
    -> no.
    Code:
    int main()
    {
        char str[10] = "I.am.happy";
        printf("%s", strtok(str, ".,"));
    
    
        printf("%s", strtok(NULL, ".,"));
    
        printf("%s", strtok(NULL, ".,"));
    
    
    }
    I got garbage after the string "happy".

    #6
    if I declare a string of 20 characters, but store only 13 characters, I know that the 14 the character is the null character. what is in the 15th-20th character?
    -> all are null characters.
    Code:
    #include <stdio.h>
    #include <string.h>
    #define SIZE 20
    int main()
    {
        int i;
        char s1[SIZE] = "Jan. 30, 1996";
        for(i = 13; i < SIZE; ++i )
            printf("|%c|\n", s1[i]);
    }
    the result from this experiment was different from what the textbook said, which was that only the 14th character was the null character and all the character after that was garbage.
    (I suppose it doesn't make much of a difference as long as printf is concerned though, since printf stops at the first null character it encounters.)
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    Read the man pages. Your strtok conclusion is incorrect.
    Originally Posted by man strtok
    Each call to strtok() returns a pointer to a null-terminated string containing the next token. This string does not include the delimiting byte. If no more tokens are found, strtok() returns NULL.
    Your program of question 1 is safe to execute, but the result is not interesting unless you happen to think you know the state of the computer. And computers these days can take a stupendous number of states.

    Now, by question 2 you should realize this is a program you must not run! strcat knows nothing about the declaration of its argument. It looks for the end, an ASCII nul byte, for the location following str into which to copy data from the other argument. This could be anywhere in memory. What opcodes can be constructed from parts of "more"?
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. Contributing User
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jan 2003
    Location
    USA
    Posts
    7,145
    Rep Power
    2222
    Originally Posted by 046
    #1.
    what is inside an empty string?
    (i.e. a string that is only declared)
    -> garbage

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main()
    {
        char str[6];
        printf("%c", str[0]);
    }
    I got a yen mark for the output.
    At first I thought that the first character of an empty string would be the null character but that doesn't seem to be the case.
    What you wrote is not a empty string. Specifically, an empty string would be "". In an actual empty string, the first character would be the null-terminator and who cares what's in the rest because it doesn't matter.

    What you created would at best be called an "uninitialized string". In that case, it contains whatever garbage is there from before because it's a local array. You need to initialize the string before you try to print it; eg,
    char str[6] = "";

    BTW, if it were a global array (ie, outside a function) then it would be zero'd out automatically and would automatically be an empty string. But even then, I very much prefer to explicitly initialize all variables and arrays, if not in the declarations then at least in the code before I attempt to use them.

    Originally Posted by 046
    #2
    then what happens if I concatenate a string to an empty string?
    It would work it you truly have an empty string. But your example does not create an empty string, but rather an uninitialized string, in which case you're hosed and you end up with garbage as you observed.

    Initialize your string to be an empty string if your code will depend on that.

    Originally Posted by 046
    #3
    The delimiter string is a list of characters that serve as a delimiter. With "m." you are telling strtok to use either a period or a lower case m as a delimiter. It does not use the entire delimiter string as a single delimiter. If that is what you want to do, then you need to use a different technique.

    BTW, when you call strtok repeated to parse the same string then you can give it different delimiters each time, depending on the syntax of the string that you are parsing.

    Originally Posted by 046
    #4 (not an experiment quesion)
    strncpy doesn't add the null character for you.
    At the moment, I am manually adding the null character after each call to strncpy.
    Read the documentation on strncpy:
    Originally Posted by man page
    Description

    The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to by dest. The strings may not overlap, and the destination string dest must be large enough to receive the copy. Beware of buffer overruns! (See BUGS.)

    The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.

    If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.
    The main purpose of strncpy is to safe-guard against buffer overflow. You will only see the problem of no null-terminator when the source string is longer than the number of characters you want to copy.

    Originally Posted by 046
    #6
    if I declare a string of 20 characters, but store only 13 characters, I know that the 14 the character is the null character. what is in the 15th-20th character?
    -> all are null characters.
    No, the correct answer is "Who cares?". It doesn't matter what characters follow the null-terminator because they are not part of the string.

    We see from the man pages that strncpy will pad the rest of the array, up to the number to tell it, with null characters. But other functions, such as strcpy, do not (nor could strcpy, since it has no idea how big the destination array is). And while your compiler does not fill the rest of the array with null characters, there is no guarantee that another compiler also wouldn't.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    142
    Rep Power
    2
    @b49P23TIvg
    Thanks for the correction.

    What opcodes can be constructed from parts of "more"?
    I don't understand this question. I looked up opcode in wiki, which told me that opcodes tells what operations to do.

    @dwise_aol
    Specifically, an empty string would be "".
    So that's how you get an empty string. I had "uninitialized" and "empty" mixed up.

    BTW, if it were a global array (ie, outside a function) then it would be zero'd out automatically and would automatically be an empty string.
    I didn't know about "global" stuffs. Since you said "outside a function", I placed the declaration before the function main, then printed the characters in main, and sure enough they had null characters.


    Also I learned about the man page through the two replies I got today.
    Thanks for that, and I'll check them from now on.
  8. #5
  9. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    What opcodes can be constructed from parts of "more"?
    Your strcpy of more into an unknown location in memory could land in an executable region of memory. You'd have stuck the contiguous 4 bytes and a nul without particular alignment to be read as instructions, overwriting the instructions that were meant to be.
    [code]Code tags[/code] are essential for python code and Makefiles!
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    142
    Rep Power
    2
    Your strcpy of more into an unknown location in memory could land in an executable region of memory. You'd have stuck the contiguous 4 bytes and a nul without particular alignment to be read as instructions, overwriting the instructions that were meant to be.
    Thank you.
    So strcat needs to know where the null terminator is, but it doesn't, so it can store "more" into some random location. and this place could be a location in which memories are treated as instructions.

IMN logo majestic logo threadwatch logo seochat tools logo