#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    9
    Rep Power
    0

    Can't understand C variable scope, functions and arrays


    I've been trying for about a week but I just can't understand it. I'm reading K&R and I've also looked at videos and other guides on arrays, pointers and functions.

    I want to create a function that asks for the user's name and sets a variable's value that I can use in main(). I can't figure out how to do it.

    Here's an example of what I don't understand.

    Why does this work:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    void question(void);
    char *answer;
    
    int main(void){    
    	printf("What is your name?");
    	question();
    	printf("Your name is %s", answer);
    	return 0;
    }
    
    
    void question() {
        answer = "john";
    }

    but not this:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    void question(void);
    char *answer;
    
    int main(void){    
    	printf("What is your name?");
    	question();
    	printf("Your name is %s", answer);
    	return 0;
    }
    
    
    void question(void){
    	fgets(answer, 100, stdin);
    }
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,891
    Rep Power
    481
    Code:
    char *answer;
    
    void question(void){
      answer = "some characters";	/* A */
    
      fgets(answer, 100, stdin);	/* B */
    }
    char*answer;

    The compiler reserves space for an address,
    and treats the memory as a pointer to character.
    Since you defined the variable outside a function definition
    the memory is not on the stack.
    And it's initialized to 0 before your program runs.

    In case A "some characters" is an array of characters (including a trailing NUL character), and assigns the address of the first character to answer.

    Doubly incorrect case B tries to store up to 100 characters into the address starting with that given by answer.
    We already know that answer is 0, and you just can't store data into memory location 0. (Someone can, of course, but it's not normally you.) But suppose you could store data there, you didn't allocate space for 100 characters, or for any characters at all. If it were allowed you'd overwrite memory with the horrific consequence of WWIII as the program continued to process the unknown.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    9
    Rep Power
    0
    Thanks for your reply. Since you linked it, I'm trying to understand the "Memory Layout of C Programs" link as well as your post. I don't get it and it's annoying me. I tried reading it from the start but text segment refers to other areas. So I tried reading heap and stack and have been doing so for a few hours now along with other stuff on the web.

    Originally Posted by b49P23TIvg
    The compiler reserves space for an address,
    and treats the memory as a pointer to character.
    What do you mean by treats the memory as a pointer to character?

    Since you defined the variable outside a function definition
    the memory is not on the stack.
    And it's initialized to 0 before your program runs.

    In case A "some characters" is an array of characters (including a trailing NUL character), and assigns the address of the first character to answer.

    Doubly incorrect case B tries to store up to 100 characters into the address starting with that given by answer.
    We already know that answer is 0, and you just can't store data into memory location 0. (Someone can, of course, but it's not normally you.) But suppose you could store data there, you didn't allocate space for 100 characters, or for any characters at all. If it were allowed you'd overwrite memory with the horrific consequence of WWIII as the program continued to process the unknown.
    I don't get it. Why does B try to store memory at memory location 0 and not A?
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,891
    Rep Power
    481
    I don't get it. Why does B try to store memory at memory location 0 and not A?
    Where I wrote
    Code:
    void question(void){
      answer = "some characters";	/* A */
    
      fgets(answer, 100, stdin);	/* B */
    }
    I rudely condensed your two situations into one function that "looks like working code" but was explained as "two completely separate cases". Sorry. Yes, if the function were used as I wrote it fgets would start storing characters at the 's' of "some" in that string. Without testing I don't know what would happen.

    Now let's suppose characters are 1 byte long and integers occupy 4 bytes.

    char*pc = (char*)8;

    int*pi = (int*)8;

    pc is treated like a pointer to character.
    pc+1 is the address 9.
    *(pc+1) fetches 1 byte from memory.


    pi is treated like a pointer to integer.
    pi+1 is the address 12.
    *(pi+1) fetches 4 bytes from memory starting at address 12.

    Comments on this post

    • Jimmy #1 agrees
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. Contributing User
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jan 2003
    Location
    USA
    Posts
    7,172
    Rep Power
    2222
    Originally Posted by Jimmy #1
    Why does this work:

    . . .

    but not this:
    First I'll answer your immediate question. Then later I'll try to address your more general questions.

    Your first program is this:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    void question(void);
    char *answer;
    
    int main(void){    
    	printf("What is your name?");
    	question();
    	printf("Your name is %s", answer);
    	return 0;
    }
    
    
    void question() {
        answer = "john";
    }
    answer is a character pointer. Because it's a global, it's been initialized to zero, which in many systems is interpreted as the NULL pointer. The memory location it's pointing to is location zero, which in Intel boxes points to a special and vitally important data structure, the Interrupt Vector Table (IVT), that the operating system uses. If any unauthorized changes are made to the IVT, then the entire system will crash, so current operating systems protect themselves from their users by only allowing each user to access his own range of memory addresses (his "memory space") that the operating system has given him -- older microcomputer systems, such as MS-DOS, did not protect themselves, which is why MS-DOS was so prone to crashing. Any process (what your program is called when it's running) that tries to access a memory location outside its own memory space gets terminated immediately, usually with an "access" or "segmentation fault" error.

    While that's a problem for your second program, it's not for the first. In your first program's question() function you have this assignment statement:
    answer = "john";
    Here's what it does. "john" is a literal string, so what the compiler did was to create that string in a read-only location of the process' memory space and it remembered the memory address of that string. That memory address is a char pointer -- a pointer type is a variable that contains the memory address of other data, such that it "points to" that data. answer is also a char pointer. What that assignment statement does is that it assigns the pointer to the string to answer.

    So now answer points to that literal string in read-only memory. answer no longer points to the IVT, but rather the literal string; answer has been initialized. And the only use that you put answer to is to read what it points to. But since it points to a read-only location, if you were to try to write to it, then your program would be terminated with a memory access violation; you can look at a read-only location, but you cannot change its contents.

    Now, if you wanted to copy one string to another, then that is not the way to do it. C has no built-in string type. C-style strings are merely a programming convention wherein char arrays are used as strings which are null-terminated (meaning that a zero character, '\0', marks the end of the string). The standard library provides a set of string manipulation functions via the string.h header file. If you want to compare two strings, you need to use a form of strcmp(); comparing two char pointers (which is effectively what a char array name is) will only compare the actual memory addresses and not their contents. If you want to copy a string, then you need to use a form of strcpy(); using an assignment statement will only assign the address itself to the pointer on the left, such that now both will be pointing to the exact same string and the only copy of that string in existence. And if that "pointer" on the left is an array name, then your program will not even compile since you cannot change the location of an array (one of the more important ways in which an array name is not the same as a pointer). But of course, for strcpy to work, the destination pointer must point to memory that has been allocated and that is accessible; I will discuss that at the end of this message.

    Now for your second program:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    void question(void);
    char *answer;
    
    int main(void){    
    	printf("What is your name?");
    	question();
    	printf("Your name is %s", answer);
    	return 0;
    }
    
    
    void question(void){
    	fgets(answer, 100, stdin);
    }
    Again, answer is a global, so it initializes to zero, pointing it at the IVT. The fgets() call in question() attempts to overwrite (and hence corrupt by clobbering it) the IVT. In self-defense, the operating system terminates your program with extreme prejudice. Having read what I described above, you know why that happened.

    In order for that fgets call to work, answer must point to memory that fgets can safely write to. The simple solution would be to declare answer as a char array of size 101 (you always must add one more character for the null-terminator). Or you could use malloc() or calloc() to allocate a block of memory large enough (101 bytes) from the heap. But however you do it, you must somehow allocate memory to answer so that fgets() can use it.

    Comments on this post

    • ptr2void agrees : Such an excellent reply.
    • Jimmy #1 agrees : very helpful!
  10. #6
  11. Contributing User
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jan 2003
    Location
    USA
    Posts
    7,172
    Rep Power
    2222
    Memory is a physical component in your computer's hardware. It is organized as a consecutive sequence of locations, each containing a value; in personal computers, each memory location contains on byte, 8 bits, of data. Each and every memory location has a unique memory address. In C, a pointer is a variable that contains a memory address. The concept of pointers is really that simple.


    Concentrate on C's concepts of storage classes and in particular on the concepts of static storage vs auto. What that basically boils down to is the question of where static and auto variables are stored and hence when they exist during the execution of the program. Variables that are declared within functions are local variables and are stored in temporary memory, in auto storage, that only exists within the function and hence will cease to exist when you return from that function. In Intel machines (eg, Windows, Linux), that temporary memory is the stack. Variables that are declared outside of functions are global variables and are stored in static storage which exists throughout the execution of the program. Those are stored in the TEXT and DATA/BSS segments, with TEXT being read-only and DATA and BSS being read/write.

    Consider what happens when you declare a local variable as static. It is still only visible and accessible within the function, but instead of being stored on the stack it's stored in the DATA segment. That means that a static local variable does not cease to exist when you return from the function, but rather it persists throughout the running of the program. The practical use for this is a variable that remembers what it was set to the last time you called that function.

    When you run your program, the operating system reads information in the executable file to determine how big a memory space to allocate for it. Then the operating system copies the program's executable code into memory, since that's the only place that the program can be executed from (ie, you cannot run a program from the disk; it has to be from memory), and it also sets up the different memory segments for the program:

    1. TEXT -- read-only segment that contains the executable code and constant data that cannot change; eg, literal strings. This segment is protected from being changed.

    2. DATA and BSS -- read/write segments for global variables, differing only in how they're initialized during start-up.
    This is the static storage area.

    3. STACK -- read/write segment for the stack, which is what the entire mechanism of function calls is based on. When you call a function, memory is allocated on the top of the stack ("pushed onto the stack") for that function and its local variables plus other data needed for the function call (including the return address). When you return from a function, then that memory is deallocated ("popped off the stack") and is ready to be reused by the next function call. As you make function calls within function calls, you push more and more onto the stack, and as you return from those calls you pop each successive layer off the stack, such that the top of the stack is always the current function's memory block.
    This is the auto storage area.

    4. HEAP -- read/write segment of memory used for dynamic memory allocation via the malloc() family of functions.


    Now, that appears to me to explain it all. If there are still parts you don't understand, please try to ask specific questions.

    Comments on this post

    • Jimmy #1 agrees
    Last edited by dwise1_aol; November 21st, 2012 at 01:35 PM.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    9
    Rep Power
    0
    Thanks for the help. I don't fully understand it and still have to re-read it a few more times tomorrow, however thanks to the replies I can at least do these things practically, although I still don't see the use in pointers.
  14. #8
  15. Contributing User
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jan 2003
    Location
    USA
    Posts
    7,172
    Rep Power
    2222
    Originally Posted by Jimmy #1
    ... , although I still don't see the use in pointers.
    I understand. I thought the exact same thing the very first time that I was taught about pointers. I immediately understood the concept of pointers, since it is identical to indirect addressing in assembly, yet interesting though the idea may be I could see no practical use. That lecture was a special presentation given us towards the end of the semester. Starting with the next semester I was using pointers most of the time, not because it was required but rather because they were so useful for the programs I was writing. And that was for other languages (PL/I, Pascal) which lack C's very vital need for pointers.

    Just about every programming language has functions and procedures (in C, procedures are functions that return void) and a mechanism for passing arguments to those functions and procedures. Those mechanisms involve two basic concepts (links included to Wikipedia):

    1. call-by-value. The argument is evaluated and a copy of that value is passed to the function. All operations performed by the function on that value, including changing that value, is local to the function and does not affect in any way the original argument. Thus, a call-by-value argument can by any expression, including another function call.

    2. call-by-name (AKA call-by-reference). In this scheme, the address of the argument is passed to the function, which uses that address to access the original argument. Thus, any changes that the function makes to the argument will change the original argument. A call-by-name argument must be something with an address, such as a variable (in other languages) or an actual address (AKA pointer).

    In C, you need to be able to do call-by-name, but, as with strings, there is no built-in mechanism for call-by-name. Thus, just as with strings, in C you need to fake it, you need to implement it yourself. In order to fake call-by-name, you need pointers.


    Here's a simple example: a swap function. You have two integers, a and b, that you must swap such that you end up with a holding b's old value and vice-versa. Here's the program without pointers:
    Code:
    #include <stdio.h>
    
    void swap(int a, int b)
    {
        int temp;
    
        temp = a;
        a = b;
        b = temp;
    }
    
    int main()
    {
        int x = 10;
        int y = 20;
    
        printf("Before swap:  x = %d, y = %d\n", x, y);
        swap(x,y);
        printf("After swap:  x = %d, y = %d\n", x, y);
        
        return 0;
    }
    Here's what we get when we run it:
    Before swap: x = 10, y = 20
    After swap: x = 10, y = 20
    That is not what we want. Even though swap() did swap the values, that had no effect on the original arguments, rendering swap() useless to us.

    How I'll implement call-by-reference using pointers:
    Code:
    #include <stdio.h>
    
    void swap(int *a, int *b)
    {
        int temp;
    
        temp = *a;
        *a = *b;
        *b = temp;
    }
    
    int main()
    {
        int x = 10;
        int y = 20;
    
        printf("Before swap:  x = %d, y = %d\n", x, y);
        swap(&x,&y);
        printf("After swap:  x = %d, y = %d\n", x, y);
        
        return 0;
    }
    Note that I use the address operator (&) in the function call and that the function now expects pointer arguments and within the function I dereference those pointers to get to their values and to store new values at those addresses. Here's the output:
    Before swap: x = 10, y = 20
    After swap: x = 20, y = 10
    Which is exactly what we needed swap() to do for us. Now it is useful.

    That is only one use for pointers. Another common one is that you can create arrays during run-time by calloc'ing from the heap, which requires that you use a pointer.

    You will learn how vital pointers are.

    Comments on this post

    • Jimmy #1 agrees

IMN logo majestic logo threadwatch logo seochat tools logo