#1
  1. Commie Mutant Traitor
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2004
    Location
    Alpharetta, GA
    Posts
    1,806
    Rep Power
    1570

    HOWTO: Structures in assembly language


    I've been in a class on assembly programming recently, and I wrote the following piece up in response to a question from another student. I thought it might be worth posting here, as I know that we get questions along these lines from time to time, and it might be useful even for those who aren't doing any assembly work. The code examples are for the MIPS processor, and specifically the SPIM simulator, but the ideas mostly apply to pretty much any more or less conventional architecture.

    Conceptually at least, a structure in assembly language is like a structure in any other language: a fixed group of heterogeneous variables which have their own names but which are part of a larger composite variable. For example, in C++, a struct might look like
    Code:
    struct {
       int foo, bar;
       char baz;
    } quux;
    The vars inside the braces would all be part of the composite variable quux; to access one part of quux, say foo, you would use quux.foo or something similar.

    It's much the same in assembly language, except of course that the types of the variables are implicit rather than explicit. In the SPIM assembly language, the definition of structures has to be a set of programming conventions rather than an actual program construct; some more powerful assemblers do have ways of defining structures, but that is nothing more than a syntax support for such a convention, more or less. So, if we agree to use labels which start with the name of the structure as a whole, followed by a period, and then the name of the element in question, you would end up with:
    Code:
    quux:      		 # blind label marking the start of the 
               		 # structure, the same actual address as 
               	 	 # quux.foo in this case
    quux.foo:	.space 4 # a four-byte integer variable
    quux.bar:    	.space 4 # another int variable
    quux.baz:    	.space 1 # a one-byte char variable 
    		.align 4 # fix the word alignment after baz
    Note that because baz was a one byte value, we need to pad three bytes after it to ensure word alignment. The whole structure is 9 bytes long (12 if you count the padding, which can be important later).

    Now, you might say that this is a lot of pointless work for a single structure; while you can treat the composite variable as a single structure, you really have three separate values, with separate labels. In this case, it is true that they could have been just as easily treated as separate data values; however, this is equally true in the C example above. However, declaring a single struct in this manner is useful for showing the logical and conceptual connections between the elements; it shows that they are represent something as a whole.

    That having been said, the real power in the idea of data structures is not from declaring a single structure in this manner, but from declaring types of structures. Again, in C++, you could define a structure type:
    Code:
    struct quux {
       int foo, bar;
       char baz;
    };
    
    quux flarp, *frink;
    Now, instead of having a single struct named quux, we define a sort of pattern or template (not a template in the C++ sense, but bear with me) for defining structures of the same type. In assembly language, this sort of structure type would be defined not as a series of labels, but as a series of offsets, like so:
    Code:
    quux.sizeof = 12  # size of quux structs, incl. padding
    quux.foo    = 0   # first element of the quux structure
    quux.bar    = 4   # 2nd element of the quux structure
    quux.baz    = 8   # third element of the quux structure
    You would then define a variable to hold it that is the size of the whole structure:
    Code:
    flarp    .space quux.sizeof
    Note defining a pointer to a struct would still be the regular size of a pointer, one full word in the case of MIPS:
    Code:
    frink    .space 4   # size of the ptr, not quux
    
    # ...  do things ...
    	la	$t0, flarp
            la	$t1, frink
    	sw	$t0  ($t1)   # frink now points to flarp
    At this point, the structure variables that make up flarp are implicitly defined by the offsets rather than defined by explicit labels. To use the struct you would get a pointer to the start of the struct variable, then use that pointer plus the defined offsets to access the structure elements:
    Code:
    	lw	$t0, flarp
    	lb	$t1, quux.baz($to)   # get the value of flarp.baz
    Now we're ready to see the real advantage of structures: defining values at runtime, either on the stack or in the heap. To dynamically allocate a struct of a given type, you just allocate the space need for it:
    Code:
    	addi	$sp, $sp, -20
    	sw	$fp, $fp.fp($sp)
    	move	$fp, $sp
    	sw	$ra, fp.ra($fp)
    	addi	$t0, $fp, -20	# location of the struct on the stack
    	la	$t1, frink
    	sw	$t0, ($t1)	# frink now points to the structure
    Note that the 12 bytes has to be in addition to anything else you are saving on the stack, such as the frame pointer, the return address, the s-registers, and so on; the struct (and any other local variables) would come after everything else being saved. This is also why we padded the struct to word alignment rather than halfword alignment - in the MIPS processor, values on the stack are always assigned in word increments (the same applies to heap structures).

    Just as in C, you have to be careful with pointers to values on the stack; once the stack is cleared, the variable is gone, and trying to use a pointer to it after that may severely corrupt your stack. This can be a tricky problem, as it will only become apparent if the stack grows up to that point in memory again; it may not show up for several function calls later.

    To get a structure on the heap, you would do much the same:
    Code:
    	li	$a0, quux.sizeof
    	li	$v0, malloc	# malloc == syscall 9
    	syscall
    	sw	$v0, frink	# save the pointer to the allocated space
    Again, you have to be careful not to access a structure pointer after the struct has been freed; while this doesn't come up in SPIM (as there is no system call to free memory, and thus anything allocated in heap never gets reclaimed), it certainly would be important in a real-world system.

    In either case, you now can use the structure offsets to handle the elements of the structure, just as if they were in the global memory (at least throughout the lifetime of the structure variable). However, with heap variables, you can also create more elaborate data structures using structs and pointers, such as a linked list or tree:
    Code:
    charNode.sizeof	= 8
    charNode.letter = 4	# again, note the need to pad the aligment
    charNode.next	= 4	# pointer to the next struct in the list
    
    charList	.space	4	# ptr to head of character list
    
    
    # add an 'a' into the front of the list
    
    	li	$a0, charNode.sizeof
    	li	$v0, malloc
    	syscall
    	move	$a0, $v0
    	jal	insert
    
    # ...
    
    insert:
    	# sanity check - is the ptr to the new node valid?
    	bnez	$a0, insert_add
    	li	$v0, -1		# return error code
    	jr	$ra	
    	
    insert_add:
    	lw	$t0, charList   	# get list head ptr
    	sw	$t0, charNode.next($a0)	# set node.next == old head
    	sw	$a0, charList		# set the head to the new node
    	move	$v0, $a0		# return ptr to new list head
    	jr	$ra
    At this point, I've gone waaay far afield, but I hope that this has clarified more than it has confused. HTH, C&CW.

    Comments on this post

    • Yegg` agrees : Simply stunning ;). This is one of those posts that anyone interested in assembly should definately take a look at.
    Last edited by Schol-R-LEA; December 10th, 2006 at 03:19 PM.
    Rev First Speaker Schol-R-LEA;2 JAM LCF ELF KoR KCO BiWM TGIF
    #define KINSEY (rand() % 7) λ Scheme is the Red Pill
    Scheme in Short Understanding the C/C++ Preprocessor
    Taming Python A Highly Opinionated Review of Programming Languages for the Novice, v1.1

    FOR SALE: One ShapeSystem 2300 CMD, extensively modified for human use. Includes s/w for anthro, transgender, sex-appeal enhance, & Gillian Anderson and Jason D. Poit clone forms. Some wear. $4500 obo. tverres@et.ins.gov
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2006
    Posts
    271
    Rep Power
    15
    Non-toy assemblers implement structs and build the offsets for you automatically.
  4. #3
  5. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    Do they implement structures in the manner that Schol-R-LEA did, or do they use macros?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2006
    Posts
    271
    Rep Power
    15
    They implement them like a C compiler would from my perspective as a programmer.

    Microsoft MASM has had this capability (and unions) for over 20 years.
  8. #5
  9. Commie Mutant Traitor
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2004
    Location
    Alpharetta, GA
    Posts
    1,806
    Rep Power
    1570
    This is true; I mentioned it in passing, but didn't give more detail because a) I originally wrote this for some fellow students in a class that used SPIM, b) the syntax for structures is different from one assembler to another, and c) the assembler pretty much just automates what I was showing, and I thought it important to show what the structures become in memory, at runtime.
    Rev First Speaker Schol-R-LEA;2 JAM LCF ELF KoR KCO BiWM TGIF
    #define KINSEY (rand() % 7) λ Scheme is the Red Pill
    Scheme in Short Understanding the C/C++ Preprocessor
    Taming Python A Highly Opinionated Review of Programming Languages for the Novice, v1.1

    FOR SALE: One ShapeSystem 2300 CMD, extensively modified for human use. Includes s/w for anthro, transgender, sex-appeal enhance, & Gillian Anderson and Jason D. Poit clone forms. Some wear. $4500 obo. tverres@et.ins.gov
  10. #6
  11. Commie Mutant Traitor
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2004
    Location
    Alpharetta, GA
    Posts
    1,806
    Rep Power
    1570
    I know that I'm getting into a bit of an ego-wank with this, but I just came across another e-mail conversation I'd had with the same student around that time, explaining the use of jump tables (again, in SPIM assembly). It's not a very complete explanation, or as clear as (I hope) the structures one was, but I think I might as well share it for the same reasons (I've reformatted it somewhat for readability):

    Originally Posted by Schol-R-LEA
    Originally Posted by Todd S.
    I was never clear on his version of the case statement.

    I think it starts like this
    Code:
    #######################
       .data
    case_options:  .asciiz  "OCND:
                  .align 4
    case_actions   .asciiz  "OPEN,CLOSE,NEXT,DONE"
       .text
    ########################
    ### and then something like this
    ########################
        la    $s0,case_options
        la    $s1,case_actions
        li    $v0,read_char
        move  $t0,$v0
        li   $t1,0
        j    case
    caseloop:
        lb   $t2,0($s0)    # Load first byte in valid selection
       bne  $t0,$t2,caseloop
    If this is correct so far, I have 2 questions
    1. how exactly do you choose the corresponding action?
    2. Do you count though the loop, or how do you set up a default action?
    Technically, what he uses there isn't a case statement; what he's doing is replacing the case statement with a data structure called a 'jump table', which holds an array of function pointers. He's then using a second table, an array of characters, as a lookup table: the position of the character code in the lookup table corresponds to the matching function pointer in the jump table.

    The main thing you are doing with this that you are creating an array of pointers to functions, then somehow indexing that array to find the function you want to call ([the professor] used a lookup table in his example, but you can use any means you like, so long as you end up with the correct offset for the function pointer you want). You then load the address of the function into a register and use the 'jump and link register' (jalr) instruction to call the actual function.

    Here's a more correct version of the code you gave (though I haven't actually tested it, so there are probably still some errors in it):

    Code:
    #######################
       .data
    case_options:  .asciiz  "OCND"
                  .align 4
    case_actions   .asciiz  "OPEN,CLOSE,NEXT,DONE"
       .text
    ########################
    ### and then something like this
    ########################
       la    $s0,case_options
       la    $s1,case_actions
       li    $v0,read_char
       syscall
       move  $t0,$v0
       
       b     case
    caseloop:
       beq   $t0, $t2, case_found   # if a match, break out of loop
    
       addi  $s0, $s0, 1   # increment the lookup index by one char
       addi  $s1, $s1, 4   # increment the jump table index by one ptr
    case:
       lb    $t2, 0($s0)    # get the current lookup value
       bnez  $t2,caseloop   # if not at end of lookup table, continue
    
       # no match found, give an error
       # ... some code here that handles the error case
    
    # match found, now jump to the selection
    case_found:
       jalr  $s1          # call the function pointed to by $s1
    
    # and continue with the menu loop, if any
    All of this depends on there being functions named OPEN, CLOSE, NEXT, and DONE, of course. One advantage of a jump table is that to add new functions to it and all you need do is add the function pointer to the first array, and a corresponding character code in the same position in the lookup table.

    It might be easier to get an idea of what is happening with the equivalent C++ code:
    Code:
    void open();
    void close();
    void next();
    void done();
    
    bool menu(char ch)
    {
        // note that a function's name by itself
        // is treated as a pointer to the function
        void (*jump_table)()[] = {open, close, next, done};
    
        jtbl_options = "OCND";
        // the indices of the chars in the string array
        // corresponds to those of the functions
        // they represent
    
        for (int i = 0; NULL != jtbl_options[i]; ++i)
        {
            if (ch == jtbl_options[i])
            {
                (jump_table[i])();
                return true;
            }
        }
        
        // ch is not one of the listed options
        return false;
    }
    OK, so maybe that didn't help much. Anyway...
    Despite the last line of the original - HTH.
    Rev First Speaker Schol-R-LEA;2 JAM LCF ELF KoR KCO BiWM TGIF
    #define KINSEY (rand() % 7) λ Scheme is the Red Pill
    Scheme in Short Understanding the C/C++ Preprocessor
    Taming Python A Highly Opinionated Review of Programming Languages for the Novice, v1.1

    FOR SALE: One ShapeSystem 2300 CMD, extensively modified for human use. Includes s/w for anthro, transgender, sex-appeal enhance, & Gillian Anderson and Jason D. Poit clone forms. Some wear. $4500 obo. tverres@et.ins.gov

IMN logo majestic logo threadwatch logo seochat tools logo