Quantcast

Results 1 to 8 of 8

Thread: Assembly in Genesis / 32X projects

  1. #1
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,781
    Rep Power
    50

    Default Assembly in Genesis / 32X projects

    Using Assembly: AKA - Now we're cooking with GAS

    Let me start by saying inline assembly is a joke - don't bother. If you're going to use some assembly in your project, do it right and make a separate assembly file. It's actually easier than trying to remember all those ridiculous formats inline assembly require. It's also much more powerful.

    Let's start with 68000 assembly; you may use it for Genesis or 32X projects. The standard compiler in the GNU Compiler Collection is AS; it is a part of binutils. You use "standard" Motorola syntax; the main differences from more common assemblers is the directives. Let's look at a few.

    Code:
            .text
    This sets the current section to the code section. With the linker script I supply with my Genesis toolchain, this puts any code or data following this directive into the rom.

    Code:
            .data

    This sets the current section to the data section. This puts any code or data following the directive into ram... with the proper startup code. In actuality, the code and data after this directive is still in the rom, but it has all references set so that when the startup code copies this code and data into ram, it will be referenced properly. If you look at the crt0.s file in my example, you'll see the loop that copies the code and data in the data section to ram.

    Let's briefly look at a few handy directives you'll commonly use.

    Code:
            .byte   0x00,0x01,0x02,0x03
            .word   0x0001,0x0203
            .long   0x00010203
    These directives make tables of bytes, words, and longs, respectively. You can have one value (as in the line with .long), or multiple values (as in the other two). Values are stored in big-endian format since that is the format the 68000 uses. It does NOT handle alignment of the data, so be careful with words and longs, which must be on at least a word boundary to avoid an exception. Which brings us to our next directive.

    Code:
            .align  2
    This directive increments the current address to the next boundary specified. For the 68000 version of AS, the value of the align directive is in bytes. 2 means align to two bytes, 4 means align to 4 bytes, etc. When in doubt, use an align directive. Be liberal - it doesn't hurt and is needed before words, longs, and instructions to avoid an exception. Only byte data can be on a byte boundary. Most hacks or homebrew that work in emulators, but fail on real hardware, usually have data or code on a byte boundary somewhere because they forgot to use the align directive.

    Here are two more handy directives.

    Code:
            .ascii  "SEGA MD Example "
            .asciz  "SEGA MD Example "
    These turn the string into a byte table of the ASCII values of the string. The difference is the asciz directive always adds a byte of 0x00 to the end of the string. That would be necessary for standard C strings, which MUST be null terminated.

    Now let's look at labels. AS for the 68000 allows you to use labels exactly the same as in C. There are no special characters needed. The labels are case sensitive, so watch the capitalization! Any label you use in assembly that isn't defined in the file is assumed to be defined in another file, so there is no extern directive like you see in C. To define a label as being seen externally requires the global directive.

    Code:
            .global gTicks
    gTicks:
            .long   0
    The global directive tells AS that the specified label may be seen by other files. Not using the global directive makes any other defined labels local, so there is no static directive like you see in C. Labels may refer to code or data. It's up to the programmer to use what the labels refer to properly. There is no type safety - you can easily use a C array of longs as bytes in an assembly file. You could use C code as data, or vice versa.

    So now how do we use code in assembly that properly interacts with C? This is what is referred to as the Application Binary Interface, or ABI. The 68000 ABI says that when you call an assembly language function, you must save registers D2 to D7, and A2 to A6. D0, D1, A0, and A1 may be modified freely without saving or restoring them. If you can, only use those registers in your assembly for better speed. When you have to use more registers, you must push them on the stack before you do so, and pop them off the stack before returning. When the code is entered, the stack points to a long that is the return address. The long at the stack pointer + 4 is the first argument passed to the function, if any. The long at the stack pointer + 8 is the second argument passed to the function, if any. Additional arguments, if any, are accessed on the stack in a similar manner, being at + 12, + 16, + 20, etc. First, in this case, means the left-most argument, with other arguments going right accessed at higher addresses. All arguments are pushed as long values regardless of size! If you pass a byte from C to assembly, it's pushed on the stack as a long. The return value, if any, should be left in D0.

    Let's look at a simple example.

    Code:
            .align  2
    
    | short set_sr(short new_sr);
    | set SR, return previous SR
    | entry: arg = SR value
    | exit:  d0 = previous SR value
            .global set_sr
    set_sr:
            moveq   #0,d0
            move.w  sr,d0
            move.l  4(sp),d1
            move.w  d1,sr
            rts
    The "|" character indicates the start of a comment. You can also use /* */ for comments. Notice how we put an align directive before the function. Do this if your code ever follows byte data to avoid exceptions. Notice that since we only use D0 and D1, we don't need to save or restore any registers. Notice that the argument (short new_sr) is accessed at the stack pointer + 4. Notice how we put the old version of the status register in D0 as the return value.

    Now let's look at a more complex example.

    Code:
    | short get_pad(short pad);
    | return buttons for selected pad
    | entry: arg = pad index (0 or 1)
    | exit:  d0 = pad value (0 0 0 1 M X Y Z S A C B R L D U) or (0 0 0 0 0 0 0 0 S A C B R L D U)
            .global get_pad
    get_pad:
            move.l  d2,-(sp)
            move.l  8(sp),d0        /* first arg is pad number */
            cmpi.w  #1,d0
            bhi     no_pad
            add.w   d0,d0
            addi.l  #0xA10003,d0    /* pad control register */
            movea.l d0,a0
            bsr.b   get_input       /* - 0 s a 0 0 d u - 1 c b r l d u */
            move.w  d0,d1
            andi.w  #0x0C00,d0
            bne.b   no_pad
            bsr.b   get_input       /* - 0 s a 0 0 d u - 1 c b r l d u */
            bsr.b   get_input       /* - 0 s a 0 0 0 0 - 1 c b m x y z */
            move.w  d0,d2
            bsr.b   get_input       /* - 0 s a 1 1 1 1 - 1 c b r l d u */
            andi.w  #0x0F00,d0      /* 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 */
            cmpi.w  #0x0F00,d0
            beq.b   common          /* six button pad */
            move.w  #0x010F,d2      /* three button pad */
    common:
            lsl.b   #4,d2           /* - 0 s a 0 0 0 0 m x y z 0 0 0 0 */
            lsl.w   #4,d2           /* 0 0 0 0 m x y z 0 0 0 0 0 0 0 0 */
            andi.w  #0x303F,d1      /* 0 0 s a 0 0 0 0 0 0 c b r l d u */
            move.b  d1,d2           /* 0 0 0 0 m x y z 0 0 c b r l d u */
            lsr.w   #6,d1           /* 0 0 0 0 0 0 0 0 s a 0 0 0 0 0 0 */
            or.w    d1,d2           /* 0 0 0 0 m x y z s a c b r l d u */
            eori.w  #0x1FFF,d2      /* 0 0 0 1 M X Y Z S A C B R L D U */
            move.w  d2,d0
            move.l  (sp)+,d2
            rts
    
    | 3-button/6-button pad not found
    no_pad:
            move.w  #0xF000,d0      /* SEGA_CTRL_NONE */
            move.l  (sp)+,d2
            rts
    
    | read single phase from controller
    get_input:
            move.b  #0x00,(a0)
            nop
            nop
            move.b  (a0),d0
            move.b  #0x40,(a0)
            lsl.w   #8,d0
            move.b  (a0),d0
            rts
    Notice how we use D2, so we must save it at the start, and restore it before we return. Because we pushed D2 onto the stack, any arguments are now one long further back on the stack; that is why the argument is now accessed at the stack pointer + 8 instead of + 4. Notice that we use both kinds of comments this function. Be sure to comment your code well enough that you can figure out what you are doing when you look at your code years later. Again, the return value is left in D0.

    We are not going to cover more in this post as this should be enough to get you going. If you wish to know more of the directives, especially macros, consult the binutils documentation for AS.


    Now let's look at AS for the SH2. It's very nearly the same as AS for the 68000 as all platforms use most of the same directives. So we mainly need to concern ourselves with the differences. The data directives are the same, but there is a difference as far as alignment goes - words need to be on word boundaries, like the 68000, but longs need to be on long boundaries; the 68000 can take longs on word or long boundaries - the SH2 cannot. Which brings the first difference in the directives - the value used with the align directive is the power of two used to set the current address. For example, the following are the same.

    68000
    Code:
            .align  2
            .align  4
            .align  16
    SH2
    Code:
            .align  1
            .align  2
            .align  4
    SH2 code must be word aligned, but is slightly faster if aligned to a 16 byte boundary (.align 4).

    Our next difference is in the labels - the SH2 assembler needs a "_" (underscore) character in front of labels if they are defined in C code, or if you wish to access the label from C. For example.

    Code:
    ! Cache clear line function
    ! On entry: r4 = ptr - should be 16 byte aligned
    
            .align  4
            .global _CacheClearLine
    _CacheClearLine:
            mov.l   _cache_flush,r0
            or      r0,r4
            mov     #0,r0
            mov.l   r0,@r4
            rts
            nop
    This function would be called from C like this.

    Code:
        CacheClearLine(&data_variable);
    Some more differences immediately are apparent. You use "!" for a comment instead of "|"; you can still use /* */ for comments. The next difference is in how arguments are passed in; the first argument is passed in R4, the second in R5, the third in R6, and the fourth in R7. Any other arguments are pushed on the stack. Use no more than four arguments for best speed. On the SH2, you must save R8 to R14, and the PR register. For example.

    Code:
    ! Entry: r4 = sound buffer to fill
    
            .global _fill_buffer
    _fill_buffer:
            sts.l   pr,@-r15                /* save return address */
            mov.l   r8,@-r15
            mov.l   r9,@-r15
            mov.l   r10,@-r15
            mov.l   r11,@-r15
            mov.l   r12,@-r15
            mov.l   r13,@-r15
            mov.l   r14,@-r15
            mov     r4,r14
    
    ! lots of code left out here as it isn't needed for the example
    
            mov.l   @r15+,r14
            mov.l   @r15+,r13
            mov.l   @r15+,r12
            mov.l   @r15+,r11
            mov.l   @r15+,r10
            mov.l   @r15+,r9
            mov.l   @r15+,r8
            lds.l   @r15+,pr                /* restore return address */
            rts
            nop
    A practical example of an assembly routine is my code to stretch the frame buffer contents - it takes a half a line of data and doubles it to fill an entire line.

    Code:
    ! void ScreenStretch(int src, int width, int height, int interp);
    ! On entry: r4 = src pointer, r5 = width, r6 = height, r7 = interpolate
    
            .align  4
            .global _ScreenStretch
    _ScreenStretch:
            cmp/pl  r7
            bt      ss_interp
    
    ! stretch screen without interpolation
    
    0:
            mov     r5,r3
            shll    r3
            mov     r3,r2
            shll    r2
            add     r4,r3
            add     r4,r2
    1:
            add     #-2,r3
            mov.w   @r3,r0
            extu.w  r0,r1
            shll16  r0
            or      r1,r0
            mov.l   r0,@-r2
            cmp/eq  r3,r4
            bf      1b
    
            /* next line */
            mov.w   ss_pitch,r0
            dt      r6
            bf/s    0b
            add     r0,r4
            rts
            nop
    
    ss_interp:
    
    ! stretch screen with interpolation
    
    0:
            mov     r5,r3
            shll    r3
            mov     r3,r2
            shll    r2
            add     r4,r3
            add     r4,r2
            mov     #0,r7
    1:
            add     #-2,r3
            mov.w   @r3,r0
            mov.w   ss_mask,r1
            and     r0,r1               /* masked curr pixel */
            shll16  r0
            add     r1,r7               /* add to masked prev pixel */
            shlr    r7                  /* blended pixel */
            or      r7,r0               /* curr pixel << 16 | blended pixel */
            mov     r1,r7               /* masked prev pixel = masked curr pixel */
            mov.l   r0,@-r2
            cmp/eq  r3,r4
            bf      1b
    
            /* next line */
            mov.w   ss_pitch,r0
            dt      r6
            bf/s    0b
            add     r0,r4
            rts
            nop
    
    ss_mask:
            .word   0x7BDE
    ss_pitch:
            .word   640
    As far as the sections go, with my 32X toolchain, the text section still goes to the rom, while data goes to the SDRAM.

    Okay, that's enough for now. I encourage people to look at the assembly files in the Tic Tac Toe examples, as well as the Yeti3D example.

  2. #2
    Wildside Expert Archer's Avatar
    Join Date
    Jan 2011
    Location
    Netherlands, The Hague
    Posts
    166
    Rep Power
    3

    Default

    As you know i already modified your tictactoe example, i now have an idea on how to make my sprite move.

    Thanks for this excellent tutorial.
    Learning the genesis one bit at a time

  3. #3
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,781
    Rep Power
    50

    Default

    No problem - just something I whipped up last night. If you have any questions (as I know I skipped a lot), be sure to ask.

  4. #4
    Wildside Expert Archer's Avatar
    Join Date
    Jan 2011
    Location
    Netherlands, The Hague
    Posts
    166
    Rep Power
    3

    Default

    YES, i made my sprite movable with the D-PAD

    Here is how in ASM (offcourse the sprite is loaded before this)
    Code:
    sprite_update:
    | void sprite_update(int x,int y)
    
      move.l  8(sp),d0                /* get x */
      move.l  4(sp),d1                /* get y */
      lea     0xC00000,a0
      lea     0xC00004,a1
    
    	move.l	#0x68000002,(a1)
      move.w  d0,      (a0)
      move.w  #0x0001, (a0)
      move.w  #0x0081, (a0)
      move.w  d1,      (a0)
      rts
    I can call this function from C and just put X and Y values as parameters. i am going to extend it to make all sprite options a parameter.

    i have attached a working demo. the sprite is hiding at the top left, so move it down and right and it will become visible.

    I do like to have some examples of how to pass byte and word parameters. I have a lot more questions, but for now i am very happy this is working
    Attached Files Attached Files
    Learning the genesis one bit at a time

  5. #5
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,781
    Rep Power
    50

    Default

    As I mentioned in the article, on the 68000, ALL arguments are passed as longs regardless of what they are. If you want to just load the byte or word rather than the whole long, all you need to do it remember the 68000 is big-endian. So if the first argument is a byte, you would read from sp + 7, or if it's a word, you read from sp + 6. Same for any other argument. Just add 2 for words and 3 for bytes.

  6. #6
    Wildside Expert Archer's Avatar
    Join Date
    Jan 2011
    Location
    Netherlands, The Hague
    Posts
    166
    Rep Power
    3

    Default

    Quote Originally Posted by Chilly Willy View Post
    As I mentioned in the article, on the 68000, ALL arguments are passed as longs regardless of what they are. If you want to just load the byte or word rather than the whole long, all you need to do it remember the 68000 is big-endian. So if the first argument is a byte, you would read from sp + 7, or if it's a word, you read from sp + 6. Same for any other argument. Just add 2 for words and 3 for bytes.
    I am sorry for the stupid quistion, i read it too quickly at work Should have read it more closely. Thanks for explaining it again.
    Learning the genesis one bit at a time

  7. #7
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,781
    Rep Power
    50

    Default

    No problem - it gave me the chance to speak a little more about it, particularly about loading a word or byte from the stack.

  8. #8
    The medium-sized mang. Raging in the Streets Lastcallhall's Avatar
    Join Date
    Jun 2010
    Location
    The Fantasy Zone
    Age
    32
    Posts
    2,770
    Rep Power
    29

    Default

    I wish I was smart enough to understand this... Still, it's a valuable service you're offering up, CW.
    You can never have enough

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •