Using Assembly: AKA - Now we're cooking with GAS
Let me start by saying inline assembly is a joke - don't bother. If you're going to use some assembly in your project, do it right and make a separate assembly file. It's actually easier than trying to remember all those ridiculous formats inline assembly require. It's also much more powerful.
Let's start with 68000 assembly; you may use it for Genesis or 32X projects. The standard compiler in the GNU Compiler Collection is AS; it is a part of binutils. You use "standard" Motorola syntax; the main differences from more common assemblers is the directives. Let's look at a few.
This sets the current section to the code section. With the linker script I supply with my Genesis toolchain, this puts any code or data following this directive into the rom.
This sets the current section to the data section. This puts any code or data following the directive into ram... with the proper startup code. In actuality, the code and data after this directive is still in the rom, but it has all references set so that when the startup code copies this code and data into ram, it will be referenced properly. If you look at the crt0.s file in my example, you'll see the loop that copies the code and data in the data section to ram.
Let's briefly look at a few handy directives you'll commonly use.
These directives make tables of bytes, words, and longs, respectively. You can have one value (as in the line with .long), or multiple values (as in the other two). Values are stored in big-endian format since that is the format the 68000 uses. It does NOT handle alignment of the data, so be careful with words and longs, which must be on at least a word boundary to avoid an exception. Which brings us to our next directive.
This directive increments the current address to the next boundary specified. For the 68000 version of AS, the value of the align directive is in bytes. 2 means align to two bytes, 4 means align to 4 bytes, etc. When in doubt, use an align directive. Be liberal - it doesn't hurt and is needed before words, longs, and instructions to avoid an exception. Only byte data can be on a byte boundary. Most hacks or homebrew that work in emulators, but fail on real hardware, usually have data or code on a byte boundary somewhere because they forgot to use the align directive.
Here are two more handy directives.
These turn the string into a byte table of the ASCII values of the string. The difference is the asciz directive always adds a byte of 0x00 to the end of the string. That would be necessary for standard C strings, which MUST be null terminated.
.ascii "SEGA MD Example "
.asciz "SEGA MD Example "
Now let's look at labels. AS for the 68000 allows you to use labels exactly the same as in C. There are no special characters needed. The labels are case sensitive, so watch the capitalization! Any label you use in assembly that isn't defined in the file is assumed to be defined in another file, so there is no extern directive like you see in C. To define a label as being seen externally requires the global directive.
The global directive tells AS that the specified label may be seen by other files. Not using the global directive makes any other defined labels local, so there is no static directive like you see in C. Labels may refer to code or data. It's up to the programmer to use what the labels refer to properly. There is no type safety - you can easily use a C array of longs as bytes in an assembly file. You could use C code as data, or vice versa.
So now how do we use code in assembly that properly interacts with C? This is what is referred to as the Application Binary Interface, or ABI. The 68000 ABI says that when you call an assembly language function, you must save registers D2 to D7, and A2 to A6. D0, D1, A0, and A1 may be modified freely without saving or restoring them. If you can, only use those registers in your assembly for better speed. When you have to use more registers, you must push them on the stack before you do so, and pop them off the stack before returning. When the code is entered, the stack points to a long that is the return address. The long at the stack pointer + 4 is the first argument passed to the function, if any. The long at the stack pointer + 8 is the second argument passed to the function, if any. Additional arguments, if any, are accessed on the stack in a similar manner, being at + 12, + 16, + 20, etc. First, in this case, means the left-most argument, with other arguments going right accessed at higher addresses. All arguments are pushed as long values regardless of size! If you pass a byte from C to assembly, it's pushed on the stack as a long. The return value, if any, should be left in D0.
Let's look at a simple example.
The "|" character indicates the start of a comment. You can also use /* */ for comments. Notice how we put an align directive before the function. Do this if your code ever follows byte data to avoid exceptions. Notice that since we only use D0 and D1, we don't need to save or restore any registers. Notice that the argument (short new_sr) is accessed at the stack pointer + 4. Notice how we put the old version of the status register in D0 as the return value.
| short set_sr(short new_sr);
| set SR, return previous SR
| entry: arg = SR value
| exit: d0 = previous SR value
Now let's look at a more complex example.
Notice how we use D2, so we must save it at the start, and restore it before we return. Because we pushed D2 onto the stack, any arguments are now one long further back on the stack; that is why the argument is now accessed at the stack pointer + 8 instead of + 4. Notice that we use both kinds of comments this function. Be sure to comment your code well enough that you can figure out what you are doing when you look at your code years later. Again, the return value is left in D0.
| short get_pad(short pad);
| return buttons for selected pad
| entry: arg = pad index (0 or 1)
| exit: d0 = pad value (0 0 0 1 M X Y Z S A C B R L D U) or (0 0 0 0 0 0 0 0 S A C B R L D U)
move.l 8(sp),d0 /* first arg is pad number */
addi.l #0xA10003,d0 /* pad control register */
bsr.b get_input /* - 0 s a 0 0 d u - 1 c b r l d u */
bsr.b get_input /* - 0 s a 0 0 d u - 1 c b r l d u */
bsr.b get_input /* - 0 s a 0 0 0 0 - 1 c b m x y z */
bsr.b get_input /* - 0 s a 1 1 1 1 - 1 c b r l d u */
andi.w #0x0F00,d0 /* 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 */
beq.b common /* six button pad */
move.w #0x010F,d2 /* three button pad */
lsl.b #4,d2 /* - 0 s a 0 0 0 0 m x y z 0 0 0 0 */
lsl.w #4,d2 /* 0 0 0 0 m x y z 0 0 0 0 0 0 0 0 */
andi.w #0x303F,d1 /* 0 0 s a 0 0 0 0 0 0 c b r l d u */
move.b d1,d2 /* 0 0 0 0 m x y z 0 0 c b r l d u */
lsr.w #6,d1 /* 0 0 0 0 0 0 0 0 s a 0 0 0 0 0 0 */
or.w d1,d2 /* 0 0 0 0 m x y z s a c b r l d u */
eori.w #0x1FFF,d2 /* 0 0 0 1 M X Y Z S A C B R L D U */
| 3-button/6-button pad not found
move.w #0xF000,d0 /* SEGA_CTRL_NONE */
| read single phase from controller
We are not going to cover more in this post as this should be enough to get you going. If you wish to know more of the directives, especially macros, consult the binutils documentation for AS.
Now let's look at AS for the SH2. It's very nearly the same as AS for the 68000 as all platforms use most of the same directives. So we mainly need to concern ourselves with the differences. The data directives are the same, but there is a difference as far as alignment goes - words need to be on word boundaries, like the 68000, but longs need to be on long boundaries; the 68000 can take longs on word or long boundaries - the SH2 cannot. Which brings the first difference in the directives - the value used with the align directive is the power of two used to set the current address. For example, the following are the same.
SH2 code must be word aligned, but is slightly faster if aligned to a 16 byte boundary (.align 4).
Our next difference is in the labels - the SH2 assembler needs a "_" (underscore) character in front of labels if they are defined in C code, or if you wish to access the label from C. For example.
This function would be called from C like this.
! Cache clear line function
! On entry: r4 = ptr - should be 16 byte aligned
Some more differences immediately are apparent. You use "!" for a comment instead of "|"; you can still use /* */ for comments. The next difference is in how arguments are passed in; the first argument is passed in R4, the second in R5, the third in R6, and the fourth in R7. Any other arguments are pushed on the stack. Use no more than four arguments for best speed. On the SH2, you must save R8 to R14, and the PR register. For example.
A practical example of an assembly routine is my code to stretch the frame buffer contents - it takes a half a line of data and doubles it to fill an entire line.
! Entry: r4 = sound buffer to fill
sts.l pr,@-r15 /* save return address */
! lots of code left out here as it isn't needed for the example
lds.l @r15+,pr /* restore return address */
As far as the sections go, with my 32X toolchain, the text section still goes to the rom, while data goes to the SDRAM.
! void ScreenStretch(int src, int width, int height, int interp);
! On entry: r4 = src pointer, r5 = width, r6 = height, r7 = interpolate
! stretch screen without interpolation
/* next line */
! stretch screen with interpolation
and r0,r1 /* masked curr pixel */
add r1,r7 /* add to masked prev pixel */
shlr r7 /* blended pixel */
or r7,r0 /* curr pixel << 16 | blended pixel */
mov r1,r7 /* masked prev pixel = masked curr pixel */
/* next line */
Okay, that's enough for now. I encourage people to look at the assembly files in the Tic Tac Toe examples, as well as the Yeti3D example.