N64 Assembly Tutorial - Lesson 4
Updated: N64 Assembly YouTube Series
The same content updated with more explanation and showing how it's done. Twitch stream style.
https://www.youtube.com/playlist?list=PLjwOF_LvxhqTXVUdWZJEVZxEUG5qt8fsA
Lesson 4
Is it Magic?
Some programmers write the code and pass it to the compiler and if the program works as expected it's done. most of the time they end up debugging some little part that wasn't quite right. When this is done in a high level language it's still easy to think of the compiler as performing magic. Since we are writing assembly code we need to pull back the curtain and see how the tricks are performed.
A quick reminder about our environment:
There is no operating system
The assembler is outputting our exact instructions (up to now)
No ELF or PE formatted output
No code libraries are used (yet?)
The bass compiler is very capable and the way it works is explained in the included doc file (C:\bass\doc\bass.html) I'm going to try and create an easy mental model for what it does.
Merge all include & insert files (recursively) into a single source file (in memory).
Parse out the replacement blocks & values (macros, defines, constants, variables, scopes)
Replace the replaceable values in the source file.
Calculate any "static calculations"
Example of addition: sw t0,$07C0+16(t1)
Write out the results, calculating Label Locations for branch and jump instructions.
Constants
The use of constants is pretty similar in all programming languages, usually they just enhance the code readability and maintainability. Since assembly language starts off being very succinct the use of constants is more important. I will use the constants available in Peter Lemon's headers from now on and I'll probably create another header, so check for downloads at the bottom of each lesson. Create your own!
Syntax
constant name(value)
Example
constant BLACK_LOW($00FF)
constant BLACK_HIGH(0)
constant BLACK($000000FF) (See Pseudo Instructions below for use)
Macros
At first glance Macros look like a function call since they can accept parameters. Not True!
The following rules/guidelines apply to Macros:
They are blocks of code that are copied to the location they are called from
The parameters don't have a type
The parameters are calculated at compile time
If used in multiple places, a new copy will be put in each location.
Pseudo Instructions
Pseudo Instructions are very common on the MIPS platform, because some instructions can do the multiple things with a little different syntax but it's more intuitive if it has multiple names. In other cases one instruction may get expanded into as many as 4 instructions, commonly only 2. I've hesitated to use these because I feel that they make debugging more complicated since the code you debug isn't all written by you. Some of it was modified by the compiler, now that we have started using Macros our source code and compiled code aren't going to match anyway. The following instructions can cut the number of code lines in half, just remember these instructions will reduce readability during debugging, but may reduce the number of errors and actually reduce the debug portion of development time.
WARNING
Do NOT use a psuedo instruction in a Delay Slot!
If it is a multi instruction type of psuedo instruction (most common) only the first of the instructions will be executed not all of them.
li - Load Immediate & la - Load Address
These two instructions are identical in every behavior, the name difference is more a convenience for reading so it's clear if you are working with a value or memory location. The compiler will convert these into 2 assembly instructions.
Example:
li t3, 71*320*4
la t2, $A010'0000
The Ethiopia Flag Exercise (Follow-Up)
This exercise was more challenging that I originally thought and it's a long enough code sample that we will use it as our code to restructure and improve readability .
First lets talk about some of the code issues that I ran into, hopefully this section can help to fix your code or explain some more background on the issue.
Off by 1 Error (First Pixel is Wrong Color)
If the delay slot of the looping instruction was used to increment the memory position, you may of noticed that the Memory Position was 1 pixel (4 bytes) beyond where you planned.
The problem has at least 2 possible fixes:
Manually calculate the final pixel position (Last Position - 4)
Instead of 'bne' use a new similar instruction 'bnel'
The lowercase L adds the word "likely" to the instruction description. This instruction description word (like "immediate") changes the behavior. Only if the Branch is taken does the Delay Slot get executed. The result is the last branch test where the values are equal doesn't execute the Delay slot and the Memory location isn't incremented. Please try this solution in your own program.
Compile Time Calculation Failed
I was using the following line to calculate the number of bytes/pixels that needed to be filled with the new color.
addi t2, t2, 71 * 320 * 4
This line seemed intuitive, didn't show any warnings or errors during compile and seemed to work. It really didn't work. The first sign was that the color ended in the middle of the line not even near the end. The other issue wasn't obvious until I had added the second color block and realized the middle color wasn't in the middle of the screen.
What went wrong?
Hand calculating the 71 * 320 * 4 = 90,880 or $0001 6300, well only the 6300 was being used in the instruction because the instruction only allows a 16 bit value (15 + sign bit). So the result was truncated (i.e. missing the upper bytes of the result). Also if the 71 value had been 80 the result would be $0001 9000 which after being truncated would created a negative $1000.
My fix for this since we would use the value 3 times, was to use the t3 register and load it with the result of the 71 * 320 * 4 calculation.
lui t3, $0001
ori t3,t3,$6300
Then change the 3 occurrences of the above 'addi' line to:
add t2, t2, t3
Clean up the Code
Now we are going to take my solution to the Ethiopia Flag Challenge and use the tricks we just learned to improve the readability and maintainability of the code. Don't type this code, this code will get much smaller.
arch n64.cpu
endian msb
output "Lesson4.N64", create
fill 1052672
origin $00000000
base $80000000
include "../LIB/N64.INC"
include "N64_HEADER.ASM"
insert "../LIB/N64_BOOTCODE.BIN"
Start:
lui t0,$BFC0
addi t1,r0,8
sw t1,$7FC(t0)
Video_Init:
lui a0, $A440
addi t0, r0, 3 // 2 = 16 BPP, 3 = 32 BPP
// Gamma, Dither, Serrate, Anti-Alias, Diagnostic
sw t0, $0000(a0)
lui t0, $A010 // Frame Buffer RDRAM Location
sw t0, $0004(a0)
addi t0, r0, 320 // Width in Pixels
sw t0, $0008(a0)
addi t0, r0, $200 // VI vertical
intr sw t0, $000C(a0)
addi t0, r0, 352
sw t0, $0010(a0)
// Conflicting documentation, sticking with a known good NTSC Value
lui t0, $3E5
addi t0, t0, $2239
sw t0, $0014(a0)
addi t0, r0, 525 // Number of Half-Lines Per field
sw t0, $0018(a0)
addi t0, r0, 0 // PAL 5-bit Leap pattern, NTSC = 0
sll t0, t0, 16 // Move current value left by 16 bits
addi t0, t0, 3093 // Total Duration Of A Line In 1/4 Pixel
sw t0, $001C(a0) // 28
addi t0, r0, 3093
sll t0, t0, 16
addi t0, t0, 3093
sw t0, $0020(a0) // 32
addi t0, r0, 108
sll t0, t0, 16
addi t0, t0, 748
sw t0, $0024(a0) // 36
addi t0, r0, 37
sll t0, t0, 16
addi t0, t0, 511
sw t0, $0028(a0) // 40
addi t0, r0, 14
sll t0, t0, 16
addi t0, t0, 516
sw t0, $002C(a0) // 44
addi t0, r0, 0 // Horizontal Sub Pixel Offset
sll t0, t0, 16 // Move current value left by 16 bits
addi t0, t0, 512 // Horizontal Scale Up Factor
sw t0, $0030(a0) // 48
addi t0, r0, 0 // Horizontal Sub Pixel Offset
sll t0, t0, 16 // Move current value left by 16 bits
addi t0, t0, 1024 // Vertical Scale Up Factor
sw t0, $0034(a0) // 52
nop // Marker NOP's
nop
// Red
//lui t0, $FF00
// Green
//lui t0, $00FF
// Blue
// ori t0, r0, $FF00
// Transparency / Black
addiu t0, r0, $00FF
setupTopBlackBar:
// Buffer Start
lui t1, $A010
// Buffer End
lui t2, $A010
ori t2, t2, 13 * 320 * 4
nop
// Marker NOP's
nop
loopTopBlackBar:
sw t0, 0(t1)
bnel t1,t2,loopTopBlackBar
addi t1,t1,4
nop // Marker NOP
CalcMajorLines:
nop lui t3, $0001
addi t3, t3, $6300
nop
setupGreen:
nop
lui t0, $00FF
add t2, t2, t3
nop
loopGreen:
sw t0, 0(t1)
bnel t1,t2,loopGreen
addi t1,t1,4
nop // Marker NOP
setupYellow:
nop
lui t0, $FFFF
add t2, t2, t3
loopYellow:
sw t0, 0(t1)
bnel t1,t2,loopYellow
addi t1,t1,4
nop // Marker NOP
setupRed:
nop
lui t0, $FF00
add t2, t2, t3
loopRed:
sw t0, 0(t1)
bnel t1,t2,loopRed
addi t1,t1,4
nop // Marker NOP
setupBottomBlackBar:
addiu t0, r0, $00FF
ori t2, t2, 13 * 320 * 4
loopBottomBlackBar:
sw t0, 0(t1)
bnel t1,t2,loopBottomBlackBar
addi t1,t1,4
nop // Marker NOP
addi t2, t2, 71 * 320 * 4
Loop:
j Loop
nop // Delay Slot
The first step is to use 2 Macros that are included in Peter Lemon's Header
include "LIB/N64_GFX.INC" // Include Graphics Macros
N64_INIT() // Run N64 Initialization Routine
ScreenNTSC(320, 240, BPP32, $A0100000) // Screen NTSC: 320x240, 32BPP, DRAM $A0100000
That saved us almost 50 lines of code. There is a link in the sample if you need to download the other header file.
The first of these lines is clearly another include, which we need for the ScreenNTSC() Macro.
The N64_INIT() is a very simple Macro with those three instructions that keep the N64 from repeatedly resetting. ScreenNTSC() is a great macro to initialize the video to the exact same values we used previously. Please review the code for these in the header, they are very well written and use constants which improve their readability. Also note the use of psuedo instructions, my code was nearly 50 lines of code to perform the initialization, the header has closer to 25 lines.
Looking at the code that is left it seems we could probably use some of the new psuedo instructions and a couple of COLOR Constants. Writing our own macros is a topic for later.
If the Challenges from Lesson 3 seemed hard, I encourage you to go back and give them another try since you have learned these new features. Try the additional Challenge:
Adjust the Transparency
Pick a solid background color
For each line increment the Transparency byte
240 lines on the screen so use the range $10 - $FF