N64 Assembly Tutorial - Lesson 4

Updated: N64 Assembly YouTube Series

The same content updated with more explanation and showing how it's done. Twitch stream style.

https://www.youtube.com/playlist?list=PLjwOF_LvxhqTXVUdWZJEVZxEUG5qt8fsA

Lesson 4

Is it Magic?

Some programmers write the code and pass it to the compiler and if the program works as expected it's done. most of the time they end up debugging some little part that wasn't quite right. When this is done in a high level language it's still easy to think of the compiler as performing magic. Since we are writing assembly code we need to pull back the curtain and see how the tricks are performed.

A quick reminder about our environment:

  • There is no operating system

  • The assembler is outputting our exact instructions (up to now)

    • No ELF or PE formatted output

  • No code libraries are used (yet?)

The bass compiler is very capable and the way it works is explained in the included doc file (C:\bass\doc\bass.html) I'm going to try and create an easy mental model for what it does.

  • Merge all include & insert files (recursively) into a single source file (in memory).

  • Parse out the replacement blocks & values (macros, defines, constants, variables, scopes)

  • Replace the replaceable values in the source file.

  • Calculate any "static calculations"

    • Example of addition: sw t0,$07C0+16(t1)

  • Write out the results, calculating Label Locations for branch and jump instructions.

Constants

The use of constants is pretty similar in all programming languages, usually they just enhance the code readability and maintainability. Since assembly language starts off being very succinct the use of constants is more important. I will use the constants available in Peter Lemon's headers from now on and I'll probably create another header, so check for downloads at the bottom of each lesson. Create your own!

Syntax

constant name(value)

Example

constant BLACK_LOW($00FF)

constant BLACK_HIGH(0)

constant BLACK($000000FF) (See Pseudo Instructions below for use)

Macros

At first glance Macros look like a function call since they can accept parameters. Not True!

The following rules/guidelines apply to Macros:

  • They are blocks of code that are copied to the location they are called from

  • The parameters don't have a type

  • The parameters are calculated at compile time

  • If used in multiple places, a new copy will be put in each location.

Pseudo Instructions

Pseudo Instructions are very common on the MIPS platform, because some instructions can do the multiple things with a little different syntax but it's more intuitive if it has multiple names. In other cases one instruction may get expanded into as many as 4 instructions, commonly only 2. I've hesitated to use these because I feel that they make debugging more complicated since the code you debug isn't all written by you. Some of it was modified by the compiler, now that we have started using Macros our source code and compiled code aren't going to match anyway. The following instructions can cut the number of code lines in half, just remember these instructions will reduce readability during debugging, but may reduce the number of errors and actually reduce the debug portion of development time.

WARNING

Do NOT use a psuedo instruction in a Delay Slot!

If it is a multi instruction type of psuedo instruction (most common) only the first of the instructions will be executed not all of them.

li - Load Immediate & la - Load Address

These two instructions are identical in every behavior, the name difference is more a convenience for reading so it's clear if you are working with a value or memory location. The compiler will convert these into 2 assembly instructions.

Example:

li t3, 71*320*4

la t2, $A010'0000

The Ethiopia Flag Exercise (Follow-Up)

This exercise was more challenging that I originally thought and it's a long enough code sample that we will use it as our code to restructure and improve readability .

First lets talk about some of the code issues that I ran into, hopefully this section can help to fix your code or explain some more background on the issue.

Off by 1 Error (First Pixel is Wrong Color)

If the delay slot of the looping instruction was used to increment the memory position, you may of noticed that the Memory Position was 1 pixel (4 bytes) beyond where you planned.

The problem has at least 2 possible fixes:

  1. Manually calculate the final pixel position (Last Position - 4)

  2. Instead of 'bne' use a new similar instruction 'bnel'

The lowercase L adds the word "likely" to the instruction description. This instruction description word (like "immediate") changes the behavior. Only if the Branch is taken does the Delay Slot get executed. The result is the last branch test where the values are equal doesn't execute the Delay slot and the Memory location isn't incremented. Please try this solution in your own program.

Compile Time Calculation Failed

I was using the following line to calculate the number of bytes/pixels that needed to be filled with the new color.

addi t2, t2, 71 * 320 * 4

This line seemed intuitive, didn't show any warnings or errors during compile and seemed to work. It really didn't work. The first sign was that the color ended in the middle of the line not even near the end. The other issue wasn't obvious until I had added the second color block and realized the middle color wasn't in the middle of the screen.

What went wrong?

Hand calculating the 71 * 320 * 4 = 90,880 or $0001 6300, well only the 6300 was being used in the instruction because the instruction only allows a 16 bit value (15 + sign bit). So the result was truncated (i.e. missing the upper bytes of the result). Also if the 71 value had been 80 the result would be $0001 9000 which after being truncated would created a negative $1000.

My fix for this since we would use the value 3 times, was to use the t3 register and load it with the result of the 71 * 320 * 4 calculation.

lui t3, $0001

ori t3,t3,$6300

Then change the 3 occurrences of the above 'addi' line to:

add t2, t2, t3

Clean up the Code

Now we are going to take my solution to the Ethiopia Flag Challenge and use the tricks we just learned to improve the readability and maintainability of the code. Don't type this code, this code will get much smaller.

arch n64.cpu

endian msb

output "Lesson4.N64", create

fill 1052672

origin $00000000

base $80000000

include "../LIB/N64.INC"

include "N64_HEADER.ASM"

insert "../LIB/N64_BOOTCODE.BIN"

Start:

lui t0,$BFC0

addi t1,r0,8

sw t1,$7FC(t0)

Video_Init:

lui a0, $A440

addi t0, r0, 3 // 2 = 16 BPP, 3 = 32 BPP

// Gamma, Dither, Serrate, Anti-Alias, Diagnostic

sw t0, $0000(a0)

lui t0, $A010 // Frame Buffer RDRAM Location

sw t0, $0004(a0)

addi t0, r0, 320 // Width in Pixels

sw t0, $0008(a0)

addi t0, r0, $200 // VI vertical

intr sw t0, $000C(a0)

addi t0, r0, 352

sw t0, $0010(a0)

// Conflicting documentation, sticking with a known good NTSC Value

lui t0, $3E5

addi t0, t0, $2239

sw t0, $0014(a0)

addi t0, r0, 525 // Number of Half-Lines Per field

sw t0, $0018(a0)

addi t0, r0, 0 // PAL 5-bit Leap pattern, NTSC = 0

sll t0, t0, 16 // Move current value left by 16 bits

addi t0, t0, 3093 // Total Duration Of A Line In 1/4 Pixel

sw t0, $001C(a0) // 28

addi t0, r0, 3093

sll t0, t0, 16

addi t0, t0, 3093

sw t0, $0020(a0) // 32

addi t0, r0, 108

sll t0, t0, 16

addi t0, t0, 748

sw t0, $0024(a0) // 36

addi t0, r0, 37

sll t0, t0, 16

addi t0, t0, 511

sw t0, $0028(a0) // 40

addi t0, r0, 14

sll t0, t0, 16

addi t0, t0, 516

sw t0, $002C(a0) // 44

addi t0, r0, 0 // Horizontal Sub Pixel Offset

sll t0, t0, 16 // Move current value left by 16 bits

addi t0, t0, 512 // Horizontal Scale Up Factor

sw t0, $0030(a0) // 48

addi t0, r0, 0 // Horizontal Sub Pixel Offset

sll t0, t0, 16 // Move current value left by 16 bits

addi t0, t0, 1024 // Vertical Scale Up Factor

sw t0, $0034(a0) // 52

nop // Marker NOP's

nop

// Red

//lui t0, $FF00

// Green

//lui t0, $00FF

// Blue

// ori t0, r0, $FF00

// Transparency / Black

addiu t0, r0, $00FF

setupTopBlackBar:

// Buffer Start

lui t1, $A010

// Buffer End

lui t2, $A010

ori t2, t2, 13 * 320 * 4

nop

// Marker NOP's

nop

loopTopBlackBar:

sw t0, 0(t1)

bnel t1,t2,loopTopBlackBar

addi t1,t1,4

nop // Marker NOP

CalcMajorLines:

nop lui t3, $0001

addi t3, t3, $6300

nop

setupGreen:

nop

lui t0, $00FF

add t2, t2, t3

nop

loopGreen:

sw t0, 0(t1)

bnel t1,t2,loopGreen

addi t1,t1,4

nop // Marker NOP

setupYellow:

nop

lui t0, $FFFF

add t2, t2, t3

loopYellow:

sw t0, 0(t1)

bnel t1,t2,loopYellow

addi t1,t1,4

nop // Marker NOP

setupRed:

nop

lui t0, $FF00

add t2, t2, t3

loopRed:

sw t0, 0(t1)

bnel t1,t2,loopRed

addi t1,t1,4

nop // Marker NOP

setupBottomBlackBar:

addiu t0, r0, $00FF

ori t2, t2, 13 * 320 * 4

loopBottomBlackBar:

sw t0, 0(t1)

bnel t1,t2,loopBottomBlackBar

addi t1,t1,4

nop // Marker NOP

addi t2, t2, 71 * 320 * 4

Loop:

j Loop

nop // Delay Slot

The first step is to use 2 Macros that are included in Peter Lemon's Header

include "LIB/N64_GFX.INC" // Include Graphics Macros

N64_INIT() // Run N64 Initialization Routine

ScreenNTSC(320, 240, BPP32, $A0100000) // Screen NTSC: 320x240, 32BPP, DRAM $A0100000

That saved us almost 50 lines of code. There is a link in the sample if you need to download the other header file.

The first of these lines is clearly another include, which we need for the ScreenNTSC() Macro.

The N64_INIT() is a very simple Macro with those three instructions that keep the N64 from repeatedly resetting. ScreenNTSC() is a great macro to initialize the video to the exact same values we used previously. Please review the code for these in the header, they are very well written and use constants which improve their readability. Also note the use of psuedo instructions, my code was nearly 50 lines of code to perform the initialization, the header has closer to 25 lines.

Looking at the code that is left it seems we could probably use some of the new psuedo instructions and a couple of COLOR Constants. Writing our own macros is a topic for later.

If the Challenges from Lesson 3 seemed hard, I encourage you to go back and give them another try since you have learned these new features. Try the additional Challenge:

  1. Adjust the Transparency

    1. Pick a solid background color

    2. For each line increment the Transparency byte

    3. 240 lines on the screen so use the range $10 - $FF

Lesson 3- Lesson 5