The sequenced music in the Zelda 64 games follows a MIDI type format which is a slightly updated version of the format used by earlier games such as Super Mario 64. There are generally three different parts to the sequences:
Sequence Header
This is the part of the sequence that controls things such as master volume, tempo and loop settings.
Channel Header
Controls volume, pan, etc. and points to music data to be played back on each channel.
Music Data
Contains the data for which notes should be played, their velocity, length, etc.
Format
From here, all numbers will be given in hexadecimal unless stated otherwise.
Sequence Header Commands
D3 xx
Seems to be something to do with the sequence type or format; for Super Mario 64, xx is usually 80, for Zelda 64, xx is usually 20.
D5 xx
Unknown. xx usually takes the value of 32. For sequences that don't play any music xx is 46.
D7 xxxx
Enables channels given by xxxx (boolean, each channel is one bit).
9x yyyy
Points to a channel header offset. x is the channel number (0 - F) and yyyy is the offset of that channel's header relative to the start of the sequence file.
DB xx
Master volume control, xx is the volume.
DD xx
Tempo control, xx is the tempo value in beats per minute.
CC yyxx
Unknown. yy starts at zero and increases each time the command appears, xx is (always?) 73.
FD xx / FD yyyy
Timestamp (number of 'ticks' to wait before the next command is read, relative to tempo), variable length. If xx goes above 7F, add 8000 to it to get yyyy.
FB xxxx
Offset to loop from, xxxx is the offset relative to the start of the sequence file.
D6 xxxx
Disables channels given by xxxx.
FF
Marks the end of the sequence header.
Note: some sequences have the following data after the first four bytes of their sequence header:
D7FFFF 8776CCFF7786F3 vv F2 ww C801F3 xx [C801FA yyyy] FB zzzz
vv, ww and xx are unknowns, yyyy and zzzz seem to be pointers to 9x commands for different parts of the sequence within the sequence header. The part in square brackets is repeated in some sequences. Its purpose is currently unknown but it may be to do with continuing playback mid-sequence after returning to the area from a house/shop.
Channel Header Commands
C4
Initialises the channel for music playback.
8x yyyy
Points to music data to be played on the current channel. x is the 'note layer' to be used (8 - B), up to a maximum of four note layers can be loaded per channel. yyyy is the offset of the music data to be played, relative to the start of the sequence file.
DF xx (yy)
Channel volume. xx is the volume value, yy is a timestamp used between control changes (when the volume is changing constantly).
DD xx
Channel pan. xx is the pan amount, 00 = hard left, 3F = centre, 7F = hard right.
E9 xx
Priority. Unknown how it is used, takes the value of xx.
D4 xx
Effects level (echo). xx is the effect amount.
D8 xx (yy)
Vibrato amount, value of xx. Higher values can produce odd sounds. yy is a timestamp used between control changes.
D3 xx (yy)
Pitch bend amount, signed, value of xx. yy is a timestamp used between control changes.
C2 xx
Transposition, signed. xx is the number of semitones to transpose by.
C1 xx
Sets the instrument number, xx, to be used for the current channel.
FD xx / FD yyyy
Timestamp (see sequence header commands).
FF
Marks the end of the channel header.
Music Data
Things get a little more complicated here.
nn tt(tt) vv gg
Play note command, valid when nn >= 00 and nn <= 3F, where:
nn = Note value
tt(tt) = Variable length timestamp
vv = Note velocity
gg = Gate time
nn tt(tt) vv
Play note command, valid when nn >= 40 and nn <= 7F, where:
nn = Note value + 40
tt(tt) = Variable length timestamp
vv = Note velocity
Gate time in this case is the same as was used for the previous play note command.
nn vv gg
Play note command, valid when nn >= 80 and nn <= BF, where:
nn = Note value + 80
vv = Note velocity
gg = Gate time
Timestamp in this case is the same as was used for the previous play note command.
C0 xx / C0 yyyy
Timestamp / rest. Used when no music needs to be played for a set amount of time, xx/yyyy is the variable length timestamp value.
FC xxxx
Jumps to offset xxxx relative to the start of the sequence file and plays the data from there until it reaches the next FF command.
FF
Marks the end of the current music data.
Note: It seems that some of the channel header commands are valid for use in the music data but the only one that's really useful is the C2 command (transposition).
Sequence Pointer Table
The sequence pointer table starts at 0xBCC6A0 in the OoT Debug ROM and follows this format:
xxxx 0000 0000 0000 0000 0000 0000 0000
xxxx = number of sequences
The rest of the table looks like:
xxxxxxxx yyyyyyyy zzzz 0000 00000000
xxxxxxxx = Pointer to the start of the sequence in the sequenced music file (Audioseq)
yyyyyyyy = Length in bytes of the sequence
zzzz = Sequence type
Moving Audioseq
To move the Audioseq file to another location in ROM for whatever reason, you will need to correct its entry in the file table, fix CRC and also change the assembly code used to load sequences from the file. This can be found at 0xB5A4AC in the Debug ROM and looks like:
3C05 xxxx 24A5 yyyy
or if you prefer the assembly code:
LUI $A1,xxxx
ADDIU $A1,$A1,yyyy
xxxx = upper half of address
yyyy = lower half of address
Remember that these numbers are signed so if yyyy is greater than 0x7FFF you need to add 1 to xxxx.