Programming‎ > ‎

6502

This section of my website is about coding for the 6502 family of processors (650x). There are several members of this family. In the 21st Century, they are mainly used for embedded controllers. In the past, they were also used as controllers but more often as the primary CPU of products from Apple, Atari, Commodore, and Nintendo (either personal computers or game consoles).

 
Links  

Here are some links to 6502 coding that you may find useful:
 
Multiply  

The 650x processors do not have a genreric multiply (or divide) machine instructions; it must be done in software!  (Binary multiply/division is possible with a simple Rotate Left or Rotate Right instruction.) A simplistic formula has frequently been used/published which is essentially the same as long multiplication (or division) which we all learned (I hope) in elementry school.  Several alternatives have been attempted.  For example, using logarithms which is analogous to using a slide rule.  It can be fast but suffers from accuracy errors.  Another method (I'm not sure who developed it) is using a neat algebraic equation: a*b = f(a+b) - f(a-b), where f(x) is simply x*x/4.  This is perfectly accurate and much faster than "long multiply" routines!  As with most programming designs, there is a cost of memory usage to gain that speed.
 
Below is the fastest code that I have developed (or have read about) to perform an 8-bit by 8-bit multiply (the product is 16 bits).  It is for unsigned values.  Signed numbers can be used if several changes are made (basically when a+b and a-b are calculated, the overflow flag must be tested and adjustments made when that flag is set).
MultAX:
   ;Inputs:
      ; A = multiplicand
      ; X = multiplier
      ;both unsigned
   ;uses self-modification (not suitable for ROM)
   ;uses all registers (A,X,Y)
   ;Results:
      A(low)
      X(high)
 
   STA getLow+1
   STA getHigh+1
   SEC
   SBC idTab,X
   BCS doMult
   SBC #0
   EOR #255
doMult:
   TAY
getLow:
   LDA sqrLow,X
   SBC sqrLow,Y
   PHA
getHigh:
   LDA sqrHigh,X
   SBC sqrHigh,Y
   TAX
   PLA
   RTS

   .ALIGN 256 ;tables should start on page boundry

sqrLow:
   .BYTE {0*0/4, 1*1/4, 2*2/4, 3*3/4, 4*4/4, 5*5/4, ... 511*511/4} & 255

sqrHigh:
   .BYTE {0*0/4, 1*1/4, 2*2/4, 3*3/4, 4*4/4, 5*5/4, ... 511*511/4} / 256

idTab:
   .BYTE 0, 1, 2, 3, 4, 5, 6, ... 255
The code above requires (on average) 46 CPU cycles.  The "long multiplication" method requires well over 100 cycles!  So it is really a trade-off... do you want fast speed (1280=256*5 bytes for tables) or do you want small size (no tables, but 2x+ slow) ??  Only you can decide!!!
 
An alternate version which needs 256 fewer bytes for tables (only 1024 instead of 1280, or 20% fewer bytes) can easily be accomplished by slowing down the code by 2 cycles (so 48 instead of 46, or 4% slower) by changing this line:
 
   SBC idTab,X

into this:
 
   STX diff+1
diff:
   SBC #0    ;operand is modified!

Some experienced 650x programmers may be wondering were addition (ADC) is being performed, since I already said it was based on f(a PLUS b) - f(a-b).  Well, the answer to that is the use of the CPU's own addressing mode: index by X.  In other words, instructions like LDA sqrLow,X are implicitly doing the addition (when the CPU adds the X register to the base address).
 
You can reduce the size of tables further (only 256 entries instead of 512) if you use real ADC (not the sneaky index by X trick just described) and apply fix-up code at the end.  That version is a bit messy and about 50% slower so I won't show it here.

 
Mirror Byte  
There are several ways to accomplish this task (reverse the bits within a byte).  A lengthy discussion can be found here.  To spare you all the details, below is a quick summary of the results from several contributors:
 
Name Size (bytes) Delay (cycles) Speed (65536/cycles) Effeciency (speed/bytes) Power (speed2/bytes)
Generic 10 100 655 66 42.9k
Mafiosino 9 84 780 87 67.6k
H2Obsession 37 32 2048 55 113k
Mega-Table 260 6 10923 42 459k
Mafiosino has the best effeciency: about 3x slower than my code, but he uses 4x fewer bytes!  Once again, it really depends on your priorty of speed versus size.  In summary (0% bias), you should use Mafiosino if size is your primary concern, or a Mega-Table if speed is a primary concern... but I still believe my idea is great if you want to compromise. (I hate to muddy the waters, but there are other factors like RAM and Register use to consider too.)
 
Anyway, here are the codes...
Generic:
   LDX #7
Loop:
   ASL
   ROR temp
   DEX
   BPL Loop
   LDA temp
 
Mafiosino:
   STA temp
   LDA #1
Loop:
   LSR temp
   ROL
   BCC Loop

H2Obsession:
   TAY
   AND #$0f
   TAX
   TYA
   LSR
   LSR
   LSR
   LSR
   TAY
   LDA revTab,y
   EOR revTab,x
   AND #$0f
   EOR revTab,x
;....
revTab:
   .byte $00, $88, $44, $cc, $22, $aa, $66, $ee
   .byte $11, $99, $55, $dd, $33, $bb, $77, $ff
 
Mega_Table:
   TAX
   LDA revTab,x

revTab:
   .byte $00, $80, $40, $c0, $20, $a0, $60, $e0 ...
 

© H2Obsession 2013,2014
Comments