Title: 8080 PRNG Part 2
Date: November 21, 2018
Tags: altair programming
========================================

Before moving on to the next program (which still doesn't work right), I'll dig
into the Xorshift PRNG, explain how it works and give a little overview of 8080
assembly programming.

Xorshift was chosen for being very low on resource requirements.  There are very
few, easy calculations and little memory used.  My initial approach is very
straightforward and I didn't take into account cycle times or code reuse or loop
costs.  There could be some improvements with loop unrolling or using
subroutines for reused code. And I could have chosen a set of fewer total
shifts.  Everything has trade-offs, though:  execution speed, program size,
stack size.

If you're not familiar with assembly at all, the 8080 instruction set is fairly
easy to wrap your head around.  Briefly, The 8080 CPU is 8 bits with 16 bits of
memory addressing.  This is accomplished by pairing 8 bit registers and
providing a few 16 bit OP codes that implicitly use the pairs.

The registers are B, C, D, E, H, L, and A (the accumulator).  There is also PSW
which contains the flag bits.  Mostly it's important as a register when making
subroutine calls.  Otherwise, you don't eve use it like a register.  When an OP
code references a register pair, specifying register B will operate on registers
B and C as a 16 bit value and the same for D and E and H and L.  H and L are
special in that when an OP code modifies memory, the 16 bit address is read from
those registers exclusively.  The accumulator is special in that many operations
assume use of that register.  These special properties means you have to
carefully chose which registers to use to reduce unnecessary shuffling or
reading and writing to memory to preserve data for later.

Here is the code as originally written.  I'll then pull out pieces and describe
what it's doing.  Later, we might play with optimization choices.

The code is presented with the following columns:
address, OP code, address parameter or immediate value in octal, OP code in
octal, comment.


000     LHLD            052 ; Load seed into HL
001             111Q    111 ; Address 000111Q - low bits
002             000Q    000 ; high bits
003     MVI A           076 ; Zero accumulator
004             000Q    000
005     CMP H           274 ; Check if H is zero
006     JNZ             302 ; If not 0, start
007             016Q    016 ; Starting address 000016Q - low bits
010             000Q    000 ; high bits
011     CMP L           275 ; Check if L is zero
012     JNZ             302 ; If not 0, start
013             016Q    016
014             000Q    000
015     HLT             166 ; If HL is zero, halt as an error
016     MVI A           076 ; Copy a into accumulator
017             015Q    015 ; a = 13
020     MOV B,H         104 ; Store original seed - H into B
021     MOV C,L         115 ; L into C
022     DAD H           051 ; 16bit left shift HL
023     DCR A           075 ; Decrement accumulator
024     JNZ             302 ; Check if we've shifted enough times
025             022Q    022 ; If no, jump to shift again
026             000Q    000
027     MOV A,L         175 ; Copy low bits
030     XRA C           251 ; XOR with seed low bits
031     MOV L,A         157 ; Store low bits
032     MOV A,H         174 ; Copy high bits
033     XRA B           250 ; XOR with seed high bits
034     MOV H,A         147 ; Store high bits
035     MOV B,H         104 ; Store copy of
036     MOV C,L         115 ; modified seed
037     MVI D           026 ; Copy b into D
040             011Q    011 ; b = 9
041     STC             067 ; Clear carry bit by setting to 1
042     CMC             077 ; and complimenting
043     MOV A,H         174 ; Copy high bits
044     RAR             037 ; Rotate right through carry (thereby saving what falls off the end)
045     MOV H,A         147 ; Store shifted high bits
046     MOV A,L         175 ; Copy low bits
047     RAR             037 ; Rotate right through carry (picking up what fell off the high bits)
050     MOV L,A         157 ; Store rotated low bits
051     DCR D           025 ; Decrement counter
052     JNZ             302 ; If not zero
053             041Q    041 ; shift again
054             000Q    000
055     MOV A,L         175 ; Copy low bits
056     XRA C           251 ; XOR with modified seed
057     MOV L,A         157 ; Store low bits
060     MOV H,A         174 ; Copy high bits
061     XRA B           250 ; XOR with modified seed
062     MOV H,A         147 ; Store high bits
063     MOV B,H         104 ; Store copy
064     MOV C,L         115 ; of modified seed
065     MVI A           076 ; Store c into accumulator
066             007Q    007 ; c = 7
067     DAD H           051 ; 16bit left shift
070     DCR A           075 ; Decrement counter
071     JNZ             302 ; If not zero
072             067Q    067 ; shift again
073             000Q    000 ;
074     MOV A,L         175 ; Copy low bits
075     XRA C           251 ; XOR with modified seed
076     MOV L,A         157 ; Save low bits
077     MOV A,H         174 ; Copy high bits
100     XRA B           250 ; XOR with modified seed
101     MOV H,A         147 ; Store high bits
102     SHLD            042 ; Store random number
103             111Q    111 ; as new seed
104             000Q    000 ; for next iteration
105     HLT             166 ; Stop

111             053Q    053 ; Initial seed 555 - low bits
112             002Q    002 ; high bits


72 lines, or memory addresses.  72 bytes or 576 bits.  No stack used except for
2 bytes to store the seed (which doubles as the current random number).  All
other data is hard coded.

## Breaking it down ##
# Input check #
One caveat of the Xorshift algorithm is that you can't have a seed of zero and
consequently the series will never include zero as a random value.  So we need
to make sure that either the high byte or the low byte of the seed are not zero
before we begin.  As soon as either byte is not zero, begin, otherwise we halt
as an error.


> 000  LHLD       052 ; Load seed into HL
> 001        111Q 111 ; Address 111Q
> 002        000Q 000
> 003  MVI A      076 ; Zero accumulator
> 004        000Q 000
> 005  CMP H      274 ; Check if H is zero
> 006  JNZ        302 ; If not 0, start
> 007        016Q 016 ; Starting address 000016Q - low bits
> 010        000Q 000 ; high bits
> 011  CMP L      275 ; Check if L is zero
> 012  JNZ        302 ; If not 0, start
> 013        016Q 016
> 014        000Q 000
> 015  HLT        166 ; If HL is zero, halt as an error


I start by reading the seed from address 000111Q which was chosen simply because
it was past the end of the program.  LHLD is one of the special OP codes that
assumes a register pair.  Then we need to set the accumulator to zero so we can
do comparisons with the bytes in H and L to use CMP to check for zero.  I'm not
sure if there a more efficient way to do this.  I haven't thought of one.

# Shift left #


> 016 MVI A       076 ; Copy a into accumulator
> 017        015Q 015 ; a = 13
> 020 MOV B,H     104 ; Store original seed - H into B
> 021 MOV C,L     115 ; L into C
> 022 DAD H       051 ; 16bit left shift HL
> 023 DCR A       075 ; Decrement accumulator
> 024 JNZ         302 ; Check if we've shifted enough times
> 025        022Q 022 ; If no, jump to shift again
> 026        000Q 000


Here is our first left shift.  First, save the seed unshifted as we'll need it
to XOR with later.  This uses the DAD instruction which does a 16 bit addition
of one of the register pairs.  Adding a number to itself is equivalent to a left
shift.  This allow me to operate on the full 16 bits in one instruction.  You'll
see that shifting right is a little bit more complicated.  Then all we do is
keep tack of how many times we shift.  When the accumulator reaches zero, the
zero bit will be set and JNZ won't jump and execution will continue on to
XORing.

# XORing #


> 027 MOV A,L     175 ; Copy low bits
> 030 XRA C       251 ; XOR with seed low bits
> 031 MOV L,A     157 ; Store low bits
> 032 MOV A,H     174 ; Copy high bits
> 033 XRA B       250 ; XOR with seed high bits
> 034 MOV H,A     147 ; Store high bits
> 035 MOV B,H     104 ; Store copy of
> 036 MOV C,L     115 ; modified seed


Using the accumulator to hold a byte a time, we can XOR with the original,
unshifted bytes.  This shifted and XORed result is then used going forward, we
don't have to care about the original seed any more.  We XOR three times so this
might benefit from code reuse by making into a subroutine.  While a subroutine
might mean less program memory used, it will mean stack space is needed and the
context switching might add execution time.  We can avoid the stack by jumping
but reducing hard coded addresses has it benefits, too.

# Shifting right #


> 037 MVI D       026 ; Copy b into D
> 040        011Q 011 ; b = 9
> 041 STC         067 ; Clear carry bit by setting to 1
> 042 CMC         077 ; and complimenting
> 043 MOV A,H     174 ; Copy high bits
> 044 RAR         037 ; Rotate right through carry
> 045 MOV H,A     147 ; Store shifted high bits
> 046 MOV A,L     175 ; Copy low bits
> 047 RAR         037 ; Rotate right through carry
> 050 MOV L,A     157 ; Store rotated low bits
> 051 DCR D       025 ; Decrement counter
> 052 JNZ         302 ; If not zero
> 053        041Q 041 ; shift again
> 054        000Q 000


Shifting right doesn't have a nice 16 bit shortcut.  We have to shift one byte
at a time and take advantage of the carry bit.  If we clear the carry bit to
start with we will always be pushing a zero into the high end.  If the catch
what falls off the low end of the byte in the carry bit, then when we shift the
low byte, it'll pull it into the high end.  RAR rotates the accumulator right
through the carry bit.  Rotate, not shift, meaning whatever falls off one end,
gets put back on the other end.  Using the carry bit prevents what falls off
from getting put back on. but instead gives us a place to hold onto it.


Right shift[0]


Right shifting only happens once, and I don't know of a cleaner way to do it.
The only optimization, I've mentioned previously, is to use XRA A at address 041
instead of STC and CMC to clear the carry bit in half the time.

After this, it's another XOR, a left shift, and a final XOR before writing out
the random number over the original seed in memory.  This is the random number
that can be consumed by another program and it also serves as the next seed to
generate the next number in the sequence.

Another benefit of putting the reused code sections into subroutines, is they
might be usable by other programs in the future.  So it matters what the overall
plan is.  If all we are doing is spitting out a random number without context,
avoiding CALLs and RETs and stack use is very efficient.  If we built a BASIC
type interpreter environment, these might become primitives for many operations
and we'd benefit from generalizing and creating subroutines in saved space.


[0] gopher://kagu-tsuchi.com:70/I/blog/images/PRNG_rshift.png