Title: 8080 PRNG Part 2

Title: 8080 PRNG Part 2
Date: November 21, 2018
Tags: altair programming
========================================

Before moving on to the next program (which still doesn't work right), I'll dig
into the Xorshift PRNG, explain how it works and give a little overview of 8080
assembly programming.

Xorshift was chosen for being very low on resource requirements. There are very
few, easy calculations and little memory used. My initial approach is very
straightforward and I didn't take into account cycle times or code reuse or loop
costs. There could be some improvements with loop unrolling or using
subroutines for reused code. And I could have chosen a set of fewer total
shifts. Everything has trade-offs, though: execution speed, program size,
stack size.

If you're not familiar with assembly at all, the 8080 instruction set is fairly
easy to wrap your head around. Briefly, The 8080 CPU is 8 bits with 16 bits of
memory addressing. This is accomplished by pairing 8 bit registers and
providing a few 16 bit OP codes that implicitly use the pairs.

The registers are B, C, D, E, H, L, and A (the accumulator). There is also PSW
which contains the flag bits. Mostly it's important as a register when making
subroutine calls. Otherwise, you don't eve use it like a register. When an OP
code references a register pair, specifying register B will operate on registers
B and C as a 16 bit value and the same for D and E and H and L. H and L are
special in that when an OP code modifies memory, the 16 bit address is read from
those registers exclusively. The accumulator is special in that many operations
assume use of that register. These special properties means you have to
carefully chose which registers to use to reduce unnecessary shuffling or
reading and writing to memory to preserve data for later.

Here is the code as originally written. I'll then pull out pieces and describe
what it's doing. Later, we might play with optimization choices.

The code is presented with the following columns:
address, OP code, address parameter or immediate value in octal, OP code in
octal, comment.

000 LHLD 052 ; Load seed into HL
001 111Q 111 ; Address 000111Q - low bits
002 000Q 000 ; high bits
003 MVI A 076 ; Zero accumulator
004 000Q 000
005 CMP H 274 ; Check if H is zero
006 JNZ 302 ; If not 0, start
007 016Q 016 ; Starting address 000016Q - low bits
010 000Q 000 ; high bits
011 CMP L 275 ; Check if L is zero
012 JNZ 302 ; If not 0, start
013 016Q 016
014 000Q 000
015 HLT 166 ; If HL is zero, halt as an error
016 MVI A 076 ; Copy a into accumulator
017 015Q 015 ; a = 13
020 MOV B,H 104 ; Store original seed - H into B
021 MOV C,L 115 ; L into C
022 DAD H 051 ; 16bit left shift HL
023 DCR A 075 ; Decrement accumulator
024 JNZ 302 ; Check if we've shifted enough times
025 022Q 022 ; If no, jump to shift again
026 000Q 000
027 MOV A,L 175 ; Copy low bits
030 XRA C 251 ; XOR with seed low bits
031 MOV L,A 157 ; Store low bits
032 MOV A,H 174 ; Copy high bits
033 XRA B 250 ; XOR with seed high bits
034 MOV H,A 147 ; Store high bits
035 MOV B,H 104 ; Store copy of
036 MOV C,L 115 ; modified seed
037 MVI D 026 ; Copy b into D
040 011Q 011 ; b = 9
041 STC 067 ; Clear carry bit by setting to 1
042 CMC 077 ; and complimenting
043 MOV A,H 174 ; Copy high bits
044 RAR 037 ; Rotate right through carry (thereby saving what falls off the end)
045 MOV H,A 147 ; Store shifted high bits
046 MOV A,L 175 ; Copy low bits
047 RAR 037 ; Rotate right through carry (picking up what fell off the high bits)
050 MOV L,A 157 ; Store rotated low bits
051 DCR D 025 ; Decrement counter
052 JNZ 302 ; If not zero
053 041Q 041 ; shift again
054 000Q 000
055 MOV A,L 175 ; Copy low bits
056 XRA C 251 ; XOR with modified seed
057 MOV L,A 157 ; Store low bits
060 MOV H,A 174 ; Copy high bits
061 XRA B 250 ; XOR with modified seed
062 MOV H,A 147 ; Store high bits
063 MOV B,H 104 ; Store copy
064 MOV C,L 115 ; of modified seed
065 MVI A 076 ; Store c into accumulator
066 007Q 007 ; c = 7
067 DAD H 051 ; 16bit left shift
070 DCR A 075 ; Decrement counter
071 JNZ 302 ; If not zero
072 067Q 067 ; shift again
073 000Q 000 ;
074 MOV A,L 175 ; Copy low bits
075 XRA C 251 ; XOR with modified seed
076 MOV L,A 157 ; Save low bits
077 MOV A,H 174 ; Copy high bits
100 XRA B 250 ; XOR with modified seed
101 MOV H,A 147 ; Store high bits
102 SHLD 042 ; Store random number
103 111Q 111 ; as new seed
104 000Q 000 ; for next iteration
105 HLT 166 ; Stop

111 053Q 053 ; Initial seed 555 - low bits
112 002Q 002 ; high bits

72 lines, or memory addresses. 72 bytes or 576 bits. No stack used except for
2 bytes to store the seed (which doubles as the current random number). All
other data is hard coded.

## Breaking it down ##
# Input check #
One caveat of the Xorshift algorithm is that you can't have a seed of zero and
consequently the series will never include zero as a random value. So we need
to make sure that either the high byte or the low byte of the seed are not zero
before we begin. As soon as either byte is not zero, begin, otherwise we halt
as an error.

> 000 LHLD 052 ; Load seed into HL
> 001 111Q 111 ; Address 111Q
> 002 000Q 000
> 003 MVI A 076 ; Zero accumulator
> 004 000Q 000
> 005 CMP H 274 ; Check if H is zero
> 006 JNZ 302 ; If not 0, start
> 007 016Q 016 ; Starting address 000016Q - low bits
> 010 000Q 000 ; high bits
> 011 CMP L 275 ; Check if L is zero
> 012 JNZ 302 ; If not 0, start
> 013 016Q 016
> 014 000Q 000
> 015 HLT 166 ; If HL is zero, halt as an error

I start by reading the seed from address 000111Q which was chosen simply because
it was past the end of the program. LHLD is one of the special OP codes that
assumes a register pair. Then we need to set the accumulator to zero so we can
do comparisons with the bytes in H and L to use CMP to check for zero. I'm not
sure if there a more efficient way to do this. I haven't thought of one.

# Shift left #

> 016 MVI A 076 ; Copy a into accumulator
> 017 015Q 015 ; a = 13
> 020 MOV B,H 104 ; Store original seed - H into B
> 021 MOV C,L 115 ; L into C
> 022 DAD H 051 ; 16bit left shift HL
> 023 DCR A 075 ; Decrement accumulator
> 024 JNZ 302 ; Check if we've shifted enough times
> 025 022Q 022 ; If no, jump to shift again
> 026 000Q 000

Here is our first left shift. First, save the seed unshifted as we'll need it
to XOR with later. This uses the DAD instruction which does a 16 bit addition
of one of the register pairs. Adding a number to itself is equivalent to a left
shift. This allow me to operate on the full 16 bits in one instruction. You'll
see that shifting right is a little bit more complicated. Then all we do is
keep tack of how many times we shift. When the accumulator reaches zero, the
zero bit will be set and JNZ won't jump and execution will continue on to
XORing.

# XORing #

> 027 MOV A,L 175 ; Copy low bits
> 030 XRA C 251 ; XOR with seed low bits
> 031 MOV L,A 157 ; Store low bits
> 032 MOV A,H 174 ; Copy high bits
> 033 XRA B 250 ; XOR with seed high bits
> 034 MOV H,A 147 ; Store high bits
> 035 MOV B,H 104 ; Store copy of
> 036 MOV C,L 115 ; modified seed

Using the accumulator to hold a byte a time, we can XOR with the original,
unshifted bytes. This shifted and XORed result is then used going forward, we
don't have to care about the original seed any more. We XOR three times so this
might benefit from code reuse by making into a subroutine. While a subroutine
might mean less program memory used, it will mean stack space is needed and the
context switching might add execution time. We can avoid the stack by jumping
but reducing hard coded addresses has it benefits, too.

# Shifting right #

> 037 MVI D 026 ; Copy b into D
> 040 011Q 011 ; b = 9
> 041 STC 067 ; Clear carry bit by setting to 1
> 042 CMC 077 ; and complimenting
> 043 MOV A,H 174 ; Copy high bits
> 044 RAR 037 ; Rotate right through carry
> 045 MOV H,A 147 ; Store shifted high bits
> 046 MOV A,L 175 ; Copy low bits
> 047 RAR 037 ; Rotate right through carry
> 050 MOV L,A 157 ; Store rotated low bits
> 051 DCR D 025 ; Decrement counter
> 052 JNZ 302 ; If not zero
> 053 041Q 041 ; shift again
> 054 000Q 000

Shifting right doesn't have a nice 16 bit shortcut. We have to shift one byte
at a time and take advantage of the carry bit. If we clear the carry bit to
start with we will always be pushing a zero into the high end. If the catch
what falls off the low end of the byte in the carry bit, then when we shift the
low byte, it'll pull it into the high end. RAR rotates the accumulator right
through the carry bit. Rotate, not shift, meaning whatever falls off one end,
gets put back on the other end. Using the carry bit prevents what falls off
from getting put back on. but instead gives us a place to hold onto it.

Right shift[0]

Right shifting only happens once, and I don't know of a cleaner way to do it.
The only optimization, I've mentioned previously, is to use XRA A at address 041
instead of STC and CMC to clear the carry bit in half the time.

After this, it's another XOR, a left shift, and a final XOR before writing out
the random number over the original seed in memory. This is the random number
that can be consumed by another program and it also serves as the next seed to
generate the next number in the sequence.

Another benefit of putting the reused code sections into subroutines, is they
might be usable by other programs in the future. So it matters what the overall
plan is. If all we are doing is spitting out a random number without context,
avoiding CALLs and RETs and stack use is very efficient. If we built a BASIC
type interpreter environment, these might become primitives for many operations
and we'd benefit from generalizing and creating subroutines in saved space.

[0] gopher://kagu-tsuchi.com:70/I/blog/images/PRNG_rshift.png