Title: Altair Boot Loader
Date: November 24, 2019
Tags: altair programming
========================================

Entering everything via the front panel was going to get overly tedious very
quickly.  Early on, boot loaders were produced that would bootstrap more complex
programs read from paper tape, cassette, and eventually floppy disk.

Famously, Microsoft got it's start by writing a BASIC interpreter for the new
Altair 8800.  Of course, no one was expected to load BASIC via the front panel
switches every time they turned the machine on.  BASIC was available on paper
tape and cassette.  Some teletype models had paper tape readers built in and
cassette players were, of course, quite common in the household.

To load BASIC or other programs from these devices into the Altair's memory
quickly, very simple programs were included to bootstrap the process.  The
bootstrap program had to be entered manually so they were very short with little
to no feedback.  They simply copied bytes from the serial port or cassette
interface and wrote them to memory.

As my programs are planned to get more complex, I needed similar functionality
to more quickly load larger programs.  I don't have a paper tape punch and
reader so I'll cheat and use my laptop to stream the bytes to the serial port.
To the Altair, it's all the same and I've already decided to cheat and write
code in a text editor and use my laptop as a terminal.  I might as well use it
as a paper tape reader, too.

Since paper tapes or cassettes store binary data, I had to do a little extra
work to convert my streamed ASCII hand assembled code into binary data before
writing to memory.  Also, since I'm connecting to the Altair with a terminal
emulator, I included some simple feedback.

Let's take a look at an actual boot loader from the day.

4K BASIC boot loader:

0000            org     0
               ;
               ; 2SIO Loader for 4K BASIC Version 3.2
               ;
               ;  ** Set sense switches A11, A10 ON **
               ;
0000    3E03            mvi     a,3             ;reset then init ACIA
0002    D310            out     010h
0004    3E15            mvi     a,015h          ;015h for 8N1, 011h for 8N2
0006    D310            out     010h

               ; H = msb of load address
               ; L = lsb of load address = length of loader = leader byte

0008    21AE0F          lxi     h,00faeh        ;4K BASIC V3.2
               ;       lxi     h,00fc2h        ;4K BASIC V4.0
               ;       lxi     h,01fc2h        ;8K BASIC v4.0
               ;       lxi     h,03fc2h        ;Extended BASIC V4.0
               ;       lxi     h,07ec2h        ;Extended Disk BASIC v5.0

000B    311A00  loop    lxi     sp,stack        ;init SP so a RET jumps to loop
000E    DB10            in      010h            ;get 2sio status
0010    0F              rrc                     ;new byte available?
0011    D0              rnc                     ;no (jumps back to loop)
0012    DB11            in      011h            ;get the byte
0014    BD              cmp     l               ;new byte = leader byte?
0015    C8              rz                      ;yes (jumps back to loop)
0016    2D              dcr     l               ;not leader, decrement address
0017    77              mov     m,a             ;store the byte (reverse order)
0018    C0              rnz                     ;loop until L = 0
0019    E9              pchl                    ;jump to code just downloaded

001A    0B00    stack   dw      loop
001C                   end

The format is a little different than I use because it's in assembler printable
output.  Addresses are in the first column, and assembled bytes are in the
second column, both in hexadecimal.  Parameters to opcodes are on the same line,
but you'll see the address will increment by the number of bytes the parameters
used up.

You can see how short this is.  It sets up the serial port, the length of the
expected data is hard coded but they use part of the length as leader bytes on
the tape, and bytes are read in a loop.  The paper tape leader is skipped and
then data is read and written to memory.  BASIC is then executed automatically.

Nothing remarkable except they use a little trick to save bytes by setting the
stack pointer to the address of the start of the loop so they can use returns
instead of jumps.  Returns don't need to be provided with an address like jumps
do.  Saves 2 bytes each time.  However, the LXI at address 000BH to reset the
stack pointer and the stack value defined at 001AH costs 5 bytes.  There are 3
returns which, if written as jumps, would have added 6 bytes.  So ultimately
they save one byte.  Maybe not wort much here, but with a larger set of
comparisons that could add up to some savings.  Every byte counts, especially
when entering data manually through the front panel.

My boot loader:

; Requires first 2 bytes, high then low, as start address in xx xxx xxx form
; Reads bytes in xx xxx xxx format until invalid character

; start up
177400  LXI SP          061 ; Set stack pointer
177401          000Q    000
177402          000Q    000
177403  MVI A           076 ; Reset ACIA
177404          003Q    003
177405  OUT             323 ; for terminal port
177406          020Q    020
177407  MVI A           076 ; 9600 baud 8N1 no interrupts
177410          025Q    025
177411  OUT             323 ; for terminal port
177412          020Q    020

; print prompt
177413  MVI B           006
177414          '\r'    012
177415  CALL            315 ; print char
177416          142Q    142
177417          377Q    377
177420  MVI B           006
177421          '\n'    015
177422  CALL            315 ; print char
177423          142Q    142
177424          377Q    377
177425  MVI B           006
177426          '<'     074
177427  CALL            315 ; print char
177430          142Q    142
177431          377Q    377

; read address
; as  xx xxx xxx  xx xxx xxx
; not x xxx xxx  xxx xxx xxx
177432  CALL            315 ; get byte
177433          070Q    070
177434          377Q    377
177435  MOV H,D         142 ; store high byte
177436  CALL            315 ; get byte
177437          070Q    070
177440          377Q    377
177441  MOV L,D         152 ; store low byte

; read opcode
177442  CALL            315 ; read byte
177443          070Q    070
177444          377Q    377

; write
177445  MOV A,D         172 ; put opcode in A
177446  MOV M,A         167 ; write opcode to address
177447  INX H           043 ; Increment address
177450  MVI B           006 ; print dot
177451          '.'     056
177452  CALL            315 ; print char
177453          142Q    142
177454          377Q    377
177455  JMP             303 ; goto read opcode
177456          042Q    042
177457          377Q    377

; error
177460  MVI B           006 ; Error
177461          '!'     041
177462  CALL            315 ; print char
177463          142Q    142
177464          377Q    377
177465  JMP             303 ; stop here
177466          065Q    065
177467          377Q    377

; read byte
; first char (only 2 bits allowed)
177470  MVI E           036 ; set number of bits
177471          374Q    374
177472  MVI D           026 ; clear opcode
177473          000Q    000
177474  MVI B           006 ; set char counter
177475          003Q    003
; char
177476  CALL            315 ; get char
177477          131Q    131
177500          377Q    377
177501  MOV C,A         117 ; save in C
177502  ANA E           243 ; check for valid octal
177503  XRI             356
177504          060Q    060
177505  JNZ             302 ; error
177506          060Q    060
177507          377Q    377
177510  MOV A,D         172 ; get opcode
177511  RLC             007 ; shift 3
177512  RLC             007
177513  RLC             007
177514  MOV D,A         127 ; store in D
177515  MOV A,C         171 ; restore char
177516  ANI             346 ; convert from ASCII to number
177517          007Q    007
177520  ADD D           202 ; add existing octal
177521  MOV D,A         127 ; store back to D
177022  MVI E           036 ; set number of bits
177023          370Q    370
177524  DCR B           005
177525  JNZ             302 ; next char
177526          076Q    076
177527          377Q    377
177530  RET             311

; get char
177531  IN              333 ; wait for character
177532          020Q    020
177533  RRC             017
177534  JNC             322 ; goto wait
177535          131Q    131
177536          377Q    377
177537  IN              333 ; read character
177540          021Q    021
177541  RET             311

; write char to terminal
; expects character in B register
177542  IN              333 ; Read status
177543          020Q    020
177544  RRC             017
177545  RRC             017
177546  JNC             322 ; Not ready to send
177547          142Q    142
177550          377Q    377
177551  MOV A,B         170 ; Get char from B register
177552  OUT             323 ; Write to terminal
177553          021Q    021
177554  RET             311

Right off the bat, this is quite a bit longer.  I need to accommodate arbitrary
programs so I don't hardcode a length and don't bother with leader bytes.  I
read all data as ASCII characters instead of binary bytes so I can manually type
programs in if I want to.  That requires a large conversion routine.  I also
allow the first two bytes to define a start address.  I print a prompt as I
originally intended this to be a more interactive program but then remembered
it's just a bootstrap.  The interactive program can be loaded next.  I left it
in so I'd know my terminal was connected and the boot loader was running.  I
also added feedback for each byte written to memory so I know it was progressing
and a minimalist error output when invalid data was encountered.  Still pretty
simple but versatile enough to load anything I need.  I have another copy of
this loader assembled to a 000000Q start address.  This way I can enter the boot
loader at either end of the memory and then load a program at the other end.  I
decided not to automatically jump to the start address once the load was
finished so I could reset and load more data to another location.

The only thing worth looking at in my code is 'read byte'.  Since I'm using
octal for data and assembling to octal, it takes a little translating to get a
byte from ASCII characters.  The first character of three always only represents
2 bits, digits 0 through 3.  I loop 3 times, for the three characters.  First, I
store a mask to remove bits other than the expected number of octal bits in the
E register.  I set it to 374Q for the first loop and AND it with the character
to test that only the first 2 bits had data, once the 060Q for ASCII is removed.
I store the bits to D by shifting the bits then adding them to D.  I then set
the mask to 370Q to allow for 3 bits, digits 0 - 7.  Loop twice more and D will
contain the single byte represented by 3 ASCII characters.  I see now that I
don't need to clear the D register at 177472Q since any junk in there will be
shifted out by the time the loop is done.  I tend to be overly cautious about
that sort of thing.

Working with hexadecimal would be easier since it's always 4 bits, and you only
need 2 ASCII characters to represent a byte.

A little long, and a little flawed, but it'll help me enter longer programs
more quickly.  I'll need the help soon.

--------------------------------------------------------------------------------

There was another way to enter programs into the Altair.  A little easier than
the front panel, but not as easy as loading from media with a boot loader.  And
if you wrote the program yourself on paper, you'd need a way to load it into the
Altair before you could save it to media, if you had that option.

Monitors, small programs that allowed you to examine, write to, and read memory
filled that gap.  I'll talk about those next.


[0] https://en.wikipedia.org/wiki/Altair_BASIC
   gopher://gopherpedia.com:70/0/Altair BASIC