* * * * *
Unit testing from inside an assembler, part III
I'm done with the “unit testing” backend for my 6809 assembler [1]. The mini-
Forth engine [2] is working out fine, although the number of words increased
from 41 to 47 to support some conveniences (like indexing and string
comparison). It took some work to support, but the number of assertions one
can make in the code is extensive. For example, a test case for this bit of
code [3] (which I do need to discuss, but that's a post for another time)
looks like this:
-----[ Assembly ]-----
test sts [$3333,x]
next pshs pc,u,y,x,dp,b,a,cc
.test "STS"
ldx #.results
ldy #test
jsr init
.assert /x = .results , "X=results"
.assert /y = .next , "Y=next"
.assert @@/0,x = .address
.assert @@/2,x = .opcode
.assert @@/4,x = .operand
.assert @@/6,x = .topcode
.assert @@/8,x = .toperand
.assert @.nowrite = $12 , "overwrite"
.assert @/-47,s = $01 , "stack mod?"
.assert .address = "0800"z , "hex address"
.assert .opcode = "10EF"z , "hex opcode"
.assert .operand = "993333"z , "hex operand"
.assert .topcode = "STS"z , "decoded opcode"
.assert .toperand = "[3333,X]"z , "decoded operand"
rts
results fdb .address
fdb .opcode
fdb .operand
fdb .topcode
fdb .toperand
address rmb 5
opcode rmb 5
operand rmb 7
topcode rmb 9
toperand rmb 19
nowrite nop
.endtst
-----[ END OF LINE ]-----
The code being tested is a 6809 disassembler written in 6809 assembly code (I
wrote that a few years back—any testing now is academic at this point). The
TEST directive takes an optional string as the name of the test. If one
isn't given, it will use the last non-local label seen in the source code as
the name of the test. The first two lines:
-----[ Assembly ]-----
.assert /x = .results , "X=results"
.assert /y = .next , "Y=next"
-----[ END OF LINE ]-----
assert that the X register points to .results and the Y register points to
next. I use the leading slash to denote a register instead of a label. One
can use register names for labels and it's mostly unambiguous as the register
is typically part of the mnemonic itself. The only exception is for the A, B
and D registers, and then, only in the index addressing mode, as you can use
the A, B or D register for an offset. But in the context of the .ASSERT
directive it makes it easier to parse the intent if I use '/' to designate a
register. Each register, and each bit in the condition code register (like
/cc.z for the zero-flag) can be used. The bit after the comma, “X=results”,
will be printed if the check fails:
-----[ data ]-----
test-disasm.asm:7: warning: W0015: STS:13 X=results: test failed:
-----[ END OF LINE ]-----
(there can be text after the “test failed” bit, thus the colon).
The next few lines:
-----[ Assembly ]-----
.assert @@/0,x = .address
.assert @@/2,x = .opcode
.assert @@/4,x = .operand
.assert @@/6,x = .topcode
.assert @@/8,x = .toperand
-----[ END OF LINE ]-----
assert the contents of memory pointed to by X. The double “@” fetches 16 bits
from the address following, and in the first line, this is the address in the
X register. The second line retrieves the 16 bits from the address two bytes
past where the X register points to. You could write these lines as:
-----[ Assembly ]-----
.assert @@(/x + 2) = .opcode
-----[ END OF LINE ]-----
but a little syntactic sugar never hurts, and it mimics the native method of
using the index registers. This was possibly the hardest bit of code to
write, as the index addressing mode of the 6809, while great from an assembly
programmer's perspective, is a nightmare from an assembler-implementer's
perspective. Even here, where it's simplified, was a pain to get right, but I
think it was worth it.
The next two lines:
-----[ Assembly ]-----
.assert @.nowrite = $12 , "overwrite"
.assert @/-47,s = $01 , "stack mod?"
-----[ END OF LINE ]-----
check that the given addresses, nowrite and a byte down in the system stack,
contain certain 8-bit values. Each byte of the memory in the virtual 6809
system is filled with the value 1 (it can be changed on the command line), so
here, each untouched byte will contain a 1. I picked that value since it's an
illegal opcode, which the emulator will trap.
The final few lines:
-----[ Assembly ]-----
.assert .address = "0800"z , "hex address"
.assert .opcode = "10EF"z , "hex opcode"
.assert .operand = "993333"z , "hex operand"
.assert .topcode = "STS"z , "decoded opcode"
.assert .toperand = "[3333,X]"z , "decoded operand"
-----[ END OF LINE ]-----
does indeed, do a string compare. And therein lies a tale. Again, this is a
form of syntactic sugar:
-----[ Assembly ]-----
.assert @.address=$30 && @(.address+1)=$38 && @(.address+2)=$30 && @(.address+3)=$30 && @(.address+4)=0
-----[ END OF LINE ]-----
This was the second hardest bit to to support, is a bit fragile, and, if I'm
honest, a hack. The string literal has to be on the right hand side of the
conditional, and worse, there's no easy way to enforce this in the assembler
(so I currently don't). Third, the second string has to be a literal string—
you can't compare two different memory regions from the 6809 VM (Virtual
Machine). There's also a limit of only one string literal per .ASSERT
directive, again, because supporting more than one would vastly complicate
the already somewhat complicated code (this “unit test“ backend is already
30% of the entire assembler).
To keep from having to add a ton of code for the conditional checks to
support two different primitive types, or to keep from having to create a
duplicate set of string conditionals, I cheated (or came up with a brilliant
hack—take your pick). The code generated is:
-----[ miniForth ]-----
VM_LIT .address
VM_SCMP
VM_EQ
VM_EXIT
-----[ END OF LINE ]-----
That VM_SCMP is hiding things—it knows which string literal to use (as it's
part of the VM program and there's only space for one string literal per
ASSERT directive) but it also leaves two values on the stack: -1,0 if the
result is less than, 1,0 if the result is greater than, and 0,0 if the result
is equal. This way, the conditional operators can work as is.
Oh, those “z”s on the end of each string literal? Well, the assembler
supports several methods of storing string data in memory. There's the
standard C NUL terminated strings; the OS-9 [4] method of setting bit 7 of
the last character of the string, and the sometimes used method where the
first character of the string is actually the length. I originally had
separate non-standard directives to support these methods, so when I wanted
to support string-comparisons, I needed a way to support these methods. Then
it hit me—the use of a suffix on the string—“Z” for the NUL terminated one
(“Z” stands for “zero”), “H” for the bit 7 set (“H” for “high-bit”) and “C”
for counted strings. And if I'm using the suffixes for the “unit test”
backend, why not in general? So I replaced the .ASCIIZ and .ASCIIH directives
(I was contemplating adding counted strings but I never got around to adding
ASCIIC) with just .ASCII and the use of a suffix (no suffix, string is left
as-is).
So, back on track. The expressions can get quite involved. Some examples:
-----[ Assembly ]-----
.assert /b = -(@lfsr & 1) & $B4
.assert @tvalue = $10*3+(1<<3)+2*2+(7-5)+1
.assert @@(tvalue + 1) = $10+3+1<<3+2*2+7-5+1
-----[ END OF LINE ]-----
You are also not limited to using the .ASSERT, .TRON (TRace ON) and .TROFF
(TRace OFF) directives inside a .TEST directive. You can put them anywhere in
the codebase, and if that code is executed as part of a “unit test”, they'll
trigger (and if you aren't using the “unit test” backend, they're ignored
outright).
There are other changes too—each backend will parse its own command line
options, I added some new warnings (such as a waring for self-modifying
code), and the memory of the virtual 6809 can have various protections (read-
only, write-only, execute-only, trace) set from the command line for further
testing.
Now I just need to update the README.txt file and release the code.
[1]
https://github.com/spc476/a09
[2]
gopher://gopher.conman.org/0Phlog:2023/12/01.1
[3]
https://github.com/spc476/6809-DISASM
[4]
https://en.wikipedia.org/wiki/OS-9
Email author at
[email protected]