* * * * *
The speed of Microsoft's BASIC floating point routines
I was curious about how fast Microsoft's BASIC (Beginners' All-purpose
Symbolic Instruction Code) floating point [1] routines were. This is easy
enough to test, now that I can time assembly code inside the assembler [2].
The code calculates -2π^3/3! using Color BASIC routines, IEEE (Institute of
Electrical and Electronics Engineers)-754 single precision and double
precision.
First, Color BASIC:
-----[ Assembly ]-----
.tron timing
ms_fp ldx #.tau
jsr CB.FP0fx ; FP0 = .tau
ldx #.tau
jsr CB.FMULx ; FP0 = FP0 * .tau
ldx #.tau
jsr CB.FMULx ; FP0 = FP0 * .tau
jsr CB.FP1f0 ; FP1 = FP0
ldx #.fact3
jsr CB.FP0fx ; FP0 = 3!
jsr CB.FDIV ; FP0 = FP1 / FP0
neg CB.fp0sgn ; FP0 = -FP0
ldx #.answer
jsr CB.xfFP0 ; .answer = FP0
.troff
rts
tau fcb $83,$49,$0F,$DA,$A2
fact3 fcb $83,$40,$00,$00,$00
answer rmb 5
fcb $86,$A5,$5D,$E7,$30 ; precalculated result
-----[ END OF LINE ]-----
I can't use the .FLOAT directive here since that only supports either the
Microsoft format or IEEE-754 but not both. So for this test, I have to define
the individual bytes per float. The last line is what the result should be
(by checking a memory dump of the VM (Virtual Machine) after running). Also,
tao is 2π [3], just in case that wasn't clear. This ran in 8,742 cycles,
taking 2,124 instructions and 4.12 cycles per instruction (I modified the
assembler to record this additional information).
Next up, IEEE-754 single precision:
-----[ Assembly ]-----
.tron timing
ieee_single ldu #.tau
ldy #.tau
ldx #.answer
ldd #.fpcb
jsr REG
fcb FMUL ; .answer = .tau * .tau
ldu #.tau
ldy #.answer
ldx #.answer
ldd #.fpcb
jsr REG
fcb FMUL ; .answer = .answer * .tau
ldu #.answer
ldy #.fact3
ldx #.answer
ldd #.fpcb
jsr REG
fcb FDIV ; .answer = .answer / 3!
ldy #.answer
ldx #.answer
ldd #.fpcb
jsr REG
fcb FNEG ; .answer = -.answer
.troff
rts
fpcb fcb FPCTL.single | FPCTL.rn | FPCTL.proj
fcb 0
fcb 0
fcb 0
fdb 0
tau .float 6.283185307
fact3 .float 3!
answer .float 0
.float -(6.283185307 ** 3 / 3!)
-----[ END OF LINE ]-----
The floating point control block (.fpcb) configures the MC6839 to use single
precision, normal rounding and projective closure (not sure what that is, but
it's the default value). And it does calculate the correct result. It's
amazing that code written 42 years ago for an 8-bit CPU (Central Processing
Unit) works flawlessly. What is isn't is fast. This code took 14,204 cycles
over 2,932 instructions (average 4.84 cycles per instruction).
The higher than average cycle type could be due to position independent
addressing modes, but I'm not entirely sure what it's doing to take nearly
twice the time. The ROM (Read Only Memory) does use the IEEE-754 extended
format (10 bytes) internally, with more bit shifts to extract the exponent
and mantissa, but twice the time?
Perhaps it's code to deal with ±∞ and NaN (Not a Number)s.
The IEEE-754 double precision is the same, except for the floating point
control block configuring double precision and the use of .FLOATD instead of
FLOAT; otherwise the code is identical. The result, however, isn't. It took
31,613 cycles over 6,865 instructions (average 4.60 cycles per instruction).
And being twice the size, it took nearly twice the time as single precision,
which is expected.
The final bit of code just loads the ROMs into memory, and calls each
function to get the timing:
-----[ Assembly ]-----
org $2000
incbin "mc6839.rom"
REG equ $203D ; register-based entry point
org $A000
incbin "bas12.rom"
.opt test prot rw,$00,$FF ; Direct Page for BASIC
.opt test prot rx,$2000,$2000+8192 ; MC6839 ROM
.opt test prot rx,$A000,$A000+8192 ; BASIC ROM
.test "BASIC"
lbsr ms_fp
rts
.endtst
.test "IEEE-SINGLE"
lbsr ieee_single
rts
.endtst
.test "IEEE-DOUBLE"
lbsr ieee_double
rts
.endtst
-----[ END OF LINE ]-----
Really, the only surprising thing here was just how fast Microsoft BASIC was
at floating point.
[1]
https://en.wikipedia.org/wiki/Microsoft_Binary_Format
[2]
gopher://gopher.conman.org/0Phlog:2023/12/19.3
[3]
https://tauday.com/tau-manifesto
Email author at
[email protected]