\ clear screen, cls clears white, bls clears black
: cls vbase@ 9600 FOR AFT 0 over ! cell+ THEN NEXT drop ;
: bls vbase@ 9600 FOR AFT ffffffff over ! cell+ THEN NEXT drop ;
\ fill screen with random values
: noise vbase@ 9600 FOR AFT random over ! cell+ THEN NEXT drop ;
Duplicate table of values
=========================
\ ( addr n ) duplicate n values of table starting at position n
: dupcells cells 2dup + swap cmove ;
Create permutation
==================
\ ( addr n ) create sequence of n values
: seq FOR AFT r@ over ! cell+ THEN NEXT drop ;
\ ( a b -- ) exchange content of two cells
: exg 2dup @ swap @ rot ! swap ! ;
\ ( mask addr -- addr+rnd ) add masked random cell offset to addr
: +random swap random and cells + ;
\ ( addr n -- ) shuffle n values at addr
: shuffle
dup 1 - ( addr n mask )
rot rot ( mask addr n )
FOR AFT
2dup +random ( mask addr addr+rnd )
over r@ cells + ( mask addr addr+rnd addr+i )
exg ( mask addr )
THEN NEXT 2drop ;
: randombytes FOR AFT random over c! 1 + THEN NEXT drop ;
Square root
===========
: bmask 1 swap lshift ;
: sqrt 0 f FOR 2dup r@ bmask dup rot or dup * rot u>= and or NEXT nip ;
Table plotter
=============
\ just a test for caching the vbase@ result
: vbase@ dolit [ vbase@ ] , ;
\ draw dots without clipping
\ bdot ( x y ) draws black, wdot ( x y ) draws white
: vcell a0 * vbase@ + over 5 rshift 2 lshift + ;
: mask not 1f and 1 swap lshift ;
: bdot vcell swap mask over @ or swap ! ;
: wdot vcell swap mask not over @ and swap ! ;
\ add clipping
: bdot 2dup 3c0 u< swap 500 u< and IF bdot exit THEN 2drop ;
: wdot 2dup 3c0 u< swap 500 u< and IF wdot exit THEN 2drop ;
\ ( x y -- x ) draw scaled and vertically centered dot, keep x
: svdot plotscale @ arshift ycenter over swap bdot ;
\ ( addr n -- ) plot values vertically centered on screen
: plot
0 swap ( addr x n )
FOR AFT ( addr x )
2dup cells + @ ( addr x y )
svdot ( addr x )
1+
THEN NEXT
2drop ;
\ ( addr n -- ) plot byte values
: cplot 0 swap FOR AFT 2dup + c@ svdot 1+ THEN NEXT 2drop ;
Sine table generator
====================
variable x variable y
: >xy y ! x ! ;
: xy> x @ y @ ;
\ Calculate points for 0 to 45 degrees rotating a point (4000,0) to
\ a small angle and using the result for the next iteration.
\ Note that computational error accumulates over time. So the
\ number of accurate values and precision is limited.
\ Use y values for sin values 0 to 45 degrees.
\ Use x values for sin values of 90-45 to 45 degrees.
\ Create the remaining parts of the curve by mirroring the computed part.
: cosa 3fffb ;
: sina 649 ;
: xysincos cosa * swap sina * ;
: int 12 rshift ;
: rotate 2dup swap xysincos - int rot rot xysincos + int ;
variable sintab 400 cells allot
: gensin90
4000 0 >xy
7f FOR
y @ over 7f r@ - cells + !
x @ over 80 r@ + cells + !
xy> rotate >xy
NEXT drop
;
: scale90 ff FOR dup @ 2 lshift over ! cell+ NEXT drop ;
: gensin180 ff FOR dup r@ cells + @ over 1ff r@ - cells + ! NEXT drop ;
: gensin360 1ff FOR dup r@ cells + @ negate over 200 r@ + cells + ! NEXT drop ;
: gensintab sintab dup gensin90 dup scale90 dup gensin180 gensin360 ;
: .sintab sintab 100 .table ;
: plotsintab home cls 8 plotscale ! sintab 400 plot ;
\ VBL count measurements for compile test
\
\ compile test (forth version of ask_string_match)
\ 4277 on Hatari
\ 3532 on TT
\
\ assembly version of ask_string_match (with both long-wise and byte-wise compare)
\ 2305 on Hatari
\ 1925 on TT
\
\ assembly version of ask_string_match (only byte-wise compare)
\ 2301 on Hatari
\ 1879 on TT (only byte compare)
\
\ assembly version of cmove -> not much effect
\ 2297 on Hatari
\ 1869 on TT
\
\ assembly version of dentry_by_name
\ 623 on Hatari
\ 468 on TT
\
\ switch to 32-bit XTAs
\ 622 on Hatari
\ 417 on TT
\
\ assembly versions of not and negate (after execution counting)
\ 522 on Hatari
\ 349 on TT
\ VBL count measurements for linetest
\
\ 1816 on Hatari
\ 755 on TT
\
\ using dblt instead of dbf
\ 1813 on Hatari
\ 684 on TT
\
\ replace conditional branch by scc/and -> is not worth it
\ 2177 on Hatari
\ 682 on TT
\
\ switch to 32-bit XTAs, optimized not and negate
\ 1800 on Hatari
\ 669 on TT
: blacklines
13f FOR vbase@ r@ 2 lshift 0 13f r@ - 2 lshift 3bf blackline NEXT
f0 FOR vbase@ 0 r@ 2 lshift 4ff f0 r@ - 2 lshift blackline NEXT ;
: whitelines
13f FOR vbase@ r@ 2 lshift 0 13f r@ - 2 lshift 3bf whiteline NEXT
f0 FOR vbase@ 0 r@ 2 lshift 4ff f0 r@ - 2 lshift whiteline NEXT ;
: linetest 4 FOR blacklines whitelines NEXT ;
Dithering table
===============
\ noisetab has 8 entries, the resulting bitmask has eight corresponding bits
: newnoisetab create here 100 dup cells allot 2dup seq shuffle ;
\ ( threshold noisetab -- bitmask )
: bitpattern
0 ( threshold noiseptr result )
7 FOR
>r ( threshold noiseptr r: result )
2dup @ ( threshold noiseptr threshold noiseval r: result )
< 1 and ( threshold noiseptr bit r: result )
r> 1 lshift or ( threshold noiseptr result )
>r cell+ r> ( threshold next-noiseptr result )
NEXT
nip nip ( result )
;
\ ( noisetab dstbase ) generate table of bitmask bytes for each gray value
: genshades
ff FOR
over r@ swap ( noisetab dstbase i noisetab )
bitpattern ( noisetab dstbase pattern )
over r@ + c! ( noisetab dstbase pattern dst )
NEXT 2drop
;
\ ( dst -- ) fill 160x120 chunky buffer with x+y pattern
variable y
: chunkypattern
0 y !
77 FOR
9f FOR r@ y @ + over c! 1 + NEXT
y @ 1 + y !
NEXT drop
;
c2m8x8
======
Simple byte-wise loop -> 17.1 VBLs on Hatari, 12.7 VBLs on TT
\ 3,29 VBLs on Hatari
\ 3.04 VBLs on Atari TT
\
\ With horizontal and vertical interlacing (128 fade phases until black)
\
\ 1.46 VBLs on Hatari
\ 0.83 VBLs on Atari TT
\ The fade table has 32 unique entries, each with a different bit
\ The second 32 entries are a copy of the first part.
variable phase 0 phase !
: nextphase phase dup @ 1 + 7f and swap ! ;
: onephase home phase @ vbase@ fadetab fade2black nextphase ;
: test 7f FOR onephase NEXT ;
: bench xtalit test vblbench ;
Voxel
=====
\ convert upper-left part of 256x256 texture 'hmap' into 160x120 chunky buffer
: c2m chunkybuf vbase@ 8shades c2m8x8 ;
: tex2cln a0 cmove ;
: tex2chunky 77 FOR 2dup tex2cln a0 + swap 100 + swap NEXT 2drop ;
: tex2chunkybuf hmap chunkybuf tex2chunky ;
\ tweaked version to display the chunky buffer at half resolution
: tex2cln 7f FOR over c@ over c! 1 + swap 2 + swap NEXT 2drop ;
: smalltex2chunky 77 FOR 2dup tex2cln a0 + swap 200 + swap NEXT 2drop ;
\
\ walking the coordinate system, wrap at boundaries
\
\ xy+ ( pos(x,y) dir(dx,dy)) -- pos(x+dx,y+dy) ), consider negative dx,dy, wrap
\ each value is a 8.8 fixpoint value
\ the upper parts can by used for indexing into a 256x256 texture
\ x uses bits 0..15, y uses bits 16..31
: #( 28 hold ;
: #) 29 hold ;
: #, 2c hold ;
: #x,y # # #. # # 20 hold #, # # #. # # ;
: #xy #) #x,y #( ;
: .xy space <# #xy #> type ;
: xmask ffff ;
: ymask ffff0000 ;
: onlyx xmask and ;
: onlyy ymask and ;
: fp swap 8 lshift or ;
: xy 10 lshift or ;
: x<->y dup onlyx swap onlyy 10 rshift ;
: xy+ 2dup onlyx + onlyx >r onlyy + onlyy r> or ;
: xy- 2dup onlyx - onlyx >r onlyy - onlyy r> or ;
: xy/ >r dup onlyx r@ / onlyx swap onlyy r> / onlyy or ;
: .8steps swap 7 FOR over xy+ dup .xy NEXT 2drop ;
: .: space 3a emit ;
\ gather xy paths of left and right boundaries of the view port
\ ( dir pos buffer -- dir pos' buffer' )
: step
>r \ ( dir pos )
over xy+ \ ( dir pos' )
r> \ ( dir pos' buffer )
2dup ! \ store to buffer
cell+ \ ( dir pos' buffer' )
;
\ ( direction pos buffer -- )
: walk
zsteps FOR AFT step THEN NEXT
2drop drop
;
\ ( n -- ) fill zdirs buffer based on the information from lsight and rsight
: calczdir cells dup rsight + @ over lsight + @ xy- a0 xy/ swap zdirs + ! ;
: calczdirs zsteps FOR AFT r@ calczdir THEN NEXT ;
\ ( z height -- y ) project height to y value, depending on z
: h2y
1 rshift 77 + \ ( z z height/2+128 )
swap \ ( z height/2+128 z )
2 lshift \ ( z height/2+128 z*4 )
zproj \ ( z y' )
;
\ ( z -- ) apply perspective projection of hline values into yline values
: h2yline
a0 FOR AFT \ ( y )
dup \ ( z z )
hline r@ + c@ \ ( z z height )
h2y
yline r@ + c!
THEN NEXT
;
\ ( z buffer -- ) generate 256 h2y table entries for the given z value
: genh2yline
ff FOR
over r@ \ ( z buffer z height )
h2y \ ( z buffer y )
over r@ + c!
NEXT
;
\ ( buffer ) generate zsteps number of h2yline tables
: genh2table
zsteps FOR AFT
r@ over r@ 8 lshift + genh2yline THEN NEXT ;
\ ( h2ytable ) plot all h2yline tables
: ploth2ytables zsteps FOR AFT h2ytable r@ 8 lshift + 100 cplot THEN NEXT ;
\ ( lut v -- v' )
: lut@ + c@ ;
\ ( to lut from n ) map n byte values using a 256-byte look-up-table
: mapb2b
FOR AFT \ ( to lut from )
2dup \ ( to lut from lut from )
c@ lut@ \ ( to lut from v' )
>r 1 + \ ( to lut from' R: v' )
rot \ ( lut from' to R: v' )
r> over c! \ ( lut from' to )
1 + \ ( lut from' to' )
rot rot \ ( to' lut from' )
THEN NEXT drop 2drop
;
\ ( z -- ) render z line of chunkybuf
: renderzlevel
a0 FOR AFT
yline r@ + c@
dup r@ visible? IF
dup yclip r@ + c!
a0 * chunkybuf + r@ +
tline r@ + c@
swap
c!
0
THEN
drop
THEN NEXT drop
;
\ version that uses the assembly routine renderline
: renderzlevel drop yline tline yclip chunkybuf renderline ;
\
\ Column dithering
\
\ Create table of 256 8-bit masks out of a 8x8 noise pattern (64
\ distinct shuffled values).
\ Make table 8 entries larger to allow the access of up to eight
\ consecutive bytes at any table position.
\
: newnoisetab create here 40 dup cells allot 2dup seq shuffle ;
newnoisetab noise
create dithertab 108 allot
: gendithertab
108 FOR AFT
r@ 2 rshift
noise r@ 7 and 3 lshift cells +
bitpattern
dithertab r@ + c!
THEN NEXT ;
: showdithertab
ff FOR
dithertab r@ + c@
vbase@ r@ a0 * + 80 + c!
NEXT ;
gendithertab
showdithertab
Multiplication and division tables
==================================
The gradient values g0..g3 are successively added to the color value.
To compute the ascent, the y values and texture values of the previous line are kept in
pyline and ptline buffers.
The gradient values are stored in the gline buffer with 32 bit per entry.
Each entry contains g0..g3.
Voxel context data structure
============================
before the change: bench 157 VBLs on Hatari / 56 VBLs on TT
Certain maps or tables (tmap, hmap, mul8x8tab, div8x8tab) are
adjusted for signed index access.
create voxelctx
80 0 fp 80 0 fp xy , \ view position
4 0 fp 0 0 fp xy , \ direction of left sight line
0 0 fp 4 0 fp xy , \ direction of right sight line
tmap 8000 + ,
hmap 8000 + ,
lsight ,
rsight ,
0 , \ current zline
tline ,
ptline ,
hline ,
yclip ,
yline ,
pyline ,
gline ,
h2ytable ,
mul8x8tab 10000 + ,
div8x8tab 10000 + ,
chunkybuf ,
zsteps ,
2 , \ level of detail
1f , \ max column height mask
Introduction of oddline, swapped tlines and ylines (to avoid copying).
-> 159 VBLs on Hatari / 60 VBLs on TT
The indirection of forth words is expensive.
create tex 10000 allot
tex 400 randombytes
tex 20 100 csplice
: lines20to100 FOR AFT dup scale20to100 100 + THEN NEXT drop ;
tex 20 lines20to100
tex 100 100 cimagerotl
tex 100 lines20to100
tmptex 10000 ff cexpose
tmptex tex 10000 cimage+
Timing effects to mod replay
============================
* Use song position and pattern position as time base of
an music-frame counter (mf)
* The lowest 6 bits contain the pattern position
* The upper bits are used as song-position counter,
which is incremented each time, the song position
changes
variable mf variable osongpos
\ update the low 6 bit with current pattern position
: upd_mf_patternpos mf dup @ ffc0 and patternpos or swap ! ;
\ increment bits 6 and above if the song position changed
: upd_mf_songpos
osongpos \ ( varaddr )
dup @ songpos \ ( varaddr oldval currval )
dup rot \ ( varaddr currval curval oldval )
= not IF \ ( varaddr currval )
2dup swap ! \ update osongpos
40 mf +! \ advance music frame
THEN 2drop ;
: mf upd_mf_songpos upd_mf_patternpos mf @ ;
Mechanism for running and switching effects
===========================================
\ Use various portions for the chunkybuf for hosting several
\ consecutive other buffers. The 'textbufs' and 'maskbufs'
\ tables reference the individual portions. The corresponding
\ words 'textbuf' and 'maskedbuf' return a buffer's address
\ for a given index.