(2023-05-03) Ode to 64K virtual machines
----------------------------------------
Many, if not most, VMs and interpreters that are meant to be implemented on
old hardware or just simulating old hardware experience anywhere else, are
designed to work with 16-bit address bus and, as such, the maximum of 65536
bytes of addressable memory. This started back in 1970s (when CPUs
themselves had 8-bit buses) with Wozniak's SWEET16 and continues to this
day, including but not limited to Uxn, VTL-2, CHIP-8 with all its flavors,
PICO-8, Minicube64, OK64, many Forth implementations, most Subleq and other
OISC implementations, and even my Equi. So, why does this work and why do I
personally consider this amount of RAM (and 16-bit environments as a whole)
optimal for day-to-day low-power human-scale computing?
First, let's get the most obvious thing out of the way. In order to be as
efficient as we can, we need the maximum address to be equal to the maximum
value of the machine word. Hence, for systems with 8-bit machine words, 256
bytes of RAM would be architecturally perfect, but, you guessed it, it's way
too little and we can hardly fit anything in there... and if we do, let's
remember how hard it was to program for Atari 2600 which did. So, we rather
allocate two 8-bit machine words (or one 16-bit machine word) to store our
address, which means 65536 bytes of RAM, which, you guessed it, has been
enough for an entire generation of home computers and gaming consoles,
especially considering that there was even less of the actual RAM and a good
part of the address space was dedicated to video, input, sound and other
ports and internal needs.
OK, so we have found out why 64K of addressed space is the smallest
comfortable amount. Now, why not more? Well, we can do more but the next
comfortable stop is 24 bits or 16 MiB. To be honest, programs that occupy
space this large are already far from human-scale. Of course there are
legitimate scenarios for storing large amounts of data (and not code) in RAM
all at once, like BWT-based compression (which is generally better when the
block size is larger). Well, in this case, you can optimize your algorithms
to use processing methods more suitable for working with storage (e.g. in
case of BWT, replace quicksort with an external merge sort). The point is,
there should be virtually nothing to fill that much RAM with, otherwise
something else is definitely going wrong.
I'm not saying that memory usage over 64KB per application/VM should be
prohibited, but I'm saying it must be heavily motivated. Otherwise, DIY and
LPC projects and platforms will eventually come to the state that mainstream
software/platform development is in today.