> My VLSI tools take a chip from conception through testing. Perhaps
> 500 lines of source code. Cadence, Mentor Graphics do the same, more
> or less. With how much source/object code?

-- Chuck Moore, the inventor of Forth

This is a personal account of my experience implementing and using the
Forth programming language and the stack machine
architecture. "Implementing and using" -- in that order, pretty much; a
somewhat typical order, as will become apparent.

It will also become clear why, having defined the instruction set of a
processor designed to run Forth that went into production, I don't
consider myself a competent Forth programmer (now is the time to warn
that my understanding of Forth is just that -- my own understanding;
wouldn't count on it too much.)

Why the epigraph about Chuck Moore's VLSI tools? Because Forth is very
radical. Black Square kind of radical. An approach to programming
seemingly leaving out most if not all of programming:

> ...Forth does it differently. There is no syntax, no redundancy, no
> typing. There are no errors that can be detected. ...there are no
> parentheses. No indentation. No hooks, no compatibility. ...No
> files. No operating system.

I've never been a huge fan of suprematism or modernism in
general. However, a particular modernist can easily get my attention
if he's a genius in a traditional sense, with superpowers. Say, he
memorizes note sheets upon the first brief glance like Shostakovich
did.

Now, I've seen chip design tools by the likes of Cadence and Mentor
Graphics. Astronomically costly licenses. Geological run times. And
nobody quite knows what they do. To me, VLSI tools in 500 lines
qualify as a superpower, enough to grab my attention.

So, here goes.

***

I was intrigued with Forth ever since I read about it in Bruce Eckel's
book on C++, a 198-something edition; he said there that
"extensibility got a bad reputation due to languages like Forth, where
a programmer could change everything and effectively create a
programming language of his own". WANT!

A couple of years later, I looked for info on the net, which seemed
somewhat scarce. An unusually looking language. Parameters and results
passed implicitly on a stack. 2 3 + instead of
2+3. Case-insensitive. Nothing about the extensibility business
though.

I thought of nothing better than to dive into the source of an
implementation, pForth -- and I didn't need anything better, as my mind
was immediately blown away by the following passage right at the top
of system.fth, the part of pForth implemented in Forth on top of the C
interpreter:

   : (   41 word drop ; immediate
   ( That was the definition for the comment word. )
   ( Now we can add comments to what we are doing! )

Now. we. can. add. comments. to. what. we. are. doing.

What this does is define a word (Forth's name for a function) called
"(". "(" is executed at compile time (as directed by IMMEDIATE). It
tells the compiler to read bytes from the source file (that's what the
word called, um, WORD is doing), until a ")" -- ASCII 41 -- is
found. Those bytes are then ignored (the pointer to them is removed
from the stack with DROP). So effectively, everything inside "( ... )"
becomes a comment.

Wow. Yeah, you definitely can't do that in C++. (You can in Lisp but
they don't teach you those parts at school. They teach the pure
functional parts, where you can't do things that you can in
C++. Bastards.)

Read some more and...

    conditional primitives
   : IF     ( -- f orig )  ?comp compile 0branch  conditional_key >mark     ; immediate
   : THEN   ( f orig -- )  swap ?condition  >resolve   ; immediate
   : BEGIN  ( -- f dest )  ?comp conditional_key <mark   ; immediate
   : AGAIN  ( f dest -- )  compile branch  swap ?condition  <resolve  ; immediate
   : UNTIL  ( f dest -- )  compile 0branch swap ?condition  <resolve  ; immediate
   : AHEAD  ( -- f orig )  compile branch   conditional_key >mark     ; immediate

Conditional primitives?! Looks like conditional primitives aren't --
they define them here. This COMPILE BRANCH business modifies the code
of a function that uses IF or THEN, at compile time. THEN -- one part
of the conditional -- writes (RESOLVEs) a branch offset to a point in
code saved (MARKed) by IF, the other part of the conditional.

It's as if a conventional program modified the assembly instructions
generated from it at compile time. What? How? Who? How do I wrap my
mind around this?

Shocked, I read the source of pForth.

Sort of understood how Forth code was represented and
interpreted. Code is this array of "execution tokens" -- function
pointers, numbers and a few built-ins like branches, basically. A
Forth interpreter keeps an instruction pointer into this array (ip), a
data stack (ds), and a return stack (rs), and does this:

   while(true) {
    switch(*ip) {
     //arithmetics (+,-,*...):
     case PLUS: ds.push(ds.pop() + ds.pop()); ++ip;
     //stack manipulation (drop,swap,rot...):
     case DROP: ds.pop(); ++ip;
     //literal numbers (1,2,3...):
     case LITERAL: ds.push(ip[1]); ip+=2;
     //control flow:
     case COND_BRANCH: if(!ds.pop()) ip+=ip[1]; else ip+=2;
     case RETURN: ip = rs.pop();
     //user-defined words: save return address & jump
     default: rs.push(ip+1); ip = *ip;
    }
   }

That's it, pretty much. Similar, say, to the virtual stack machine
used to implement Java. One difference is that compiling a Forth
program is basically writing to the code array in a WYSIWYG
fashion. COMPILE SOMETHING simply appends the address of the word
SOMETHING to the end of the code array. So does plain SOMETHING when
Forth is compiling rather than interpreting, as it is between a colon
and a semicolon, that is, when a word is defined.

So

   : DRAW-RECTANGLE 2DUP UP RIGHT DOWN LEFT ;

simply appends {&2dup,&up,&right,&down,&left,RETURN} to the code
array. Very straightforward. There are no parameters or
declaration/expression syntax as in...

   void drawRectangle(int width, int height) {
     up(height);
     right(width);
     down(height);
     left(width);
   }

..to make it less than absolutely clear how the source code maps to
executable code. "C maps straightforwardly to assembly"? Ha! Forth
maps straightforwardly to assembly. Well, to the assembly language of
a virtual stack machine, but still. So one can understand how
self-modifying code like IF and THEN works.

On the other hand, compared to drawRectangle, it is somewhat unclear
what DRAW-RECTANGLE does. What are those 2 values on the top of the
stack that 2DUP duplicates before meaningful English names appear in
DRAW-RECTANGLE's definition? This is supposed to be ameliorated by
stack comments:

   : DRAW-RECTANGLE ( width height -- ) ... ;

..tells us that DRAW-RECTANGLE expects to find height at the top of the
stack, and width right below it.

I went on to sort of understand CREATE/DOES> -- a further extension of
this compile-time self-modifying code business that you use to "define
defining words" (say, CONSTANT, VARIABLE, or CLASS). The CREATE part
says what should be done when words (say, class names) are defined by
your new defining word. The DOES> part says what should be done when
those words are used. For example:

   : CONSTANT
      CREATE ,
      DOES> @
   ;
   \ usage example:
   7 CONSTANT DAYS-IN-WEEK
   DAYS-IN-WEEK 2 + . \ should print 9

CREATE means that every time CONSTANT is called, a name is read from
the source file (similarly to what WORD would have done). Then a new
word is created with that name (as a colon would have done). This word
records the value of HERE -- something like sbrk(0), a pointer past the
last allocated data item. When the word is executed, it pushes the
saved address onto the data stack, then calls the code after
DOES>. The code after CREATE can put some data after HERE, making it
available later to the DOES> part.

With CONSTANT, the CREATE part just saves its input (in our example,
7) -- the comma word does this: *HERE++ = ds.pop(); The DOES> part then
fetches the saved number -- the @ sign is the fetch word: ds.push(
*ds.pop() );

CONSTANT works somewhat similarly to a class, CREATE defining its
constructor and DOES> its single method:

   class Constant
     def initialize(x) @x=x end
     def does() @x end
   end
   daysInWeek = Constant.new(7)
   print daysInWeek.does() + 2

..But it's much more compact on all levels.

Another example is defining C-like structs. Stripped down to their
bare essentials (and in Forth things tend to be stripped down to their
bare essentials), you can say that:

   struct Rectangle {
     int width;
     int height;
   };

..simply gives 8 (the structure size) a new name Rectangle, and gives 0
and 4 (the members' offsets) new names, width and height. Here's one
way to implement structs in Forth:

   struct
     cell field width
     cell field height
   constant rectangle

   \ usage example:
   \ here CREATE is used just for allocation
   create r1 rectangle allot \ r1=HERE; HERE+=8
   2 r1 width !
   3 r1 height !
   : area dup width @ swap height @ * ;
   r1 area . \ should print 6

CELL is the size of a word; we could say "4 field width" instead of
"cell field width" on 32b machines. Here's the definition of FIELD:

    : field ( struct-size field-size -- new-struct-size )
       create over , +
       does> @ +
    ;

Again, pretty compact. The CREATE part stores the offset, a.k.a
current struct size (OVER does ds.push(ds[1]), comma does
*HERE++=ds.pop()), then adds the field size to the struct size,
updating it for the next call to FIELD. The DOES> part fetches the
offset, and adds it to the top of the stack, supposedly containing the
object base pointer, so that "rect width" or "rect height" compute
&rect.width or &rect.height, respectively. Then you can access this
address with @ or ! (fetch/store). STRUCT simply pushes 0 to the top
of the data stack (initial size value), and at the end, CONSTANT
consumes the struct size:

   struct \ data stack: 0
     cell ( ds: 0 4 ) field width  ( ds: 4 )
     cell ( ds: 4 4 ) field height ( ds: 8 )
   constant rectangle ( ds: as before STRUCT )

You can further extend this to support polymorphic methods -- METHOD
would work similarly to FIELD, fetching a function pointer ("execution
token") through a vtable pointer and an offset kept in the CREATEd
part. A basic object system in Forth can thus be implemented in one
screen (a Forth code size unit -- 16 lines x 64 characters).

To this day, I find it shocking that you can define defining words
like CONSTANT, FIELD, CLASS, METHOD -- something reserved to built-in
keywords and syntactic conventions in most languages -- and you can do
it so compactly using such crude facilities so trivial to
implement. Back when I first saw this, I didn't know about DEFMACRO
and how it could be used to implement the defining words of CLOS such
as DEFCLASS and DEFMETHOD (another thing about Lisp they don't teach
in schools). So Forth was completely mind-blowing.

And then I put Forth aside.

It seemed more suited for number crunching/"systems programming" than
text processing/"scripting", whereas it is scripting that is the best
trojan horse for pushing a language into an organization. Scripting is
usually mission-critical without being acknowledged as such, and many
scripts are small and standalone. Look how many popular "scripting
languages" there are as opposed to "systems programming
languages". Then normalize it by the amount of corporate backing a
language got on its way to popularity. Clearly scripting is the best
trojan horse.

In short, there were few opportunities to play with Forth at work, so
I didn't. I fiddled with the interpreter and with the metaprogramming
and then left it at that without doing any real programming.

Here's what Jeff Fox, a prominent member of the Forth community who've
worked with Chuck Moore for years, has to say about people like me:

> Forth seems to mean programming applications to some and porting Forth
> or dissecting Forth to others. And these groups don't seem to have
> much in common.
>
> ...One learns one set of things about frogs from studying them in their
> natural environment or by getting a doctorate in zoology and
> specializing in frogs. And people who spend an hour dissecting a dead
> frog in a pan of formaldehyde in a biology class learn something else
> about frogs.
>
> ...One of my favorite examples was that one notable colorforth [a Forth
> dialect] enthusiast who had spent years studying it, disassembling it,
> reassembling it and modifying it, and made a lot of public comments
> about it, but had never bothered running it and in two years of
> 'study' had not been able to figure out how to do something in
> colorforth as simple as:
>
>     1 dup +
>
> ...[such Forth users] seem to have little interest in what it does, how
> it is used, or what people using it do with it. But some spend years
> doing an autopsy on dead code that they don't even run.
>
> Live frogs are just very different than dead frogs.

Ouch. Quite an assault not just on a fraction of a particular
community, but on language geeks in general.

> I guess I feel that I could say that if it isn't solving a
> significant real problem in the real world it isn't really Forth.

True, I guess, and equally true from the viewpoint of someone
extensively using any non-mainstream language and claiming enormous
productivity gains for experts. Especially true for the core (hard
core?) of the Forth community, Forth being their only weapon. They
actually live in Forth; it's DIY taken to the extreme, something
probably unparalleled in the history of computing, except, perhaps,
the case of Lisp environments and Lisp machines (again).

Code running on Forth chips. Chips designed with Forth CAD
tools. Tools developed in a Forth environment running on the bare
metal of the desktop machine. No standard OS, file system or
editor. All in recent years when absolutely nobody else would attempt
anything like it. They claim to be 10x to 100x more productive than C
programmers (a generic pejorative term for non-Forth programmers; Jeff
Fox is careful to put "C" in quotes, presumably either to make the
term more generic or more pejorative).

> ...people water down the Forth they do by not exercising most of the
> freedom it offers... by using Forth only as debugger or a yet another
> inefficient scripting language to be used 1% of the time.
>
> Forth is about the freedom to change the language, the compiler, the
> OS or even the hardware design and is very different than
> programming languages that are about fitting things to a fixed
> language syntax in a narrow work context.

What can be said of this? If, in order to "really" enter a programming
culture, I need to both "be solving a significant real problem in the
real world" and exercising "the freedom to change the language, the
compiler, the OS or even the hardware design", then there are very few
options for entering this culture indeed. The requirement for "real
world work" is almost by definition incompatible with "the freedom to
change the language, the compiler, the OS and the hardware design".

And then it so happened that I started working on a real-world project
about as close to Forth-level DIY as possible. It was our own
hardware, with our own OS, our own compilers, designed to be running
our own application. We did use standard CAD tools, desktop operating
systems and editors, and had standard RISC cores in the chip and
standard C++ cross compilers for them. Well, everyone has
weaknesses. Still, the system was custom-tailored, embedded,
optimized, standalone, with lots of freedom to exercise -- pretty close
to the Forth way, in one way.

One part of the system was an image processing co-processor, a
variation on the VLIW theme. Its memory access and control flow was
weird and limited, to the effect that you could neither access nor
jump to an arbitrary memory address. It worked fine for the
processing-intensive parts of our image processing programs.

We actually intended to glue those parts together with a few "control
instructions" setting up the plentiful control registers of this
machine. When I tried, it quickly turned out, as was to be expected,
that those "control instructions" must be able to do, well, everything
-- arithmetic, conditions, loops. In short, we needed a CPU.

We thought about buying a CPU, but it was unclear how we could use an
off-the-shelf product. We needed to dispatch VLIW instructions from
the same instruction stream. We also needed a weird mixture of
features. No caches, no interrupts, no need for more than 16 address
bits, but for accessing 64 data bits, and 32-bit arithmetic.

We thought about making our own CPU. The person with the overall
responsibility for the hardware design gently told me that I was out
of my mind. CPUs have register files and pipeline and pipeline stalls
and dependency detection to avoid those stalls and it's too
complicated.

And then I asked, how about a stack machine? No register file. Just a
3-stage pipeline -- fetch, decode, execute. No problem with register
dependencies, always pop inputs from the top of the stack, push the
result.

He said it sounded easy enough alright, we could do that. "It's just
like my RPN calculator. How would you program it?" "In Forth!"

I defined the instruction set in a couple of hours. It mapped to Forth
words as straightforwardly as possible, plus it had a few things Forth
doesn't have that C might need, as a kind of insurance (say, access to
16-bit values in memory).

This got approved and implemented; not that it became the schedule
bottleneck, but it was harder than we thought. Presumably that was
partly the result of not reading "Stack Computers: the new wave", and
not studying the chip designs of Forth's creator Chuck Moore,
either. I have a feeling that knowledgable people would have sneered
at this machine: it was trivial to compile Forth to it, but at the
cost of complicating the hardware.

But I was satisfied -- I got a general-purpose CPU for setting up my
config regs at various times through my programs, and as a side
effect, I got a Forth target. And even if it wasn't the most
cost-effective Forth target imaginable, it was definitely a time to
start using Forth at work.

(Another area of prior art on stack machines that I failed to study in
depth was 4stack -- an actual VLIW stack machine, with 4 data stacks as
suggested by its name. I was very interested in it, especially during
the time when we feared implementation problems with the multi-port
register file feeding our multiple execution units. I didn't quite
figure out how programs would map to 4stack and what the efficiency
drop would be when one had to spill stuff from the data stacks to
other memory because of data flow complications. So we just went for a
standard register file and it worked out.)

The first thing I did was write a Forth cross-compiler for the machine
-- a 700-line C++ file (and for reasons unknown, the slowest-compiling
C++ code that I have ever seen).

I left out all of the metaprogramming stuff. For instance, none of the
Forth examples above, the ones that drove me to Forth, could be made
to work in my own Forth. No WORD, no COMPILE, no IMMEDIATE, no
CREATE/DOES>, no nothing. Just colon definitions, RPN syntax, flow
control words built into the compiler. "Optimizations" -- trivial
constant folding so that 1 2 + becomes 3, and inlining -- :INLINE 1 + ;
works just like : 1 + ; but is inlined into the code of the caller. (I
was working on the bottlenecks so saving a CALL and a RETURN was a big
deal.) So I had that, plus inline assembly for the VLIW
instructions. Pretty basic.

I figured I didn't need the more interesting metaprogramming stuff for
my first prototype programs, and I could add it later if it turned out
that I was wrong. It was wierd to throw away everything I originally
liked the most, but I was all set to start writing real
programs. Solving real problems in the real world.

It was among the most painful programming experiences in my life.

All kinds of attempts at libraries and small test programs aside, my
biggest program was about 700 lines long (that's 1 line of compiler
code for 1 line of application code). Here's a sample function:

   : mean_std ( sum2 sum inv_len -- mean std )
     \ precise_mean = sum * inv_len;
     tuck u* \ sum2 inv_len precise_mean
     \ mean = precise_mean >> FRAC;
     dup FRAC rshift -rot3 \ mean sum2 inv_len precise_mean
     \ var = (((unsigned long long)sum2 * inv_len) >> FRAC) - (precise_mean * precise_mean >> (FRAC*2));
     dup um* nip FRAC 2 * 32 - rshift -rot \ mean precise_mean^2 sum2 inv_len
     um* 32 FRAC - lshift swap FRAC rshift or \ mean precise_mean^2 sum*inv_len
     swap - isqrt \ mean std
   ;

Tuck u*.

This computes the mean and the standard deviation of a vector given
the sum of its elements, the sum of their squares, and the inverse of
its length. It uses scaled integer arithmetic: inv_len is an integer
keeping (1<<FRAC)/length. How it arranges the data on the stack is
beyond me. It was beyond me at the time when I wrote this function, as
indicated by the plentiful comments documenting the stack state,
amended by wimpy C-like comments ("C"-like comments) explaining the
meaning of the postfix expressions.

This nip/tuck business in the code? Rather than a reference to the
drama series on plastic surgery, these are names of Forth stack
manipulation words. You can look them up in the standard. I forgot
what they do, but it's, like, ds.insert(2,ds.top()), ds.remove(1),
this kind of thing.

Good Forth programmers reportedly don't use much of those. Good Forth
programmers arrange things so that they flow on the stack. Or they use
local variables. My DRAW-RECTANGLE definition above, with a 2DUP, was
reasonably flowing by my standards: you get width and height,
duplicate both, and have all 4 data items -- width,height,width,height
-- consumed by the next 4 words. Compact, efficient -- little stack
manipulation. Alternatively we could write:

   : DRAW-RECTANGLE { width height }
     height UP
     width RIGHT
     height DOWN
     width LEFT
   ;

Less compact, but very readable -- not really, if you think about it,
since nobody knows how much stuff UP leaves on the stack and what
share of that stuff RIGHT consumes, but readable enough if you assume
the obvious. One reason not to use locals is that Chuck Moore hates
them:

> I remain adamant that local variables are not only useless, they are
> harmful.
>
> If you are writing code that needs them you are writing non-optimal
> code. Don't use local variables. Don't come up with new syntax for
> describing them and new schemes for implementing them. You can make
> local variables very efficient especially if you have local registers
> to store them in, but don't. It's bad. It's wrong.
>
> It is necessary to have [global] variables. ... I don't see any use for
> [local] variables which are accessed instantaneously.

Another reason not to use locals is that it takes time to store and
fetch them. If you have two items on a data stack on a hardware stack
machine, + will add them in one cycle. If you use a local, then it
will take a cycle to store its value with { local_name }, and a cycle
to fetch its value every time you mention local_name. On the first
version of our machine, it was worse as fetching took 2 cycles. So
when I wrote my Forth code, I had to make it "flow" for it to be fast.

The abundance of DUP, SWAP, -ROT and -ROT3 in my code shows that
making it flow wasn't very easy. One problem is that every stack
manipulation instruction also costs a cycle, so I started wondering
whether I was already past the point where I had a net gain. The other
problem was that I couldn't quite follow this flow.

Another feature of good Forth code, which supposedly helps achieve the
first good feature ("flow" on the stack), is factoring. Many small
definitions.

> Forth is highly factored code. I don't know anything else to say
> except that Forth is definitions. If you have a lot of small
> definitions you are writing Forth. In order to write a lot of small
> definitions you have to have a stack.

In order to have really small definitions, you do need a stack, I
guess -- or some other implicit way of passing parameters around; if
you do that explicitly, definitions get bigger, right? That's how you
can get somewhat Forth-y with Perl -- passing things through the
implicit variable $_: call chop without arguments, and you will have
chopped $_.

Anyway, I tried many small definitions:

   :inline num_rects params @ ;
   :inline sum  3 lshift gray_sums + ;
   :inline sum2 3 lshift gray_sums 4 + + ;
   :inline rect_offset 4 lshift ;
   :inline inv_area rect_offset rects 8 + + @ ;
   :inline mean_std_stat ( lo hi -- stat )
     FRAC lshift swap 32 FRAC - rshift or
   ;
   : mean_std_loop
    \ inv_global_std = (1LL << 32) / MAX(global_std, 1);
    dup 1 max 1 swap u/mod-fx32 drop \ 32 frac bits

    num_rects \ start countdown
    begin
     1 - \ rects--
     dup sum2 @
     over sum @
     pick2 inv_area
     mean_std \ global_mean global_std inv_global_std rectind mean std
     rot dup { rectind } 2 NUM_STATS * * stats_arr OFT 2 * + + { stats }
     \ stats[OFT+0] = (short)( ((mean - global_mean) * inv_global_std) >> (32 - FRAC) );
     \ stats[OFT+1] = (short)( std * inv_global_std >> (32 - FRAC) );
     pick2       um* mean_std_stat stats 2 + h! \ global_mean global_std inv_global_std mean
     pick3 - over m* mean_std_stat stats h!
     rectind ?dup 0 = \ quit at rect 0
    until
    drop 2drop
   ;

I had a bunch of those short definitions, and yet I couldn't get rid
of heavy functions with DUP and OVER and PICK and "C" comments to make
any sense of it. This stack business just wasn't for me.

Stacks are not popular. It's strange to me that they are not. There is
just a lot of pressure from vested interests that don't like stacks,
they like registers.

But I actually had a vested interest in stacks, and I began to like
registers more and more. The thing is, expression trees map perfectly
to stacks: (a+b)*(c-d) becomes a b + c d -- *. Expression graphs,
however, start to get messy: (a+b)*a becomes a dup b + *, and this dup
cluttering things up is a moderate example. And an "expression graph"
simply means that you use something more than once. How come this
clutters up my code? This is reuse. A kind of factoring, if you
like. Isn't factoring good?

In fact, now that I thought of it, I didn't understand why stacks were
so popular. Vested interests, perhaps? Why is the JVM bytecode and the
NET bytecode and even CPython's bytecode all target stack VMs? Why
not use registers the way LLVM does?

Speaking of which. I started to miss a C compiler. I downloaded
LLVM. (7000 files plus a huge precompiled gcc binary. 1 hour to build
from scratch. So?) I wrote a working back-end for the stack machine
within a week. Generating horrible code. Someone else wrote an
optimizing back-end in about two months.

After a while, the optimizing back-end's code wasn't any worse than my
hand-coded Forth. Its job was somewhat easier than mine since by the
time it arrived, it only took 1 cycle to load a local. On the other
hand, loads were fast as long as they weren't interleaved with stores
-- some pipeline thing. So the back-end was careful to reorder things
so that huge sequences of loads went first and then huge sequences of
stores. Would be a pity to have to do that manually in Forth.

You have no idea how much fun it is to just splatter named variables
all over the place, use them in expressions in whatever order you
want, and have the compiler schedule things. Although you do it all
day. And that was pretty much the end of Forth on that machine; we
wrote everything in C.

What does this say about Forth? Not much except that it isn't for
me. Take Prolog. I know few things more insulting than having to code
in Prolog. Whereas Armstrong developed Erlang in Prolog and liked it
much better than reimplementing Erlang in C for speed. I can't imagine
how this could be, but this is how it was. People are different.

Would a good Forth programmer do better than me? Yes, but not just at
the level of writing the code differently. Rather, at the level of
doing everything differently. Remember the freedom quote? "Forth is
about the freedom to change the language, the compiler, the OS or even
the hardware design".

..And the freedom to change the problem.

Those computations I was doing? In Forth, they wouldn't just write it
differently. They wouldn't implement them at all. In fact, we didn't
implement them after all, either. The algorithms which made it into
production code were very different -- in our case, more
complicated. In the Forth case, they would have been less
complicated. Much less.

Would less complicated algorithms work? I don't
know. Probably. Depends on the problem though. Depends on how you
define "work", too.

The tiny VLSI toolchain from the epigraph? I showed Chuck Moore's
description of that to an ASIC hacker. He said it was very simplistic
-- no way you could do with that what people are doing with standard
tools.

But Chuck Moore isn't doing that, under the assumption that you need
not to. Look at the chips he's making. 144-core, but the cores (nodes)
are tiny -- why would you want them big, if you feel that you can do
anything with almost no resources? And they use 18-bit
words. Presumably under the assumption that 18 bits is a good
quantity, not too small, not too large. Then they write an application
note about imlpementing the MD5 hash function:

> MD5 presents a few problems for programming a Green Arrays
> device. For one thing it depends on modulo 32 bit addition and
> rotation. Green Arrays chips deal in 18 bit quantities. For another,
> MD5 is complicated enough that neither the code nor the set of
> constants required to implement the algorithm will fit into one or
> even two or three nodes of a Green Arrays computer.

Then they solve these problems by manually implementing 32b addition
and splitting the code across nodes. But if MD5 weren't a standard,
you could implement your own hash function without going to all this
trouble.

In his chip design tools, Chuck Moore naturally did not use the
standard equations:

> Chuck showed me the equations he was using for transistor models in
> OKAD and compared them to the SPICE equations that required solving
> several differential equations. He also showed how he scaled the
> values to simplify the calculation. It is pretty obvious that he has
> sped up the inner loop a hundred times by simplifying the
> calculation. He adds that his calculation is not only faster but
> more accurate than the standard SPICE equation. ... He said, "I
> originally chose mV for internal units. But using 6400 mV = 4096
> units replaces a divide with a shift and requires only 2 multiplies
> per transistor. ...  Even the multiplies are optimized to only step
> through as many bits of precision as needed.

This is Forth. Seriously. Forth is not the language. Forth the
language captures nothing, it's a moving target. Chuck Moore
constantly tweaks the language and largely dismisses the ANS standard
as rooted in the past and bloated. Forth is the approach to
engineering aiming to produce as small, simple and optimal system as
possible, by shaving off as many requirements of every imaginable kind
as you can.

That's why its metaprogramming is so amazingly compact. It's similar
to Lisp's metaprogramming in much the same way bacterial genetic code
is similar to that of humans -- both reproduce. Humans also do many
other things that bacteria can't (...No compatibility. No files. No
operating system). And have a ton of useless junk in their DNA, their
bodies and their habitat.

Bacteria have no junk in their DNA. Junk slows down the copying of the
DNA which creates a reproduction bottleneck so junk mutations can't
compete. If it can be eliminated, it should. Bacteria are small,
simple, optimal systems, with as many requirements shaved off as
possible. They won't conquer space, but they'll survive a nuclear war.

This stack business? Just a tiny aspect of the matter. You have
complicated expression graphs? Why do you have complicated expression
graphs? The reason Forth the language doesn't have variables is
because you can eliminate them, therefore they are junk, therefore you
should eliminate them. What about those expressions in your Forth
program? Junk, most likely. Delete!

I can't do that.

I can't face people and tell them that they have to use 18b words. In
fact I take pride in the support for all the data types people are
used to from C in our VLIW machine. You can add signed bytes, and
unsigned shorts, and you even have instructions adding bytes to
shorts. Why? Do I believe that people actually need all those
combinations? Do I believe that they can't force their 16b unsigned
shorts into 15b signed shorts to save hardware the trouble?

OF COURSE NOT.

They just don't want to. They want their 16 bits. They whine about
their 16th bit. Why do they want 16 and not 18? Because they grew up
on C. "C". It's completely ridiculous, but nevertheless, people are
like that. And I'm not going to fight that, because I am not
responsible for algorithms, other people are, and I want them happy,
at least to a reasonable extent, and if they can be made happier at a
reasonable cost, I gladly pay it. (I'm not saying you can't market a
machine with a limited data type support, just using this as an
example of the kind of junk I'm willing to carry that in Forth it is
not recommended to carry.)

Why pay this cost? Because I don't do algorithms, other people do, so
I have to trust them and respect their judgment to a large
extent. Because you need superhuman abilities to work without
layers. My minimal stack of layers is -- problem, software,
hardware. People working on the problem (algorithms, UI, whatever)
can't do software, not really. People doing software can't do
hardware, not really. And people doing hardware can't do software,
etc.

The Forth way of focusing on just the problem you need to solve seems
to more or less require that the same person or a very tightly united
group focus on all three of these things, and pick the right
algorithms, the right computer architecture, the right language, the
right word size, etc. I don't know how to make this work.

My experience is, you try to compress the 3 absolutely necessary
layers to 2, you get a disaster. Have your algorithms people talk
directly to your hardware people, without going through software
people, and you'll get a disaster. Because neither understands
software very well, and you'll end up with an unusable
machine. Something with elaborate computational capabilities that
can't be put together into anything meaningful. Because gluing it
together, dispatching, that's the software part.

So you need at least 3 teams, or people, or hats, that are to an
extent ignorant about each other's work. Even if you're doing
everything in-house, which, according to Jeff Fox, was essentially a
precondition to "doing Forth". So there's another precondtion -- having
people being able to do what at least 3 people in their respective
areas normally do, and concentrating on those 3 things at the same
time. Doing the cross-layer global optimization.

It's not how I work. I don't have the brain nor the knowledge nor the
love for prolonged meditation. I compensate with, um, people skills. I
find out what people need, that is, what they think they need, and I
negotiate, and I find reasonable compromises, and I include my narrow
understanding of my part -- software tools and such -- into those
compromises. This drags junk in. I live with that.

> I wish I knew what to tell you that would lead you to write good
> Forth. I can demonstrate. I have demonstrated in the past, ad
> nauseam, applications where I can reduce the amount of code by 90%
> and in some cases 99%. It can be done, but in a case by case
> basis. The general principle still eludes me.

And I think he can, especially when compatibility isn't a must. But
not me.

I still find Forth amazing, and I'd gladly hack on it upon any
opportunity. It still gives you the most bang for the buck -- it
implements the most functionality in the least space. So still a great
fit for tiny targets, and unlikely to be surpassed. Both because it's
optimized so well and because the times when only bacteria survived in
the amounts of RAM available are largely gone so there's little
competition.

As to using Forth as a source of ideas on programming and language
design in general -- not me. I find that those ideas grow out of an
approach to problem solving that I could never apply.