# On Forth
As a fan of minimalistic technology, I am extremely compelled by the
Forth language and programming paradigm. For those who are not
familiar with it, Forth is a language that seems largely designed to
facilitate its own implementation with minimal effort.
Instead of explaining the idea at length, let's show a Forth
4 2 + 3 *
You may recognize this as an arithmetic expression in Reverse Polish
Notation. This is a side effect of how Forth is implemented. The
program is a sequence of tokens, separated by whitespace. While
interpreting this sequence of tokens, Forth will search for each
token in a dictionary of "words" (named subroutines). If it finds it
in the dictionary, it will execute the corresponding code. If it
doesn't, it will try to interpret the token as an integer. If that
succeeds, it will push the integer to a stack. Otherwise, it will
display an error message (usually the helpful "?").
When the program above is executed, 4 and 2 will be pushed to the
stack. After that, the word "+" will be executed. "+" will take the
top two elements of the stack, add them and push the result back to
the stack. After it has executed, we have only the number 6 on the
stack. After that we push three and end up with the numbers 3 and 6
on the stack. Finally we execute the word "*" which will pull the
top two elements of the stack, multiply them and push the result.
Thus we end up with (4+2)*6 on the stack.
This is first of all a very convenient style of notation. Unlike
in-fix notation where we need to memorize operator precedence and
track parentheses, reverse polish notation is simply read from left
to right. It's also point free so far in the sense that any
subsequence of the program is a valid program in itself that can be
applied to the stack. This has some important implications for
Second, it's relatively easy to implement.
Let's say that we want to apply the program above not only to the
number 4 but to 5 and 7 as well. We could write:
4 2 + 3 *
5 2 + 3 *
7 2 + 3 *
and consider ourselves done. If you are a programmer you might think
that this common behavior is best factored into a function, in case
we need to change it in the future. In Forth, we can define a word
to represent the operation:
: ADJUST 2 + 3 * ;
Here we define the word ADJUST. The definition starts with a colon
followed by a single token representing the name of the new word
followed the sequence of tokens it represents, terminated by a
semicolon. Note that there are no function arguments and no return
values. These operations implicitly operate directly on the stack.
Note also that this is a significant departure from the
interpretation loop described above. The ":" word, when executed,
will take over the interpreter and read the next token ("ADJUST")
and write that into the dictionary as the name of the next entry. It
will then enter the compilation mode of the interpreter. In
compilation mode, instead of executing words and pushing numbers it
will compile code to execute words and push numbers. How it is
compiled exactly depends on the implementation, but for the sake of
this post, let's assume that it compiles it down to machine code.
So we compile, in pseudo assembly:
When we are calling, Forth uses a secondary stack for return
addresses, so we have two separate stacks; a data stack and a return
Finally, how is the word definition terminated if we're in
compilation mode? Shouldn't ";" be compiled just like any word? Here
is another departure: words may individually override the compiler
and execute immediately instead of being compiled. This is usually
represented as a flag in the word dictionary definition.
Let's correct our understanding of the interpreter loop given this
Executing = true
1. Read the next token
2. Look for the token in the word dictionary.
3. If found:
a) If executing or if the word is flagged for immediate execution:
immediately call the code of the word
b) Else, compile the definition of the word at the current compilation
4. Else if the token can be parsed as a number:
a) If executing: push that number to the stack
b) Else, compile code that pushes that number to the stack
5. Else, if neither a number nor word in the dictionary, display an
error message and set Executing to true
A little more complex, but now the complete core of the Forth
interpreter. How, in these terms is ";" achieved? It's a word
flagged for immediate execution that will switch the interpreter to
execution mode and write the return code to the current compilation
So far we've only touched simple arithmetic and the definition of
our ADJUST word, but how is conditional execution achieved in this
model? Let's first see how it looks in practice:
: DIVISIBLE-BY-THREE 3 MOD 0 = ;
: F DUP DIVISIBLE-BY-THREE IF . ELSE DROP THEN ;
We're introducing six new words here. MOD first, which pushes the
remainder of the integer division of two elements popped off the
stack. "=" which compares two elements popped off the stack and
pushes a value representing a boolean value; TRUE if they are and
FALSE if they are not. "." which pops one value off the stack and
prints it as an integer. "DUP" which pops an element off the stack
and pushes it twice, duplicating it. "DROP" which simply pops an
element off the stack without doing anything else.
Finally we have "IF", "ELSE" and "THEN". The order of these tokens
are starting to look a little Yoda-like, but it has a simple
explanation. "IF" will pop one element off the stack. If it's FALSE,
it will jump past the "ELSE" word and continue execution. If it's
TRUE, it will not jump and will execute the code between "IF" and
"THEN", jumping past "ELSE".
Our program defines a word "DIVISIBLE-BY-THREE" that tests whether
the top of the stack is divisible by three. It also defines the word
"F" which will print the number on the top of the stack if it's
divisible by three and drop it if it isn't. We call F with the
number 9 on the stack, so our program should print 9.
Now that we understand what this does in practice, we can consider
IF/ELSE/THEN are implemented. the IF word is executed immediately.
It pushes the current compilation address to the return stack and
compiles a code that branches depending on the top of the stack. A
placeholder address is used for the branch target. By the time we've
passed "IF", our definition of the "F" word looks like this in
We also have the address of our branch-if-true instruction on the
return stack. We continue compiling our print word. We then reach
the "THEN" word which again is marked for immediate execution.
"THEN" will get the current compilation addressand write the correct
branch target to the address pointed at by the top of the return
stack. It will then push the current compilation address to the
return stack and compile a jump instruction, again with a
placeholder target. Our pseudo-assembly code now looks like this:
We then compile DROP call. Finally, "THEN" will execute immediately,
get the current compilation address and write the correct jump
target to the address pointed at by the top of the return stack.
With some clever use of immediate execution words we have defined
IF...ELSE...THEN entirely in terms of Forth to show that these words
are just words like any other without any special meaning assigned
How could the definitions actually look in Forth?
: IF IMMEDIATE HERE >R COMPILE-PLACEHOLDER-BRANCH ;
: ELSE IMMEDIATE HERE R> RESOLVE-FORWARD-REFERENCE COMPILE-PLACEHOLDER-JUMP ;
: THEN IMMEDIATE HERE R> RESOLVE-FORWARD-REFERENCE ;
Let's define the words we depend on here:
- HERE -- push the current compilation address to the data stack
- >R -- pop the top of the data stack and push it to the return
- R> -- pop the top of the return stack and push it to the data
- IMMEDIATE -- flag the currently defined word as an immediate
execution word. This is in itself an immediate execution word.
- COMPILE-PLACEHOLDER-BRANCH -- write to HERE a branch instruction
with a placeholder target
- COMPILE-PLACEHOLDER-JUMP -- same thing, but for a jump.
- RESOLVE-FORWARD-REFERENCE -- pop two addresses off the stack and
change the jump or branch instruction pointed at by the top
address to jump or branch to the second address on the stack.
I think the ultimate benefit of this model is that Forth can be
defined in terms of a small set of primitive words. We need some
small code to get the interpreter and dictionary data structure,
some basic arithmetic words and some stack and memory manipulation
words. From that we can build a more complete language
incrementally. Because you need so little code to bootstrap a Forth
implementation, we typically see it implemented for small, modest
machines. Probably the most widely known implementation of Forth and
Forth application is Open Firmware. If you have a computer that
boots using Open Firmware (say, a late PowerPC Mac) you can actually
boot into a full-fledged Forth development environment. Forth
programs are usually indistinguishible from their execution
environment; there is no clear distinction between the compiler and
the application and the application in itself often extends the
compiler in the same sense IF/ELSE/THEN might.
The inventor, Chuck Moore, was using languages with a meticulous
compilation process at the time he invented Forth. Because Forth
compilation is entirely incremental and relatively simple (compared
to e.g. FORTRAN), it will simply compile as you type it at little
cost in time. This enables a fast write-compile-run loop that from
the initial interpreter can be controlled entirely from a terminal
without multiple compilation passes to break the flow.
Chuck Moore found this useful in his career since he could quickly
adapt to the clients' requirements and even let them define their
own Forth words in terms of higher level words. For example, a user
at a telescope may not know how to interact with a stepper motor or
how to trigger the camera in terms of the low level operations, but
given words like "TURN", "PITCH" "DEGREES" and "PHOTOGRAPH" they
might write a program like
TURN -15 DEGREES
PITCH 60 DEGREES
Chuck Moore has been a proponent of Forth since he invented it. His
latest iteration is called colorForth, a bootable Forth environment
where colors are used to indicate semantic differences in words
instead of punctuation like ":" or ";". His work is extremely
colored by his uncompromising minimalism and disregard for
convention. In colorForth, he wrote a chip CAD and simulator suite.
Using this software, he designed a clockless 144 CPU chip, where the
individual CPUs have 64 18-bit words of RAM. I wholly recommend
looking into his work history. He's not a frequent public speaker,
but there are some talks on youtube, and he used to run a blog and a
website that are now mirrored on github. He strikes me as not
entirely reasonable, driven sometimes to fallacy largely by an
almost religious conviction to his ideals, but you know what they
say about reasonable people and changing the world...
I've written this mostly as a reminder of some of the concepts I've
understood while using and researching Forth. Some of it may be
wrong and I'm most certainly omitting some important details. That
said, I will refer to this document when I eventually write my own