❝A whole stack of memories never equal one little hope.❞
Charles M. Schulz
Once again there was an funny code challenge from John Crickett. This time a custom Forth interpreter was to be implemented. Forth is a very interesting language that I got to know in my youth. Unfortunately, I don’t remember exactly, but I probably tried it out on my Commodore 128D at some point.
Forth is a stack-oriented programming language, which can be very confusing for anyone who has only ever seen C and Pascal-like languages. Forth programs work on a stack. You can store numbers on this stack and work with them in a variety of ways. The individual instructions in Forth are called words and I would like to present some very simple ones here with the following Forth program.
10 2 5 * + . cr
The first three words are integer numbers and these are placed on the stack one after the other. This is followed by *
as the word for the multiplication. The multiplication takes the two top values from the stack, multiplies them and places the product back on top of the stack. So where there were just 10
, 2
and 5
, there are now only 10
and 10
. Next comes the word +
for addition. Addition works in the same way as multiplication and places the sum on top of the stack. The stack now only contains the value 20
. The program is not finished yet, because the words .
and cr
are still at the end. The word .
takes the top entry from the stack and writes it to the output. cr
writes a line break to the output.
In addition to these simple commands, there are also definitions to define your own words, loops and conditional constructs. The following two lines are a simple example of definitions.
: rot-13 0 do rot13 loop cr ; : rot13 13 - emit ;
rot-13
is a word with which a word is decoded from some number using Caesar Encryption. After the :
, the definition begins with the name of the new word. This is followed by the words from which the definition is composed until the final ;
. Within this definition is a do-loop loop in which the word rot13
is called. The loop accesses two values on the stack, the end value and the start value of the loop counter. The start value is specified as 0
within rot-13
, the end value must be stored on the stack before rot-13
is called.
The second word rot13
has a simpler structure. It subtracts 13
from the current value on the stack and then outputs it as a character.
96 91 82 87 4 rot-13
This call of rot-13
generates the output JENS
, because the first four characters are the corresponding ASCII codes plus 13
and the parameter 4
is the end value of the loop in rot-13
.
So how do you build an interpreter for Forth? First of all, you need to understand the relatively simple grammar of the language. The best way to solve this problem is with a parser.
Definition : <COLON> <IDENTIFIER> (Word)+ <SEMICOLON> ; IfThenElse : <IF> (Word)* [ <ELSE> (Word)* ] <THEN> ; DoLoop : <DO> (Word)+ <LOOP> ; Word : ( <INTEGER> | <DOT> | <PLUS> | <MINUS> | <CR> | <EMIT> | Definition | <IDENTIFIER> | IfThenElse | DoLoop | <I> ) ; Root : (Word)* <EOF> ;
This is the core grammar with some omissions in the Word
rule. This is for didactic purposes to improve readability. This grammar does not master the Forth standard but only the part that was required in the competition. As always, the concepts of KISS and YAGNI should be taken to heart.
The classes generated by this CongoCC grammar can be supplemented with additional code via the Inject mechanism. As so often, I have used the Visitor pattern here.
INJECT DoLoop : import de.schegge.forth.ForthVisitor; { public <IN, OUT> OUT accept(ForthVisitor<IN, OUT> visitor, IN input) { return visitor.visit(this, input); } }
As the grammar is not particularly complex, two implementations of the ForthVisitor
interface are sufficient.
public class ForthBuilder implements ForthVisitor<AtomicInteger, List<Action>> { private static final WordBuilder visitor = new WordBuilder(); @Override public List<Action> visit(Root root, AtomicInteger input) { return root.children().stream().map(c -> c.accept(visitor, input)).filter(Objects::nonNull).toList(); } @Override public List<Action> visit(Definition definition, AtomicInteger input) { return List.of(definition.accept(visitor, input)); } @Override public List<Action> visit(IfThenElse ifThenElse, AtomicInteger input) { return List.of(ifThenElse.accept(visitor, input)); } @Override public List<Action> visit(DoLoop doLoop, AtomicInteger input) { return List.of(doLoop.accept(visitor, input)); } @Override public List<Action> visit(Token token, AtomicInteger input) { return token.getType() == TokenType.EOF ? List.of() : List.of(token.accept(visitor, input)); } @Override public List<Action> visit(Node node, AtomicInteger input) { return List.of(node.accept(visitor, input)); } }
The ForthBuilder
is the entry point for transforming the parse tree into a Forth program in the form of a List<Action>
. As the result of parsing can be any (almost any) Node
, it must be possible to visit all variants. Almost all of them forward the processing directly to the second visitor WordBuilder
. The exceptions are Root
and Token
. Root
is created when the Forth program actually consists of a list of words. Therefore, all children nodes are run through here and the words supplied by them are added as a list. A possible null
value is filtered out of the list; this is created by the Token
EOF
, which does not return a word but null
. Token also has special processing because of EOF
, because in this case an empty list is to be returned.
Root : (Word)* ;
Empty programs are created if the input is empty or only contains a comment. This is stated by the root rule, which is also valid without a single word. Then the parse tree consists only of EOF
.
The second visitor WordBuilder
is more complex because it has to extract the information-containing nodes for Definition
, DoLoop
and IfThenElse
.
public Action visit(Definition definition, AtomicInteger input) { String name = definition.get(1).toString(); List<Action> definitionActions = parseSequence(2, definition.size() - 1, definition, input); return context -> context.definitions().put(name, definitionActions); }
For Definition
, this is the second node, which contains the name and the nodes 2 to size() -1, which contain the words that make up the new definition.
The Forth basic vocabulary is processed in the visit method for Token
.
public Action visit(Token token, AtomicInteger input) { return switch (token.getType()) { case TokenType.BYE -> context -> { throw new GoodByeException(); }; case TokenType.PLUS -> context -> context.push(context.pop() + context.pop()); // ... }
The words BYE
and PLUS
are shown here as an example. Both return an Action
in the form of a lamda. A GoodBeException
is thrown by the Action
for BYE
, which exits the processing loop somewhat inelegantly. The lamda for PLUS
fetches two values from the stack with pop
and writes the sum to the stack with push
.
The rest of the implementation only takes care of parsing the input and calling the execute
methods of the actions during program execution. If you want to take a look at the Forth interpreter, the link can be found on GitHub. Have fun.
https://github.com/CodingChallengesFYI/SharedSolutions/blob/main/Solutions/challenge-forth.md