A Custom Forth Interpreter

❝A whole stack of memories never equal one little hope.❞

Charles M. Schulz

Once again there was an funny code challenge from John Crickett. This time a custom Forth interpreter was to be implemented. Forth is a very interesting language that I got to know in my youth. Unfortunately, I don’t remember exactly, but I probably tried it out on my Commodore 128D at some point.

Forth is a stack-oriented programming language, which can be very confusing for anyone who has only ever seen C and Pascal-like languages. Forth programs work on a stack. You can store numbers on this stack and work with them in a variety of ways. The individual instructions in Forth are called words and I would like to present some very simple ones here with the following Forth program.

10 2 5 * + . cr

The first three words are integer numbers and these are placed on the stack one after the other. This is followed by * as the word for the multiplication. The multiplication takes the two top values from the stack, multiplies them and places the product back on top of the stack. So where there were just 10, 2 and 5, there are now only 10 and 10. Next comes the word + for addition. Addition works in the same way as multiplication and places the sum on top of the stack. The stack now only contains the value 20. The program is not finished yet, because the words . and cr are still at the end. The word . takes the top entry from the stack and writes it to the output. cr writes a line break to the output.

In addition to these simple commands, there are also definitions to define your own words, loops and conditional constructs. The following two lines are a simple example of definitions.

: rot-13 0 do rot13 loop cr ;
: rot13 13 - emit ;

rot-13 is a word with which a word is decoded from some number using Caesar Encryption. After the :, the definition begins with the name of the new word. This is followed by the words from which the definition is composed until the final ;. Within this definition is a do-loop loop in which the word rot13 is called. The loop accesses two values on the stack, the end value and the start value of the loop counter. The start value is specified as 0 within rot-13, the end value must be stored on the stack before rot-13 is called.

The second word rot13 has a simpler structure. It subtracts 13 from the current value on the stack and then outputs it as a character.

96 91 82 87 4 rot-13

This call of rot-13 generates the output JENS, because the first four characters are the corresponding ASCII codes plus 13 and the parameter 4 is the end value of the loop in rot-13.

So how do you build an interpreter for Forth? First of all, you need to understand the relatively simple grammar of the language. The best way to solve this problem is with a parser.

Definition :
    <COLON> <IDENTIFIER> (Word)+ <SEMICOLON>
;

IfThenElse :
   <IF> (Word)* [ <ELSE> (Word)* ] <THEN>
;

DoLoop :
  <DO> (Word)+ <LOOP>
;

Word :
    ( <INTEGER>
    | <DOT>
    | <PLUS>
    | <MINUS>
    | <CR>
    | <EMIT>
    | Definition
    | <IDENTIFIER>
    | IfThenElse
    | DoLoop
    | <I>
    )
;

Root :
    (Word)*
    <EOF>
;

This is the core grammar with some omissions in the Word rule. This is for didactic purposes to improve readability. This grammar does not master the Forth standard but only the part that was required in the competition. As always, the concepts of KISS and YAGNI should be taken to heart.

The classes generated by this CongoCC grammar can be supplemented with additional code via the Inject mechanism. As so often, I have used the Visitor pattern here.

INJECT DoLoop :
  import de.schegge.forth.ForthVisitor;
{
   public <IN, OUT> OUT accept(ForthVisitor<IN, OUT> visitor, IN input) {
      return visitor.visit(this, input);
   }
}

As the grammar is not particularly complex, two implementations of the ForthVisitor interface are sufficient.

public class ForthBuilder implements ForthVisitor<AtomicInteger, List<Action>> {
    private static final WordBuilder visitor = new WordBuilder();
    
    @Override
    public List<Action> visit(Root root, AtomicInteger input) {
        return root.children().stream().map(c -> c.accept(visitor, input)).filter(Objects::nonNull).toList();
    }

    @Override
    public List<Action> visit(Definition definition, AtomicInteger input) {
        return List.of(definition.accept(visitor, input));
    }

    @Override
    public List<Action> visit(IfThenElse ifThenElse, AtomicInteger input) {
        return List.of(ifThenElse.accept(visitor, input));
    }

    @Override
    public List<Action> visit(DoLoop doLoop, AtomicInteger input) {
        return List.of(doLoop.accept(visitor, input));
    }

    @Override
    public List<Action> visit(Token token, AtomicInteger input) {
        return token.getType() == TokenType.EOF ? List.of() : List.of(token.accept(visitor, input));
    }

    @Override
    public List<Action> visit(Node node, AtomicInteger input) {
        return List.of(node.accept(visitor, input));
    }
}

The ForthBuilder is the entry point for transforming the parse tree into a Forth program in the form of a List<Action>. As the result of parsing can be any (almost any) Node, it must be possible to visit all variants. Almost all of them forward the processing directly to the second visitor WordBuilder. The exceptions are Root and Token. Root is created when the Forth program actually consists of a list of words. Therefore, all children nodes are run through here and the words supplied by them are added as a list. A possible null value is filtered out of the list; this is created by the Token EOF, which does not return a word but null. Token also has special processing because of EOF, because in this case an empty list is to be returned.

Root : 
(Word)* 
;

Empty programs are created if the input is empty or only contains a comment. This is stated by the root rule, which is also valid without a single word. Then the parse tree consists only of EOF.

The second visitor WordBuilder is more complex because it has to extract the information-containing nodes for Definition, DoLoop and IfThenElse.

public Action visit(Definition definition, AtomicInteger input) {
  String name = definition.get(1).toString();
  List<Action> definitionActions = parseSequence(2, definition.size() - 1, definition, input);
  return context -> context.definitions().put(name, definitionActions);
}

For Definition, this is the second node, which contains the name and the nodes 2 to size() -1, which contain the words that make up the new definition.

The Forth basic vocabulary is processed in the visit method for Token.

public Action visit(Token token, AtomicInteger input) {
  return switch (token.getType()) {
    case TokenType.BYE -> context -> { throw new GoodByeException(); };
    case TokenType.PLUS -> context -> context.push(context.pop() + context.pop());
    // ...
}

The words BYE and PLUS are shown here as an example. Both return an Action in the form of a lamda. A GoodBeException is thrown by the Action for BYE, which exits the processing loop somewhat inelegantly. The lamda for PLUS fetches two values from the stack with pop and writes the sum to the stack with push.

The rest of the implementation only takes care of parsing the input and calling the execute methods of the actions during program execution. If you want to take a look at the Forth interpreter, the link can be found on GitHub. Have fun.

https://github.com/CodingChallengesFYI/SharedSolutions/blob/main/Solutions/challenge-forth.md

Leave a Comment