Rocinante – the class generator

“Code generation, like drinking alcohol, is good in moderation.”

Alex Lowe

In the first article on Project Rocinante, we discussed the processing of Protocol Buffer definition files with CongoCC. In this post we will create the first Java classes based on this definitions.

The starting point for source code generation is the result produced by the CongoCC-based Protocol Buffer Parser. This parser generates an Abstract-Syntax-Tree (AST) from the input data, which contains nodes for the recognized syntactic elements.

To generate Java classes, we need to extract all the necessary data from the AST and provide it to our code generator. We use the Visitor-Pattern, which has already done a good job in the Hamcrest Matcher Generator and Enum Converter Generator projects.

Before we go into the details of our new Visitor, I would like to explain why we need the Visitor-Pattern at all. You would think that we could simply wander through the tree and collect all the information? After all, we have a syntax and the tree will probably follow it?

But the AST is a sparse tree that only contains the syntactic elements that are actually needed. The next example shows what this means. Let’s take a look at a very simple grammar.

MultiplicativeExpression :
    <NUMBER> ( (<TIMES>|<DIVIDE|<MODULO>) <NUMBER> )*
;

A multiplication can consist of a number or a number followed by any sequence of operator-number pairs. The expression 1*1 results in an AST of the form MultiplocativeExpression -> NUMBER, TIMES, NUMBER. However, the AST for the expression 42 is only NUMBER. Simply explained, all nodes that only have a single sub-node are omitted.

This has serious consequences for the evaluation of the tree, because we have to check the type of a node before processing it. In our example, it is the MultiplicativeExpression or is it a NUMBER. A naive implementation checks the type of the current node and calls the appropriate processing routine depending on the context.

void evaluateExpression(Node node) {
  if (node instanecof MultiplicativeExpression expression) {
    evaluateExpression(expression);
  } else if (node instanceof NUMBER number) {
    evaluateNUmber(number);
  }
}

Depending on how many possible types have to be considered, the processing code can become very ugly.

The Visitor-Pattern is a more elegant choice, because the nodes choose their processing method on the Visitor itself. Each node class has an accept method with the Visitor as a parameter. In this accept method, the node calls its intended method on the Visitor.

The ProtoVisitor shown here has a longer list of visit methods that have the visited class as their first parameter.

public interface ProtoVisitor<I, O> {
  default O visit(MessageProduction message, I input) {
    return null;
  }

  default O visit(MessageProduction message, I input) {
    return null;
  }

  // ...
}

The MessageProduction class has an accept method that calls the above visit method for MessageProduction on implementations of ProtoVisitor.

public class MessageProduction extends BaseNode {

  public <I, O> O accept(ProtoVisitor<I, O> visitor, I input) {
    return visitor.visit(this, input);
  }
}

This means that a type-specific implementation is called for any node when the accept method is called, or the default method if no implementation is required.

The work of our ProtoVisitor implementation begins in the visit method for Proto instances.

class GeneratorVisitor implements ProtoVisitor<GeneratorContext, Void> {

  @Override
  public Void visit(Proto proto, GeneratorContext context) {
    proto.childrenOfType(PackageProduction.class).forEach(x -> x.accept(this, context));
    proto.childrenOfType(EnumProduction.class).forEach(x -> x.accept(this, context));
    proto.childrenOfType(MessageProduction.class).forEach(x -> x.accept(this, context));
    return null;
  }
    
  // ...
}

This method is quite simple to understand. For all sub-nodes of type PackageProduction, EnumProduction and MessageProduction, their accept method is called.
For our first implementation, this means that we do not support imports and global options.

The PackageProduction processing is simple. If a PackageProduction exists, the package name from the PackageProduction is saved directly in the GeneratorContext.

@Override
public Void visit(PackageProduction packageProduction, GeneratorContext context) {
  context.setPackageName(packageProduction.get(1).toString());
  return null;
}

The other two processing methods are more sophisticated, but very similar. Since the previous article promised to show the generation of Java sources for MessageProduction, it is used here for illustration purposes.

The evaluation in the GeneratorVisitor for a MessageProduction is limited to the extraction of the name and the creation of a MassagePattern instance. All the data required for generation is stored in this instance. The evaluation of the MessageBody is then delegated to a MessageVisitor, which inserts further information into the MessagePattern instance. Its mode of operation will be presented in a subsequent article so as not to make this article even larger.

@Override
public Void visit(MessageProduction message, GeneratorContext context) {
  MessagePattern messagePattern = new MessagePattern(message.get(1).toString());
  context.addMessage(messagePattern);
  message.get(2).accept(MESSAGE_VISITOR, new MessagePatternWithContext(messagePattern, context));
  return null;
}

Now that all the necessary information for the message class has finally been collected, the source code can be generated. The ProtoGenerator indirectly calls the createMessageClass in its generate method for each MessagePattern instance. The method is called indirectly because the associated IO class is also created in the createMessageFile method. However, this class will only be discussed in the next article when it comes to encoding and decoding of protocol buffer messages.

public void generate(String source, ProtoSources protoSources, ProtoDestinations destinations) throws IOException {
  ProtocolbuffersParser parser = new ProtocolbuffersParser(source, protoSources);
  parser.Proto();
  GeneratorContext context = new GeneratorContext();
  parser.rootNode().accept(GENERATOR_VISITOR, context);
  Configuration configuration = new Configuration();
  context.getEnums().forEach(x -> createEnumClass(x, context, configuration, destinations));
  context.getMessages().forEach(m -> createMessageFile(m, context, configuration, destinations));
}

The createMessageClass uses a FreshMarker template to generate the source code. As the message class is a pure transfer object, a record is created by Rocinante. A field for each attribute of the message is inserted in the record. If it is an optional field, a getter method is also inserted which returns an Optional. The corresponding wrapper classes are used here for primitive data types.

private static void createMessageClass(MessagePattern message, GeneratorContext context, Configuration configuration, ProtoDestinations destinations) {
  Template template = configuration.getTemplate("message", """
      <#if package??>
      package ${package};
                
      </#if>              
      import java.util.Optional;

      public record ${message.name}(<#list message.fields as field with loop><#if field.optional>${field.type.wrapper}<#else>${field.type.type}</#if> ${field.name}<#if loop?has_next>, </#if></#list>) {
        <#list message.fields as field with loop>
          <#if field.optional>
        public Optional<${field.type.wrapper}> get${field.name?capitalize}() {
          return Optional.ofNullable(${field.name}());
        }
          </#if>
        </#list>
      }
      """);
  Map<String, Object> model = new HashMap<>();
  model.put("package", context.getPackageName());
  model.put("message", message);
  StringWriter writer = new StringWriter();
  template.process(model, writer);
  try {
    destinations.writeFile(context.getPackageName(), message.getName() + ".java", writer.toString());
  } catch (IOException e) {
    throw new ProtoWriteException(e);
  }
}

Now that the source code for our example message is ready, we can create an instance like this.

Example example = new Example("Rocinante", true, 42);

In the next article we will see how the following byte sequence is encoded from it.

0a 09 52 6f 63 69 6e 61 6e 74 65 10 01 18 2a

1 thought on “Rocinante – the class generator”

Leave a Comment