“Code generation, like drinking alcohol, is good in moderation.”
Alex Lowe
In the first article on Project Rocinante, we discussed the processing of Protocol Buffer definition files with CongoCC. In this post we will create the first Java classes based on this definitions.
The starting point for source code generation is the result produced by the CongoCC-based Protocol Buffer Parser. This parser generates an Abstract-Syntax-Tree (AST) from the input data, which contains nodes for the recognized syntactic elements.
To generate Java classes, we need to extract all the necessary data from the AST and provide it to our code generator. We use the Visitor-Pattern, which has already done a good job in the Hamcrest Matcher Generator and Enum Converter Generator projects.
Before we go into the details of our new Visitor, I would like to explain why we need the Visitor-Pattern at all. You would think that we could simply wander through the tree and collect all the information? After all, we have a syntax and the tree will probably follow it?
But the AST is a sparse tree that only contains the syntactic elements that are actually needed. The next example shows what this means. Let’s take a look at a very simple grammar.
MultiplicativeExpression : <NUMBER> ( (<TIMES>|<DIVIDE|<MODULO>) <NUMBER> )* ;
A multiplication can consist of a number or a number followed by any sequence of operator-number pairs. The expression 1*1
results in an AST of the form MultiplocativeExpression -> NUMBER, TIMES, NUMBER
. However, the AST for the expression 42
is only NUMBER
. Simply explained, all nodes that only have a single sub-node are omitted.
This has serious consequences for the evaluation of the tree, because we have to check the type of a node before processing it. In our example, it is the MultiplicativeExpression
or is it a NUMBER
. A naive implementation checks the type of the current node and calls the appropriate processing routine depending on the context.
void evaluateExpression(Node node) { if (node instanecof MultiplicativeExpression expression) { evaluateExpression(expression); } else if (node instanceof NUMBER number) { evaluateNUmber(number); } }
Depending on how many possible types have to be considered, the processing code can become very ugly.
The Visitor-Pattern is a more elegant choice, because the nodes choose their processing method on the Visitor itself. Each node class has an accept
method with the Visitor as a parameter. In this accept
method, the node calls its intended method on the Visitor.
The ProtoVisitor
shown here has a longer list of visit
methods that have the visited class as their first parameter.
public interface ProtoVisitor<I, O> { default O visit(MessageProduction message, I input) { return null; } default O visit(MessageProduction message, I input) { return null; } // ... }
The MessageProduction
class has an accept
method that calls the above visit
method for MessageProduction
on implementations of ProtoVisitor
.
public class MessageProduction extends BaseNode { public <I, O> O accept(ProtoVisitor<I, O> visitor, I input) { return visitor.visit(this, input); } }
This means that a type-specific implementation is called for any node when the accept
method is called, or the default method if no implementation is required.
The work of our ProtoVisitor
implementation begins in the visit
method for Proto
instances.
class GeneratorVisitor implements ProtoVisitor<GeneratorContext, Void> { @Override public Void visit(Proto proto, GeneratorContext context) { proto.childrenOfType(PackageProduction.class).forEach(x -> x.accept(this, context)); proto.childrenOfType(EnumProduction.class).forEach(x -> x.accept(this, context)); proto.childrenOfType(MessageProduction.class).forEach(x -> x.accept(this, context)); return null; } // ... }
This method is quite simple to understand. For all sub-nodes of type PackageProduction
, EnumProduction
and MessageProduction
, their accept
method is called.
For our first implementation, this means that we do not support imports and global options.
The PackageProduction
processing is simple. If a PackageProduction
exists, the package name from the PackageProduction
is saved directly in the GeneratorContext
.
@Override public Void visit(PackageProduction packageProduction, GeneratorContext context) { context.setPackageName(packageProduction.get(1).toString()); return null; }
The other two processing methods are more sophisticated, but very similar. Since the previous article promised to show the generation of Java sources for MessageProduction
, it is used here for illustration purposes.
The evaluation in the GeneratorVisitor
for a MessageProduction
is limited to the extraction of the name and the creation of a MassagePattern
instance. All the data required for generation is stored in this instance. The evaluation of the MessageBody
is then delegated to a MessageVisitor
, which inserts further information into the MessagePattern
instance. Its mode of operation will be presented in a subsequent article so as not to make this article even larger.
@Override public Void visit(MessageProduction message, GeneratorContext context) { MessagePattern messagePattern = new MessagePattern(message.get(1).toString()); context.addMessage(messagePattern); message.get(2).accept(MESSAGE_VISITOR, new MessagePatternWithContext(messagePattern, context)); return null; }
Now that all the necessary information for the message class has finally been collected, the source code can be generated. The ProtoGenerator
indirectly calls the createMessageClass
in its generate
method for each MessagePattern
instance. The method is called indirectly because the associated IO class is also created in the createMessageFile
method. However, this class will only be discussed in the next article when it comes to encoding and decoding of protocol buffer messages.
public void generate(String source, ProtoSources protoSources, ProtoDestinations destinations) throws IOException { ProtocolbuffersParser parser = new ProtocolbuffersParser(source, protoSources); parser.Proto(); GeneratorContext context = new GeneratorContext(); parser.rootNode().accept(GENERATOR_VISITOR, context); Configuration configuration = new Configuration(); context.getEnums().forEach(x -> createEnumClass(x, context, configuration, destinations)); context.getMessages().forEach(m -> createMessageFile(m, context, configuration, destinations)); }
The createMessageClass
uses a FreshMarker template to generate the source code. As the message class is a pure transfer object, a record is created by Rocinante. A field for each attribute of the message is inserted in the record. If it is an optional field, a getter method is also inserted which returns an Optional
. The corresponding wrapper classes are used here for primitive data types.
private static void createMessageClass(MessagePattern message, GeneratorContext context, Configuration configuration, ProtoDestinations destinations) { Template template = configuration.getTemplate("message", """ <#if package??> package ${package}; </#if> import java.util.Optional; public record ${message.name}(<#list message.fields as field with loop><#if field.optional>${field.type.wrapper}<#else>${field.type.type}</#if> ${field.name}<#if loop?has_next>, </#if></#list>) { <#list message.fields as field with loop> <#if field.optional> public Optional<${field.type.wrapper}> get${field.name?capitalize}() { return Optional.ofNullable(${field.name}()); } </#if> </#list> } """); Map<String, Object> model = new HashMap<>(); model.put("package", context.getPackageName()); model.put("message", message); StringWriter writer = new StringWriter(); template.process(model, writer); try { destinations.writeFile(context.getPackageName(), message.getName() + ".java", writer.toString()); } catch (IOException e) { throw new ProtoWriteException(e); } }
Now that the source code for our example message is ready, we can create an instance like this.
Example example = new Example("Rocinante", true, 42);
In the next article we will see how the following byte sequence is encoded from it.
0a 09 52 6f 63 69 6e 61 6e 74 65 10 01 18 2a
1 thought on “Rocinante – the class generator”