Rocinante – a poor mans protocol buffer

“Trust your gut before your head gets in the way.”

Josephus Miller

For a software developer who wants to improve his own skills, it is always helpful to try out new topics. A nice side effect of such small projects is that you always get new material for articles like this one.

From time to time, you’re lucky that current projects or research leads to a new topic.
If you are not lucky enough to find such topics yourself, you should visit John Crickett‘s Coding Challenges page. New challenges are regularly introduced there and community solutions are also presented.

Personally, I recently stumbled across the Protocol Buffer documentation from Google and found the binary representation of messages interesting.However, I didn’t really like the grammar of the proto files and the generated Java API. In particular, the very brief documentation of the semantics raised more questions than it answered.

syntax = "proto3";

package de.schegge.example;

message Example {
  optional string text = 1;
  bool flag = 2;
  int32 number = 3;
}

A simple proto file consists of a few optional constructs. In this case, a syntax statement to specify the protocol version, a package statement to specify a namespace and a message statement that describes a protocol buffer message with three fields.

What is peculiar about the official grammar is that some of the semantics would be obsolete with a better formulated syntax. For example, Syntax may only be used once at the beginning of the file. Similarly, Package may only appear once somewhere after Syntax in the file. But the file format description does not mention which elements of the file the Package definition applies to? For all in the file or all after the Package definition?

The Protocol Buffer project will be called Rocinante and I have already designed a logo for the project and the cover image for this article. However, it’s actually not the horse that gives the project its name, but the spaceship.

The first milestone should be a simple cut-through for my own Protocol Buffer library. A parser for proto files, the generation of Java classes from the parsed data and a rudimentary binary input and output.

The tool of choice for writing a parser has been the CongoCC parser generator by Jonathan Revusky for some time now. CongoCC and its predecessor JavaCC 21 have already been celebrated here in several articles and CongoCC should also show its strengths in this project.

The following excerpt from the CongoCC grammar file base-protocol-buffers.ccc shows the input rule with an interesting CongoCC feature. The CongoCC Preprocessor allows different variants of a grammar to be formulated in the same file. The lower part shows the rule from the original grammar and the upper part, the more clearly formulated rule from the rocinante grammar. If the variable rosi is defined, the Rocinante grammar is generated, otherwise the original grammar.

Proto :
#if rosi
  [Syntax]
  [Package]
  (Import)*
  ( Option | Message | Enum )*
#else
  ( Syntax | Package | Import | Option | Message | Enum | <SEMICOLON>)*
#endif
  <EOF>
;

The rocinante grammar takes into account the special semantics of Syntax and Package and also introduces the rule that Import statements belong after Package and before the other constructs. For Java developers, this is the only correct interpretation for an import statement.

The base-protocol-buffers.ccc file is imported by the protocol-buffers.ccc, which provides the nodes from the grammar rules with an additional method by an INJECT Statement. This method implement the Visitor-Pattern on the necessary nodes.

INJECT Proto :
  import de.schegge.rosinante.core.ProtoVisitor;
{
  public <I, O> O accept(ProtoVisitor<I, O> visitor, I input) {
    return visitor.visit(this, input);
  }
}

The original grammar is generated if the protocol-buffer.ccc is used. The Rocinante grammar is generated if the following file rosi.ccc is used. The roci.ccc file imports the protocol-buffer.ccc file and additional defines the variable rosi. This activates alternative paths as in the Proto rule in the base-protocol-buffer.ccc file.

#define rosi

INCLUDE "protocol-buffers.ccc"

The Rocinante Parser based on CongoCC generates the following Java source file from the example file. The next article shows exactly how this works.

package de.schegge.example;
           
import java.util.Optional;
                     
record Example(String text, boolean flag, int number) {
  public Optional<String> getText() {
    return Optional.ofNullable(text());
  }
}

Leave a Comment