Rocinante – repeated fields

“But before you come to any conclusions
Try walking in my shoes”

Walking in My Shoes – Depeche Mode

The Rocinante library has now passed some important milestones. It can read and write messages and masters the scalar base types. The next step is now to support Repeated Fields. Repeated fields are the Protocol Buffer counterpart to lists and arrays in other protocols.

The Repeated Fields got their name from their implementation. Normally, as you can see in the figure above, a record for a field should only appear once in a message. If a field appears several times, Protocol Buffer recipients are advised to regard the last record found as the value sent. Sub-Messages are a different matter, but Rocinante has not yet mastered this.

The above figure has two interpretations, depending on the type of field 5. For a normal field, it has the value of the last record. As a repeated field, its value is a list of the values of the three records. In this example, it is also irrelevant that a record for field 7 is between the second and third record.

This characteristic of Protocol Buffer messages could be used, for example, to attach trace data to a message without decoding the message beforehand. It is completely sufficient to add the binary representation of a trace field to the binary representation of the message.

For the implementation of the Repeated Fields, not much changes to the existing code. In the templates, the field type for a Repeated Field changes to a generic list.

A list must therefore be initialised for the read method. As with the optional field, the wrapper type must also be used here in the case that the field type is a primitive type.

<#if field.repeated>
  List<${field.type.wrapper}> ${field.name} = new ArrayList<>();
<#elseif field.optional>
  ${field.type.wrapper} ${field.name} = null;
<#else>
  ${field.type.type} ${field.name} = ${field.type.initial};
</#if>

In addition to initialising the local variable, the assignment of the read-in value must also change. The List#add method is used instead of the simple assignment.

<#if (field.repeated)>
  ${field.name}.add(protoInputStream.read${field.type.io}(wireType));
<#else>
  ${field.name} = protoInputStream.read${field.type.io}(wireType);
</#if>

The solution for the write method is somewhat simpler. Instead of calling the corresponding ProtoOutputStream method for each element of the list, there is an overloaded method for lists in the ProtoOutputStream. Why this is the case is explained by another feature of the Repeated Fields.

For a simple list of scalars, the coding described above is very inefficient. Since each record consists of a field type and the actual value, in the worst case there is an overhead of more than 50% to the user data.

An alternative coding for Repeated Fields uses the wire type LEN. Fields with the wire type LEN contain a value consisting of the actual payload and its length as a VarInt inserted before it. In the actual payload, the list values are coded one after the other.

This Packed Repeated Fields are possible for wire types VARINT, I64 and I32. A packed version is not necessary for the LEN wire type. The payload is usually much greater than the overhead.

The Rocinante ProtoOutputStream provides special methods for writing Packed Repeated Fields. The writeRepeatedVarInt is shown here as an example.

private <T> void writeRepeatedVarInt(int fieldNumber, List<T> values, Function<T, Integer> convert) throws IOException {
  ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
  ProtoOutputStream protoOutputStream = new ProtoOutputStream(outputStream);
  for (T value : values) {
    protoOutputStream.writeVariableByteInt(convert.apply(value));
  }
  byte[] bytes = outputStream.toByteArray();
  writeWireTypeAndFieldNumber(WireType.LEN, fieldNumber);
  writeVariableByteInt(bytes.length);
  write(bytes);
}

First, the values from the list are stored in a byte array using a local ProtoOutputStream. Then a record is written with the wire type LEN by first writing the length of the byte array as VarInt and then the byte array itself in the stream. A Function is passed to the method so that this method can be used not only for int32 (Function.identity()), but also for sint32 (value -> (value >> 31) ^ (value << 1)) and bool (value -> value ? 1 : 0).

There is a small problem here that will become even more noticeable with sub-messages. The previously supported types with the wire type LEN have a payload whose length in bytes is easy to determine. With a list of VarInt, the length of the payload is only known after the values have been coded. However, the length must be coded as VarInt before the actual payload. With relatively short lists, extra memory for the additional byte array only needs to be provided for a short time. It becomes more awkward later if nested sub-messages are to be encoded.

When a Packed Repeated Field is decoded, the wire type must be checked first. For a variety of reasons, a wire type VARINT, I32 or I64 can also be entered here instead of a wire type LEN. In this case, the field was not encoded as a Packed Repeated Field but as a Repeated Field or a normal field.

The following readIntegers method returns a single-element list for the wire type VARINT. For the wire type LEN a list of all found values is returned. For this the length of the payload is read first and then all elements contained in the payload are read.

private <T> List<T> readIntegers(WireType wireType, Function<Integer, T> converter) throws IOException {
  if (wireType == WireType.VARINT) {
    return List.of(converter.apply(readVariableByteInt()));
  }
  if (wireType == WireType.LEN) {
    int length = readVariableByteInt();
    LimitedProtoInputStream limitedProtoInputStream = new LimitedProtoInputStream(this, length);
    List<T> result = new ArrayList<>();
    do {
      result.add(converter.apply(limitedProtoInputStream.readVariableByteInt()));
    } while (limitedProtoInputStream.available() > 0);
    return result;
  }
  throw new IOException("wrong type: " + wireType);
}

To prevent the method from being complicated by the length checks, the reading of the values is outsourced to a subclass of the ProtoInputStream. This LimitedProtoInputStream can only read the specified number of bytes from the original InputStream.

The readIntegers method is used in the Proto classes to insert the INT32 values into the result list. The template becomes somewhat more complex at this point, as the result list must be filled with List#addAll in the case of a Packed Repeated Field.

<#if (field.repeated)>
  <#if (field.type.packable)>                                    ${field.name}.addAll(protoInputStream.read${field.type.io}s(wireType));
  <#else>
${field.name}.add(protoInputStream.read${field.type.io}(wireType));
  </#if>
<#else>
  ${field.name} = protoInputStream.read${field.type.io}(wireType);
</#if>

This concludes the implementation of Packed Repeated Fields in Rocinante for the time being. However, this chapter must be reopened at the latest when Sub-Messages are implemented.

Leave a Comment