“But before you come to any conclusions
Walking in My Shoes – Depeche Mode
Try walking in my shoes”
The Rocinante library has now passed some important milestones. It can read and write messages and masters the scalar base types. The next step is now to support Repeated Fields. Repeated fields are the Protocol Buffer counterpart to lists and arrays in other protocols.
The Repeated Fields got their name from their implementation. Normally, as you can see in the figure above, a record for a field should only appear once in a message. If a field appears several times, Protocol Buffer recipients are advised to regard the last record found as the value sent. Sub-Messages are a different matter, but Rocinante has not yet mastered this.
The above figure has two interpretations, depending on the type of field 5. For a normal field, it has the value of the last record. As a repeated field, its value is a list of the values of the three records. In this example, it is also irrelevant that a record for field 7 is between the second and third record.
This characteristic of Protocol Buffer messages could be used, for example, to attach trace data to a message without decoding the message beforehand. It is completely sufficient to add the binary representation of a trace field to the binary representation of the message.
For the implementation of the Repeated Fields, not much changes to the existing code. In the templates, the field type for a Repeated Field changes to a generic list.
A list must therefore be initialised for the read
method. As with the optional field, the wrapper type must also be used here in the case that the field type is a primitive type.
<#if field.repeated> List<${field.type.wrapper}> ${field.name} = new ArrayList<>(); <#elseif field.optional> ${field.type.wrapper} ${field.name} = null; <#else> ${field.type.type} ${field.name} = ${field.type.initial}; </#if>
In addition to initialising the local variable, the assignment of the read-in value must also change. The List#add
method is used instead of the simple assignment.
<#if (field.repeated)> ${field.name}.add(protoInputStream.read${field.type.io}(wireType)); <#else> ${field.name} = protoInputStream.read${field.type.io}(wireType); </#if>
The solution for the write
method is somewhat simpler. Instead of calling the corresponding ProtoOutputStream
method for each element of the list, there is an overloaded method for lists in the ProtoOutputStream
. Why this is the case is explained by another feature of the Repeated Fields.
For a simple list of scalars, the coding described above is very inefficient. Since each record consists of a field type and the actual value, in the worst case there is an overhead of more than 50% to the user data.
An alternative coding for Repeated Fields uses the wire type LEN
. Fields with the wire type LEN
contain a value consisting of the actual payload and its length as a VarInt inserted before it. In the actual payload, the list values are coded one after the other.
This Packed Repeated Fields are possible for wire types VARINT
, I64
and I32
. A packed version is not necessary for the LEN
wire type. The payload is usually much greater than the overhead.
The Rocinante ProtoOutputStream
provides special methods for writing Packed Repeated Fields. The writeRepeatedVarInt
is shown here as an example.
private <T> void writeRepeatedVarInt(int fieldNumber, List<T> values, Function<T, Integer> convert) throws IOException { ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); ProtoOutputStream protoOutputStream = new ProtoOutputStream(outputStream); for (T value : values) { protoOutputStream.writeVariableByteInt(convert.apply(value)); } byte[] bytes = outputStream.toByteArray(); writeWireTypeAndFieldNumber(WireType.LEN, fieldNumber); writeVariableByteInt(bytes.length); write(bytes); }
First, the values from the list are stored in a byte
array using a local ProtoOutputStream
. Then a record is written with the wire type LEN
by first writing the length of the byte
array as VarInt and then the byte
array itself in the stream. A Function
is passed to the method so that this method can be used not only for int32
(Function.identity()
), but also for sint32
(value -> (value >> 31) ^ (value << 1)
) and bool
(value -> value ? 1 : 0
).
There is a small problem here that will become even more noticeable with sub-messages. The previously supported types with the wire type LEN
have a payload whose length in bytes is easy to determine. With a list of VarInt, the length of the payload is only known after the values have been coded. However, the length must be coded as VarInt before the actual payload. With relatively short lists, extra memory for the additional byte
array only needs to be provided for a short time. It becomes more awkward later if nested sub-messages are to be encoded.
When a Packed Repeated Field is decoded, the wire type must be checked first. For a variety of reasons, a wire type VARINT, I32 or I64 can also be entered here instead of a wire type LEN. In this case, the field was not encoded as a Packed Repeated Field but as a Repeated Field or a normal field.
The following readIntegers
method returns a single-element list for the wire type VARINT
. For the wire type LEN
a list of all found values is returned. For this the length of the payload is read first and then all elements contained in the payload are read.
private <T> List<T> readIntegers(WireType wireType, Function<Integer, T> converter) throws IOException { if (wireType == WireType.VARINT) { return List.of(converter.apply(readVariableByteInt())); } if (wireType == WireType.LEN) { int length = readVariableByteInt(); LimitedProtoInputStream limitedProtoInputStream = new LimitedProtoInputStream(this, length); List<T> result = new ArrayList<>(); do { result.add(converter.apply(limitedProtoInputStream.readVariableByteInt())); } while (limitedProtoInputStream.available() > 0); return result; } throw new IOException("wrong type: " + wireType); }
To prevent the method from being complicated by the length checks, the reading of the values is outsourced to a subclass of the ProtoInputStream
. This LimitedProtoInputStream
can only read the specified number of bytes from the original InputStream
.
The readIntegers
method is used in the Proto classes to insert the INT32
values into the result list. The template becomes somewhat more complex at this point, as the result list must be filled with List#addAll
in the case of a Packed Repeated Field.
<#if (field.repeated)> <#if (field.type.packable)> ${field.name}.addAll(protoInputStream.read${field.type.io}s(wireType)); <#else> ${field.name}.add(protoInputStream.read${field.type.io}(wireType)); </#if> <#else> ${field.name} = protoInputStream.read${field.type.io}(wireType); </#if>
This concludes the implementation of Packed Repeated Fields in Rocinante for the time being. However, this chapter must be reopened at the latest when Sub-Messages are implemented.