Notes on Google Protocol Buffers Created 03/04/15 Updated 04/10/15, Updated 05/12/15, Updated 01/05/16, Updated 01/08/16 Introduction Google Protocol Buffers is a method of serializing structured data. It is useful in developing programs to communicate with each other over a wire or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data. The design goals for Protocol Buffers emphasized simplicity and performance. In particular, it was designed to be smaller and faster than XML. There is a wide range of language bindings that are supported by Protocol Buffers, but in this document we focus on Java. Protocol Buffers is widely used at Google for storing and interchanging all kinds of structured information. The method serves as a basis for a custom remote procedure call (RPC) system that is used for nearly all intermachine communication at Google.[3] Protocol Buffers is very similar to the Apache Thrift protocol (used by Facebook for example), except that the public Protocol Buffers implementation does not include a concrete RPC protocol stack to use for defined services. Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with other languages or systems might be needed in the future. Purpose of Evaluation We looked at Protocol Buffers during the early work with Kafka (March-April 2014), in which the serialization format was up to us. Other options considered included Avro and Thrift. See our document “Notes on Apache Kafka”. Resources https://developers.google.com/protocol-buffers/ Current version is 2.6.1, released October 2014. Version 3.x is under development. That will include a new version of the IDL capabilities. Concepts This is very similar in concept to the TIBCO Message format. A software developer defines data structures (called messages) and services in a proto definition file (.proto) and compiles it with protoc. This compilation generates code that can be invoked by a sender or recipient of these data structures. For example, example.proto will produce example.pb.cc and example.pb.h, which will define C++ classes for each message and service that example.proto defines. Canonically, messages are serialized into a binary wire format which is compact, forward- and backwardcompatible, but not self-describing (that is, there is no way to tell the names, meaning, or full datatypes of fields without an external specification). There is no defined way to include or refer to such an external specification (schema) within a Protocol Buffers file. The officially supported implementation includes an ASCII serialization format,[4] but this format—though self-describing—loses the forward- and backward-compatibility behavior, and is thus not a good choice for applications other than debugging. You then use the protoc tool to create developer-oriented classes for these messages, which in our case are Java classes. These classes are well-behaved in that they work similarly to regular JavaBeans pojo’s, having getters, setters, and list accessors like our standard Java classes. At the programmer level, you work with instances of these classes. You create new instances of these classes by using their corresponding Builder object, which is a proxy that has setters for the various fields, and then implements a build() method that returns the actual object. Serialize into a byte array using the toByteArray() method. Deserialize from a byte array using the parseFrom() method. Differences from TIBCO Message format TIBCO messages were self-describing. As such, there was no schema definition, instead the field values were packed or unpacked in the order applied. Reviewing the packing (binary) It is quite efficient. For instance, even an integer which is declared as a 32-bit value will take only 1 byte if the value allows it (there are both fixed-width and dynamic-width integers). Getting Started The example project on GitHub appeared to be both the source code of ProtocolBuffers, and a set of examples. Instead we created a project in Java, and specified the Protocol Buffer library as a dependency. The project was derived from their examples, and has a method to add a person to a byte array representation of a address book, and a method to reads from the address book. Creating Instances Builder pattern. Packing and Unpacking byte[] array = object.toByteArray(); object.writeTo(output); <className>.parseFrom(byteArray); Related Tools To examine the generated file as hexadecimal bytes, we installed the HexView plugin for IntelliJ. Additional Capabilities in the IDL Oneof If you have a message with many optional fields and where at most one field will be set at the same time, you can enforce this behavior and save memory by using the oneof feature. Oneof fields are like optional fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members. You can check which value in a oneof is set (if any) using a special case() or WhichOneof() method, depending on your chosen language. To define a oneof in your .proto you use the oneof keyword followed by your oneof name, in this casetest_oneof: message SampleMessage { oneof test_oneof { string name = 4; SubMessage sub_message = 9; } } You then add your oneof fields to the oneof definition. You can add fields of any type, but cannot use the required, optional, or repeated keywords. Using Maven to Build There is a plugin with release 0.4.4 available from sergei-ivanov To use it write a Maven file that looks like: <build> <plugins> <plugin> <groupId>com.google.protobuf.tools</groupId> <artifactId>maven-protoc-plugin</artifactId> <version>0.4.4</version> <configuration> <protocExecutable>/usr/local/bin/protoc</protocExecutable> </configuration> <executions> <execution> <goals> <goal>compile</goal> <goal>test-compile</goal> </goals> </execution> </executions> </plugin> </plugins> </build> This is located in the repo: <pluginRepositories> <pluginRepository> <id>protoc-plugin</id> <url>https://dl.bintray.com/sergei-ivanov/maven/</url> </pluginRepository> </pluginRepositories> Example Projects Created PB01 Simple addPerson, listPersons methods on AddressBook PB02 Handles game state as used in the RallyOn game’s leaderboard fetch/send logic. A Leaderboard consists of a Game plus a list of PlayerScores. A PlayerScore has a player and a score. There are AddPlayerScore and DisplayLeaderboard methods. Data types used include string, int32, and date (stored as int64). There are also messages that describing Tracking and Membership (Join/Leave) events.