Notes on Google Protocol Buffers

advertisement
Notes on Google Protocol Buffers
Created 03/04/15
Updated 04/10/15, Updated 05/12/15, Updated 01/05/16, Updated 01/08/16
Introduction
Google Protocol Buffers is a method of serializing structured data. It is useful in developing programs to
communicate with each other over a wire or for storing data. The method involves an interface description
language that describes the structure of some data and a program that generates source code from that description
for generating or parsing a stream of bytes that represents the structured data. The design goals for Protocol Buffers
emphasized simplicity and performance. In particular, it was designed to be smaller and faster than XML.
There is a wide range of language bindings that are supported by Protocol Buffers, but in this document we focus on
Java.
Protocol Buffers is widely used at Google for storing and interchanging all kinds of structured information. The
method serves as a basis for a custom remote procedure call (RPC) system that is used for nearly all intermachine communication at Google.[3]
Protocol Buffers is very similar to the Apache Thrift protocol (used by Facebook for example), except that the
public Protocol Buffers implementation does not include a concrete RPC protocol stack to use for
defined services.
Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed
make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with
other languages or systems might be needed in the future.
Purpose of Evaluation
We looked at Protocol Buffers during the early work with Kafka (March-April 2014), in which the serialization
format was up to us. Other options considered included Avro and Thrift. See our document “Notes on Apache
Kafka”.
Resources
https://developers.google.com/protocol-buffers/
Current version is 2.6.1, released October 2014.
Version 3.x is under development. That will include a new version of the IDL capabilities.
Concepts
This is very similar in concept to the TIBCO Message format.
A software developer defines data structures (called messages) and services in a proto definition file (.proto) and
compiles it with protoc. This compilation generates code that can be invoked by a sender or recipient of these data
structures. For example, example.proto will produce example.pb.cc and example.pb.h, which will
define C++ classes for each message and service that example.proto defines.
Canonically, messages are serialized into a binary wire format which is compact, forward- and backwardcompatible, but not self-describing (that is, there is no way to tell the names, meaning, or full datatypes of fields
without an external specification). There is no defined way to include or refer to such an external specification
(schema) within a Protocol Buffers file. The officially supported implementation includes an ASCII serialization
format,[4] but this format—though self-describing—loses the forward- and backward-compatibility behavior, and is
thus not a good choice for applications other than debugging.
You then use the protoc tool to create developer-oriented classes for these messages, which in our case are Java
classes. These classes are well-behaved in that they work similarly to regular JavaBeans pojo’s, having getters,
setters, and list accessors like our standard Java classes.
At the programmer level, you work with instances of these classes.
You create new instances of these classes by using their corresponding Builder object, which is a proxy that has
setters for the various fields, and then implements a build() method that returns the actual object.
Serialize into a byte array using the toByteArray() method.
Deserialize from a byte array using the parseFrom() method.
Differences from TIBCO Message format
TIBCO messages were self-describing.
As such, there was no schema definition, instead the field values were packed or unpacked in the order applied.
Reviewing the packing (binary)
It is quite efficient. For instance, even an integer which is declared as a 32-bit value will take only 1 byte if the
value allows it (there are both fixed-width and dynamic-width integers).
Getting Started
The example project on GitHub appeared to be both the source code of ProtocolBuffers, and a set of examples.
Instead we created a project in Java, and specified the Protocol Buffer library as a dependency.
The project was derived from their examples, and has a method to add a person to a byte array representation of a
address book, and a method to reads from the address book.
Creating Instances
Builder pattern.
Packing and Unpacking
byte[] array = object.toByteArray();
object.writeTo(output);
<className>.parseFrom(byteArray);
Related Tools
To examine the generated file as hexadecimal bytes, we installed the HexView plugin for IntelliJ.
Additional Capabilities in the IDL
Oneof
If you have a message with many optional fields and where at most one field will be set at the same time, you can
enforce this behavior and save memory by using the oneof feature.
Oneof fields are like optional fields except all the fields in a oneof share memory, and at most one field can be set at
the same time. Setting any member of the oneof automatically clears all the other members. You can check which
value in a oneof is set (if any) using a special case() or WhichOneof() method, depending on your chosen
language.
To define a oneof in your .proto you use the oneof keyword followed by your oneof name, in this
casetest_oneof:
message SampleMessage {
oneof test_oneof {
string name = 4;
SubMessage sub_message = 9;
}
}
You then add your oneof fields to the oneof definition. You can add fields of any type, but cannot use
the required, optional, or repeated keywords.
Using Maven to Build
There is a plugin with release 0.4.4 available from sergei-ivanov
To use it write a Maven file that looks like:
<build>
<plugins>
<plugin>
<groupId>com.google.protobuf.tools</groupId>
<artifactId>maven-protoc-plugin</artifactId>
<version>0.4.4</version>
<configuration>
<protocExecutable>/usr/local/bin/protoc</protocExecutable>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>test-compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
This is located in the repo:
<pluginRepositories>
<pluginRepository>
<id>protoc-plugin</id>
<url>https://dl.bintray.com/sergei-ivanov/maven/</url>
</pluginRepository>
</pluginRepositories>
Example Projects Created
PB01
Simple addPerson, listPersons methods on AddressBook
PB02
Handles game state as used in the RallyOn game’s leaderboard fetch/send logic.
A Leaderboard consists of a Game plus a list of PlayerScores. A PlayerScore has a player and a score.
There are AddPlayerScore and DisplayLeaderboard methods.
Data types used include string, int32, and date (stored as int64).
There are also messages that describing Tracking and Membership (Join/Leave) events.
Download