THE CURIOUS CASE OF PROTOBUFS… De-mystifying Google’s hottest binary protocol Prasanna Kanagasabai Jovin Lobo About us : Prasanna Kanagasabai : Security Engineer @ ThoughtWorks Member of null- The Open Security Community . Author of IronSAP a module over IronWASP. Speaker @ nullcon-Delhi, Clubhack, IIT Guwahati and various null meetups. Jovin Lobo : Associate Consultant @ Aujas Networks Member of null- The Open Security Community. Author of GameOver – Linux distro for learning web security. Spoken at nullCon, GNUnify before. Agenda Introduction. Anatomy of Protobufs Defining Message formats in .Proto files. Protobuf compiler Python API to read write messages. Encoding Scheme Problem Statement. Decoding like-a-pro with IronWasp ‘Protobuf Decoder’. Introduction: Protocol Buffers a.k.a Protobufs : Protobufs are Google's own way of serializing structured data . Extensible, language-neutral and platformneutral . Smaller, faster and simpler to implement. Java, C++ and Python Anatomy: Over view : Defining a .Proto file. #> less Example.proto message Conference { required string conf_name = 1 ; required int32 no_of_days = 2 ; optional string email = 3 ; } // * 1,2,3 are unique tags. These are used by the fields in binary encoding. * For optimization use tags from 1-15 as higher nos. will use one more byte to encode. Compiling Syntax: protoc –I=$_input_Dir --python_out=$_out_Dir $_Path_ProtoFile Eg: protoc –I=. --python_out=. Example.proto This will generate a Example_pb2.py file in the specified destination directory. $ProtoFile_pb2.py The Protobuf compiler generates special descriptors for all your messages, enums, and fields. It also generates empty classes, one for each message type: Eg: Reading and writing messages using the Protobuf binary format : SerializeToString() serializes the message and returns it as a string. ParseFromString(data) parses a message from the given string. Demo: Protobuf… how it wrks Encoding. example2.proto message Ex1 { required int32 num = 1; // field tag } Code snippet: obj = example2_pb2.Ex1(); obj.num = 290; // field value obj.SerializeToString(); Output : 08 A2 02 #hex 000010001010001000000010 #binary Problem statement. This is what freaked him out 08 A2 02 000010001010001000000010 Lets Decode it .. Step 1 : Find the wire type . Step 2: Find the field number. Step 3: Find the field tag. Step1: finding wire type. 0000 1000 1010 0010 0000 0010 To find wire type take the first byte: 0000 1000 1010 0010 0000 0010 [0]000 1000 Drop MSB from First byte. 0001 000 The last 3 bits give wire type. Wire type is 000 type = 0 is Varint. Wire types Step 2: Field tag. What we already have is 0001000 Now we right shift value by 3 bits and the remaining bits will give us the field tag. 0001000 0001 000 ‘0001 ‘ i.e. ‘ 1’ So we get the field tag = 1 Step 3: Find the field value 0000 1000 1010 0010 0000 0010 We drop the 1st byte 1010 0010 0000 0010 Drop the MSB’s from each of these bytes 1010 0010 0000 0010 010 0010 000 0010 Reverse these bytes to obtain the field value. 000 0010 010 0010 000 0010 010 0010 i.e 256 + 32 + 2 = 290 So we finally get the value of the field = 290. So we successfully decoded example2.proto message Ex1 { required int32 num = 1; } Code snippet: obj = example2_pb2.Ex1(); obj.num = 290; obj.SerializeToString(); Output : 08 A2 02 #hex 000010001010001000000010 #binary We successfully Decoded Value : “290” Demo : Lets do this live Automating all this with IronWasp Protobuf Decoder: About IronWasp : IronWasp is an open-source web security scanner. It is designed to be customizable to the extent where users can create their own custom security scanners using it. Author – Lavakumar Kuppan (@lavakumark) Website : www.ironwasp.org ProtoBuf Decoder Road Map for Protobuf Decoder 0110100000111101000001011011 1001111001001000000101000101 1101010110010101110011011101 0001101001011011110110111001 1100110010000000111111 0110100000111101000001011011 1001111001001000000101000101 1101010110010101110011011101 0001101001011011110110111001 1100110010000000111111 0110100000111101000001011011 1001111001001000000101000101 1101010110010101110011011101 0001101001011011110110111001 1100110010000000111111 Hmmm … Decoding …… Done … It says …… Any Questions ? Done … It says …… Any Questions ? Thank You