Red Hat
Oct 28, 2010
by Galder Zamarreño

Over the past few days I’ve been playing around with Google Protocol Buffers (Protobufs) and Infinispan in order to find out whether Protobufs could act as a generic marshalling layer. By generic, I mean marshalling/unmarshalling logic that would take an object built from a protoc-compiled class in one of the supported languages (Java, C++ or Python) and convert it into a byte[] on the marshalling side, and viceversa, convert a byte[] into an instance of the original object. If I managed to achieve this, Infinispan would potentially have a layer that could take a byte[] generated by a python client converted into a java class in a generic way.

Infinispan is different to many applications out there using Protobufs because it has no control over what the users put in Infinispan and so such generic marshalling layer cannot make any assumptions on the type when trying to convert it from byte[] to a Java/Python/C++ object. So, the traditional methods used by Protobufs for example in Java don’t really work for Infinispan. In other words, if you have a Person java class generated with protoc precompiler, you’d transform the the byte[] into an instance of Person doing something along the lines of:

 byte[] buffer = ...;
 Person me = Person.parseFrom(buffer);

As I said earlier, such generic marshalling layer could not do that because the type is unknown. If you look at Java marshalling strategies such as Java Serialization or JBoss Marshalling, they don’t have these problems. Out of the payload byte[], they can figure out the type, instantiate it and populate it accordingly. So, I wondered whether something similar could be built on top of Protobufs. If you take Java as example for the rest of the article, a naive approach would be to simply prepend the payload generated by Protobufs with a UTF-8 String containing the Java class name. The receveiving end could read this, locate the class, and then using reflection call the static parseFrom method. However, what sort of Java class name would a Python generic marshaller provide?

A potential solution for this is hinted in the Techniques section in the Protobufs documentation. By generating a FileDescriptorSet out of your .proto files, at runtime you can build a mapping between the proto language descriptor names and corresponding Java class names. The proto language descriptor name is something that’s generic irrespective of the target language, so if the marshaller could prepend this to the payload sent, the reading part could read it, figure out the corresponding Java class name and use the method explained earlier. Let’s look at this in greater detail taking the Protobufs Basic Java Tutorial as starting point:

  1. First, let’s generate the FileDescriptorSet file:
     ./protoc -I=src/main/proto 
  2. Next, on startup inspect the .desc file generated and map the Protobuf full descriptor name with the corresponding fully qualified Java class name:
     import java.util.HashMap;
     import java.util.Map;
     Map<String, String> mapping = new HashMap<String, String>();
     FileDescriptorSet descriptorSet = FileDescriptorSet.parseFrom(
        new FileInputStream("src/main/desc/addressbook.desc"));
     for (FileDescriptorProto fdp: descriptorSet.getFileList()) {
        FileDescriptor fd = FileDescriptor.buildFrom(fdp,
           new FileDescriptor[]{});
        for (Descriptor descriptor : fd.getMessageTypes()) {
          String className = fdp.getOptions().getJavaPackage() + "." 
             + fdp.getOptions().getJavaOuterClassname() + "$" 
             + descriptor.getName();
          mapping.put(descriptor.getFullName(), className);
  3. Once the mapping is in place, it’s time to take an Protobuf object and try to generate a byte[]. To make sure the reading part can figure out what the Java class to use, the writing part needs to prepend the Protobuf generated byte[] with the Protobuf descriptor name:
     import com.acme.protobuf.AddressBookProtos;
     AddressBookProtos.Person.Builder personBuilder = 
     personBuilder.setName("Galder Zamarreño");
     AddressBookProtos.Person person =;
     byte[] buffer = objectToByteBuffer(person);
     public byte[] objectToByteBuffer(Object o) throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        Message message = (Message) o;
        byte[] name = message.getDescriptorForType().getFullName()
        baos.write(name.length); // TODO: Length as int and not byte
        // Write the full descriptor name, i.e. protobuf.Person
        byte[] messageBytes = message.toByteArray();
        baos.write(messageBytes.length); // TODO: Length as int and not byte
        return baos.toByteArray();
  4. With the buffer written, it’s time to read it now:
     import com.acme.protobuf.AddressBookProtos;
     import java.lang.reflect.Method;
     AddressBookProtos.Person person2 = (AddressBookProtos.Person) 
     public Object objectFromByteBuffer(byte[] buffer) throws Exception {
        ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
        byte[] name = new byte[];; // TODO: Read fully??
        // Get the class name associated with the descriptor name
        String className = mapping.get(new String(name, "UTF-8"));
        Class clazz = Thread.currentThread().getContextClassLoader()
        Method parseFromMethod = clazz.getMethod("parseFrom", byte[].class);
        byte[] message = new byte[];; // TODO: Read fully??
        return parseFromMethod.invoke(null, message);
  5. Finally, you can try to assert whether the object written and the object read are the same:
     assert person.equals(person2);

That’s it! It’s worth noting that this approach has some disadvantages. It uses reflection to call the parsing method which is slower than calling the static parseFrom method directly on Person class. On top of that, users need to add an extra step of generating the FileDescriptorSet and it needs to be passed to the generic marshalling layer. Finally, adding the descriptor name to payload increases its size.

So, this approach is not flawless by any means. In fact, we will instead use a marshaller based in Apache Avro which suits our use case better (see “Portable Serialization For Hot Rod With Apache Avro” wiki for more info on its use and you can find the actual Apache Avro marshaller code in our SVN repository). However, I thought it’d be of interest to anyone else out there who might be trying to do something similar or anyone who has done it in a different way.

Feedback appreciated :)

Original Post