What is Protobuf?

Avatar

Protobuf, which is short for “Protocol Buffers,” is an efficient, language-agnostic data serialization mechanism. It enables developers to define structured data in a .proto file, which is then used to generate source code that can write and read data from different data streams.

Here, we will review the history of Protobuf, how it works, and what makes it different from other data formats. We’ll then go over some of Protobuf’s use cases, advantages, and best practices.

History of Protobuf

Protobuf was originally developed by engineers at Google who needed an efficient way to serialize structured data across various internal services. This version of Protobuf—known as Proto1—was used internally at Google before being released as an open source project in 2008. The initial public release—known as Proto2—included a basic serialization framework and support for a select number of programming languages, such as Python, C++, and Java.

After the public release, Protobuf quickly gained traction for its efficiency and speed, and its status as an open source project gave more developers the opportunity to contribute. In 2015, Google released gRPC, which is a schema driven framework that facilitates service-to-service communication in distributed environments. Protobuf’s portability and efficiency make it the preferred data format for working with gRPC APIs, and the widespread adoption of gRPC has greatly contributed to the growth and popularity of Protobuf.

Google released Proto3 in 2016, shortly after the release of gRPC. This new version of Protobuf includes several improvements that emphasize simplicity and uniformity across different languages. It also removeds features like field presence and default values, and it promotes a more compact serialization format.

How does Protobuf work?

Protobuf uses a binary data format, which is more compact and faster to read and write than text-based formats. It also provides an interface definition language (IDL) that makes it easy to define the structure of the data to be serialized.

A Protobuf file is saved using the .proto file extension. The .proto file is written in Protobuf’s IDL format, and it contains all of the information about the structure of the data. The data is modeled as “messages,” which are groups of name-value pairs. Here’s an example of a simple Protobuf message in a .proto file:

syntax = "proto3";

message Customer {
    required int32 id = 1;
    required string name = 2;
    required string email = 3;
    optional string address = 4;
}

In this example, the Customer message contains four fields: id, name, email, and address. Each field has its type indicated, as well as a label that indicates if it is required, optional, or repeated.

The .proto file can be compiled into several programming languages using Protoc, which is the Protobuf compiler. This compiler generates source code in the programming language that the developer specifies. This source code includes classes and methods for writing, reading, and manipulating the message type defined in the .proto file.

When you have data to store or transmit, you create instances of the generated classes and populate them with your data. These instances are then serialized to a binary format. When reading the data, the binary format is deserialized back into instances of the classes generated from your .proto file. This allows you to easily access the structured data.

The binary data format produced by Protobuf is platform-independent and can be used to exchange data between different systems, applications, or services, even if they are implemented in different programming languages or run on different platforms.

How can I generate code with Protoc?

As we discussed above, a .proto file can be compiled into different languages using the Protobuf compiler (Protoc). Follow the instructions here to install the compiler on your computer.

The code below will compile a .proto file into JavaScript:

protoc --js_out=import_style=commonjs,binary: .customers.proto

This generates JavaScript code in a file named customers_pb.js in the same directory using CommonJS syntax. In a new JavaScript file, you can copy and paste the below code:

const Schema = require("./customers_pb");
const john = new Schema.Customer();
john.setId(1001);
john.setName("John Doe");
john.setEmail(“John.doe@example.com”);
john.setAddress(“123 Main Street, Anytown, USA 12345”);

This code uses the generated class methods to set values for the schema we modeled. We created a new instance of the Customer class and populated its relevant data. Similar to how this customer was created, we can also get this customer’s data, as shown below:

john.getId();
john.getName();
john.getEmail();
john.getAddress();

This data can be serialized to binary format using the serializeBinary() method:

const bytes = customers.serializeBinary();
console.log("binary " + bytes)

The serialized binary version is lightweight and can be stored or transported over a network.

How is Protobuf different from other data formats?

Protobuf is distinct from other data serialization formats in several ways. First, it is schema-based and models data as messages, which are name-value pairs. This data is strongly typed, and each data structure is defined along with its type. It also has a compilation step that generates serialization code from a .proto file. This generated code can be used to create data from already-modeled schemas in the .proto file, as well as to serialize and deserialize this data. Protobuf’s use of a binary serialization format makes it compact and efficient, and it provides direct support for defining RPCs in its .proto file.

Other data formats like XML, JSON, and YAML are not schema-based by default. JSON uses key-value pairs and XML uses tags to structure data. Additionally, JSON, XML, and YAML do not require a compilation step, as they do not serialize data into a binary format.

What are the advantages of working with Protobuf?

Protobuf offers a variety of advantages over other data formats, including:

  • Efficiency: Protobuf serializes data into a binary format, which is much more compact than equivalent data in text-based formats like JSON or XML. This compactness translates to reduced storage and bandwidth usage. Additionally, Protobuf’s serialization and deserialization processes are typically faster when compared to JSON.
  • Cross-language support: Protobuf provides support for multiple programming languages, which makes it easy to integrate across polyglot microservice architectures.
  • Strong typing with a clear schema: Protobuf requires developers to define a clear schema for data in .proto files. This schema-based approach ensures that the data structure is explicitly defined, which leads to better consistency, easier maintenance, and early detection of errors.
  • Backward and forward compatibility: Protobuf is designed to handle changes in the schema without breaking compatibility with older versions. This means that you can add new fields to your data structures without affecting existing code, which is crucial for long-term maintenance of a system.
  • Efficient network usage: Protobuf’s compact binary format makes it an excellent choice for network communication, especially in environments where bandwidth is limited, such as mobile networks or IoT devices.

What are some best practices for working with Protobuf?

When working with Protobuf, it’s important to adhere to the following best practices to ensure that data is properly structured and optimized for performance:

  • Define clear and consistent Protobuf schemas: Write .proto files with clarity and consistency. For instance, use clear naming conventions for messages and fields, and organize your schema logically.
  • Use descriptive and meaningful field names: Choose field names that clearly describe their content and purpose. It’s important to avoid using ambiguous or generic names because the Protobuf compiler uses these names for the generated class methods.
  • Maintain backward and forward compatibility: Avoid removing or changing the meaning of existing fields. If you must make changes, deprecate fields rather than removing them. Additionally, use reserved tags and names for fields that have been removed to prevent future conflicts.
  • Minimize use of the Any type: While the Any type provides flexibility, it can be less efficient and more error-prone. Use specific field types wherever possible to benefit from strong typing and clearer interfaces.
  • Optimize for size and efficiency: Use appropriate data types for fields to minimize size. For example, choose the most efficient integer type (i.e., int32, int64, or uint32) based on the expected range of values.
  • Version your Protobuf files: Maintain version control of your .proto files, especially in a team environment or when your APIs are used by external clients. This practice makes it easier to track changes and manage compatibility over time.

What do you think about this topic? Tell us in a comment below.

Comment

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.