Introducing Bond, Microsoft's latest data serialization framework

Microsoft Bond is a newly introduced framework developed by Microsoft for serializing data with defined schemas.

Let’s start by reviewing the areas where data serialization plays a crucial role:

Persisting data in various forms like files, streams, NoSQL databases, and BigData systems.
Transmitting data across networks, inter-process communication channels, etc.

It’s worth noting that these applications frequently handle data with pre-defined structures, referred to as schemas. A schema encompasses:

Structure: Defines the hierarchy, relationships, and order of data elements.
Semantic: Provides meaning to data, like age representing the number of years since birth.

microsoft bond data serialization framework

It’s important to understand that all data inherently possesses a schema, whether implicitly defined or directly supported by the programming language in use. As data structures become more complex, developers often resort to creating supporting data transfer objects (DTOs) and custom input/output (IO) code, often in multiple programming languages. Managing these components, especially as they grow and change, can quickly turn into a maintenance nightmare. This is where serialization frameworks prove invaluable.

The primary advantage of serialization frameworks lies in their ability to define an abstract representation of data schemas, independent of any specific programming language or platform. This abstract representation is commonly known as DSL (domain-specific language).

With such a domain-specific language (DSL), developers can model data schemas for specific applications. While the schema definition can exist in various forms, serialization frameworks typically favor one that aligns well with their DSL. To illustrate, consider the well-known example of XSD and XML.

XSD serves as the DSL, while XML is the preferred format for defining documents adhering to that schema. Tools like “xsd.exe” can generate DTO classes from XSD, effectively representing the schema in another form. The beauty of this approach is that you can seamlessly convert between XML and DTOs, maintaining semantic equivalence, all thanks to the underlying XSD schema. In essence, a serialization framework offers a DSL to define data schemas in a format optimized for the framework.

The next step involves translating the abstract data schema into concrete entities within a programming language. Serialization frameworks achieve this through dedicated code generation tools.

These tools generate all the necessary code for various target programming languages, enabling client applications to interact with schematized data. This typically includes generating DTOs, proxies, and other supporting structures. This step is crucial for strongly-typed languages and optional for dynamically-typed (duck-typed) languages.

Finally, the data needs to be persisted for transmission. This involves serializing the structured data into a stream of bytes (or text) and deserializing it back when needed.

Serialization frameworks address this by introducing the concept of protocols. A protocol defines a set of rules governing how structured data, adhering to a specific schema, should be serialized and deserialized. Each protocol is usually implemented across all programming languages and platforms supported by the framework. The broader the language/platform support, the more protocol implementations are required.

For instance, a framework aiming to support the JSON protocol must provide JSON reader/writer implementations for languages like C#, C++, and across platforms such as Windows, Linux, etc.

In summary, a modern data serialization framework offers:

Abstractions: DSL for defining schemas and Protocols for data encoding/decoding.
Code generation tools: To generate language-specific code for working with data.
Protocol implementations: Concrete implementations of protocols for various languages and platforms.

Microsoft Bond, a modern data serialization framework, provides a powerful DSL, flexible protocols, code generators for C++ and C#, and efficient protocol implementations for Windows, Linux, and Mac OS X.

For many years, Bond remained an internal technology within Microsoft. However, thanks to Microsoft’s Open Source initiative, Bond is now publicly available on GitHub: Microsoft Bond.

Data Serialization Competitors

The competitive landscape among software giants has resulted in the emergence of several serialization frameworks:

Google Inc. - Google Protocol Buffers
Facebook Inc. - Thrift, currently maintained by Apache
Apache Foundation Software - Avro

As expected, these frameworks are incompatible with each other, which isn’t a significant concern unless your public API relies on one of them.

Each framework has its strengths and weaknesses, so the choice ultimately depends on the specific requirements of your project.

Why Bond?

The official justification for choosing Bond can be found here: “Why Bond”.

Here’s a concise summary:

Bond supports a comprehensive type system, including generics.
It handles schema versioning and ensures both forward and backward compatibility.
It allows for runtime schema manipulation.
It provides support for various collection types: vectors, maps, and lists.
It offers type-safe lazy serialization through its “bonded” feature.
It allows for pluggable protocols (formats) with marshaling and transcoding capabilities.

It’s important to note that Bond follows a “pay-to-play” philosophy. The more features you utilize, the greater the impact on size and speed. This approach provides developers with substantial flexibility.

For the sake of transparency, let’s also acknowledge Bond’s limitations:

Bond primarily targets the Microsoft technology stack, with dedicated support for supporting C++ and C#, while Java support is not yet available.
It lacks support for union types, often referred to as “oneof” in Protocol Buffers.

What About Performance?

When comparing frameworks, developers often prioritize performance benchmarks. However, it’s essential to remember that these frameworks encompass DSLs, code generators, and protocols. Focusing solely on protocol performance overlooks the valuable features offered by DSLs and code generators. In some instances, a superior DSL can be far more advantageous than a marginal improvement in serialization speed.

Apart from speed, the space efficiency of encodings supported by various protocols can also be crucial. Conducting performance and space comparisons using your specific data is highly recommended. This practical approach provides the most accurate assessment of the benefits offered by each framework.

This article includes a the demo project showcasing the use of the Bond framework. It demonstrates how to read records from the Windows Application Event Log, serialize them as Bond objects, and then deserialize them back.

To build and run the demo, you only need Visual Studio. No additional software installations are required.

Using Microsoft Bond

Getting Bond

You can find the necessary Bond packages for your platform(s) on the official guide on getting Bond.

For .NET projects, obtaining Bond is straightforward:

1
    install-package Bond.CSharp

The package includes:

Code generator (gbc.exe) located in the bin folder
.NET libraries
MSBuild tasks

Workflow

Working with Bond typically involves the following steps:

Familiarize yourself with Bond’s DSL and define your data schema by creating “.bond” file(s).
Utilize the code generator (“gbc.exe”) to generate DTOs for your chosen programming language.
Reference the generated files and the Bond runtime libraries in your project.

For a more automated approach, consider using MSBuild tasks provided by the framework to streamline code generation.

DSL Features Overview

Before diving into writing your first “.bond” file, it’s essential to understand its syntax and features. A comprehensive guide to the Bond IDL can be found in the official documentation page. Here’s a quick overview of the basic features:

Modules: You can split large schemas into multiple files and include them using the “import” statement.
Namespace: Namespaces in Bond serve the same purpose as in C++/C#.
User-defined structs: The fundamental building blocks for defining custom data types.
Forward declaration: Essential for defining recursive data structures.
Basic types: Bond supports standard types like “bool,” various integer types (“uint8” through “uint64,” “int8” through “int64”), floating-point types (“float,” “double”), and text types (“string,” “wstring”).
Container types: Bond provides “blob” for raw data, along with collections like “list,” “vector,” “set,” and “map<K, T>.” It also supports nullable types.
Custom typed aliases and mapping: Useful for scenarios where you want to represent a concept differently in your code and on the wire, e.g., using “DateTime” in C# but serializing it as ticks (“int64”).
Custom attributes: Allow you to include annotations that influence code generation, providing customization points.

To make things clearer, let’s look at an example:

1
2
3
4
5
6
7
namespace MyProject
    
struct MyRecord
{
    0: string Name = "Noname";
    1: vector<double> Constants;
}

In this example, “0” and “1” represent the ordinal positions of the fields (these can be any integer values with arbitrary spacing), and = "Noname" sets a default value for the field (optional).

Code Generation

The Bond framework provides a code generation tool implemented in Haskell. To generate C# and C++ code from a “.bond” schema file via the command line, you would use:

1
2
gbc c# example.bond
gbc c++ example.bond

Supported Protocols (formats)

Bond offers out-of-the-box support for three main categories of protocols:

Tagged protocols: “CompactBinary” and “FastBinary”
These protocols embed schema metadata directly into the serialized payload. This self-describing characteristic allows consumers to interpret the data even without prior knowledge of the producer’s schema.
Untagged protocols: “SimpleBinary”
Untagged protocols serialize only the data itself. Consequently, consumers must obtain the schema information through external means. This type of protocol is often favored in storage scenarios, as it allows storing the schema once (e.g., in a database table), reducing metadata overhead when dealing with numerous records sharing the same schema.
DOM-based protocols: “SimpleJson” and “SimpleXml”
DOM-based protocols parse the entire payload into an in-memory Data Object Model. Deserialization then involves querying this DOM. They are typically employed for text-based encodings like JSON or XML.

The Bond runtime library provides corresponding Reader and Writer classes for each protocol, handling the actual serialization and deserialization processes.

Using these protocols is straightforward, only slightly more involved than using the familiar “JsonConvert.SerializeObject()” method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
var record = new MyRecord
{
    Name = "FooBar",
    Constants = { 3.14, 6.28 }
};

var output = new OutputBuffer();
var writer = new CompactBinaryWriter<OutputBuffer>(output);
Serialize.To(writer, record);

var input = new InputBuffer(output.Data);
var reader = new CompactBinaryReader<InputBuffer>(input);
record = Deserialize<Example>.From(reader);

What’s Next?

If you find Bond compelling and have spare time for coding, consider contributing to its development. While I won’t enumerate all the benefits of contributing to open-source projects, I know many developers seek opportunities to make a difference. Here are a couple of ideas:

Port Bond to Java or another mainstream programming language.
Implement schema import/export functionality to enable interoperability with other DSLs (e.g., converting between “.proto” and “.bond” files).

No matter how you choose to engage with Bond, I highly recommend reaching out to Adam Sapek beforehand. As the project lead, they can guide you toward the areas where contributions are most needed and impactful.