Enhancing Website Speed with Data Caching
Modern websites often gather information from various sources like databases and external APIs. A common scenario is user authentication, where a website retrieves the user’s record from its database and then enhances it with additional data fetched through API calls to other services. To ensure a smooth and responsive user experience, it’s crucial to minimize the time spent on resource-intensive operations like database queries and API requests. Data caching emerges as a valuable optimization technique to address this need.
Typically, processes store their working data in memory. When a web server operates within a single process, like Node.js/Express, caching this data becomes straightforward using an in-memory cache within the same process. However, load-balanced web servers utilize multiple processes, and even with a single-process setup, there might be a requirement for the cache to persist beyond server restarts. This scenario calls for an external caching solution such as Redis, which necessitates data serialization for storage and deserialization for retrieval.
While serialization and deserialization are relatively simple in statically typed languages like C#, JavaScript’s dynamic nature presents a unique challenge. Although ECMAScript 6 (ES6) introduced classes, their fields and types are not defined until initialization, which might not coincide with class instantiation. Additionally, the return types of fields and functions lack schema definition. Furthermore, a class’s structure can be modified during runtime, making it difficult to predict its final form. Though achievable in C# using reflection, this approach is generally discouraged.
This very challenge confronted me while working on the Toptal core team a few years ago. We were tasked with developing a responsive agile dashboard for our teams, drawing data from multiple sources: our work-tracking system, project management tool, and a database. The site, built with Node.js/Express, employed an in-memory cache to reduce the load on these data sources. However, our frequent deployments, a result of our iterative development process, led to multiple server restarts daily, invalidating the cache and negating its benefits.
An out-of-process cache like Redis seemed like the solution. However, my research revealed a lack of robust serialization libraries for JavaScript. The default JSON.stringify/JSON.parse methods returned generic object types, stripping away functions from the original classes’ prototypes. This meant refactoring our application to accommodate an alternative design.
Key Requirements for a JavaScript Serialization Library
To enable serialization and deserialization of arbitrary JavaScript data while ensuring interchangeability between deserialized and original representations, a suitable library needed to fulfill these requirements:
- Preservation of prototypes (functions, getters, setters) in deserialized representations, mirroring the original objects.
- Support for intricate nested data structures, including arrays and maps, with accurate prototype mapping for nested objects.
- Idempotency: the ability to serialize and deserialize the same objects multiple times without unintended side effects.
- A serialization format easily transferable over TCP and compatible with storage in services like Redis.
- Minimal code modifications to designate a class as serializable.
- Fast and efficient library routines.
- Ideally, a mechanism to handle deserialization of older class versions through mapping or versioning.
Building a Solution: Tanagra.js
To address this gap, I created Tanagra.js, a versatile serialization library for JavaScript. The name draws inspiration from a memorable Star Trek: The Next Generation episode where the Enterprise crew grapples with deciphering the language of an enigmatic alien species. This library aims to prevent such communication breakdowns in the realm of data serialization.
Tanagra.js prioritizes simplicity and efficiency. Currently supporting Node.js (with potential for in-browser functionality) and ES6 classes including Maps, the library offers a main implementation using JSON and an experimental version utilizing Google Protocol Buffers. It relies solely on standard JavaScript (tested with ES6 and Node.js), eliminating the need for experimental features, Babel transpiling, or TypeScript.
To mark a class as serializable, a simple method call is used during export:
module.exports = serializable(Foo, myUniqueSerialisationKey)
The method returns a proxy to the class, intercepting the constructor to inject a unique identifier. If not provided, the class name serves as the default identifier. This key, serialized alongside the data, is also accessible as a static field in the class. For classes containing nested types requiring serialization, they are specified in the method call as well:
module.exports = serializable(Foo, [Bar, Baz], myUniqueSerialisationKey)
This mechanism extends to nested types from previous class versions, allowing for scenarios like serializing a Foo1 and deserializing it into a Foo2.
During serialization, the library constructs a global map associating keys with classes. This map is then utilized during deserialization. Since the key is embedded within the serialized data, the library requires the top-level class type to be provided during deserialization:
const foo = decodeEntity(serializedFoo, Foo)
An experimental auto-mapping library automates this process by analyzing the module tree and generating mappings based on class names. However, this approach is only reliable for uniquely named classes.
Project Structure
The project is organized into several modules:
- tanagra-core - houses shared functionalities used by various serialization formats, including the method for marking classes as serializable.
- tanagra-json - handles data serialization into JSON format.
- tanagra-protobuf - provides serialization into Google Protocol Buffers format (experimental).
- tanagra-protobuf-redis-cache - a helper library for storing serialized Protocol Buffers data in Redis.
- tanagra-auto-mapper - analyzes the Node.js module tree to create a class map, aiming to eliminate the need for specifying the deserialization type (experimental).
It’s worth noting that the library consistently uses US English spellings.
Putting It into Practice: Example Usage
The following code snippet demonstrates the declaration of a serializable class and utilizes the tanagra-json module for serialization and deserialization:
| |
Evaluating Performance
To gauge the library’s effectiveness, I compared the performance of both the JSON serializer and the experimental protobufs serializer against the native JSON.parse and JSON.stringify methods. Each approach underwent 10 trials.
The tests were conducted on my 2017 Dell XPS15 laptop, equipped with 32GB of RAM and running Ubuntu 17.10.
The test subject was the serialization of the following nested object:
| |
Write Performance Results
| Serialization method | Ave. inc. first trial (ms) | StDev. inc. first trial (ms) | Ave. ex. first trial (ms) | StDev. ex. first trial (ms) |
| JSON | 0.115 | 0.0903 | 0.0879 | 0.0256 |
| Google Protobufs | 2.00 | 2.748 | 1.13 | 0.278 |
| Control group | 0.0155 | 0.00726 | 0.0139 | 0.00570 |
Read Performance Results
| Serialization method | Ave. inc. first trial (ms) | StDev. inc. first trial (ms) | Ave. ex. first trial (ms) | StDev. ex. first trial (ms) |
| JSON | 0.133 | 0.102 | 0.104 | 0.0429 |
| Google Protobufs | 2.62 | 1.12 | 2.28 | 0.364 |
| Control group | 0.0135 | 0.00729 | 0.0115 | 0.00390 |
Analyzing the Findings and Charting the Course
The JSON serializer demonstrates a performance overhead of approximately 6-7 times compared to native serialization. The experimental protobufs serializer lags further behind, being roughly 13 times slower than the JSON serializer and a significant 100 times slower than native serialization.
The internal caching of schema and structural information within each serializer has a noticeable impact on performance. The JSON serializer’s initial write operation is about four times slower than subsequent writes, while the protobufs serializer exhibits a ninefold difference. This highlights that writing objects with cached metadata is significantly faster in both libraries.
A similar trend is observed for read operations. The JSON library’s first read is roughly four times slower than the average, while the protobufs library shows a speed difference of about two and a half times.
The performance bottlenecks in the protobufs serializer currently relegate it to experimental status. It’s recommended only if the Protocol Buffers format is a strict requirement. However, further optimization is worthwhile as the format’s terseness compared to JSON makes it more suitable for network transmission. Notably, Stack Exchange employs this format for its internal caching mechanisms.
While the JSON serializer exhibits better performance, it’s still considerably slower than the native implementation. Although this difference might be negligible for small object trees (a few milliseconds added to a 50ms request wouldn’t cripple a website’s performance), it could pose challenges for handling very large object graphs, making it a key area for improvement.
Future Plans for Tanagra.js
Currently in its beta stage, Tanagra.js is undergoing continuous development. The JSON serializer, having undergone substantial testing, demonstrates stability. The roadmap for the upcoming months includes:
- Performance optimizations for both serializers
- Enhanced support for pre-ES6 JavaScript
- Integration of ES-Next decorators
To my knowledge, Tanagra.js stands alone in its ability to serialize complex, nested object data in JavaScript and deserialize it back to its original type. If you find yourself working on projects that could benefit from this functionality, I encourage you to explore the library, share your valuable feedback, and consider contributing to its development.