Adding 130 million rows to SQLite every minute...using a scripting language

The other day, I came across this blog post Inserting One Billion Rows in SQLite Under A Minute, which felt serendipitous because I had been enhancing my own SQLite/Objective-S adapter. (The author later added “Towards” to the post’s title, acknowledging that the goal wasn’t fully realized yet).

My SQLite adapter stemmed from my earlier exploration article series on optimizing JSON performance. This deep dive was prompted by the abysmal performance of Swift Coding in handling this simple, yet crucial, task. To summarize: Swift’s JSON coder lagged at around 10MB/s. By adopting a streaming strategy and some fine-tuning, we boosted it to approximately 200MB/s.

Since then, my focus has been on making Objective-S more UI-friendly. Its object-literal syntax makes UI definition as effortless as “declarative” functional approaches like React or SwiftUI. However, it still leverages familiar AppKit or UIKit objects and avoids the silly idea that the UI is purely a function of the model. Oh, and it offers truly functional live previews. But I’ll delve into that another time.

So, I’m gradually approaching a ToDoMVC, a benchmark that feels somewhat natural to me. While I still love working with JSON files, and my previous articles demonstrated their impressive speed, I acknowledge that many prefer a “real” database, particularly on the back-end. So, I aimed to build that capability as well. One of my benchmarks for Objective-S is its ability to construct a superior Rails equivalent. (At this stage, I’m confident of achieving this).

A good design can be evaluated by stress-testing it. A helpful stress test is measuring its speed, as it reveals whether the creation is lean or burdened with unnecessary layers and indirections.

This is particularly relevant in Scripted Components (pdf) systems, which blend a relatively slow but adaptable interactive scripting language with swift, optimized components. The challenge lies in merging the scripting language’s flexibility with the speed benefits of the components, avoiding the need to constantly adapt and optimize components for individual use cases, or suffer from slow performance despite having fast components. I suspected that the streaming method I’ve been using successfully for JSON and Objective-C would also excel in this more demanding environment.

Spoiler: It did!

The Benchmark

The benchmark was a slightly adjusted version of the script powering a tasks backend. This script, like the example, creates a tasks database and populates it with sample rows. However, instead of two rows, it inserts 10 million, or even a hundred million.


` ``` #!env stsh #-taskbench:dbref

class Task { var id. var done. var title. -description { “”. } +sqlForCreate { ‘( [id] INTEGER PRIMARY KEY, [title] VARCHAR(220) NOT NULL, [done] INTEGER );’. } }.

scheme todo : MPWAbstractStore { var db. var tasksTable. -initWithRef:ref { this:db := (MPWStreamQLite alloc initWithPath:ref path). this:tasksTable := #MPWSQLTable{ #db: this:db , #tableClass: Task, #name: ’tasks’ }. this:db open. self. } -createTable { this:tasksTable create. this:tasksTable := this:db tables at:’tasks’. this:tasksTable createEncoderMethodForClass: Task. } -createTaskListToInsert:log10ofSize { baseList ← #( #Task{ #title: ‘Clean Room’, #done: false }, #Task{ #title: ‘Check Twitter’, #done: true } ). …replicate … taskList. } -insertTasks { taskList := self createTaskListToInsert:6. 1 to:10 do: { this:tasksTable insert:taskList. }. } }. todo := todo alloc initWithRef:dbref. todo createTable. todo insertTasks. ``` `


(For brevity, I’ve omitted the method that expands the two tasks into the million-entry list, as it was large and not directly relevant).

In this example, we define the Task class and use it to generate the SQL Table. Alternatively, we could have created the table directly and generated a Tasks class from it.

Running this script produces the following output:


` ```

time ./taskbench-sqlite.st /tmp/tasks1.db ./taskbench-sqlite.st /tmp/tasks1.db 4.07s user 0.20s system 98% cpu 4.328 total ls -al /tmp/tasks1.db* -rw-r–r– 1 marcel wheel 214M Jul 24 20:11 /tmp/tasks1.db sqlite3 /tmp/tasks1.db ‘select count(id) from tasks;’ 10000000

``` `


We successfully inserted 10 million rows in 4.328 seconds, resulting in several hundred megabytes of SQLite data. Running it for a minute would have yielded 138 million rows. Impressive! In comparison, the original article reported 11M rows/minute for CPython, 40M rows/minute for PyPy, and 181M rows/minute for Rust, albeit on a slower Intel MacBook Pro, while I used an M1 Air. I compiled and ran the Rust version on my M1 Air, achieving 100 million rows in 21 seconds. This is slightly more than twice as fast as my Objective-S script, but with a simpler schema (CHAR(6) vs. VARCHAR(220)) and less data (1.5GB vs. 2.1GB for 100M rows).

Optimizing SQLite for Speed

Initially, the script’s performance was significantly slower. The primary bottleneck was inefficient SQLite usage, mainly inserting each row individually without batching. When SQLite encounters an INSERT (or UPDATE) outside a transaction, it automatically wraps it in a generated transaction and commits it after processing. As SQLite prioritizes the atomicity of transactions on disk, this process is slow. Very slow.

The class responsible for SQLite inserts is a Polymorphic Write Stream, meaning it understands arrays. When it encounters one, it sends itself the beginArray message, writes the array’s contents, and concludes by sending itself the endArray message. Since writing an array suggests writing the entire structure, it was an opportune moment to introduce transactions:


` ``` -(void)beginArray { sqlite3_step(begin_transaction); sqlite3_reset(begin_transaction); }

-(void)endArray { sqlite3_step(end_transaction); sqlite3_reset(end_transaction); }

``` `


Now, writing multiple objects as a single transaction is as simple as placing them in an array, as demonstrated in the benchmark code. While there were other minor optimizations, after this change, less than 10% of the total time was spent within SQLite. It was time to shift focus to optimizing my code, the caller.

Column Keys and Cocoa Strings

My initial assumption was that the remaining bottleneck lay within the Objective-S interpreter. However, it turned out to be Cocoa string handling. Not only was I dynamically generating SQLite parameter placeholder keys, allocating new NSString objects for every column of every row, but retrieving character data from an NSString object now involves intricate and sluggish internal machinery utilizing encoding conversion streams. -UTF8String is detrimental to performance, and other methods seem to consistently rely on the same inefficient mechanism. I suspect crippling NSString’s performance is a way to make other string handling methods appear comparatively better.

After some refactoring, the code now looks up the incoming NSString key in a dictionary that maps it to the SQLite parameter index, successfully circumventing string-processing and character access.

JIT-Compiling the Encoder Method…Without a JIT

You may have noticed that the class definition in the benchmark code lacks an explicit encoder method; it only defines its instance variables and some utility methods. So, how is the class data encoded for the SQLTable? Is KVC used? No, that would be rather slow, although it could be a viable fallback option.

The solution lies in the createEncoderMethodForClass: method. As its name implies, this method dynamically generates an encoder method by combining code blocks, transforms the top-level code into a method using imp_implementationWithBlock(), and then adds this method to the target class using class_addMethod().


` ``` -(void)createEncoderMethodForClass:(Class)theClass { NSArray ivars=[theClass allIvarNames]; if ( [[ivars lastObject] hasPrefix:@"_"]) { ivars=(NSArray)[[ivars collect] substringFromIndex:1]; }

NSMutableArray *copiers=[[NSMutableArray arrayWithCapacity:ivars.count] retain];
for (NSString *ivar in ivars) {
    MPWPropertyBinding *accessor=[[MPWPropertyBinding valueForName:ivar] retain];
    [ivar retain];
    [accessor bindToClass:theClass];
    
    id objBlock=^(id object, MPWFlattenStream* stream){
        [stream writeObject:[accessor valueForTarget:object] forKey:ivar];
    };
    id intBlock=^(id object, MPWFlattenStream* stream){
        [stream writeInteger:[accessor integerValueForTarget:object] forKey:ivar];
    };
    int typeCode = [accessor typeCode];
    
    if ( typeCode == 'i' || typeCode == 'q' || typeCode == 'l' || typeCode == 'B' ) {
        [copiers addObject:Block_copy(intBlock)];
    } else {
        [copiers addObject:Block_copy(objBlock)];
    }
}
void (^encoder)( id object, MPWFlattenStream *writer) = Block_copy( ^void(id object, MPWFlattenStream *writer) {
    for  ( id block in copiers ) {
        void (^encodeIvar)(id object, MPWFlattenStream *writer)=block;
        encodeIvar(object, writer);
    }
});
void (^encoderMethod)( id blockself, MPWFlattenStream *writer) = ^void(id blockself, MPWFlattenStream *writer) {
    [writer writeDictionaryLikeObject:blockself withContentBlock:encoder];
};
IMP encoderMethodImp = imp_implementationWithBlock(encoderMethod);
class_addMethod(theClass, [self streamWriterMessage], encoderMethodImp, "v@:@" );

}

``` `


Interestingly, I hadn’t created this method specifically for this purpose. I had already developed it for JSON-coding. Since both the JSON-encoder and the SQLite writer are Polymorphic Write Streams (as are their corresponding decoder/parser counterparts), the same method worked seamlessly for both.

(Note: This encoder-generator currently doesn’t handle all data types, and this is intentional).

Extracting Data from Objective-S Objects

The encoder method utilizes MPWPropertyBinding objects for efficient access to instance variables via the object’s accessors, caching IMPs and converting data as needed. This makes them both performant and versatile. However, the accessors generated by Objective-S for its instance variables were quite convoluted, as they relied on the same mechanism used for Objective-S methods, which only operate on objects, not primitive data types.

To ensure seamless interoperability with Objective-C, which expects methods capable of handling non-object data types, all non-object method arguments are converted to objects on input, and return values are converted back from objects to primitive values on output.

Consequently, even accessors for primitive types like the integer “id” or the boolean “done” would have their values converted to and from objects by the interface machinery. As previously mentioned, I was somewhat taken aback that this inefficiency was overshadowed by the NSString-based key handling.

In fact, one motivation for pursuing the SQLite insert benchmark was to justify addressing this convoluted mechanism. Ultimately, resolving it proved to be far less daunting than anticipated, employing a technique strikingly similar to the encoder-generator mentioned earlier, just simpler.

We employ different blocks parameterized with the offset to the instance variable based on its type. The setter-generator code snippet below illustrates this, showcasing how the object-case code differs due to retain-count management:


` ``` #define pointerToVarInObject( type, anObject ,offset) ((type*)(((char*)anObject) + offset))

#ifndef clang_analyzer // This leaks because we are installing into the runtime, can’t remove after

-(void)installInClass:(Class)aClass { SEL aSelector=NSSelectorFromString([self objcMessageName]); const char *typeCode=NULL; int ivarOffset = (int)[ivarDef offset]; IMP getterImp=NULL; switch ( ivarDef.objcTypeCode ) { case ’d’: case ‘@’: typeCode = “v@:@”; void (^objectSetterBlock)(id object,id arg) = ^void(id object,id arg) { id *p=pointerToVarInObject(id,object,ivarOffset); if ( *p != arg ) { [*p release]; [arg retain]; *p=arg; } }; getterImp=imp_implementationWithBlock(objectSetterBlock); break; case ‘i’: case ’l’: case ‘B’: typeCode = “v@:l”; void (^intSetterBlock)(id object,long arg) = ^void(id object,long arg) { *pointerToVarInObject(long,object,ivarOffset)=arg; }; getterImp=imp_implementationWithBlock(intSetterBlock); break; default: [NSException raise:@“invalidtype” format:@“Don’t know how to generate set accessor for type ‘%c’",ivarDef.objcTypeCode]; break; } if ( getterImp && typeCode ) { class_addMethod(aClass, aSelector, getterImp, typeCode ); }

}

``` `


At this juncture, profiling revealed that roughly two-thirds of the execution time was spent within sqlite_ functions, indicating that further optimizations were entering the realm of diminishing returns.

Linear Scans Outperform Dictionaries

One final noticeable overhead was the (string) key to parameter index mapping, which, after the previous optimizations, remained a NSDictionary mapping from NSString to NSNumber. As you likely know, NSDictionary isn’t renowned for its speed. One idea was to replace it with a MPWFastrStringTable. However, this would necessitate either finding a way to swiftly access NSString character data or modifying the protocol.

Instead, I opted for a brute-force approach: storing pointers to the NSString objects directly in a C-Array indexed by the SQLite parameter index. Before performing the other lookup (which I retain for safety), a linear scan is performed on this array using the incoming string pointer. This simple trick effectively eliminated the parameter index lookup from my profiles.

Conclusion

With these final adjustments, the code is probably nearing peak performance. The remaining speed difference compared to the Rust code can be attributed to handling more data and a more intricate schema, as well as the need to retrieve data from materialized objects, while the Rust code simply generates SQLite calls on-the-fly.

All this is achieved within a slow, interpreted scripting language, with all variable components (data class, control flow) defined within that language. So, while I eagerly anticipate the native compiler for Objective-S, it’s reassuring to know that it’s not essential for exceptional performance and that the fundamental design of these APIs is robust.

Licensed under CC BY-NC-SA 4.0
Last updated on Jul 02, 2024 10:35 +0100