Compiler Writers Gone Wild: The Madness of ARC

This week on CWGW, let’s explore a crash that shouldn’t happen but does.

I’m working on a project where the most common crash lately originates from an NSOutlineView delegate method:

1
2
3
4
- (BOOL)outlineView:(NSOutlineView *)outlineView isGroupItem:(id)item
{
    return NO;
} 

The team initially disregarded this crash because it seemed illogical and they had other priorities. The app is quite stable, so luckily, crashes were infrequent. When they asked me for help, I was also stumped. On x86, this code should simply put a zero into the eax register and return. Crashing should be impossible[1](#note1), leading everyone to assume the stack traces were inaccurate, as they often are.

I had recently reviewed the project settings and noticed we were compiling with -O0, disabling optimizations. This made me suspect that Automatic Reference Counting (ARC) was performing unnecessary retains. My suspicion proved correct: otool -Vt showed that ARC transformed our straightforward return NO; into this:

` ``` -[SomeOutlineViewDelegeate outlineView:isGroupItem:]: 00000001001bfdb0 pushq %rbp 00000001001bfdb1 movq %rsp, %rbp 00000001001bfdb4 subq $0x30, %rsp 00000001001bfdb8 leaq -0x18(%rbp), %rax 00000001001bfdbc movq %rdi, -0x8(%rbp) 00000001001bfdc0 movq %rsi, -0x10(%rbp) 00000001001bfdc4 movq $0x0, -0x18(%rbp) 00000001001bfdcc movq %rax, %rdi 00000001001bfdcf movq %rdx, %rsi 00000001001bfdd2 movq %rcx, -0x30(%rbp) 00000001001bfdd6 callq 0x10027dbaa ## symbol stub for: _objc_storeStrong 00000001001bfddb leaq -0x20(%rbp), %rdi 00000001001bfddf movq $0x0, -0x20(%rbp) 00000001001bfde7 movq -0x30(%rbp), %rsi 00000001001bfdeb callq 0x10027dbaa ## symbol stub for: _objc_storeStrong 00000001001bfdf0 leaq -0x20(%rbp), %rdi 00000001001bfdf4 movabsq $0x0, %rsi 00000001001bfdfe movl $0x1, -0x24(%rbp) 00000001001bfe05 callq 0x10027dbaa ## symbol stub for: _objc_storeStrong 00000001001bfe0a movabsq $0x0, %rsi 00000001001bfe14 leaq -0x18(%rbp), %rax 00000001001bfe18 movq %rax, %rdi 00000001001bfe1b callq 0x10027dbaa ## symbol stub for: _objc_storeStrong 00000001001bfe20 movb $0x0, %r8b 00000001001bfe23 movsbl %r8b, %eax 00000001001bfe27 addq $0x30, %rsp 00000001001bfe2b popq %rbp 00000001001bfe2c retq 00000001001bfe2d nopl (%rax)
``` `

That’s a lot of code! ARC functions by generating numerous retains and releases (hidden within objc_storeStrong()), relying on a specialized optimization pass to remove extraneous operations and leave only the essential retains/releases. Enabling the “standard” -Os optimization results in this much more sensible output:

` ``` -[WLTaskListsDataSource outlineView:isGroupItem:]: 00000001000e958a pushq %rbp 00000001000e958b movq %rsp, %rbp 00000001000e958e xorl %eax, %eax 00000001000e9590 popq %rbp 00000001000e9591 retq
``` `

That’s more like it!

The cause of the crashes remains unclear, as all involved objects appeared normal in the debugger. However, we’ve removed the confusion of “impossible” crashes, increasing our chances of actually debugging the issue.

Another concern is performance. I benchmarked this equivalent program:

` ``` #import @interface Hi:NSObject {} -(BOOL)doSomething:arg1 with:arg2; @end
@implementation Hi -(BOOL)doSomething:arg1 with:arg2 { return NO; } @end
int main( int argc, char *argv[] ) { Hi *hi=[Hi new]; for (int i=0;i < 100000000; i++ ) { [hi doSomething:hi with:hi]; } return 0; } ``` `

On my 13" MacBook Pro, it takes approximately 0.5 seconds with ARC off and 13 seconds with ARC on. This 26x slowdown creates a non-obvious performance model that’s hard to predict and manage. One of Objective-C’s strengths was its straightforward performance model, allowing for relatively fast code with minimal effort, despite some inherently slower aspects of the language.

Relying solely on optimizer writers worries me. Historically, this approach hasn’t worked well for me. With ARC, the optimizer may not recognize when a retain/release is unnecessary, requiring manual insertion of __unsafe_unretains in specific places (not many, but identification is key) sprinkle.

Effective optimization has always needed human guidance alongside automated assistance; “trust the compiler” doesn’t inspire confidence. My concern is amplified by increasingly erratic compiler optimizations. For example, clang sees no issue in producing different values when dereferencing the same pointer concurrently without intervening stores (source: http://blog.regehr.org/archives/767):

``` #include #include int main() { int *p = (int*)malloc(sizeof(int)); int *q = (int*)realloc(p, sizeof(int)); *p = 1; *q = 2; if (p == q) printf("%d %d\n", *p, *q); } ```

This code, tested with clang-600.0.34.4 on my machine, outputs a nonsensical “1 2”. I’ve encountered more such examples, discussed in my post cc -Osmartass. Swift exacerbates this with its expensive default semantics and reliance on the compiler to rectify inefficiencies.

Based on my observations and tests, this approach can lead to performance discrepancies exceeding a 100x factor between normal and -Ofast optimized builds. This is unacceptable and makes code comprehension and optimization significantly harder. We might end up examining assembly more frequently when optimizing Swift than we did with Objective-C, questioning why the optimizer failed to handle specific code sections.

I recall the “Java optimization” WWDC sessions where we were encouraged to embrace Java. Essentially, we were given a model of HotSpot’s JIT optimizer capabilities. Optimizing code required understanding the generated code, the optimizer’s limited capabilities, and translating that back to the source code. It seemed easier to directly write the desired assembly or use portable Macro Assembler, or even an object-oriented version.

[1]: Well, it could crash if the stack had already reached its limit.

Discuss this on HN