Performance of XML on iPhone

Soon after I started developing for iPhone, I came across a neat piece of sample code named XML Performance (login needed). Having some prior experience with high performance XML processing code on iPhone, I was immediately curious.

This example compares Cocoa’s NSXMLParser with a custom parser built on libxml2, using the download of a top 300 songs chart from iTunes as a benchmark.

Enhanced Responsiveness with libxml2 over NSXMLParser

My previous work led me to believe that libxml2 would be considerably faster. However, I assumed this speed advantage would become less relevant as I/O data rates decreased (WiFi to 3G to Edge), because I/O would ultimately overshadow processing time. Boy, was I wrong!

Although my assumptions about overall performance were technically accurate, I completely overlooked responsiveness. The NSXMLParser sample would appear to freeze for 3 to 50 seconds before displaying any results, depending on the network. This is obviously a terrible user experience. Conversely, the libxml example began presenting results almost instantly. While it was also slightly faster overall, this difference seemed trivial compared to the continuous flow of results it provided.

The key difference here is incremental processing. While NSXMLParser’s -initWithContentsOfURL: method seems to download the entire document before processing it, the libxml2-based code in the sample downloads and processes the XML in small chunks.

However, using libxml2 has its downsides. The libxml2 code is about twice as long as the NSXMLParser code, at roughly 150 lines (excluding comments and whitespace). If you’ve worked with NSXMLParser before, you know it’s already quite cumbersome. Now imagine double the “fun” with 150 lines of code for a rudimentary parser that handles only 5 tags. Luckily, there’s a better approach.

A Simpler Solution: Objective-XML’s SAX

If you’ve already built a Cocoa-(Touch-)based parser using NSXMLParser, simply integrate Objective-XML into your project and swap out NSXMLParser for MPWSAXParser. Everything else will function identically, except significantly faster (even surpassing libxml2) and with improved responsiveness on slower connections thanks to incremental processing.

It’s worth noting that Objective-XML, like NSXMLParser, lacked incremental processing until recently. This was a silly oversight on my part, stemming from my failure to consider latency lags bandwidth. This has been rectified, and both MPWMAXParser and MPWSAXParser now boast URL-based parsing methods that incorporate incremental processing.

In essence, Objective-XML acts as a drop-in replacement for NSXMLParser, providing the performance and responsiveness of libxml2 without the coding nightmare.

Even Simpler: Messaging API for XML (MAX)

However, even a Cocoa implementation of the SAX API isn’t exactly the epitome of coding simplicity. Objective-XML offers MAX, an API designed to streamline this process. MAX seamlessly integrates XML processing with Objective-C messaging through these two key features:

  • Clients receive element-specific messages for processing
  • The parser manages nesting, which is controlled by the client

Consider this code snippet for constructing Song objects from iTunes elements, which illustrates these two features:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
\-itemElement:(MPWXMLAttributes\*)children attributes:(MPWXMLAttributes\*)attributes parser:(MPWMAXParser\*)p
{
  Song \*song=\[\[Song alloc\] init\];
  \[song setArtist:\[children objectForTag:artist\_tag\]\];
  \[song setAlbum:\[children objectForTag:album\_tag\]\];
  \[song setTitle:\[children objectForTag:title\_tag\]\];
  \[song setCategory:\[children objectForTag:category\_tag\]\];
  \[song setReleaseDate:\[parseFormatter dateFromString:\[children objectForTag:releasedate\_tag\]\]\];
  \[self parsedSong:song\];
  \[song release\];
  return nil;
}

When MAX encounters a complete element, it sends the -itemElement:attributes:parser: message to its client. This eliminates the need for clients to perform string processing on tag names or manage partial states like in SAX parsers. The method builds a song object using data from the element’s child elements and passes it directly to the rest of the app through the parsedSong: message. It doesn’t return a value, so MAX won’t build a tree at this level.

Artist, album, title, and category are values of nested child elements within the element. The shared code for all these child elements retrieves the character content of their respective elements, as shown below:

1
2
3
4
\-defaultElement:children attributes:atrs parser:parser
{
	return \[\[children combinedText\] retain\];
}

Unlike the previous processing code, this method returns a value. MAX uses this value to construct a DOM-like structure, which the next higher level consumes—in this case, the -itemElement:attributes:parser: method shown earlier. Unlike a traditional DOM, the MAX tree structure comprises domain-specific objects returned incrementally by the client.

These code examples demonstrate MAX’s ability to function as both a DOM and SAX parser, controlled simply by whether the processing methods return objects (DOM) or not (SAX). They also showcase both element-specific and generic processing.

In the iTunes Song parsing example, I could create a MAX parser using roughly half the code required for the NSXMLParser example, a ratio I’ve observed in larger projects too. As for performance, it’s slightly better than MPWSAXParser, making it faster than libxml2 and significantly faster than NSXMLParser.

Summary and Conclusion

The somewhat misleadingly titled “XML Performance” sample code for iPhone highlights the importance of managing latency for perceived end-user performance, while revealing little about actual XML processing speed.

The sample code effectively demonstrates NSXMLParser’s performance limitations. However, its proposed solution of using libxml2 is not ideal due to the significant increase in code complexity. Objective-XML provides a two-pronged solution: a drop-in replacement for NSXMLParser that matches libxml2’s performance and latency benefits, and a new API that’s not only faster but also much more straightforward than both NSXMLParser and libxml2.

Licensed under CC BY-NC-SA 4.0
Last updated on Aug 08, 2022 01:39 +0100