Lazy Loading in SqlAlchemy

I’ve been exploring SQLAlchemy’s Object-Relational Mapping (ORM) capabilities, and it seems quite powerful, similar to .Net’s Entity Framework. Even though I usually don’t delve deep into architectural patterns like DDD or Hexagonal, I believe that striving for Persistence Ignorance, even in simpler projects, is beneficial. The documentation advices using the Declatative Mapping style suggests using imperative mapping, which is what I’ve been doing.

One of the compelling aspects of ORMs is their handling of related entities. Consider a scenario with a “Blog” entity and its associated “Post” entities. Often, it’s more efficient to retrieve the posts only when explicitly accessed, rather than loading them upfront with the blog. This technique is known as Lazy Loading. In .Net’s Entity Framework, properties referring to other entities are called “Navigation Properties.” Lazy Loading is accomplished through Proxy classes. Essentially, when a navigation property marked for lazy loading is accessed, the ORM dynamically creates a proxy object. This proxy inherits from the original entity class and intercepts the access to the navigation property, triggering a database query to load the related data. This mechanism is detailed further here: here.

To summarize, lazy loading is achieved by:

  1. Maintaining a reference to the database context within the proxy object.
  2. Overriding navigation properties to trigger data loading from the database using the stored context when accessed.

Crucially, the proxy object inherits from the entity class, necessitating virtual navigation properties and preventing the entity class from being sealed.

SQLAlchemy’s ORM also offers lazy loading. Given Python’s dynamic nature, where attributes can be added or removed at runtime, the need for proxy classes isn’t immediately apparent. I initially suspected that lazy loading might be implemented by overriding the __getattribute__ method within the entity class. However, the actual implementation is slightly different.

Let’s illustrate this with an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Post:
    # Interestingly, SQLAlchemy doesn't call the __init__ method for entities
    def __init__(self, post_id, title, content):
        print("Post __init__")
        self.post_id = post_id
        self.title = title
        self.content = content

class Blog:
    def __init__(self, blog_id, url, title):
        print("Blog __init__")
        self.blog_id = blog_id
        self.url = url
        self.title = title
        self.posts: List[Post] = []

For this example, I’m utilizing imperative mapping and specifying the blog-to-posts relationship as lazy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
metadata = MetaData()
mapper_registry = registry(metadata=metadata)

table_blog = Table(
    "Blogs",
    mapper_registry.metadata,
    Column("BlogId", Integer, primary_key=True, autoincrement=True),
    Column("Url", String),
    Column("Title", String),
)

table_post = Table(
    "Posts",
    mapper_registry.metadata,
    Column("PostId", Integer, primary_key=True, autoincrement=True),
    Column("BlogId", ForeignKey("Blogs.BlogId")),
    Column("Title", String),
    Column("Content", String),

)

def start_mappers():
    mapper_registry.map_imperatively(entities.Blog, table_blog,
      properties={
            "blog_id": table_blog.c.BlogId, 
            "url": table_blog.c.Url, 
            "title": table_blog.c.Title, 
            "posts": relationship(entities.Post, lazy="select")
            # The default value for relationship.lazy is "select", enabling lazy loading. 
        }
    )

    mapper_registry.map_imperatively(
        entities.Post,
        table_post,
        properties={
            "post_id": table_post.c.PostId, 
            "title": table_post.c.Title,
            "content": table_post.c.Content
        }      
    )

When this mapping function is executed, SQLAlchemy adds attributes to our entity classes. Before mapping, the Blog class primarily contains the __init__ method. However, after mapping, attributes of type sqlalchemy.orm.attributes.InstrumentedAttribute are added for each column defined in the mapping.

An intriguing observation is that SQLAlchemy doesn’t invoke the __init__ method when reconstructing objects from database rows. While the provided explanation old versions of SqlAlchemy might not be for version 2.0, it still holds true (the print statements within the __init__ methods are never executed). Essentially, the ORM bypasses __init__ and directly restores attributes on the instance, similar to Python’s pickle module.

Consequently, even though my Blog.__init__ initializes the posts attribute (the relationship property) to an empty list, this initialization is effectively skipped. This means that initially retrieved Blog instances lack the posts attribute. A freshly retrieved Blog instance looks like this, with no sign of the posts attribute:

{’_sa_instance_state’: , ’title’: ‘Deploy To Nenyures’, ‘blog_id’: 1, ‘url’: ‘deploytonenyures.blogspot.com’}

So, how does SQLAlchemy implement lazy loading? As mentioned earlier, Python’s attribute lookup process is sophisticated attribute lookup. When accessing myBlog.posts for the first time, Python won’t find posts in the instance’s dictionary. Instead, it locates posts within the Blog class as an InstrumentedAttribute—which happens to be a descriptor. My hypothesis is that the descriptor’s __get__ method is responsible for querying the database, retrieving the related Post entities, and adding them as an attribute to the Blog instance. Subsequently, any further access to myBlog.posts will find the loaded posts directly within the instance.

Here’s how the same Blog instance looks after the initial access to posts, triggering the lazy loading:

{’_sa_instance_state’: , ’title’: ‘Deploy To Nenyures’, ‘blog_id’: 1, ‘url’: ‘deploytonenyures.blogspot.com’, ‘posts’: [, , ]}

Licensed under CC BY-NC-SA 4.0
Last updated on Mar 15, 2024 18:13 +0100