To prevent Stack Overflows, utilize async-await in Python

In a previous article, I discussed preventing stack overflows in JavaScript’s deeply recursive functions using async functions and await 'nothing'. This technique leverages JavaScript’s ability to await anything, pausing execution and queuing a “continuation” function to resume after the awaited Promise resolves.

Python’s asyncio presents a different scenario. While JavaScript’s async functions consistently return Promises, Python offers various awaitables like coroutines, Tasks, and Futures. It’s important to note that Python coroutines, defined with the async keyword, don’t return Futures directly. They suspend and resume within await calls, ultimately returning a value, unlike JavaScript’s Promise-based resolution.

A Python coroutine suspends when the last coroutine in the call stack explicitly yields control using yield, returning a Future or None. This Future, upon reaching the event loop, sets a callback (via add_done_callback) to resume the coroutine. This behavior, as highlighted in This answer, clarifies that await in Python doesn’t automatically yield to the event loop but rather initiates the awaitable, potentially leading to suspension.

Numerous resources delve into the implementation of Python coroutines using generators, yield from, Tasks, and Futures. Resources like https://tenthousandmeters.com/blog/python-behind-the-scenes-12-how-asyncawait-works-in-python, this, and https://leimao.github.io/blog/Python-AsyncIO-Awaitable-Coroutine-Future-Task provide valuable insights. A crucial takeaway is the role of a controlling Task in a chain of coroutine calls, either explicitly created with asyncio.create_task or implicitly by the event loop. As explained in the “tenthousandmeters” article, task.__step() manages the coroutine’s execution by adding a callback to the Future, effectively scheduling the coroutine’s resumption.

Considering these distinctions, I attempted to adapt the stack overflow prevention from JavaScript to Python. Initially, I replaced "await 'whatever'" with "await asyncio.sleep(0)" hoping to relinquish control to the event loop and prevent stack overflow in recursive scenarios.

The initial Python code:

1
2
3
4
5
6
7
8
9
def build_message(msg, counter):
    """
    Normal recursion. Will overflow the stack at around iteration 1000
    """
    if counter == 0:
        return msg
    # print(f"{counter}")
    msg += "a"
    return build_message(msg, counter-1)

The async version, which fails with a stack overflow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
async def async_build_message_KO(msg, counter):
    """
    Try to prevent the stackoverflow by making it a coroutine and suspending it via asyncio.sleep
    It does not work
       """
    if counter == 0:
        return msg
    sleeper = asyncio.sleep(0.001)
    print(f"{msg[0]} - {counter}")
    # print(f"{counter} - {type(sleeper).__name__}")
    await sleeper

    msg += "a"
    return await async_build_message_KO(msg, counter-1)

Despite introducing asyncio.sleep(0.001) to force suspension, the code still encountered a stack overflow after around 1000 iterations. This unexpected behavior led me to research further, discovering a solution using asyncio.create_task as shown in I found a solution. This approach, demonstrated below, successfully prevents stack overflows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
async def async_build_message_OK(msg, counter):
    if counter == 0:
        return msg
    print(msg[0])
    # It works fine, both if implemented as Tail recursion:
    # msg += "a"
    # return await asyncio.create_task(async_build_message_2(msg, counter-1))
    
    # or as Non Tail recursion:
    msg = await asyncio.create_task(async_build_message_OK(msg, counter-1))
    msg += "a"
    return msg

This revised code effectively handles deep recursion without stack overflow, whether employing tail recursion or not (although Python doesn’t optimize tail calls).

The reason the initial attempt failed lies in the way Python manages coroutines and their stack frames. Python’s interpreter keeps track of function calls using a stack of stack frame objects. These stack frames reside in the heap, and while CPython likely uses a traditional stack internally, it manages the Python stack frames as a list with a finite size.

Furthermore, Python generators, which form the foundation of native coroutines, encapsulate both a stack frame and a code object. As explained in a very interesting way, the stack frame, allocated on the heap, stores a pointer to the last executed bytecode instruction, guiding the interpreter upon resumption. This structure means each coroutine has an associated stack frame. Even with await asyncio.sleep(), the recursive calls create a growing stack of coroutines and their stack frames, ultimately leading to overflow.

The success of the create_task approach stems from its ability to avoid a linear stack of coroutines. By wrapping each recursive call in a Task, we create a chain of Tasks in the heap, each managing its own coroutine. These Tasks are interconnected through callbacks established using add_done_callback. When a coroutine finishes, it sets the result of its controlling Task, prompting the event loop to schedule the next Task in the chain, effectively mimicking the asynchronous behavior achieved through Promises in JavaScript.

In essence, both the JavaScript solution from my previous article and the Python solution using asyncio.create_task achieve the same goal: preventing stack overflows by avoiding a direct, synchronous call stack. JavaScript accomplishes this implicitly through Promises returned by async functions, while Python requires explicit Task creation to achieve a similar effect.

Interestingly, JavaScript’s generator implementation, as detailed in explained here, closely resembles Python’s approach. By maintaining an ExecutionContext object and a code pointer, JavaScript generators can pause and resume execution, preserving their state between invocations. Similarly, JavaScript’s async/await mechanism, explained in seems also to be based on generators, utilizes an iterator object to manage the generator’s execution context and track its paused state.

Both Python and JavaScript’s implementations offer an intriguing contrast to .NET’s approach, where generators (referred to as “iterators”) rely heavily on compiler transformations to create state machine-like structures. Understanding these nuanced differences across languages provides valuable insights into the inner workings of asynchronous programming and stack management.

a post