Graalpy is a programming language

A few years back, I explored the intriguing world of GraalVM and Truffle Interpreters framework, mistakenly believing that various Truffle implementations (like those for Ruby and Python) held more theoretical value than practical use. I imagined their impact would be limited to a handful of Java applications seeking scripting capabilities or access to Python’s AI and scientific libraries. How wrong I was!

A recent dive into Truffle’s Python implementation, graalpy, revealed a thriving and active project (mirroring Truffle Ruby). GraalPy, a Python 3.10 implementation, acts as a substitute for the conventional CPython when working exclusively with standard library modules or pure Python external modules. Compatibility, however, needs to be verified when using external extension modules.

GraalPy boasts impressive performance gains, claiming to outperform CPython by a factor of 3 or 4 in certain applications. While I won’t delve into benchmarking, I’d like to shed light on GraalPy’s inner workings.

Built using the Truffle framework, GraalPy is more than just an interpreter. It comprises a Java-based interpreter paired with a high-performance JIT compiler (GraalVM). When GraalPy encounters a Python source file, it compiles it into to a .pyc file (similar to CPython), containing Python bytecodes. The Graal interpreter then constructs an Abstract Syntax Tree (AST) from these bytecodes and proceeds to interpret them. As execution progresses, the interpreter code (in Java bytecodes) undergoes specialization/partial evaluation, optimizing the interpretation of the AST. This optimized code is then fed to the Graal Compiler, generating highly efficient native code. Finally, the magic of advanced JIT optimizations and deoptimizations, developed by brilliant minds, kicks in, resulting in a high-performance interpreter!

While this understanding took root two years ago, GraalPy’s innovative use of Native Image adds another layer of intrigue. The Graal JIT compiler, written in Java, can be compiled into native code itself. here explains:

There are two operating modes of the Graal compiler when used as the HotSpot JIT compiler: as pre-compiled machine code (“libgraal”), or as dynamically executed Java bytecode (“jargraal”).
libgraal: the Graal compiler is compiled ahead-of-time into a native shared library. In this operating mode, the shared library is loaded by the HotSpot VM. The compiler uses memory separate from the HotSpot heap. It runs fast from the start since it does not need to warm up. This is the default and recommended mode of operation.
jargraal: the Graal compiler goes through the same warm-up phase that the rest of the Java application does. That is, it is first interpreted before its hot methods are compiled. This mode is selected with the -XX:-UseJVMCINativeLibrary command line option.

This is where it gets interesting. The Graal compiler, capable of both JIT and AOT compilation, can be integrated into your JVM process as a native library (.so, .dll, etc.) or a Java JAR. AOT compilation is a prerequisite for native library integration. When included as a Java JAR, it undergoes initial interpretation followed by JIT compilation (for frequently executed methods) by the less-optimized C1 compiler. here: elaborates:

The primary benefit of libgraal is that compilations are fast from the start. This is because the compiler is running compiled from the get-go, by-passing the HotSpot interpreter altogether. Furthermore, it’s compiled by itself. By contrast, jargraal is compiled by C1. The result is that the compiled code of the compiler is more optimized with libgraal than with jargraal.

GraalPy offers a choice between a “Native Standalone” version and a standard Java version. Opting for the Native Standalone version entails more than just utilizing the native libgraal. The Java code for both the Truffle interpreter and its dependent Java standard libraries are compiled to native code. here states:

You can download GraalPy as a standalone distribution for Oracle GraalVM or GraalVM Community Edition. There are two standalone types to choose from:
Native Standalone: This contains a Native Image compiled launcher
JVM Standalone: This contains Python in the JVM configuration

When seeking a drop-in CPython replacement for enhanced performance, the Native Standalone version comes highly recommended. interesting quick reference explains:

The native runtime is most compatible with CPython. It runs as an ahead-of-time compiled Python launcher, which starts up faster and uses less memory.
The JVM runtime can interoperate with Java and other GraalVM languages can be added to it. You can identify it by the -jvm suffix in its name.

A simple unzip of the “Native Standalone” version on Ubuntu presents a launcher app (graalpy), a directory housing Python 3.10 modules, and a hefty 350MB libpythonvm.so shared library. This library presumably contains SubstrateVM along with the natively compiled Truffle interpreter and its required Java Standard Library components.

Now, the crux of the matter: If Truffle’s power lies in optimizing specialized interpreter sections using the Graal JIT compiler, what’s the point if the interpreter is already AOT compiled to native code? This very question, riddling others as well, appeared on StackOverflow. very interesting explanation

- Question
My understanding is that AOT compilation with native-image will result in methods compiled to native code that are run in the special-purpose SubstrateVM. Also, that the Truffle framework relies on dynamically gathered profiling information to determine which trees of nodes to partially evaluate. And that PE works by taking the JVM bytecode of the nodes in question and analyzing it with the help of the Graal JIT compiler. And here’s where I’m confused. If we pass a Truffle interpreter through native-image, the code for each node’s methods will be native code. How can PE proceed, then? In fact, is Graal even available in SubstrateVM?

- Answer
Besides the native code of the interpreter, Substrate VM also stores in the image a representation of the interpreter (a group of methods that conform the interpreter) for partial evaluation. The format of this representation is not JVM bytecodes, but the graphs already parsed into Graal IR form. PE runs on these graphs producing even smaller, optimized graphs which are then fed to the Graal compiler, so yes SVM ships the Graal compiler as well in the native image. Why the Graal graphs and not the bytecodes? Bytecodes were used in the past, but storing the graphs directly saves the (bytecodes to Graal IR) parsing step.

This response is truly remarkable. It reveals that SubstrateVM, used in Native Image applications, goes beyond memory management and garbage collection. It also incorporates the Graal Compiler, which can be invoked through the JVM Compiler Interface (JVMCI) using not just Java bytecodes but directly Graal IR graphs.

Initially, I was ecstatic, assuming GraalPy would finally overcome CPython’s notorious Global Interpreter Lock (GIL) bottleneck. Sadly, that’s not the case just yet. GraalPy developers, participating in the discussions discussion on eliminating the GIL from CPython, clarified that they too employ a GIL to maintain compatibility with C extensions. This is disappointing, especially considering Truffle Ruby’s success in shedding the GIL, unlike its C-based counterpart, YARV.

While researching Truffle interpreters, including GraalPy, I came across references to PyPy, a JIT-based Python implementation. Comparisons were drawn between its Tracing JIT approach and Truffle’s Partial Evaluation method. Interestingly, the GraalPy and PyPy teams are now collaborating on HPy, aiming to create a superior C extension API for Python.

“academic” stuff