In our pursuit of creating a lightweight programming language within the C++ framework, we first built a tokenizer three weeks ago, followed by implementing expression evaluation over the subsequent two weeks.
We’ve now reached the stage where we can finalize and present a complete programming language. While it may not be as feature-rich as a fully developed language, it will encompass all the essential components, including a remarkably small memory footprint.
I’m often amused by how newly established companies structure the FAQ sections on their websites. Instead of addressing common user queries, they answer questions they wish people would ask. I’m going to adopt the same approach here. People who follow my work frequently inquire about why Stork doesn’t compile to bytecode or at least an intermediate language.
Why Doesn’t Stork Compile to Bytecode?
I’m more than happy to address this question. My objective was to create a scripting language with a small memory footprint that could be seamlessly integrated with C++. While I don’t have a rigid definition of “small-footprint,” I envision a compiler compact enough to be easily portable to less powerful devices, consuming minimal memory during execution.

Speed wasn’t my primary focus, as time-critical tasks are best handled in C++. However, if extensibility is needed, a language like Stork could prove valuable.
I’m not claiming that there aren’t other, potentially superior, languages capable of accomplishing similar tasks (take Lua, for example). Their absence would be unfortunate. I aim to provide you with a clear understanding of this language’s intended use case.
Given its embedded nature within C++, leveraging existing C++ features instead of building an entirely new ecosystem for the same purpose feels much more practical. Moreover, I find this approach significantly more engaging.
As always, the complete source code can be found on my GitHub page. Now, let’s delve into the progress we’ve made.
Changes
Previously, Stork was incomplete, making it difficult to identify all its shortcomings. However, as it has taken a more solidified form, I’ve made the following adjustments to elements introduced in earlier parts:
- Functions are no longer treated as variables. A separate
function_lookupis now present withincompiler_context. To avoid any ambiguity,function_param_lookuphas been renamed toparam_lookup. - The mechanism for calling functions has been modified. The
runtime_contextnow includes acallmethod that accepts astd::vectorof arguments, stores the previous return value index, pushes arguments onto the stack, updates the return value index, invokes the function, pops arguments from the stack, restores the original return value index, and finally returns the result. This eliminates the need to maintain a separate stack for return value indices, as we can now utilize the C++ stack for this purpose. - RAII classes have been incorporated into
compiler_context, returned by calls to its member functions,scopeandfunction, respectively. In their constructors, these objects generate new instances oflocal_identifier_lookupandparam_identifier_lookup, subsequently restoring the previous state within their destructors. - An RAII class has been introduced in
runtime_context, returned by theget_scopemember function. This function stores the stack size upon construction and restores it in its destructor. - The
constkeyword and constant objects have been removed due to their non-essential nature, although they could potentially be beneficial. - The
varkeyword has been eliminated as it is currently unnecessary. - A
sizeofkeyword has been added to determine array sizes during runtime. While this naming choice might seem unconventional to C++ programmers, assizeofin C++ operates during compile time, I opted for this keyword to prevent conflicts with common variable names likesize. - A
tostringkeyword has been included for explicit conversion to thestringtype. It can’t be implemented as a function due to our restriction against function overloading. - Various minor modifications have been made.
Syntax
Given the close resemblance of our syntax to C and its derivative languages, I’ll focus on the less obvious aspects.
Variable type declarations follow this structure:
void, used exclusively for function return typesnumberstringT[]represents an array containing elements of typeTR(P1,...,Pn)signifies a function that returns typeRand accepts arguments of typesP1toPn. An&prefix to these types indicates pass-by-reference.
Function declaration adheres to this format: [public] function R name(P1 p1, … Pn pn)
It’s mandatory to prefix functions with the keyword function. The public prefix enables calls to the function from C++. If no value is returned by the function, it defaults to the default value of its return type.
We support for-loops with declarations within the first expression. Similar to C++17, if-statements and switch-statements with an initialization expression are also permitted. An if-statement begins with an if-block, potentially followed by multiple elif-blocks, and optionally concludes with an else-block. Variables declared within the initialization expression of the if-statement are accessible in all its blocks.
An optional number following a break statement allows exiting multiple nested loops. For instance, consider the code:
| |
In this case, the break 2 statement will exit both loops. This number is validated during compilation. Quite impressive, isn’t it?
Compiler
This iteration introduces numerous features, but delving into excessive detail might deter even the most dedicated readers. Therefore, I’ll intentionally omit a substantial aspect – the compilation process.
Compilation was covered extensively in the first and second parts of this blog series. The focus then was on expressions, but the compilation of other elements doesn’t differ significantly.
Let’s illustrate with an example of compiling while statements:
| |
As you can observe, it’s fairly straightforward. The code parses the while keyword, followed by (, then constructs a number expression (we don’t utilize booleans), and finally parses ).
Subsequently, it compiles a block statement, which may or may not be enclosed within { and } (single-statement blocks are allowed), ultimately creating a while statement.
The first two function arguments should be familiar. The third, possible_flow, indicates the permissible flow-changing commands (continue, break, return) within the current parsing context. While this information could be stored within an object if compilation statements were member functions of a compiler class, I prefer to avoid excessively large classes, which the compiler would inevitably become. Passing an additional argument, especially a small one, is harmless, and might even enable code parallelization in the future.
Another noteworthy aspect of the compilation process is how function calls are handled.
To support scenarios where two functions call each other, we can either adopt the C approach of forward declaration or employ two compilation phases.
I opted for the latter. Upon encountering a function definition, we parse its type and name into an object called incomplete_function, skipping the body without interpretation by simply counting curly brace nesting levels until the first closing curly brace. Tokens encountered during this process are collected and stored within incomplete_function, and a function identifier is added to the compiler_context.
Once the entire file is parsed, each function is fully compiled, allowing them to call any other function within the file and access global variables.
However, global variables can be initialized by calls to these functions, potentially leading to the classic “chicken and egg” problem if the functions access uninitialized variables.
In such cases, the issue is resolved by throwing a runtime_exception – a relatively lenient approach considering access violation is a more fitting consequence for such code.
The Global Scope
Two types of entities can exist in the global scope:
- Global variables
- Functions
Global variables can be initialized with an expression returning the correct type. Each global variable is provided with an initializer.
As initializers return lvalue, they act as constructors for global variables. When no expression is specified for a global variable, a default initializer is created.
This is the initialize member function within runtime_context:
| |
Called from the constructor, it clears the global variable container, enabling it to be called explicitly to reset the runtime_context state.
As mentioned earlier, we need a mechanism to detect access to uninitialized global variables. This is achieved with the following global variable accessor:
| |
If the first argument evaluates to false, a runtime_assertion throws a runtime_error with the relevant message.
Each function is implemented as a lambda capturing a single statement, which is then evaluated with the runtime_context received by the function.
Function Scope
As demonstrated with the while-statement compilation, the compiler operates recursively, starting with the block statement representing the entire function’s block.
Here’s the abstract base class for all statements:
| |
The only function apart from the default ones is execute, responsible for executing the statement logic on the runtime_context and returning the flow, which dictates the subsequent program flow.
| |
The static creator functions are self-explanatory, designed to prevent illogical flow with non-zero break_level and types differing from flow_type::f_break.
The consume_break function creates a break flow with a decremented break level. If the break level reaches zero, it returns a normal flow.
Let’s examine the various statement types:
| |
Here, simple_statement represents a statement derived from an expression. Every expression can be compiled into one returning void, enabling the creation of a simple_statement. As break, continue, and return cannot be part of an expression, simple_statement returns flow::normal_flow().
| |
The block_statement maintains a std::vector of statements, executing them sequentially. If any statement returns a non-normal flow, it’s immediately returned. This statement utilizes a RAII scope object to allow local scope variable declarations.
| |
The local_declaration_statement evaluates the expression responsible for creating a local variable and pushes the newly created variable onto the stack.
| |
The break_statement has its break level determined during compilation. It simply returns the flow corresponding to that break level.
| |
The continue_statement simply returns flow::continue_flow().
| |
Both return_statement and return_void_statement return flow::return_flow(). The sole difference is that the former evaluates an expression and sets the result as the return value before returning.
| |
The if_statement, created for one if-block, zero or more elif-blocks, and one (potentially empty) else-block, evaluates each expression until one evaluates to 1. It then executes that block and returns the result. If no expression evaluates to 1, the execution result of the last (else) block is returned.
An if_declare_statement is used when the first part of an if-clause contains declarations. It pushes all declared variables onto the stack before executing its base class (if_statement).
| |
The switch_statement executes its statements sequentially, but first jumps to the appropriate index obtained from expression evaluation. If any statement returns a non-normal flow, that flow is immediately returned. In case of flow_type::f_break, it consumes one break level.
The switch_declare_statement allows a declaration in its header, but not within its body.
| |
| |
Both while_statement and do_while_statement execute their body statement as long as their expression evaluates to 1. If the execution yields flow_type::f_break, they consume it and return. In the case of flow_type::f_return, they return the flow. Normal execution or continue result in no action.
While it may seem that continue has no effect, it does impact the inner statement. For instance, a block_statement wouldn’t be fully evaluated.
It’s interesting to note that while_statement is implemented using C++’s while loop, and do-statement using C++’s do-while loop.
| |
for_statement and for_statement_declare are implemented similarly to while_statement and do-statement. They inherit from the for_statement_base class, which handles most of the logic. for_statement_declare is created when the first part of the for loop involves variable declaration.

These are all the statement classes we have, forming the fundamental building blocks of our functions. Upon creation, the runtime_context stores these functions. Functions declared with the public keyword can be invoked by name.
This concludes the core functionality of Stork. The remaining features are additions aimed at enhancing the language’s usability.
Tuples
Arrays, being homogeneous containers, can only hold elements of a single type. Structures are the go-to solution for heterogeneous containers.
However, tuples offer a simpler alternative. While they can store elements of different types, these types must be defined during compilation. Here’s an example of tuple declaration in Stork:
| |
This declares a pair consisting of a number and a string, and initializes it.
Initialization lists can also be used to initialize arrays. If the expression types within the initialization list don’t match the variable type, a compiler error is triggered.
Since arrays are implemented as containers of variable_ptr, we get runtime tuple implementation without any extra effort. Type checking for contained variables is handled during compilation.
Modules
It would be beneficial to abstract away implementation details from Stork users, providing a more user-friendly interface.
This is achieved through the following class, presented without implementation specifics:
| |
The load and try_load functions handle loading and compiling the Stork script from the specified path. While the former can throw a stork::error, the latter catches it and prints it to the output, if provided.
The reset_globals function re-initializes global variables.
Both add_external_functions and create_public_function_caller should be called prior to compilation. The former adds a C++ function that can be invoked from Stork. The latter creates a callable object for calling the Stork function from C++. A compile-time error is generated during Stork script compilation if the public function type doesn’t match R(Args…).
I’ve included several standard functions that can be added to the Stork module.
| |
Example
Let’s consider an example Stork script:
| |
And the corresponding C++ code:
| |
Standard functions are added to the module before compilation, and the trace and rnd functions are used within the Stork script. The greater function is included as a demonstration.
The script is loaded from the “test.stk” file located in the same directory as “main.cpp” (using the __FILE__ preprocessor definition), and then the main function is called.
The script generates a random array, sorts it in ascending order using the less comparator, and then in descending order using the greater comparator, written in C++.
As you can see, the code is highly readable for anyone familiar with C or any C-derived programming language.
What to Do Next?
There are several features I’d like to implement in Stork:
- Structures
- Classes and inheritance
- Inter-module calls
- Lambda functions
- Dynamically-typed objects
Time constraints and scope limitations have prevented their implementation thus far. I’ll strive to update my GitHub page with new versions as I implement these features during my free time.
Wrapping Up
We have successfully created a new programming language!
This endeavor has consumed a significant portion of my spare time over the past six weeks. However, I can now write scripts and witness them running, which is what I’ve been doing lately – often scratching my head in confusion every time it crashes unexpectedly. Some bugs were minor, while others were quite challenging. There were times when I was embarrassed by poor decisions I had publicly shared. Nevertheless, I persisted, fixing issues and continuing to code.
During this process, I learned about if constexpr, a feature I had never used before. I also gained a deeper understanding of rvalue-references and perfect forwarding, along with other lesser-known C++17 features I don’t encounter regularly.
While the code isn’t flawless – a claim I would never make – it’s in good shape, largely adheres to good programming practices, and most importantly, it functions as intended.
Creating a new programming language from scratch might seem like a ludicrous idea to the average person, or even the average programmer. However, that’s all the more reason to undertake such a project – to prove to yourself that it’s possible. Consider it a challenging puzzle that exercises your mind and keeps you mentally sharp.
In our daily programming routines, we often encounter mundane tasks. We can’t always choose the exciting aspects and must power through tedious work at times. Professional developers prioritize delivering high-quality code to their employers and providing for themselves, which can sometimes lead to avoiding programming during their free time. This can dampen the enthusiasm we experienced in our early programming days.
If you don’t have to, don’t let that spark die. Work on projects that pique your interest, even if they’ve been done before. Having fun doesn’t require justification.
And if you can somehow integrate these personal projects – even partially – into your professional work, consider yourself fortunate! Not many have that luxury.
The code for this installment will be frozen in a dedicated branch on my GitHub page.