Guide to Advanced Metaprogramming: Building a Ruby DSL

Domain-specific languages (DSLs) are a potent tool for simplifying the programming and configuration of intricate systems. As a software engineer, you likely engage with numerous DSLs daily without realizing it.

This article will delve into the concept of DSLs, their ideal use cases, and guide you through creating your own DSL in Ruby using advanced metaprogramming techniques. This article builds on Nikola Todorovic’s introduction to Ruby metaprogramming, previously featured on the Toptal Blog. We recommend familiarizing yourself with that article if you are new to metaprogramming.

Defining a Domain Specific Language

DSLs are languages tailored to a specific application domain or use case. Their specialized nature limits their applicability to general-purpose software development. DSLs manifest in various forms, including:

  • Markup languages like HTML and CSS, designed for structuring, populating, and styling web pages. These languages lack the capability to write arbitrary algorithms, thus fitting the DSL definition.
  • Macro and query languages (e.g., SQL), operating on top of existing systems or programming languages, with inherent limitations that classify them as DSLs.
  • DSLs that leverage the syntax of an established programming language in a way that mimics a distinct mini-language.

This last category, referred to as an internal DSL, will be the focus of our upcoming example. But first, let’s examine some prominent examples of internal DSLs. Rails’ route definition syntax exemplifies this concept:

1
2
3
4
5
6
7
8
9
Rails.application.routes.draw do
  root to: "pages#main"

  resources :posts do
    get :preview

    resources :comments, only: [:new, :create, :destroy]
  end
end

This Ruby code resembles a customized route definition language, thanks to metaprogramming techniques that enable its clean and user-friendly interface. Notice how the DSL’s structure employs Ruby blocks, while method calls such as get and resources function as keywords within this mini-language.

Metaprogramming features even more prominently in the RSpec testing library:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
describe UsersController, type: :controller do
  before do
    allow(controller).to receive(:current_user).and_return(nil)
  end

  describe "GET #new" do
    subject { get :new }

    it "returns success" do
      expect(subject).to be_success
    end
  end
end

This code snippet also illustrates fluent interfaces, which allow declarations to be interpreted as natural language sentences, enhancing code readability:

1
2
3
4
5
# Stubs the `current_user` method on `controller` to always return `nil`
allow(controller).to receive(:current_user).and_return(nil)

# Asserts that `subject.success?` is truthy
expect(subject).to be_success

Another instance of a fluent interface is ActiveRecord and Arel’s query interface, which utilizes an abstract syntax tree internally for constructing complex SQL queries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Post.                               # =>
  select([                          # SELECT
    Post[Arel.star],                #   `posts`.*,
    Comment[:id].count.             #     COUNT(`comments`.`id`)
      as("num_comments"),           #       AS num_comments
  ]).                               # FROM `posts`
  joins(:comments).                 # INNER JOIN `comments`
                                    #   ON `comments`.`post_id` = `posts`.`id`
  where.not(status: :draft).        # WHERE `posts`.`status` <> 'draft'
  where(                            # AND
    Post[:created_at].lte(Time.now) #   `posts`.`created_at` <=
  ).                                #     '2017-07-01 14:52:30'
  group(Post[:id])                  # GROUP BY `posts`.`id`

While Ruby’s expressiveness and metaprogramming capabilities make it well-suited for building DSLs, they are not exclusive to Ruby. Here’s a JavaScript test using the Jasmine framework:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
describe("Helper functions", function() {
  beforeEach(function() {
    this.helpers = window.helpers;
  });

  describe("log error", function() {
    it("logs error message to console", function() {
      spyOn(console, "log").and.returnValue(true);
      this.helpers.log_error("oops!");
      expect(console.log).toHaveBeenCalledWith("ERROR: oops!");
    });
  });
});

Although not as elegant as the Ruby examples, this syntax demonstrates that well-chosen names and creative syntax utilization enable internal DSL creation in almost any language.

The advantage of internal DSLs lies in their avoidance of a separate parser, a component often challenging to implement correctly. Using the syntax of their implementation language also ensures seamless integration with existing codebases.

However, this comes at the cost of syntactic freedom. Internal DSLs must adhere to the syntactic rules of their host language. The degree of compromise depends on the language itself. Verbose, statically typed languages like Java and VB.NET offer less flexibility compared to dynamic, metaprogramming-rich languages like Ruby.

Constructing Our Own: A Ruby DSL for Class Configuration

Our example Ruby DSL will be a reusable configuration engine. It will define configuration attributes for a Ruby class using a simplified syntax. Incorporating configuration capabilities into a class is a common requirement in Ruby, particularly for configuring external gems and API clients. A standard solution involves an interface similar to this:

1
2
3
4
5
MyApp.configure do |config|
  config.app_id = "my_app"
  config.title = "My App"
  config.cookie_name = "my_app_session"
end

Let’s implement this interface initially and then refine it iteratively. We’ll add features, improve syntax clarity, and enhance reusability.

For this interface to function, the MyApp class requires a class method called configure. This method takes a block, executes it by yielding to it, and passes in a configuration object. This object, in turn, has accessor methods for reading and writing configuration values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class MyApp
  # ...

  class << self
    def config
      @config ||= Configuration.new
    end

    def configure
      yield config
    end
  end

  class Configuration
    attr_accessor :app_id, :title, :cookie_name
  end
end

After the configuration block executes, we can readily access and modify the values:

1
2
3
4
5
6
7
8
MyApp.config
=> #<MyApp::Configuration:0x2c6c5e0 @app_id="my_app", @title="My App", @cookie_name="my_app_session">

MyApp.config.title
=> "My App"

MyApp.config.app_id = "not_my_app"
=> "not_my_app"

While functional, this implementation lacks the distinct feel of a custom language to be considered a DSL. We will address this gradually. Our next step is decoupling the configuration functionality from the MyApp class, making it generic and applicable to various use cases.

Achieving Reusability

Currently, replicating similar configuration capabilities in another class would necessitate copying both the Configuration class and its associated setup methods. We’d also need to modify the attr_accessor list to accommodate the new configuration attributes. To circumvent this, let’s relocate the configuration features into a separate module called Configurable. With this change, our MyApp class would appear as follows:

1
2
3
4
5
6
7
class MyApp
#BOLD
  include Configurable
#BOLDEND

  # ...
end

All configuration-related elements now reside within the Configurable module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#BOLD
module Configurable
  def self.included(host_class)
    host_class.extend ClassMethods
  end

  module ClassMethods
#BOLDEND
    def config
      @config ||= Configuration.new
    end

    def configure
      yield config
    end
#BOLD
  end
#BOLDEND

  class Configuration
    attr_accessor :app_id, :title, :cookie_name
  end
#BOLD
end
#BOLDEND

The notable addition here is the self.included method. Module inclusion in Ruby only incorporates instance methods. Therefore, our config and configure class methods wouldn’t be added to the host class by default. However, defining a method named included within a module triggers its execution whenever that module is included in a class. This allows us to manually extend the host class with the methods contained in ClassMethods:

1
2
3
def self.included(host_class)     # called when we include the module in `MyApp`
  host_class.extend ClassMethods  # adds our class methods to `MyApp`
end

Our work isn’t finished yet. Next, we need to enable the specification of supported attributes within the host class that includes the Configurable module. An ideal solution would resemble this:

1
2
3
4
5
6
7
class MyApp
#BOLD
  include Configurable.with(:app_id, :title, :cookie_name)
#BOLDEND

  # ...
end

Surprisingly, this code is syntactically valid. include is not a keyword but a method expecting a Module object as its parameter. As long as we provide an expression that returns a Module, the inclusion will proceed smoothly. Therefore, instead of directly including Configurable, we need a method called with. This method will generate a new, customized module with the specified attributes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
module Configurable
#BOLD
  def self.with(*attrs)
#BOLDEND
    # Define anonymous class with the configuration attributes
#BOLD
    config_class = Class.new do
      attr_accessor *attrs
    end
#BOLDEND

    # Define anonymous module for the class methods to be "mixed in"
#BOLD
    class_methods = Module.new do
      define_method :config do
        @config ||= config_class.new
      end
#BOLDEND

      def configure
        yield config
      end
#BOLD
    end
#BOLDEND

    # Create and return new module
#BOLD
    Module.new do
      singleton_class.send :define_method, :included do |host_class|
        host_class.extend class_methods
      end
    end
  end
#BOLDEND
end

Let’s break down this code. The entire Configurable module now comprises a single with method, with all operations occurring within it. We initiate the process by creating a new anonymous class using Class.new to house our attribute accessor methods. Since Class.new accepts the class definition as a block and blocks have access to external variables, we can seamlessly pass the attrs variable to attr_accessor.

1
2
3
4
5
def self.with(*attrs)           # `attrs` is created here

  # ...

  config_class = Class.new do   # class definition passed in as a block

    attr_accessor *attrs        # we have access to `attrs` here

  end

This ability of Ruby blocks to access external variables contributes to their classification as closures. They “close over” the surrounding environment in which they were defined, not necessarily executed. This distinction is crucial. Regardless of when or where our define_method blocks eventually execute, they retain access to the config_class and class_methods variables. This access persists even after the with method completes and returns. The following example illustrates this behavior:

1
2
3
4
5
6
7
8
9
def create_block  foo = "hello"            # define local variable
return Proc.new { foo }  # return a new block that returns `foo`

end



block = create_block       # call `create_block` to retrieve the block



block.call                 # even though `create_block` has already returned,
=> "hello"                 #   the block can still return `foo` to us

Armed with this understanding of blocks, we can proceed to define an anonymous module within class_methods. This module will hold the class methods that will be added to the host class upon inclusion of our generated module. We utilize define_method to define the config method because we require access to the external config_class variable from within the method. Defining it with the def keyword wouldn’t grant this access because standard method definitions using def are not closures. However, define_method accepts a block, enabling this functionality:

1
2
3
4
5
6
config_class = # ...               # `config_class` is defined here

# ...
class_methods = Module.new do      # define new module using a block
define_method :config do         # method definition with a block

    @config ||= config_class.new   # even two blocks deep, we can still
end                              #   access `config_class`

Finally, we invoke Module.new to create the module to be returned. Within this module, we need to define our self.included method. Unfortunately, the def keyword is not an option here, as the method needs access to the external class_methods variable. Consequently, we resort to define_method with a block again. However, this time, we apply it to the singleton class of the module since we are defining a method on the module instance itself. Moreover, since define_method is a private method of the singleton class, we use send to invoke it instead of a direct call:

1
2
3
4
5
6
7
class_methods = # ...

# ...

Module.new do  singleton_class.send :define_method, :included do |host_class|    host_class.extend class_methods  # the block has access to `class_methods`

  endend

That was a deep dive into metaprogramming. But was the added complexity justified? The ease of use speaks for itself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class SomeClass
  include Configurable.with(:foo, :bar)

  # ...
end

SomeClass.configure do |config|
  config.foo = "wat"
  config.bar = "huh"
end

SomeClass.config.foo
=> "wat"

Yet, we can do even better. Our next step is refining the syntax within the configure block to enhance the module’s usability.

Syntax Enhancement

One remaining aspect we can improve is the repetitive use of config on each line within the configuration block. An ideal DSL would implicitly understand that everything within the configure block operates within the context of our configuration object. This would allow us to achieve the same result with a cleaner syntax:

1
2
3
4
5
MyApp.configure do
  app_id "my_app"
  title "My App"
  cookie_name "my_app_session"
end

Let’s implement this improvement. We require two key elements: a mechanism to execute the block passed to configure within the configuration object’s context and a modification to the accessor methods. These methods should write a value if an argument is provided and return the value when called without an argument. Here’s a possible implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
module Configurable
  def self.with(*attrs)
#BOLD
    not_provided = Object.new
#BOLDEND
  
    config_class = Class.new do
#BOLD
      attrs.each do |attr|
        define_method attr do |value = not_provided|
          if value === not_provided
            instance_variable_get("@#{attr}")
          else
            instance_variable_set("@#{attr}", value)
          end
        end
      end

      attr_writer *attrs
#BOLDEND
    end

    class_methods = Module.new do
      # ...

      def configure(&block)
#BOLD
        config.instance_eval(&block)
#BOLDEND
      end
    end

    # Create and return new module
    # ...
  end
end

The simpler change involves running the configure block within the context of the configuration object. Utilizing Ruby’s instance_eval method on an object allows the execution of an arbitrary block of code as if it were running within that object. Consequently, when the configuration block calls the app_id method on the first line, that call is directed to our configuration class instance.

Modifying the attribute accessor methods in config_class is a bit more involved. To grasp this, we need to understand the behind-the-scenes workings of attr_accessor. Let’s consider the following attr_accessor call:

1
2
3
class SomeClass
  attr_accessor :foo, :bar
end

This is equivalent to defining a reader and writer method for each specified attribute:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class SomeClass
  def foo
    @foo
  end

  def foo=(value)
    @foo = value
  end

  # and the same with `bar`
end

So, when we used attr_accessor *attrs in the original code, Ruby automatically generated the attribute reader and writer methods for us for every attribute in attrs. This resulted in the standard accessor methods: app_id, app_id=, title, title=, and so on.

In our enhanced version, we aim to retain the standard writer methods to ensure the proper functioning of assignments like this:

1
2
MyApp.config.app_id = "not_my_app"
=> "not_my_app"

We achieve this by continuing to auto-generate the writer methods using attr_writer *attrs. However, we can no longer rely on the standard reader methods. They need to be modified to support writing the attribute as well, accommodating this new syntax:

1
2
3
4
MyApp.configure do
  app_id "my_app" # assigns a new value
  app_id          # reads the stored value
end

To generate the reader methods ourselves, we iterate through the attrs array. For each attribute, we define a method that either returns the current value of the corresponding instance variable (if no new value is provided) or writes the new value if specified:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
not_provided = Object.new
# ...
attrs.each do |attr|
  define_method attr do |value = not_provided|
    if value === not_provided
      instance_variable_get("@#{attr}")
    else
      instance_variable_set("@#{attr}", value)
    end
  end
end

Here, we leverage Ruby’s instance_variable_get method to read an instance variable with an arbitrary name, and instance_variable_set to assign a new value to it. It’s important to note that the variable name must be prefixed with an “@” sign in both cases, hence the string interpolation.

You might be wondering why we use a blank object as the default value for “not provided” instead of nil. The reason is straightforward: nil is a valid value that might be intentionally set for a configuration attribute. Testing for nil wouldn’t allow us to differentiate between these two scenarios:

1
2
3
4
MyApp.configure do
  app_id nil # expectation: assigns nil
  app_id     # expectation: returns current value
end

The blank object stored in not_provided is designed to be equal only to itself. This ensures that it won’t be inadvertently passed into our method, causing an unintended read instead of a write.

Incorporating Support for References

Let’s add one more feature to enhance our module’s versatility: the ability to reference a configuration attribute from another:

1
2
3
4
5
6
7
8
MyApp.configure do
  app_id "my_app"
  title "My App"
  cookie_name { "#{app_id}_session" }
End

MyApp.config.cookie_name
=> "my_app_session"

We’ve introduced a reference from cookie_name to the app_id attribute. The expression containing the reference is enclosed in a block, enabling delayed evaluation of the attribute value. The idea is to evaluate the block later, when the attribute is read, rather than during definition. This prevents issues arising from defining attributes in an “incorrect” order:

1
2
3
4
5
6
7
SomeClass.configure do
  foo "#{bar}_baz"     # expression evaluated here
  bar "hello"
end

SomeClass.config.foo
=> "_baz"              # not actually funny

Wrapping the expression in a block prevents its immediate evaluation. We can store the block for later execution when retrieving the attribute value:

1
2
3
4
5
6
7
SomeClass.configure do
  foo { "#{bar}_baz" }  # stores block, does not evaluate it yet
  bar "hello"
end

SomeClass.config.foo    # `foo` evaluated here
=> "hello_baz"          # correct!

Adding support for delayed evaluation using blocks requires minimal changes to the Configurable module. In fact, we only need to adjust the attribute method definition:

1
2
3
4
5
6
7
8
define_method attr do |value = not_provided, &block|
  if value === not_provided && block.nil?
    result = instance_variable_get("@#{attr}")
    result.is_a?(Proc) ? instance_eval(&result) : result
  else
    instance_variable_set("@#{attr}", block || value)
  end
end

When setting an attribute, the block || value expression saves the block (if provided) or the value itself. Subsequently, when reading the attribute, we check if it’s a block and evaluate it using instance_eval. If not, we return it as before.

However, supporting references introduces its own set of considerations and edge cases. For instance, consider what might happen if you attempt to read any attribute in this configuration:

1
2
3
4
SomeClass.configure do
  foo { bar }
  bar { foo }
end

The Final Module

We’ve developed a robust module for making arbitrary classes configurable. It utilizes a clean and straightforward DSL that even supports referencing configuration attributes from one another:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class MyApp
  include Configurable.with(:app_id, :title, :cookie_name)

  # ...
end

SomeClass.configure do
  app_id "my_app"
  title "My App"
  cookie_name { "#{app_id}_session" }
end

Here’s the complete module implementing our DSL, concisely written in 36 lines of code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
module Configurable
  def self.with(*attrs)
    not_provided = Object.new

    config_class = Class.new do
      attrs.each do |attr|
        define_method attr do |value = not_provided, &block|
          if value === not_provided && block.nil?
            result = instance_variable_get("@#{attr}")
            result.is_a?(Proc) ? instance_eval(&result) : result
          else
            instance_variable_set("@#{attr}", block || value)
          end
        end
      end

      attr_writer *attrs
    end

    class_methods = Module.new do
      define_method :config do
        @config ||= config_class.new
      end

      def configure(&block)
        config.instance_eval(&block)
      end
    end

    Module.new do
      singleton_class.send :define_method, :included do |host_class|
        host_class.extend class_methods
      end
    end
  end
end

Looking at this intricate Ruby code, which is arguably difficult to decipher and maintain, you might question whether the effort was worthwhile simply to enhance a DSL’s aesthetics. The answer depends on the context, leading us to the final point of this article.

Ruby DSLs: When to Use and When to Avoid Them

As we streamlined the external syntax of our DSL, we increasingly relied on complex metaprogramming tricks internally. This resulted in an implementation that might prove challenging to comprehend and modify in the future. Like many aspects of software development, this involves a trade-off that requires careful evaluation.

For a DSL to justify its implementation and maintenance overhead, it should offer substantial benefits. These benefits often come from reusability across various scenarios, effectively distributing the cost across multiple use cases. Frameworks and libraries frequently incorporate their own DSLs precisely because they cater to a large developer base, with each developer benefiting from the productivity gains of these embedded languages.

Therefore, a general guideline is to consider building DSLs when you, your fellow developers, or your application’s end-users will derive significant value from them. If you choose to create a DSL, prioritize comprehensive test coverage and clear documentation of its syntax. This is crucial, as understanding a DSL’s functionality solely from its implementation can be very difficult. Your future self and fellow developers will appreciate your foresight.

Licensed under CC BY-NC-SA 4.0