Metaprogramming Pitfalls
One of my most liked posts on the elixir forum has long been a post, where I shed some light onto what quote/2
and unquote/1
are about in metaprogramming on elixir. The post seems to have stood the test of time given I still get consistent likes on this post from time to time even with this being over 4 years old at the time of writing this blog post.
In this blogpost I want to extend what I’ve talked about on the forum. I also want to get into a bit more detail on the complexities around macros dealing with module bodies and functions within them. I mentioned this in the forum post but didn’t provide much guidance on it – that simply wasn’t necessary in the context of thread. I’ve however seen people bump into that complexity again and again, so it’s time to document it a bit more thoroughly here.
Metaprogramming
To start a quick introduction into metaprogramming. Metaprogramming means one can use elixir code to generate elixir code and more importantly elixir also has means of making use of such code generation within the language itself.
Usually people quickly start to talk about macros, but macros are not actually necessary to use metaprogramming, so I’ll defer talking about macros to later in this blog post.
Unquote fragments
The easiest way to get into metaprogamming on elixir in my opinion is using so called “unquote fragments”. While there’s a special name I’d still like you to retain the fact that it refers to the same unquote/1
used elsewhere, though it denotes its usage in a specific context. That specific context is the definition of functions. If you don’t know yet what unquote/1
is don’t worry, it’ll become clear later.
To demonstrate I’ll reuse the example I made in the forum post.
Consider wanting to setup a function, which turns an integer log level into the corresponding atom name of the log level. That’s easy enough to be done manually:
defmodule LogHandler do
def log_level(1), do: :error
def log_level(2), do: :warning
def log_level(3), do: :info
end
While it would be totally unnecessary in a production codebase, especially given such a short list, let’s entertain the idea of using metaprogramming and generating those function heads from code.
I suggested above that elixir allows creating code from code, so let’s just naively attempt to do that. To generate those functions all we’d need to do would be having knowledge about the mapping from integer log level to atom and be able to iterate that mapping to create the function heads.
defmodule LogHandler do
# Naive attempt - non-functional
for {i, level} <- [{1, :error}, {2, :warning}, {3, :info}] do
def log_level(i), do: level
end
end
For a naive attempt this is surprisingly close to a working solution. I can easily see this being an actual attempt of someone knowing elixir, but not its metaprogramming functionality.
Trying to compile this there are a few errors and warnings however. The following two will show up 3 times each.
error: undefined variable "level"
#log_handler.ex:4: LogHandler.log_level/1
warning: variable "i" is unused (if the variable is not meant to be used, prefix it with an underscore)
#log_handler.ex:4: LogHandler.log_level/1
…
The reason for those errors and warnings should be quite obvious by looking at the next code snippet, which shows the code the above logic would generate.
defmodule LogHandler do
def log_level(i), do: level
def log_level(i), do: level
def log_level(i), do: level
end
Again – not far off, but clearly not correct.
To explain how to make this work we need to take a step back. Within the for
comprehension above we have the following line of code:
def log_level(i), do: level
This is valid (even if non-functional) elixir code. Nothing in the elixir compiler can tell that the intention is to replace the i
and level
with the values provided by the outer for
comprehension. That’s where unquote/1
and unquote fragments comes in.
A naive understanding of unquote fragments would be that it signifies to the compiler to inject the value provided to unquote(…)
into the code surrounding the call – kinda like string interpolation (#{}
), but for code. I’ll go more into the caveats around unquote/1
usage later, for now this naive view should be enough.
Small sidenote: You can use unquote fragments to customize any parameter of the def/2
macro, which includes even the function name. Remember that with explicitly removing a few layers of syntax sugar the prev. code would kinda look like this: def(log_level(i), [{:do, level}])
. The function name and parameters list is the first parameter to def/2
.
Getting back to the example let’s tell elixir to use the i
and level
variables from the surrounding code in the module body:
defmodule LogHandler do
for {i, level} <- [{1, :error}, {2, :warning}, {3, :info}] do
def log_level(unquote(i)), do: unquote(level)
end
end
Small change and the module compiles just fine to exactly the code we manually wrote out right at the start of the blog post.
That’s great success and essentially all you need to make use of unquote fragements. But there’s more to understand to not eventually run into issues.
If you’ve worked with elixir macros before you might also be wondering why there’s no quote/2
involved here. The TLDR is that it’s kinda integrated into def/2
and how the module body is executed/evaluated while compiling elixir code, while functions aren’t evaluated at that time. For more context simply keep reading.
Abstract Syntax Tree
Elixir – like many other languages – parses elixir code into an abstract syntax tree (AST) before turning that AST via various additional steps into some representation, which can be executed by the VM. In Elixir that AST is a deeply nested list of mostly tuple-3 values and some literals – where the value in the AST is exactly the value it represents in code (see the syntax reference for exhaustive details on that). Let’s look at an example to kinda know how this might look like, but with no intention to fully understand that result.
MyModule.some_function(the_parameter)
# turned to AST
{
{
:.,
[],
[
{:__aliases__, [alias: false], [:MyModule]},
:some_function
]
},
[],
[
{:the_parameter, [], Elixir}
]
}
The latter is the abstract syntax tree, which represents the piece of code at the start in a structured manner. Having an AST to work with generally makes it a lot simpler to programatically work with code compared to needing to work on the plain elixir syntax itself, where multiple ways to express the same code exist.
Why would this be relevant though? Because the AST is the base primitive of metaprogramming and it’ll become more apparent when looking at macros.
Macros
Macros are the way by which elixir allows for encapulation of metaprogramming code, kinda like functions encapsulate distinct pieces of runtime executable code. This works by allowing macros to turn one piece of code – the parameters of the macro – into other code – which is returned from the macro.
For explaining that let’s go back to showing things in practice.
Sticking with LogHandler
consider that the definition of those log levels is actually maintained in an external file and you don’t want to hardcode the values in your module. Instead the intention is to reference that external file to get to the information.
To start out we could still do this in the module’s body:
defmodule LogHandler do
pairs =
"log_definition.json"
|> File.read!()
|> Jason.decode!()
|> Enum.map(fn %{"int" => int, "label" => label} ->
# String.to_atom/1 can be fine at compile time.
# Manually writing out the code would've
# introduced those atoms anyways.
{int, String.to_atom(label)}
end)
for {i, level} <- pairs do
def log_level(unquote(i)), do: unquote(level)
end
end
That works, but is a bit verbose. “Verbosity” is usually a prime reason for people to start using macros and macros really are a useful tool to prevent writing boilerplate. But do we really need to “replace code with other code” here? Not really.
The following does work just fine, while not using any macros:
defmodule DefinitionParser do
def log_pairs do
"log_definition.json"
|> File.read!()
|> Jason.decode!()
|> Enum.map(fn %{"int" => int, "label" => label} ->
{int, String.to_atom(label)}
end)
end
end
defmodule LogHandler do
for {i, level} <- DefinitionParser.log_pairs() do
def log_level(unquote(i)), do: unquote(level)
end
end
We don’t need macros here, because we don’t need to generate any code. Everything done by DefinitionParser
can be handled by calling functions.
Let’s extend the requirements a bit. The code above has a flaw: If the external file changes the elixir code won’t update until some other factor, like a clean build or change in the modules source file, causes a recompilation. Luckily elixir actually comes with support for the usecase of depending on external files. One can use @external_resource
to declare a dependency on an external file. Whenever a project is compiled the referenced file(s) will be checked for changes, forcing the module to recompile if changes are found.
We could put that module attribute manually in LogHandler
, but really DefinitionParser
should know the path not LogHandler
. Just calling log_pairs
should be enough signal for the @external_resource
attribute to be set automatically.
Side note: You might notice me being cautious in explaining the reasoning for macros. A guideline of mine is that “a macro cannot do something if you cannot manually write out the code yourself”. However that also means you want to have a good reason for macros to justify their complexities. Here we use it to combine setting a module attribute with calling a function – the same could be made to work without them, but it might be prone to error.
Let’s implement this new feature using a macro:
defmodule DefinitionParser do
defmacro log_pairs do
quote do
@external_resource "log_definition.json"
# Pull this into a separate function and
# call that one in a real codebase.
"log_definition.json"
|> File.read!()
|> Jason.decode!()
|> Enum.map(fn %{"int" => int, "label" => label} ->
{int, String.to_atom(label)}
end)
end
end
end
defmodule LogHandler do
require DefinitionParser
for {i, level} <- DefinitionParser.log_pairs() do
def log_level(unquote(i)), do: unquote(level)
end
end
This new macro will make the module attribute be set and the pairs be fetched. This happens as at compile time the call to DefinitionParser.log_pairs()
will be replaced with all the code returned from the macro.
Dealing with AST
The macro above doesn’t look much different to the function before, but it works a lot different. Macros unlike functions don’t deal with data flowing though a system, but they deal with AST – just AST. Any incoming parameters are supplied as AST and the macro needs to return AST. This is often missed by people starting to work with macros. Literals looking the exact same in AST form sometimes make it seems as if the macro would get “values” passed, but it’s really not. This becomes quite obvious once non literals are inputs to macros.
To drive this point home here’s a few examples:
defmacro parameter(a) do
…
end
# Literals
parameter(1) # a == 1
parameter(:atom) # a == :atom
parameter([a: 2]) # a == [a: 2]
# Non-literal
parameter(%{a: 2}) # a == {:%{}, [], [a: 2]}
parameter(variable) # a == {:variable, [], Elixir}
parameter(function()) # a == {:function, [], []}
The non-literal ones are rather deceptive. Reading the code calling the macro you might expect the macro to get the value of e.g. the variable or the value returned by function()
, but again a macro transforms code to other code – and the code is “a variable” or “a function call”, not the values they might evaluate to.
This makes even more sense with parameter()
being a macro, hence being called when the code is compiled, but variable
or function()
are to be evaluated at runtime – as in only after compilation finished. This trips people up anyways though.
Receiving AST over evaluated data however doesn’t mean that macros cannot make use of those parameters. They can either be useful by extracting information out of the received AST itself or by making the received AST part of their returned AST.
quote/2
/unquote/1
With a lot of the context handled we can finally start talking about unquote/1
– and the quote/2
that I silently snuck into the macro implementation above. Both in combination allow us to work with AST without needing to deal with a deeply nested list structure, but by composing snippets of code.
quote/2
allows us to write any (syntactically correct) elixir code and it’ll return the AST of that code. That approach makes understanding and maintaining macro code a lot easier than trying to figure out what a large tree of tuples does. If you’re curious check just how much AST this one line of code would be.
defmacro example do
quote do
@external_resource "log_definition.json"
end
end
Looking at quote/2
you might have already made a connection to the start of the blog post though. quote/2
has the same problem as we had there. How to differenciate code, which is meant to be returned to the caller of the macro as-is, from the code, which needs to be dynamic and to be supplied by the macro or parameters of the macro. That’s what unquote/1
is used for.
defmacro example(root) do
quote do
@external_resource Path.join(
unquote(root),
"log_definition.json"
)
end
end
To be more specific in my description, compared to earlier, unquote/1
expects to be given AST (either literals or the AST of a non literal, see Macro.escape/1
) and inject this into it’s outer “quoted” context.
In the case of a macro this quoted context is everything within quote do … end
. In the case of an unquote fragment I’m actually fuzzy on the exact details, but essentially def/2
kinda does something like quote do
interally. By my understanding it evaluates everything to AST and stores it somewhere, so it can eventually be turned into compiled code for those function found within some beam files.
Usage of unquote/1
outside of those two situations is not allowed.
Not sure this fully answered the question of why unquote/1
works for functions, but at least all this should’ve provided some parallels to what happens with macros vs. when using unquote fragments.
Side note: Debugging macros can sometimes be annoying. A tip I have is to use quote/2
to do so. I like to run quote do {Code I want to have generated} end
and compare the result with what the macro returns. I often found that to be simpler that trying to Macro.to_string
the AST and then needing to deal with the “inconsistencies” between multiple ways to write the same code.
Unquote fragments within macros
This one is a tricky one. Even people who kinda know their way around macros get tripped up by this, but maybe all the above does already give you a hint of the problem to be dealt with.
If unquote/1
allows for injecting AST into a quoted context, how do you deal with the case where those contexts are nested within each other. Without the use of additional options it would be ambiguous.
To illustrate the problem in actual code:
@doc """
Create a log mapping function for the provided log levels
## Example
defmodule MyLogA do
import MacroMod
log_mapping([{1, :info}, {2, :error}])
end
"""
defmacro log_mapping(pairs) do
quote do
for {i, level} <- unquote(pairs) do
def log_level(unquote(i)), do: unquote(level)
end
end
end
We as humans can kinda infer that i
and level
are meant to be supplied by the for
and pairs
is meant to be supplied from the macro’s parameter. The compiler however cannot do that. It would in this case expect both i
and level
to be variables provided by the macro, while the variables set by the for
would create a warning later in compilation for being unused (comparable to the warnings a the beginning of the blog post).
What would be the solution here? It’s additional options on quote/2
. Having options on quote/2
handles all nested variants where quote/2
is involved, and def/2
within def/2
(without any quote/2
blocks) is not allowed.
The first option to look at is :unquote
. By setting it to false
the current quote/2
block will keep unquote/1
calls as unquote/1
calls in its returned AST instead of injecting the AST of the unquote/1
parameter.
defmacro log_mapping(pairs) do
quote unquote: false do
for {i, level} <- unquote(pairs) do
def log_level(unquote(i)), do: unquote(level)
end
end
end
This still doesn’t work, but this time for a different error. The complete code as written within the quote/2
block would be returned as AST from the macro. Including the unquote(pairs)
. That unquote(pairs)
would then fail in the caller to the macro, because unquote/1
would be evaluated in a place where it’s not within a quote do
nor within def/2
.
We fixed the unquote(i)
and unquote(level)
, but broke the unquote(pairs)
. Given we cannot use unquote/1
to provide pairs
to the quote/2
we need yet another option: :bind_quoted
. This option lets the caller of quote/2
provide data to inject into the code body, but in a different way than using unquote/1
. Given :bind_quoted
also implicitly sets unquote: false
there’s also no need to supply both options.
defmacro log_mapping(pairs) do
quote bind_quoted: [pairs: pairs] do
for {i, level} <- pairs do
def log_level(unquote(i)), do: unquote(level)
end
end
end
As you can see the unquote/1
around pairs
within for
is gone. The bind_quoted
does make sure pairs
is available as a variable just before the code of the quote/2
body. Turning the returned AST of the macro back into code would similar to the following:
pairs = # unquoted input to log_mapping
for {i, level} <- pairs do
def log_level(unquote(i)), do: unquote(level)
end
Once def/2
is evaluated it would then handle the unquote/1
s on the log_level/1
function definition(s) like described in the beginning of the blog post.
The same options would also allow multiple nested quote/2
to work. But I’d expect that at that point you’re in deep metaprogramming territory, when there’s multiple levels of quote/2
to juggle.