In this article, I want to demonstrate a nice way to write functional-style code in Ruby. It is a way to write non-trivial pure functions, without a bunch of weird non-idiomatic code. Also, the acronym is PFAAO, which I think sounds pretty cool.
Pure Functions
Pure functions are functions that have no side-effects, and whose return values are only determined by their arguments. Given the same arguments, a pure function will always return the same value. It does nothing other than generate a return value – it doesn’t affect anything, anywhere else in the program.
Pure functions are the core concept of functional programming (FP). I’m going to gloss over the benefits of FP, because there are plenty of other articles about that. Let’s just say that pure functions are good, and you should be writing them wherever possible.
The Example: A JSON To XML Converter
To demonstrate this pattern, I’m going to write a program that converts a JSON document to XML. The whole example project is available on GitHub, here: https://github.com/tomdalling/pure-function-as-an-object
The input looks like this:
{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }
{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }
{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }
{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }
{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }
And the output looks like this:
<?xml version="1.0"?>
<cities count="5">
<city id="01001" name="AGAWAM" state="MA" location="-72.622739,42.070206" population="15338"/>
<city id="01002" name="CUSHMAN" state="MA" location="-72.51565,42.377017" population="36963"/>
<city id="01005" name="BARRE" state="MA" location="-72.108354,42.409698" population="4546"/>
<city id="01007" name="BELCHERTOWN" state="MA" location="-72.410953,42.275103" population="10579"/>
<city id="01008" name="BLANDFORD" state="MA" location="-72.936114,42.182949" population="1240"/>
</cities>
We will only be using the Ruby 2.3.0 standard library, and the Ox gem for writing the XML.
Let’s get started!
Designing The API
After gathering the requirements, the next step is to ask: how much of this can be implemented with pure functions?
As it turns out, pretty much the entire thing could be a pure function. Reading the input is non-pure, and writing the output is also non-pure, but the entire conversion from JSON to XML is pure. The same input JSON always produces the same output XML.
The API for this converter could be used like this:
input = File.read('input.json')
output = JSON2XML.convert(input)
File.write('output.xml', output)
This example code could easily be written as a test.
Just use some dummy input, and assert that output
is correct.
Because the File.read
and File.write
calls are non-pure, I have purposely kept them separate from the conversion code.
That allows the conversion step to be implemented as a pure function – it just takes a string and returns a string.
This API looks pretty good to me, assuming that the input data isn’t too big. If the input data set was huge, we wouldn’t be able to load the whole thing into memory, so we would need so write some sort of streaming API. But, for the sake of demonstration, let’s assume that we’ve investigated this possibility and decided that it’s very unlikely that the input will ever be more than a few megabytes.
The Motivation
The public API consists of a single method, which will be the pure function. Let’s start with that:
require 'json'
require 'ox'
module JSON2XML
def self.convert(input_json)
# TODO: implementation goes here
end
end
We could easily implement all the functionality by writing a bunch of pure methods on the JSON2XML
module, without defining any new classes.
It would work, but I personally don’t think that this is how Ruby is designed to be used.
It’s not what most people would consider to be idiomatic Ruby.
We would be fighting the language – going against the grain.
That’s not fun.
To use Ruby the way it is designed to be used, we will need to define a class here. However, the class is unnecessary. It’s an implementation detail. The public API is just a single pure function, so nobody should ever need to instantiate an object of the class we’re about to create.
This is the motivation behind the pattern: we want the public API to be a simple pure function, but we want to implement that function with a private class.
The Pattern
Without further ado, here is the pattern:
require 'json'
require 'ox'
class JSON2XML
def self.convert(input_json)
new(input_json).send(:xml)
end
private
def initialize(input_json)
@input_json = input_json
end
def xml
# TODO: implementation goes here
end
end
Instead of a module, JSON2XML
is now a class.
Everything below the convert
method is marked as private, signalling to other developers that convert
is the only thing they should be calling, and everything else is an implementation detail.
The xml
method can’t be called directly because it is private, so we use send(:xml)
to get around that.
The convert
class method:
- creates an object of its own class,
- passing its argument into the initializer,
- and calls a single method on the object, to generate the return value.
That’s the overview of how the pattern works.
Implementation Advantages
Now let’s finish off the implementation of the JSON to XML converter, to demonstrate some advantages of using this pattern.
The xml
method is empty at the moment.
It’s supposed to return the output XML as a string, as required by the convert
class method.
How do you generate XML?
One approach is to take some sort of XML “document” object, and serialize it.
Let’s do that:
def xml
Ox.dump(document, with_xml: true)
end
“But wait!”
I hear you say.
“There is no document
object. It doesn’t exist.”
Let’s create it, then.
How do you make a document object?
Since we’re using the Ox gem, we need to instantiate an Ox::Document
and fill it with all the output.
XML documents are only supposed to have a single root node in them, so all the content will have to be inside that.
def document
Ox::Document.new(version: '1.0').tap do |doc|
doc << root_node
end
end
“But wait!”
I hear you say, again.
“Where did this root_node
come from? That’s not defined anywhere.”
Let’s create it, then.
How do you make the root node?
In Ox, you instantiate a Ox::Element
object.
We have to fill the root node with all the output data, so each city will have its own node inside the root node.
Also, the root node has a “count” attribute on it, for reasons that I will explain later.
def root_node
Ox::Element.new('cities').tap do |root|
root[:count] = city_nodes.size
city_nodes.each { |city| root << city }
end
end
“But wait!”
I hear you say, for a third time.
“city_nodes
doesn’t exist either!”
Let’s create it.
I’m sure you’re getting the gist of what’s happening here.
Each city node is also an Ox::Element
object.
We can create a city node from each line of the JSON input.
def city_nodes
input_json.each_line.map { |line| parse_city_node(line) }
end
The input_json
method is missing.
That is the string value that was passed into the initializer.
We can implement that with a simple attr_reader :input_json
.
The parse_city_node
method needs to be implemented, too.
Here it is:
def parse_city_node(line)
Ox::Element.new('city').tap do |city|
attrs = JSON.parse(line)
city[:id] = attrs.fetch('_id')
city[:name] = attrs.fetch('city')
city[:state] = attrs.fetch('state')
city[:location] = attrs.fetch('loc').join(',')
city[:population] = attrs.fetch('pop').to_s
end
end
“But wait!” I don’t hear you say, because the implementation is now complete. Here is the whole class:
class JSON2XML
def self.convert(input_json)
new(input_json).send(:xml)
end
private
attr_reader :input_json
def initialize(input_json)
@input_json = input_json
end
def xml
Ox.dump(document, with_xml: true)
end
def document
Ox::Document.new(version: '1.0').tap do |doc|
doc << root_node
end
end
def root_node
Ox::Element.new('cities').tap do |root|
root[:count] = city_nodes.size
city_nodes.each { |city| root << city }
end
end
def city_nodes
input_json.each_line.map { |line| parse_city_node(line) }
end
def parse_city_node(line)
Ox::Element.new('city').tap do |city|
attrs = JSON.parse(line)
city[:id] = attrs.fetch('_id')
city[:name] = attrs.fetch('city')
city[:state] = attrs.fetch('state')
city[:location] = attrs.fetch('loc').join(',')
city[:population] = attrs.fetch('pop').to_s
end
end
end
What I’ve just demonstrated is a top-down decomposition of the entire implementation. You start with the desired output – in this case, an XML string – and work backwards. You write the code that you wish existed, and implement it later. This is a nice way to break down complicated algorithms into smaller, and smaller pieces.
This is all made possible by Ruby’s syntax. Ruby blurs the line between local variables and a method calls. This allows us to write identifiers representing values that we wish we had, as if they already exist as local variables, and then implement them later as methods. We’re working with grain of the language, and this is one of the benefits.
Notice how none of the methods have side effects.
Each method in the class is itself a pure function.
The @input_json
instance variable is never changed – it’s essentially a constant.
Because of that, we would expect that every method would always have the same return value.
For example, if I call the xml
method three times, I would get three identical strings.
To use FP terminology, all of the methods have referential transparency.
Performance Optimisation
Astute readers may have noticed a performance hiccup in the root_node
method.
def root_node
Ox::Element.new('cities').tap do |root|
root[:count] = city_nodes.size
city_nodes.each { |city| root << city }
end
end
I put the “count” attribute on the root node to demonstrate a common performance problem with this pattern.
The problem is that city_nodes
is called two times.
That means we’re parsing the entire data set twice.
That is unnecessary, and roughly doubles the run time of the conversion.
Thankfully, due to all the methods being pure, we have a simple solution to this problem: memoization. Memoization is the caching of return values from pure functions.
We’ve already seen that the city_nodes
method always returns the same value.
That means we can cache the return value the first time the function is called.
On all subsequent calls, we can just return the cached value without running the rest of the function.
Cache invalidation can be a tricky problem, but not in this case. When do we need to invalidate the cache? Never! It’s a pure function. The return value literally never changes. We can just forget about cache invalidation completely.
Here is the solution:
def city_nodes
@city_nodes ||=
input_json.each_line.map { |line| parse_city_node(line) }
end
The function implementation is exactly the same, except for the @city_nodes ||=
line that has been added.
This trick isn’t specific to the PFAAO pattern, it’s just normal, idiomatic Ruby.
Now we can call city_nodes
as many times as we like, from any other method, without having to worry about performance.
This is a cleaner solution than storing the return value in a local variable before using it.
When To Use PFAAO
This pattern is good for implementing complicated pure functions. I’ve written a DOCX to HTML converter this way, quite happily.
If the implementation is simple, it’s not really worth defining a new class – just write a slightly larger function. If you’re writing a pure function and it’s growing out of control, that is the time to consider using this pattern.
Where the implementation is mostly pure, but not completely, you can still use this pattern. The implementation will be more complicated, but still have a small and simple public API. Ruby isn’t Haskell. We can use side effects, in a controlled fashion, wherever it makes sense.
If your implementation requires a lot of mutable state, PFAAO is probably a bad fit.
Optional Extras
-
Disallow instantiation of the class.
Unfortunately, makinginitialize
private does not prevent instantiation of the class. The fact that everything is private except the class method should indicate that the class isn’t meant to be instantiated. If that’s not strict enough for you, you can make Ruby raise an error by adding this line of code to the class definition:private_class_method :new
-
Make it quack like a Proc.
Objects that act like functions, such asProc
objects, are invoked using thecall
method in idiomatic Ruby. I named the class methodconvert
in the example above, but if you changed that tocall
you could use the class as if it were aProc
. In that case, I would change the class name toXML2JSONConverter
to keep the word “convert” in there. -
Make it convertable to a Proc.
Let’s say you’re using this PFAAO as a block fiarly often, like this:documents.map{ |d| XML2JSON.convert(d) }
It would be nicer to just pass the block argument using ampersand syntax like this:
documents.map(&XML2JSON)
To make this a reality, implement the
to_proc
method on the class, like this:def self.to_proc method(:convert).to_proc end
Conclusion
It is possible, and even preferable, to write Ruby in a functional style. No monads or category theory required. Complicated pure functions can be written in idiomatic Ruby.
Again, you can get the code for the article from GitHub: https://github.com/tomdalling/pure-function-as-an-object
We’ve also learnt that PFAAO is a cool acronym. Say it out loud. PFAAO. PFAAO.
PFAAO…