In the series on shortcut fusion (here and here) we discovered two techniques to remove intermediate data structure in list operation pipelines. This time, let’s actually implement the very first of these!
To be more precise:
foldLeft
as an abstraction over list operations. This
is because for lists, foldLeft
and foldRight
can be used to implement
similar operations, and it turns out (as we will see) that foldLeft
is a
bit nicer to optimize in our scheme.This post is not meant to be a detailed tutorial on how to use LMS. I will however attempt to give just as much information as is required to understand this post in a self-contained manner.
On that note, multi-stage programming (MSP) is a technique to separate the
compilation of a program into multiple stages. In an MSP setting, we explicitly
specify which parts of the program can be evaluated at a later stage (and by
corollary which parts can be evaluated now). Running such an annotated program
p0
will yield a new program p1
. This new program is behaviourally equivalent
to p0
. The difference is that the parts that were not delayed have been
evaluated away, and the parts which were explicitly delayed are expressions in
p1
. Essentially, MSP is a way to perform safe runtime code generation.
Closely related to MSP is the concept of partial evaluation. With this technique, if a program receives a static and a dynamic input, the static part of the program is evaluated away, so that the residual program (which still needs a dynamic input to yield a result) is specialized for the particular static input. This residual program, being specialized, is expected to have better performance than to original program. The main difference with MSP is that the static parts of the program are inferred, rather than explicitly specified.
LMS
LMS is short for Lightweight Modular Staging. It is a staging/runtime code
generation framework in Scala. The execution of expressions is delayed through
the use of a special type constructor, Rep[T]
. In other words:
val x: T = ...
is a constant in the generated code.val x: Rep[T] = ...
is an expression of type T
in the
generated code.The following diagram captures how to think of an LMS program:
Instead of writing a program as in the left-bottom corner, we will be writing a
program containing Rep
types, as in the top left corner. LMS will then take
care of going to the top-right corner, and beyond.
The type checker allows us to combine Rep
expressions in a fairly seamless
way, so it feels like we are writing zero-stage programs. However, we are in
fact composing code generators when we write an expression of type Rep[T]
.
This is a very important concept to keep in mind for later on.
Exercise 1: What is the difference between Rep[T => U]
and Rep[T] => Rep[U]
?
To answer the question, let us desugar the types:
Rep[T => U]
desugars to Rep[Function[T, U]]
. This means that in the
generated code (top-right corner) there will be a function.Rep[T] => Rep[U]
desugars to Function[Rep[T], Rep[U]]
. This is an
unstaged function. It only exists in the first stage. Applying it to an
input of type Rep[T]
effectively inlines the code of the function at the
application spot!This is the one of the great tricks of writing LMS programs. We want to generate code which contains little/no overhead. Generating functions when not absolutely necessary is therefore futile. This thought is the driving force behind designing a good staged library. Indeed, many higher order functions can take unstaged functions as parameters, and we get inlining for free!
Armed with some knowledge about LMS, let us shift our focus to the main topic.
The signature of foldLeft
is given below:
It takes a zero element, a combination function, and applies it to a list. We
have seen that the essence of foldLeft
lies in the first parameter list: it
can be applied to other collections as well. So we can abstract the second parameter
list away for now. Using the guiding design principles, we come up with the
following types:
The enclosing trait FoldLefts
mixes in some of LMS’ building blocks which help
in composing code generators[4]. Although it may seem like a long list,
these are the only building blocks required for this example. In particular, we
want to be able to write a bit of mutable code (LiftVariables
) and while loops
(While
). The Manifest
annotation on polymorphic types is specific to code
generation.
Exercise 2: Pay close attention to the type of abstract class FoldLeft
.
What is staged, and what is not?
As promised, we are using unstaged functions. Note that we are in fact also
using unstaged tuples. There is once again a subtle but all important difference
between (Rep[A], Rep[S])
and Rep[(A, S)]
.
The actual implementation of the foldLeft
function for lists can be given as
follows:
I choose to implement it using while loops and mutable (but controlled)
variables rather than recursively. Why? I want to generate low-level code for
the JVM, which is better at optimizing while loops than recursive functions.
This also explains the choice of foldLeft
over foldRight
.
Exercise 3: Why does fromList
take a Rep[List[A]]
and not a List[Rep[A]]
?
Well, because at the end of the day, we still need to provide an initial list to the fold pipeline. And this list is an actual input to the program, i.e. it is unknown at staging time.
Let’s make sure our code runs. What does that mean? It means:
FoldLeft
generates lower-level code,
containing no foldlefts.Let’s implement the identity function, for a sanity check:
This function needs to be implemented in the right trait to get everything to work properly. Please take a look at the full code (links in the code section below). A few things to pay attention to:
Rep[List]
, not on List
.
This is because we have inherited the ability to operate on Rep[List]
from
the ListOps
trait.++
), which is
basically algorithmic suicide. You are right. For really efficient code, we
would use ListBuffer
instead.S
as List[Int]
so
early in the pipeline, when we declare the fld
value. Any ideas on how to
improve upon that are welcome!The generated code for foldLeftId
has the following form:
As we can see, we are left with a barebone while loop, exactly what we wished for.
Exercise 4: Write a function fromRange
which creates an instance of
foldLeft
over a range of integers:
We are finally equipped to implement our good friends map
, filter
, flatMap
.
Let me get you started with map
:
If you have read the previous post on foldr/build fusion, or implemented a map
on lists using fold, this should be obvious. The subtle point, however is that
the function taken as parameter is an unstaged function.
Calling comb(acc, f(elem))
does not only inline the body of comb
, but also
the body of f
, in the right place. This is where all the magic is actually
happening. Once again, please take a look at the appropriate test case (and
generated code) to make sure you get it.
Exercise 5: Write the filter
function:
Once again, p
is a staged function, so applying it inlines the body of the
predicate.
Exercise 6: Write the flatMap
function:
Hold on! What is the meaning of the type of f
? Let’s expand it:
We have a curried function. To be precise, we have a curried, unstaged function.
If we can fully apply the function (as opposed to partially applying it) we can
inline not only the body of f
, but also the body of the resulting FoldLeft
.
This way, we don’t have to worry about generating code that represents a
FoldLeft
after all! Here goes:
As soon as we get a nestedFld
, we immediately apply it, so all is well.
Exercise 7: What if the function passed to flatMap
creates a FoldLeft
using fromList
?
In that case, we do indeed have an issue: we will end up creating a list in the
generated code. So we should avoid writing such code. It’s much better to use
something like fromRange
, or even a List[Rep[B]]
. Because the programmer is
expected to know the shape of the FoldLeft
resulting from the flatMap
call,
we can also reasonably expect her/him to use staged/unstaged structures
intelligently.
We have just implemented a compiler optimization for fusion! And we have done it
by writing almost only library-like code (modulo Rep
types). I don’t know
about you, but I find this rather elegant! We have done this by leveraging basic
MSP concepts, and the LMS framework. To be absolutely sure, you can take a look
at the test cases, where compositions of higher-order functions are implemented.
The generated code has the expected shape.
In terms of fusion algorithms, how powerful are we? Well, we are no more
powerful than foldr/build fusion: we face the same issues with zip
as them. We
are also at least as powerful as them: we can fuse all the functions they can.
To convince yourself, you could implement concat
as an exercise.
So it seems we are as powerful as them (we might need to prove this more formally ). We are arguably more elegant, because we do not rely on beta-reduction or other underlying compiler optimizations. We have implemented a staged library instead!
Update: I had a few discussions with Dmitry, and see things in a slightly different light with respect to elegance. As I understand, foldr/build fusion is implemented in Haskell for lists (among others). The way it is implemented is to use Haskell’s rewrite rule system to get the shortcut fusion in motion. Haskell’s compiler then gets into action, and can perform inlining and other optimizations that we do through LMS here. That is also arguably quite elegant indeed!
In a language like Scala, the compiler cannot do as well, mainly because of virtual dispatch. Working with the LMS library, we have a closed world during code generation time. Hence we can gain more optimization power, and safely perform inlining etc.
The code used in this post can be accessed through the following files:
If you want to run this code locally, please follow instructions for setup on
the readme here. The sbt
command to
run the particular test is test-only barbedwire.FoldLeftSuite
.