Staged Parser Combinators and Recursion

In the last post we saw how to stage parser combinators. We added alternation and concatenation, and CPS-encoded the parse result datas tructure to avoid intermediate allocations for it.

One important thing we didn’t do yet is recursion. Indeed, many data formats (JSON, programming languages, serialization formats) have a nested, self-similar structure. And as we are staging parser combinators, we should be able to stage recursive parsers too. Which is what we will do in this post. For once, we will slightly steer away from our principle of doing everything as a library, and use some minimal LMS internals.

A Playful Diversion

But first, a cute little exercise, thanks to Alex, in which we explore recursion a bit. We all know how to write the Fibonacci function:

def fib(n: Int): Int = if (n == 0 || n == 1) 1 else fib(n - 1) + fib(n - 2)

The problem with this implementation is that it computes many values more than once. For example, here is the trace for fib(4)

fib(4)
↪ fib(3) + fib(2)
↪ fib(2) + fib(1) + fib(2)
↪ fib(1) + fib(0) + fib(1) + fib(2)
↪ ...

We compute fib(2) twice, which is not necessary. We could for instance memo-ize results as we go instead.

Exercise 1: Suppose you are given a cache in scope. Reimplement fib so that it memo-izes intermediate results.

val cache = scala.collection.mutable.HashMap.empty[Int, Int]
def memofib(n: Int) = ???

Now we may have more than one function that we would like to memo-ize. It is tedious to have to create a cache for each such function. Could we do better?

Exercise 2: Define a function memo that takes a function (Int => Int) and returns a memo-ized version of it. You can use caches if need be. Reimplement fib using memo.

def memo(f: Int => Int): Int => Int = ???

It’s probably easier to implement backwards. The real trick lies in implementing fib using memo. Following the types, we can get to something like:

def fib: Int => Int = memo { i => if (i == 0 || i == 1) 1 else ??? }

And then we realize that we need to refer to fib in the expression we pass to memo: after all, we are trying to implement a recursive function. But, and here lies the trick, we need to refer to the same fib that we declare. And for that we cannot declare fib using a def. It has to be the same function value, and def creates a new value at every invocation. And we therefore have few options other than resorting to laziness:

lazy val fib: Int => Int = memo { i =>
  if (i == 0 || i == 1) 1
  else fib(i - 1) + fib(i - 2)
}

Indeed, thanks to laziness, the first time we use fib a reference for it will be created and then properly bound to further uses of fib. As a result any internal structures that memo uses will be bound to that reference. And that should definitely give you enough info to implement memo now!

Recursion and Staging

So what is the problem with recursion and staging? Seems that it should work out of the box? Well, consider the following recursive, staged parser, which parses digits and adds them up:

def adder: Parser[Int] = (digit2Int ~ adder) map { case (x, y) => x + y } | digit2Int

In a staged setting, adder is a code generator for a parser. So every time we encounter it we generate code for it. Being recursive, this process will last forever, theoretically. In practice we get stack overflows. Neither are desirable, of course.

For recursive parsers, we need to generate two different types of code for the same parser:

We need to remember whether we have encountered a recursive parser, and we can use memo-ization for that! Just like the memo function above, we define a rec combinator which wraps a recursive parser. It is similar to the fix combinator. Our example now becomes:

lazy val adder: Parser[Int] = rec {
  (digit2Int ~ adder) map { case (x, y) => x + y } | digit2Int
}

One can see the use of lazy vals as a drawback, but it turns out that it is a common pattern even for standard parsers. For example packrat parsers need to be declared as lazy vals to function correctly [1].

The implementation of rec turns out to be very close to that of memo:

trait StagedParsersExp
    extends StagedParsers
    ...
    with ParseResultOpsExp
    with FunctionsExp {

  val store = new scala.collection.mutable.HashMap[Parser[_], Sym[_]]

  def rec[T: Typ](p: Parser[T]) = Parser[T] { in =>

    import Parser._

    val myFun: Rep[Input => ParseResult[T]] = store.get(p) match {

      case Some(f) =>
        scala.Console.println("we have a function call")
        val realf = f.asInstanceOf[Exp[Input => ParseResult[T]]]
        realf

      case None =>
        scala.Console.println("first time we see this guy, creating a new symbol")

        val funSym = fresh[Input => ParseResult[T]]

        store += (p -> funSym)
        val f = p.toParseResult
        val g = createDefinition(funSym, doLambdaDef(f))
        store -= p

        funSym
    }

    val res: Rep[ParseResult[T]] = myFun(in)
    conditional(res.isEmpty,
      ParseResultCPS.Failure(res.next),
      ParseResultCPS.Success(res.get, res.next)
    )
  }
}

Some important points:

The bottomline

So that’s it! We have now added recursion to our staged parser combinators. Before we conclude, there are a few points worth discussing.

Alternatives to staging recursion

It might be a bit disappointing that we have to use an explicit rec combinator because the information about recursion seems directly available anyway. A possibility would be to use macros or reflection to inspect the structure of a parser, and add the rec combinator at compile time. But if we use macros, we could directly do partial evaluation with them, and just leave recursive functions untouched. This is what Eric does in his parser combinators and macros work[2].

A note about laziness

In this post we saw that lazy vals are very useful for late binding of recursive constructs. This is another classic functional programming trick. In a language like Haskell, because everything is lazy by default we don’t realize it as much, but in Scala we are required to actively think about it.

We also need to be a bit careful with definitions of our combinators, in order to propage the laziness correctly. This is why, for example, the sequencing combinator ~ takes a by-name parameter:

def ~[U: Typ](that: => Parser[U]): Parser[(T, U)]

If it didn’t, the value of the right hand side would be computed when the expression is created, and in a recursive case this blows up.

The code

The code used in this post can be accessed through the following files:

If you want to run this code locally, please follow instructions for setup on the readme here. The sbt command to run the test is test-only stagedparsec.RecParsersSuite

References

  1. Packrat Parsing in Scala, Jonnalagedda., Tech. Report 2009
  2. Accelerating Parser Combinators with Macros, Béguet et al., Scala 2014.
  3. Staged Parser Combinators for Efficient Data Processing, Jonnalagedda et al., OOPSLA 2014
Manohar Jonnalagedda 04 September 2015