Friday, June 8, 2012

The Kansas University Rewrite Engine (KURE)

The Kansas University Rewrite Engine (KURE) is a Haskell-hosted DSL for strategic programming. We`ve just released the third version of KURE, which adds lenses for navigation and a variant set of combinators to make change detection easier.

This post just overviews the basics, and gives a simple example of usage.

KURE Basics


KURE is based around the following data type:
data Translate c m a b = Translate {apply :: c -> a -> m b}

translate :: (c -> a -> m b) -> Translate c m a b
translate = Translate

There`s a lot of type parameters, but the essential idea is that Translate represents a transformation that can be applied to a value of type a in a context c, and produces a value of type b in the monad m. Actually, we require m to be a MonadPlus, as this allows us to encode notions of success and failure, which are integral to strategic programming. Specifically, mzero represents failure and mplus is a "catch" for both mzero and fail. To avoid clutter we`ll omit the class constraints, but just imagine that wherever you see an m there`s a (MonadPlus m => ...) to go with it.

We also define a synonym for the special case when the result and argument type coincide:
type Rewrite c m a = Translate c m a a

Translate itself forms a monad (and an arrow, and a bunch of other structures besides), which provides us with a lot of combinators for free. Two key definitions are composition and bind:
(>>>) :: Translate c m a b -> Translate c m b d -> Translate c m a d
t1 >>> t2 = translate $ \ c -> apply t1 c >=> apply t2 c

(>>=) :: Translate c m a b -> (b -> Translate c m a d) -> Translate c m a d
t >>= f = translate $ \ c a -> do b <- apply t c a
apply (f b) c a

Observe the difference: composition takes the result of the first translation as the argument to the second translation, whereas bind uses the result to determine the second translation, but then applies that second translation to the original argument.

Another useful combinator is <+> (from the ArrowPlus class), which acts as a catch for Translate:
(<+>) :: Translate c m a b -> Translate c m a b -> Translate c m a b
t1 <+> t2 = translate $ \ c a -> apply t1 c a `mplus` apply t2 c a

We can now write strategic programming code, such as the classic try combinator:
tryR :: Rewrite c m a -> Rewrite c m a
tryR r = r <+> idR

Where idR is the identity rewrite:
idR : Rewrite c m a
idR = translate $ \ _ -> return

Finally, one combinator new to this version of KURE is sequential composition of rewrites that allows one rewrite to fail:
(>+>) :: Rewrite c m a -> Rewrite c m a -> Rewrite c m a

Example: Arithmetic Expressions with Fibonacci


Now let`s consider an example. Take a data type of arithmetic expressions augmented with a Fibonacci primitive:
data Arith = Lit Int | Add Arith Arith | Sub Arith Arith | Fib Arith

To keep things simple, we`ll work with an empty context, and use Maybe as our MonadPlus:
type RewriteA = Rewrite () Maybe Arith

Let`s start with some rewrites that perform basic arithmetic simplification:
addLitR :: RewriteA
addLitR = do Add (Lit m) (Lit n) <- idR
return (Lit (m + n))

subLitR :: RewriteA
subLitR = do Sub (Lit m) (Lit n) <- idR
return (Lit (m - n))

We`re exploiting the fact that Translate is a monad to use do-notation - something we have found extremely convenient. If the pattern match fails, this will just trigger the fail method of the monad, which we can then catch as desired.

Using >+>, we can combine these two rewrites into a single rewrite for arithmetic simplification:
arithR :: RewriteA
arithR = addLitR >+> subLitR

Next a more interesting rewrite, unfolding the definition of Fibonacci:
fibLitR :: RewriteA
fibLitR = do Fib (Lit n) <- idR
case n of
0 -> return (Lit 0)
1 -> return (Lit 1)
_ -> return (Add (Fib (Sub (Lit n) (Lit 1)))
(Fib (Sub (Lit n) (Lit 2)))
)

Tree Traversals


Thus far, we`ve only discussed rewrites that apply to the entire data structure we`re working with. But a key feature of KURE (and strategic programming) is the ability to traverse a structure applying rewrites to specific locations. For example, the anyR combinator applies a rewrite to each immediate child of a node, succeeding if any of those rewrites succeed:
anyR :: RewriteA -> RewriteA

At first glance this might sound simple, but there are a number of issues. Most notably, what if the children have distinct types from each other and their parent? How should such a combinator be typed? This isn`t an issue in this simple Fibonacci example, as there is only one type (Arith), but in general you could have an AST with multiple mutually recursive non-terminals. KURE solves this by constructing a sum data type of all non-terminals in the AST, and having traversal combinators operate over this data type (using Associated Types to specify the sum type for each non-terminal). This is the most significant feature of KURE, but it`d take too long to explain the details here. You can read about it in either of the following papers:

Using the anyR combinator (amongst others), KURE defines various traversal strategies (we just give the specialised types here):
anybuR :: RewriteA -> RewriteA
anybuR r = anyR (anybuR r) >+> r

innermostR :: RewriteA -> RewriteA
innermostR r = anybuR (r >>> tryR (innermostR r))

anybuR traverses a tree in a bottom-up manner, applying the rewrite to every node, whereas innermostR performs a fixed-point traversal, continuing until no more rewrites can be successfully applied. For example, we can define an evaluator for Arith using this strategy:
evalR :: RewriteA
evalR = innermostR (arithR >+> fibLitR)

Release


KURE 2.0.0 is now available on Hackage.

You can find this Fibonacci example, and several others, bundled with the source package. For a non-trivial example, KURE is being used as the underlying rewrite engine for the HERMIT tool. HERMIT hasn`t been released yet, but you can read about it in this paper:

Introducing the HERMIT Equational Reasoning Framework

The paper also describes how KURE uses lenses.

Sunday, June 3, 2012

Introducing the HERMIT Equational Reasoning Framework



Imagine you are sitting at your terminal wishing your Haskell program would go faster. The optimization flag has been turned on, and you know of an unimplemented transformation that could help. What do you do? You could add a new optimization pass to GHC, taking part in the dark art of tuning heuristics to allow it to play well with others. Or you could experiment, using HERMIT.


First Example of HERMIT


As an example we use the Fibonacci function, not because it is interesting, but because it is so well known.


module Main where

fib :: Int -> Int
fib n = if n < 2 then 1 else fib(n-1) + fib(n-2)

Compiling with -O2, and using Criterion to average over 100 tests, we observe that fib 35 runs in 124.0ms±2.6ms on our development laptop.


To enable further optimization of fib, we want to try unrolling the recursive calls. Like the GHC optimizer, we want to do this without changing the source, which is clear and concise.


To do so, we fire up HERMIT, choosing to use the command-line interface (because the GUI version is not written yet). HERMIT uses the GHC plugin mechanism, to insert itself into the optimization pipeline as a rather non-traditional compiler pass, capturing programs mid-compilation and allowing the user to manipulate them.



ghc -fplugin=Language.HERMIT.Plugin
-fplugin-opt=Language.HERMIT.Plugin:main:Main:mode=i
-fforce-recomp -O2 -dcore-lint Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
...
module main:Main where
fib Int -> Int
...
hermit> _



GHC has compiled our program into its intermediate form, called GHC Core, and HERMIT is asking for input. At this point we can start exploring our captured program. There are several bindings, but let us consider just fib.



hermit> consider 'fib
rec fib = λ n
case (<) $fOrdInt n (I# 2) of wild
False
(+) $fNumInt
(fib ((-) $fNumInt n (I# 1)))
(fib ((-) $fNumInt n (I# 2)))
True I# 1


The consider command moves the focus onto the specified function, and the REPL displays the function's abstract syntax. This is HERMIT's ``compact'' pretty printer for GHC Core. Core programs necessarily contain a substantial amount of information, and in this render engine HERMIT uses as a placeholder for type arguments. Notice that we refer to a Haskell syntactical name using the Template Haskell convention of prefixing with a tick.


Next we want to unroll fib, to see if this improves performance. Specifically, we want to inline all calls to fib (but nothing else) in a bottom-up manner. HERMIT uses strategic programming as its basis, so we build a strategy using two HERMIT functions.



hermit> any-bu (inline 'fib)
rec fib = λ n
case (<) $fOrdInt n (I# 2) of wild
False
(+) $fNumInt
((λ n
case (<) $fOrdInt n (I# 2) of wild
False
(+) $fNumInt
(fib ((-) $fNumInt n (I# 1)))
(fib ((-) $fNumInt n (I# 2)))
True I# 1)
((-) $fNumInt n (I# 1)))
((λ n
case (<) $fOrdInt n (I# 2) of wild
False
(+) $fNumInt
(fib ((-) $fNumInt n (I# 1)))
(fib ((-) $fNumInt n (I# 2)))
True I# 1)
((-) $fNumInt n (I# 2)))
True I# 1


Each inner instance of fib has been replaced with its definition, effectively unrolling fib. There are many more simplifications that can be done, such as β-reduction, but GHC is good at this. For now, we just want to test this version; so we type exit and GHC resumes compiling our program.


Now we time this new version. Again, taking the average over 100 tests using Criterion, we observe that the post-HERMIT fib 35 runs in about 76.7ms±3.6ms. We have modified GHC's internal representation of our program mid-compilation, improving it such that the resulting program runs measurably faster. Furthermore, we did not need to alter the original program to do so.


More Information


If you are interested in learning more about HERMIT, we've just submitted a paper to the Haskell Symposium, called Introducing the HERMIT Equational Reasoning Framework, written by Andrew Farmer, Ed Komp, Neil Sculthorpe, and myself. A first release of HERMIT will be available via hackage in the near future.


Andy Gill

Saturday, May 19, 2012

Monad Reification in Haskell and the Sunroof Javascript compiler

It is possible to reify a monad in Haskell. This took us by surprise; we did not think this was possible. Here is the story of how reification on a monad works.

Reification


Reification is the observation of underlying structure. One reason for reification is to cross-compile a Haskell expression to execute on a different platform. This has been done many times, and a great reference for this is Conal Elliott, Sigbjorn Finne and Oege de Moor's Compiling Embedded Languages

The trick is to build a structure when a direct interpretation would be typically used. A canonical example is arithmetic.
data Expr where
Lit :: Int -> Expr
Add :: Expr -> Expr -> Expr

instance Num Expr where
e1 + e2 = Add e1 e2
fromInteger i = Lit (fromInteger i)

Now we do things like
GHCi> 1 + 2
Add (Lit 1) (Lit 2)

and get the structure of the computation of the expression, not just the value. This is called a deeply embedded language; the structure is deeply embedded inside the expression.

From here we can use this structure to, among other things, compile and execute this expression in a different setting. In "Compiling Embedded Languages", the target was executing C graphics code.

Reification of Functions


It is possible to reify a function. The trick here is to provide a prototypical version of the input argument, and observe how it used in the result.
data Expr where
-- new constructor
Var :: String -> Expr
-- as before
Lit :: Int -> Expr
Add :: Expr -> Expr -> Expr

reifyFn :: (Expr -> Expr) -> Expr
reifyFn fn = fn (Var "x")

Now we can reify the function.
> let f x = x + 2 :: Expr
> reifyFn f
Add (Var "x") (Lit 2)

We use this trick extensively in Kansas Lava, and other DSLs.

Reification of Monads


Providing a deep embedding of a Monad is straightforward. Consider:
data M :: * -> * where
Return :: a -> M a
Bind :: M a -> (a -> M b) -> M b
GetChar :: M Char
PutChar :: Char -> M ()

The issue is how do we reify Bind? Or more specifically, how do we provide the prototypical variable of type 'a', to reify the 2nd argument of Bind? For a long time, I assumed this was not possible, without a post-hoc constraint on Bind for a type that provided the appropriate Var "..".

However, there is a way of doing this, by normalizing structure of the monad. This trick was introduced by Chuan-kai Lin in Unimo and is used by the Heinrich Apfelmus in his operational hackage package. We work with operational, because (1) Heinrich Apfelmus contacted us, pointing out that his library could simplify our ad-hoc unrolling mechanism, and (2) it was available on hackage.

operational uses the left identity and associativity monad rules to normalize a monadic program into a stream of primitive instructions terminated by a return.
Program ::= Primitive >>= Program
| return a

Using operational is easy, you define the primitive(s), and then you can view your program.
import Control.Monad.Operational

data MI :: * -> * where
GetChar :: MI Char
PutChar :: Char -> MI ()

-- bind and return provided by operational
type M a = Program MI a

compile :: M a -> ...
compile = eval . view
where
eval :: ProgramView MI a -> ...
eval (PutChar ch :>>= g) = ...
eval (GetChar :>>= g) = ...
eval (Return b) = ..

This effectively gives you a deep embedding. Neat stuff.

Sunroof: The Javascript Monad


Javascript implementations on all major browsers provide a powerful API for building interactive web pages. We want to use Javascript libraries, but program in Haskell, by using a Javascript monad.

A usable model can be built using a simple translation of a fixed set of function calls into Javascript commands. With careful construction, we can combine commands before sending them, optimizing network usage. The challenging part is having the Javascript return values in an efficient manner. Consider this Haskell code:
c <- getContext "my-canvas"
... some use of c ...

In a simple transaction model, getContext invokes a Javascript command on the client, returning the response as c. However, we would prefer the whole code fragment to be compiled to Javascript such that the binding and use of c are performed on the client directly, with no intermediate client<->server communication. And thanks to the ideas inside Unimo and operational we can!

We do this by constraining the returned values of all the primitives to be reifiable via a constraint on GADT constructors. In (the simplified version of) our Javascript compiler, Javascript function calls are implemented with the JS_Call primitive.
data JSInst :: * -> * where
JS_Call :: (Sunroof a) => String -> [JSValue] -> JSInst a
...

This is the key step, the Sunroof constraint provides the ability to generate a prototypical a. The Unimo trick works for constraint types as well as monomorphic types.

So, from our list of primitives, the operational package allows us to build our Javascript monad, with the monad instance for JSM is provided Program.
type JSM a = Program JSInst a

For technical reasons, Program is abstract in operational, so the library provides view to give a normalized form of the monadic code. In the case of JS_Call, bind corresponds to normal sequencing, where the result of the function call is assigned to a variable, whose name has already been passed to the rest of the computation for compilation. newVar, assignVar and showVar are provided by the Sunroof class.
compile :: Sunroof c => JSM c -> CompM String
compile = eval . view
where
showArgs :: [JSValue] -> String
showArgs = intercalate "," . map show
eval :: Sunroof b
=> ProgramView JSInst b -> CompM String
eval (JS_Call nm args :>>= g) = do
a <- newVar
code <- compile (g a)
return $ assignVar a ++ nm ++ "("
++ showArgs args ++ ");" ++ code
...
eval (Return b) = return $ showVar b

This critically depends on the type-checking extensions used for compiling GADTs, and scales to additional primitives, provided they are constrained on their polymorphic result, like JS_Call.

Using compile, we compile our Sunroof Javascript DSL to Javascript, and now a bind in Haskell results in a value binding in Javascript. A send command compiles the Javascript expressed in monadic form and sends it to the browser for execution.
send :: (Sunroof a) => JSM a -> IO a

The Javascript code then responds with the return value, which can be used as an argument to future calls to send.

We can write a trivial example which draws a circle that follows the mouse:
drawing_app :: Document -> IO ()
drawing_app doc = do
...
send doc $ loop $ do
event <- waitFor "mousemove"
let (x,y) = (event ! "x",event ! "y")
c <- getContext "my-canvas"
c <$> beginPath()
c <$> arc(x, y, 20, 0, 2 * pi, false)
c <$> fillStyle := "#8ED6FF"
c <$> fill()

The following code is generated by Sunroof (on the Haskell server) and then executed entirely on the client:
var loop0 = function(){
waitFor("mousemove",function(v1){
var v2=getContext("my-canvas");
(v2).beginPath();
(v2).arc(v1["x"],v1["y"],20,0,2*Math.PI,false);
(v2).fillStyle = "#8ED6FF";
(v2).fill();
loop0();
})
}; loop0();

Volia! A Haskell-based Javascript monad reified and transmitted to a browser.

Close


This blog article is adapted from the short paper Haskell DSLs for Interactive Web Services, submitted to XLDI 2012, written by Andrew Farmer and myself.

The lesson, I suppose, is never assume something is not possible in Haskell. We only stumbled onto this when we were experimenting with a variant of monadic bind with class constraints, and managed to remove the constraints. We've never seen an example of using operational or Unimo that constraints the primitives to be able to generate specifically a prototypical value, aka the function reification trick above. If anyone has seen this, please point it out, and we'll be happy to cite it.

We would like to thank Heinrich Apfelmus for pointing out that we could rework our compiler to use operational, and providing us with suitable template of its usage.

Let the reification of monads begin!

Andy Gill