Translating Common Lisp to Haskell—a case study
- The Common Lisp program: an assembler for FRACTRAN
- An awkward step
- A macro is a tagless final encoding!
- Going beyond: a pretty printer
- Conclusion
Lisp programs have their own particular style, often involving mutable state, macros, meta-programming and more, which is to be expected of such a flexible language. Haskell, on the other hand, seems to be a different world altogether, encapsulating effects with monads, a static type system, and occasionally, use of Template Haskell.
So how difficult would it be to translate a Common Lisp program to Haskell, in a way that makes the translated code seem idiomatic? The answer is through a careful choice of what Haskell features to use, in this case, monad transformers, but also a lesser-known technique—the tagless final style.
Get the full code here.
The Common Lisp program: an assembler for FRACTRAN
A couple of years ago, malisper wrote a blog post on writing an assembler for the esoteric programming language FRACTRAN in Common Lisp. It’s quite a nice display of the power of Common Lisp, particularly the macro system. We’re going to display a Common Lisp code block followed by its translated Haskell variant, when possible.
(defparameter *cur-inst-prime* nil)
(defparameter *next-inst-prime* nil)
(defparameter *lisptran-labels* nil)
(defparameter *lisptran-vars* nil)
(defparameter *next-new-prime* nil)
Unsurprisingly, this global state is encapsulated with the state
monad. ExceptT
appears too because we want to be able to throw an
error when a label is not found in the map. We’re also using the
Math.NumberTheory.Primes
package to generate the infinite stream of
primes for us.
import qualified Data.Map as M
import qualified Math.NumberTheory.Primes as P
data CompState =
CompState
{ currInstPrime, nextInstPrime :: Integer
, labels, vars :: M.Map String Integer
, primes :: [P.Prime Integer]
, gensymCount :: Integer
}
deriving (Show)
newtype Comp a =
Comp { runComp :: ExceptT String (State CompState) a }
deriving ( Functor, Applicative, Monad,
, MonadState CompState, MonadError String)
new-prime
generates a fresh prime and places it in
*next-new-prime*
. In our case, we have an infinite list of primes,
so newPrime
should just advance the list and return the old head.
(defun new-prime ()
"Returns a new prime we haven't used yet."
(prog1 *next-new-prime*
(setf *next-new-prime*
(loop for i from (+ *next-new-prime* 1)
if (prime i)
return i))))
newPrime :: Comp Integer
newPrime = do
l <- gets primes
modify (\s -> s {primes = tail l})
return (P.unPrime (head l))
advance
is a sequence of assignments, so the translation is
straightforward.
(defun advance ()
(setf *cur-inst-prime* *next-inst-prime*
*next-inst-prime* (new-prime)))
advance :: Comp ()
advance = do
c <- gets nextInstPrime
p <- newPrime
modify (\s -> s {currInstPrime = c, nextInstPrime = p})
prime-for-label
looks up a label and returns its value if found, and
inserts it otherwise. prime-for-var
is defined similarly.
(defun prime-for-label (label)
(or (gethash label *lisptran-labels*)
(setf (gethash label *lisptran-labels*)
(new-prime))))
primeForLabel :: String -> Comp Integer
primeForLabel label = do
labels <- gets labels
case M.lookup label labels of
Just p -> return p
Nothing -> do
p <- newPrime
modify (\s -> s {labels = M.insert label p labels})
return p
An awkward step
Now we run into a little bit of an issue;
(defmacro deftran (name args &body body)
"Define a Lisptran macro."
(setf (gethash ',name *lisptran-macros*)
(lambda ,args ,@body)))
We don’t have macros in Haskell! This is where our translation starts
to diverge. In such a case, it is useful to read the rest of the Lisp
code and see the larger structures at play, in this case, how the
deftran
macro is used, for instance, in the definitions of addi
,
subi
and >=i
.
(deftran addi (x y)
(prog1 (list (/ (* *next* (expt (prime-for-var x) y))
*cur*))
(advance)))
(deftran subi (x y) ((addi x ,(- y))))
(deftran >=i (var val label)
(prog1 (let ((restore (new-prime)))
(list (/ restore
(expt (prime-for-var var) val)
*cur-inst-prime*)
(/ (* (prime-for-label label)
(expt (prime-for-var var) val))
restore)
(/ *next-inst-prime* *cur-inst-prime*)))
(advance)))
It would seem that we are stuck. We could generate lists of
Rationals
, but the use of advance
forces us to use the State
monad. Furthermore, in subi
, it calls addi
!
One approach would be to express the instructions as a data type;
type Var = String
type Label = String
data Instr = Addi Var Int
| Subi Var Int
| Jge Var Int Label
...
But we lose a critical feature of macros, that they can be used in
other macros, such as subi
calling addi
, and when we add a new
instruction, we have to go through the entire codebase to handle the
extra case, this is the infamous expression problem. Fortunately,
much work has been carried out in attempting to resolve this, with one
promising approach being the tagless final approach. That is, can
express we addi
, subi
and more using a typeclass, rather than a
data declaration? The answer is a resounding yes.
A macro is a tagless final encoding!
class MonadState repr => FracComp repr where
lit :: Integer -> repr [Rational]
label :: String -> repr [Rational]
addi :: String -> Integer -> repr [Rational]
jge :: String -> Integer -> String -> repr [Rational]
gensym :: repr String
subi :: String -> Integer -> repr [Rational]
subi x y = addi x (-y)
Now the definition of subi
looks just like the Lisp one! What’s
going on in this typeclass is that repr
is a higher-kinded type,
repr :: * -> *
. The FracComp
typeclass has a constraint, repr
has to support being a State monad, because we will need a notion of
sequencing label generation to assemble programs correctly.
This extends naturally to deftran
definitions that have side
effects, for instance, gensym
in goto
.
(deftran goto (label) `((>=i ,(gensym) 0 ,label)))
goto :: FracComp repr => String -> repr [Rational]
goto dest = do
g <- gensym
jge g 0 dest
That’s neat, but now we only have a typeclass, we need to actually
instantiate it. Indeed, Comp
can be made an instance of FracComp
.
instance FracComp Comp where
addi x 0 = primeForVar x $> []
addi x y = do
g <- (^ abs y) <$> primeForVar x
f <-
if y < 0
then (%) <$> gets nextInstPrime <*> ((* g) <$> gets currInstPrime)
else (%) <$> ((* g) <$> gets nextInstPrime) <*> gets currInstPrime
advance
return [f]
gensym = newsym
newsym = do
n <- gets gensymCount
modify (\s -> s {gensymCount = n + 1})
return ('t' : show n)
There’s a little bit of a hiccup when y
is negative, because raising
to a negative exponent raises an error. Otherwise, the code is
remarkably close to Lisp.
Now we need to actually assemble a program. assemble
initializes
the state to the initial state.
(defun assemble (insts)
"Compile the given Lisptran program into Fractran.
Returns two values. The first is the Fractran program
and the second is the alphabet of the program."
(let* ((*cur-prime* 2)
(*cur-inst-prime* (new-prime))
(*next-inst-prime* (new-prime))
(*lisptran-labels* (make-hash-table))
(*lisptran-vars* (make-hash-table)))
(values (assemble-helper insts)
(alphabet *lisptran-vars*))))
initState =
let (c:n:p) = P.primes
in (CompState
{ currInstPrime = P.unPrime c
, nextInstPrime = P.unPrime n
, primes = p
, labels = mempty
, vars = mempty
, gensymCount = 0
})
run a = a & runComp & runExceptT & (`evalState` initState)
Now, we want to run the assembler. Something like this;
λ> [addi "x" 3] :: FracComp repr => [repr [Rational]]
λ> assemble [addi "x" 3] :: FracComp f => f [Rational]
So, assemble
should have the following type:
assemble :: FracComp repr => [repr [Rational]] -> repr [Rational]
We can calculate it as follows;
λ> :t [addi "x" 3]
it :: FracComp repr => [repr [Rational]]
λ> :t sequence [addi "x" 3]
it :: FracComp m => m [[Rational]]
λ> :t concat <$> sequence [addi "x" 3]
it :: FracComp f => f [Rational]
And for kicks, we can generalize concat
to join
, yielding our final
result.
assemble :: (Traversable m, Monad m, Monad f) => m (f (m a)) -> f (m a)
assemble l = join <$> sequence l
λ> run (assemble [addi "x" 3])
Right [375 % 2]
The genius of the tagless final approach is that it lets us define new data variants, in this case, new modular pieces of FRACTRAN code.
Some examples;
(deftran while (test &rest body)
(let ((gstart (gensym))
(gend (gensym)))
`((goto ,gend)
,gstart
,@body
,gend
(,@test ,gstart))))
(deftran zero (var)
`((while (>=i ,var 1)
(subi ,var 1))))
(deftran move (to from)
(let ((gtemp (gevnsym)))
`((zero ,to)
(while (>=i ,from 1)
(addi ,gtemp 1)
(subi ,from 1))
(while (>=i ,gtemp 1)
(addi ,to 1)
(addi ,from 1)
(subi ,gtemp 1)))))
while test body = do
gstart <- gensym
gend <- gensym
assemble
(concat [[goto gend,
label gstart],
body,
[label gend, test gstart]])
zero var = while (jge var 1) [subi var 1]
move to from = do
gtemp <- gensym
assemble
[ zero to
, while (jge from 1)
[addi gtemp 1, subi from 1]
, while (jge gtemp 1)
[addi to 1, addi from 1, subi gtemp 1]
]
adds a b = do
gtemp <- gensym
assemble
[ while (jge b 1) [addi gtemp 1, subi b 1]
, while (jge gtemp 1) [addi a 1, addi b 1, subi gtemp 1]
]
With the tagless final embedding, we can write Haskell functions that
generate FRACTRAN programs as easily as we construct values. For
instance, a function that takes an integer n
and returns a FRACTRAN
program that computes the sum of the numbers from 1 to n
inclusive.
sumTo :: FracComp repr => Integer -> [repr [Rational]]
sumTo n = [ addi "c" 0
, addi "n" n
, while (jge "n" 0)
[adds "c" "n", subi "n" 1]]
Now let’s see the assembler in action!
λ> runAssembler (sumTo 10)
Right [847425747 % 2,13 % 3,19 % 13,11 % 3,11 % 29,31 % 11,41 % 31,
23 % 11,23 % 47,2279 % 23,59 % 301,59 % 41,67 % 413,329 % 67,61 % 59,
73 % 61,83 % 73,71 % 61,71 % 97,445 % 71,707 % 89,103 % 5353,
103 % 83,109 % 5459,5141 % 109,107 % 103,113 % 749,113 % 19,
131 % 113,29 % 131,127 % 113]
Going beyond: a pretty printer
We’re done. Let’s see what directions we can take our newly
translated FRACTRAN assembler. Since we used the tagless final
approach, we can do cool things such as interpreting the values under
a different semantic domain. In other words, a fully assembled and
final (pun intended) program FracComp f => f [Rational]
has a
concrete type that depends on the appropriate choice of f
, which in
turn depends on the call site! In particular, we can let f
be the
newtype S
, defined as
newtype S a = S { unS :: StateT Int (Writer [Doc]) a }
deriving (Functor, Applicative, Monad,
MonadWriter [Doc], MonadState Int)
And write the FracComp
instance for S
.
instance FracComp S where
lit i = tell [text (show i)] $> []
label l = tell ["label" <+> text l] $> []
addi l x = tell ["addi" <+> text l <+> text (show x)] $> []
jge l x dest = tell ["jge" <+> text l <+> text (show x) <+> text dest] $> []
gensym = gets (('g' :) . show) <* modify (+ 1)
pretty :: Traversable t => t (S a) -> Doc
pretty x = x
& fmap unS
& sequence
& (`evalStateT` 0)
& execWriter
& vcat
pretty
works by unwrapping the t (S a)
to a stateful writer, then
handling the state and writing.
-- Traversable t
pretty x = x :: t (S a)
& fmap unS :: t (StateT Int (Writer [Doc]) a)
& sequence :: StateT Int (Writer [Doc]) (t a)
& (`evalStateT` 0) :: Writer [Doc] (t a)
& execWriter :: [Doc]
& vcat :: Doc
λ> pretty (sumTo 10)
addi n 10
jge g2 0 g1
label g0
jge g6 0 g5
label g4
addi g3 1
addi n -1
label g5
jge n 1 g4
jge g9 0 g8
label g7
addi c 1
addi n 1
addi g3 -1
label g8
jge g3 1 g7
addi n -1
label g1
jge n 0 g0
But we have just defined the pretty printers for the basic opcodes,
let’s also write specialized printers for the high-level constructs
like while
. Once again, tagless final helps us achieve this.
instance FracComp S where
lit i = tell [text (show i)] $> []
label l = tell ["label" <+> text l] $> []
addi l x = tell [text l <+> "+=" <+> text (show x)] $> []
jge l x dest = tell [text l <+> ">=" <+> (text (show x) <+> text dest)] $> []
gensym = gets (('g' :) . show) <* modify (+ 1)
--------------------------------------------------------------
jle l x dest = tell [text l <+> "<=" <+> (text (show x) <+> text dest)] $> []
adds l x = tell [text l <+> "+=" <+> text x] $> []
subi l x = tell [text l <+> "-=" <+> text (show x)] $> []
goto l = tell ["goto" <+> text l] $> []
while test body = do
censor ((\x -> "while " <> x <> "{") <$>) (test "")
censor (nest 2 <$>) (sequence body)
tell ["}"]
return []
As a result, we can now output FRACTRAN programs in a language resembling C.
λ> pretty (sumTo 10)
c += 0
n += 10
while n >= 0 {
c += n
n -= 1
}
Conclusion
Porting code can be challenging, as there are multiple facets to
consider, for instance, what if the target language lacked a feature
of the source language? Keeping it idiomatic across paradigms adds
additional challenges. In this translation, some Lisp functions were
omitted entirely, either because they were not needed or did not fit
with the model (for instance, the assemble-helper
function).
Nevertheless, code translation is a (in my opinion) good way to deepen
understanding and practice.
Get the full code here.