Matlux

A journey deep into the the strata of software design.

Anamorphic Adventure in Clojure

Summary

Where is the unfold function in Clojure? What is an Anamorphism? What are the options currently available to implement and use Anamorphisms (a.k.a. the “unfold” function) in Clojure?

Introduction

I have been using Clojure for a while and I’m fond of Functional Programming constructs which invariably have had the ability to provide very powerful abstraction leading to profound questioning.

A few months back something strange happened to me. I was talking to a colleague who was showing off his implementation of a Functional Programming library in Java. Although I was amused to see at what length (reluctant) Java programmers are able to go, to morph Java into a pseudo functional language, I realised the implementation provided a standard higher order function which I never came across before in Clojure i.e. the unfold function (in Java).

It was a wake up call. I quickly started reading about rich concepts like Anamorphism, Catamorphism and Hylomorphism. I also wanted to go back to the root of it all: Haskell to see how all of this was implemented and used. I wanted to brush up my knowledge of FP theory in order to find out more about all those fundamental FP patterns which were not widely advertised in the Clojure community.

How to learn more about Anamorphism

My path was not linear. It made me stroll across a variety of instructive resources:

What is an anamorphism?

Theorical definition

According to Meijer an anamorphism h is defined below by wrapping the relevant ingredients between concave lenses:

h = [(g, p)]

Where

h ∈ B –> A* , p ∈ B –> bool and g ∈ B –> A || B

For ease of understanding, the above could be translated into Haskell as:

1
2
3
h :: B -> [A]
p :: B -> Bool
g :: B -> (A, B)

An anamorphism is more often called unfold. Let see how it was implemented in Haskell.

Haskell definition

1
2
3
4
5
6
7
8
-- type signature
unfoldr :: (b -> Maybe (a,b)) -> b -> [a]

-- implementation
unfoldr g b  =
  case g b of
   Just (a,new_b) -> a : unfoldr g new_b
   Nothing        -> []

In both above definitions unfold needs a function (called g of 1 arguments returning a pair) and a seed b. The function g is there to expend the seed into a collection or sequence. It is done step by step (calling g iteratively) returning a sequence value and a new seed on each step. However the Haskell implementation does not explicitly need a predicate p to stop the expension of the sequence. This is because the Haskell g function returns a Maybe (a,b) type. Maybe can return Nothing which becomes the terminating condition.

Anamorphism in Clojure

iterate function

The Clojure idiomatic answer to the use case that we are trying to solve with unfold are traditionally answered by using iterate or when finer control is necessary lazy-seq. lazy-seq is very low level, it requires the use of recursion. iterate is a higher level function more comparable to unfold and is (unsurprisingly) implemented using lazy-seq as a building block.

Here is the iterate signature (Haskell syntax):

1
iterate :: (a -> a) -> a -> [a]

The implementation is:

1
2
3
(defn iterate
  "Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects"
  [f x] (cons x (lazy-seq (iterate f (f x)))))

At first glance, iterate binds the concept of seed with the result in the resulting collection. The result is a collection of seeds which recursively gets transformed by the function g.

However, according to Meijer iterate can be written in term of an unfold.

  • Does it mean we can express any algorithm written with unfold with iterate? – Probably but it will need some level of wrapping around the call.
  • Could unfold be written in terms of iterate? – We will see if it’s possible further down.

Otherwise, has the Clojure community tried to crack this nut before?

On Clojure blogs, I found the following two implementations:

Clojure implementation 1: Clojure Unfold and Anamorphisms

1
2
3
4
5
6
7
8
9
(defn unfold1
  ([p f g seed tail-g]
   (lazy-seq
     (if (p seed)
       (tail-g seed)
       (cons (f seed)
             (unfold1 p f g (g seed) tail-g)))))
  ([p f g seed]
     (unfold1 p f g seed (fn [_] ()))))

Erico’s proposed implementation works well. However it diverges from the above two definitions in two ways:

  • it introduces additional functions like f and tail-g
  • g function is not following the signature defined by Meijer and Haskell. Instead its signature is g :: B -> B i.e. it takes a seed and returns a new seed. It needs f to extract the value to inject in the output collection from each seed.
  • finally is makes use of a predicate p which the Haskell implementation avoids.

This implementation in practice works as it can implement the same anamorphisms as the Haskell implementations (maybe more). However I personally find that the presence of 5 arguments is adding complexity. I’m not convinced 5 arguments are adding any meaningful feature or reducing the unfold concept to its simplest form (but I’m happy to be proven wrong in the comment section).

Clojure implementation 2: Unhygienic (“anaphoric”) Clojure macros for fun and profit

1
2
3
4
5
6
7
8
(defn unfold2 [next done? seed]
  ((fn unfold* [seed]
     (lazy-seq
      (when-not (done? seed)
      (let [[value new-seed] (next seed)]
        (cons value
              (unfold* new-seed))))))
   seed))

Amalloy’s proposed implementation is, in comparison, simpler and matching the signature of Meijer’s definition. However it’s not matching the simplicity of the Haskell definition because it needs the predicate p/done? (3 arguments instead of 2 arguments).

Can we do better? Can we match the Haskell implementation in Clojure?

Matching the Haskell implementation of unfold in Clojure

Can we also use iterate to write the function instead of relying on the low level lazy-seq?

The reason the Haskell implementation is so concise is because the Maybe type is available. Because a Maybe can return Nothing, there is a value returned by g that can be used to terminate the recursion.

Maybe has the following formal definition:

1
data Maybe a = Nothing | Just a

Maybe is a data type that can return a value (Just a) or Nothing. Clojure does not have a Maybe type as such but it has nil which is idiomatically used exactly like a Nothing in the case of a Maybe. We can use the nil as a terminating event.

Here it is:

1
2
3
4
5
(defn unfold [g seed]
  (->> (g seed)
       (iterate (comp g second))
       (take-while identity)
       (map first)))

(take-while identity) terminates the expension of the sequence when g returns nil.

unfold Implementation comparison

Let’s put the above implementation of unfold to the test. To compare them, we’re going to implement the same function which makes use of the unfold function in Haskell and each Clojure implementation to judge their relative merits.

The unfold test function specification

toBinary is a function which takes a number and returns a binary sequence representing the same number.

1
toBinary :: Integer -> [Integer]

Example:

1
2
toBinary 16
=> [1,0,0,0,0]

Haskell

1
2
3
4
5
toBinary x =  reverse ((unfoldr g) x)
  where
    g :: Integer -> Maybe (Integer,Integer)
    g 0 = Nothing
    g n = Just (n `mod` 2, n `div` 2)

Clojure 1

1
2
3
4
(defn to-binary [x]
  (let [g (fn [[a n]] [(mod n 2) (int (/ n 2))])
        p (fn [[a seed]] (= seed 0))]
    (reverse (next (unfold1 p first g [nil x] (fn [seed] (list (first seed))))))))

This one is the trickiest. It’s necessary to use a few tricks to make this work:

  • have to pass [nil x] instead of the x feels very wrong
  • the use of the (fn [seed] (list (first seed))) feels contrived
  • the use of the next to trim the nil at the end is also not great

Clojure 2

1
2
3
4
(defn to-binary [x]
  (let [g (fn [n] [(mod n 2) (int (/ n 2))])
        p (fn [seed] (= seed 0))]
    (reverse (unfold2 g p x))))

This one is OK. Nothing wrong, just 3 parameters instead of 2. Readability is good.

Clojure unfold with iterate

1
2
3
4
5
(defn to-binary [x]
  (let [g (fn [n]
            (when-not (zero? n)
              [(mod n 2) (int (/ n 2))]))]
    (reverse (unfold g x))))

Succinct and matching Haskell’s implementation. This is our preferred implementation but it’s debatable which between clojure2 and this example, is the best.

Thanks

Thanks to Nathan Matthews for helping to write the unfold function in Clojure using iterate.