Advent of code, or how I learned to love refactoring.

I have been toying around with Clojure for five or six years now and whilst I really enjoy the way it allows me to think and to solve problems I find it difficult to come up with small but fun projects to work on to help me learn the language. I have spent many happy hours on sites such as the venerable project euler and more recently 4clojure and learned plenty but I don’t think either is as good a resource for problems which help you become a better coder as advent of code does because neither encourages as much refactoring.

OK, ok I've only done five so far..

Advent of code Christmas tree

The site has 25 days of questions and each one follows a similar pattern: (disclaimer: I haven’t completed all of them yet or even looked at them all so for all I know some of the later ones break the mould but I hope not).

  1. Explain some rules which typically will require you to parse text to numbers
  2. Provide some short examples which could be used as unit tests
  3. Provide access to a unique set of input to your problem

I will now show you as an example how the first problem helped me to write better code. (I have slightly exaggerated how bad some of my initial solutions were to highlight the learning and have given you full access to my inner monologue.)

Day 1

I’ve read the rubric and it is clear that I am going to need a function which converts a brace to a plus or minus one. In none of the examples do I see anything other than a ‘(‘ or a ‘)’ so I assume I can get away with the following:

(defn brace->movement
  [brace]
  (if (= "(" brace)
    1
    -1))

The first functional programming I did was some ML 16 or 17 years ago so I leap to recursion to solve the rest of the problem.

(defn calculate-floor
  [braces floor]
  (if (empty? braces)
    floor
    (let [h (first braces)
      m (brace->movement h)
      f (+ floor m)]
      (calculate-floor (rest braces) f))))

I start testing …

user> (calculate-floor "(())" 0)
-4
user> (calculate-floor "()()" 0)
-4

well this is not going well, both of these should have resulted in ‘0’. It looks like something must be going wrong with my brace->movement parsing function so I will just test that at the REPL. (I guess I should have done that first huh?)

user> (brace->movement "(")
1
user> (brace->movement ")")
-1

Hmmm.. nope, not that. Everything working perfectly. OK something other than ‘(‘ or ‘)’ must be being passed to that function. It must be first that is the problem:

user> (first "(())")
(

Ah ha! Of course a string is a sequence of characters and if I call first on a sequence of characters it must give me a character. So now do I go for a function that parses a character such as:

(defn brace->movement'
  [brace]
  (if (= ( brace)
    1
    -1))

or one leave it untouched and convert the calling function to pass a string:

(defn calculate-floor'
  [braces floor]
  (if (empty? braces)
    floor
    (let [h (str (first braces))
          m (brace->movement h)
          f (+ floor m)]
      (calculate-floor' (rest braces) f))))

I prefer the second alternative as I have already proved to myself that the parse function works correctly when passed a string.

back to testing…

user> (calculate-floor' "(())" 0)
0
user> (calculate-floor' "()()" 0)
0

Woo hoo! And not only that, it passes all the test cases. The actual problem is a fairly long string which I don’t really want to copy and paste in to the REPL so I create a helper function to load a string from a file by adapting some code I saw in a different project written by a friend of mine (this code “just worked” and allowed me to focus on the problem that I actually wanted to solve so I didn’t question it then and I am not going to go into it now):

(ns advent.core
  (:require [clojure.java.io :as io]))

(defn read-input-string
  "returns the single string read from the file"
  [file]
  (with-open [rdr (io/reader file)]
    (first (line-seq rdr))))

I test it

user> (require '[advent.core :refer [read-input-string]])
nil
user> (def data (read-input-string "data/day-1"))
#'user/data
user> (take 10 data)
(( ) ( ) ( ( ) ( ) ()

All looks good so I can wrap up my functions to answer day 1.

(ns advent.day-1
  (:require [advent.core :refer [read-input-string]]))

(defn day-1
  []
  (let [braces (read-input-string "data/day-1")]
    (calculate-floor' braces 0)))

user> (day-1)
java.lang.StackOverflowError: null...

Oh dear oh dear. I tried only using half of the input and I do get an answer. At first I think maybe I should divide and conquer – split the input string into a list of lists each of which is short enough to not cause a stack overflow error then combine the answers (I have heard of map reduce you see). But come on. This really isn’t a lot of input; 7000 characters. Map reduce is for big data sets and yes it might work for me here but seems like overkill and I really only want exactly the right amount of kill.

OK, maybe recursion isn’t the best way to go (NOTE: there is no problem with recursion to solve this problem and one way was suggested to me later with I go through at the end of this piece). So what other choices do I have? Well actually I start off with a list of braces I want to convert them to a list of plus or minus ones then sum them. I know that I can sum a sequence of numbers using (reduce + numbers). And actually I can create the list of numbers by using map.

(defn calculate-floor''
  [braces]
  (let [numbers (map brace->movement braces)]
    (reduce + numbers)))

Run the test cases again…

user> (calculate-floor'' "(())")
-4

Drat! I forgot the character to string conversion. But, if I supply a sequence of strings to the function it should be fine, which I can do by calling map str on a string:

user> (map str "(())")
("(" "(" ")" ")")
user> (calculate-floor'' (map str "(())"))
0

And it passes all the rest of the tests too. Good news so I am happy with this function, now to modify the calling function to pass the right thing.

(defn day-1'
  []
  (let [braces (read-input-string "data/day-1")
        list   (map str braces)]
    (calculate-floor'' list)))

That gives me an answer (280 if you are interested) which I submit and find I am correct! Woohoo! Now I am given access to the extension problem. Before I go on to that let me just give the a tidied up and complete version of all the code so far.

(ns advent.core
  (:require [clojure.java.io :as io]))

(defn read-input-string
  "returns the single string read from the file"
  [file]
  (with-open [rdr (io/reader file)]
    (first (line-seq rdr))))
(ns advent.day-1
  (:require [advent.core :refer [read-input-string]]))

(defn brace->movement
  [brace]
  (if (= "(" brace)
    1
    -1))

(defn calculate-floor
  [braces]
  (let [numbers (map brace->movement braces)]
    (reduce + numbers)))

(defn day-1
  []
  (let [braces (read-input-string "data/day-1")
        list (map str braces)]
    (calculate-floor list)))

That looks pretty short, but I don’t think that some of the intermediate assignments make much sense. The body of calculate-floor could actually be a one line without any loss of clarity:

(reduce + (map brace->movement braces))

and why introduce list in day-1? I know I need a list of strings as input so lets do the map first thing. The body would now be:

(let [braces (map str (read-input-string "data/day-1"))]
  (calculate-floor braces))

So now the cleaned up version is:

(ns advent.day-1
  (:require [advent.core :refer [read-input-string]]))

(defn brace->movement
  [brace]
  (if (= "(" brace)
    1
    -1))

(defn calculate-floor
  [braces]
  (reduce + (map brace->movement braces)))

(defn day-1
  []
  (let [braces (map str (read-input-string "data/day-1"))]
    (calculate-floor braces)))

calculate-floor looks a bit ridiculous now – is it really necessay to define a function for that one liner? I don’t think so as the only function which is not part of the language itself is brace->movement and that has already been tested so I refactor the code again to end up with:

(ns advent.day-1
  (:require [advent.core :refer [read-input-string]]))
(defn brace->movement
  [brace]
  (if (= "(" brace)
    1
    -1))

(defn day-1
  []
  (let [braces (map str (read-input-string "data/day-1"))]
    (reduce + (map brace->movement braces))))

I’m pretty happy with that, it is succinct and clear (in my estimation) and still gives the correct answer. If I had a gripe it would be that the brace->movement function is not as tightly defined as it could be; it does pass the unit tests and does its job well enough but maybe the extension introduces some new kind of brace. I decide it is worth another refactor:

(defn brace->movement
  [brace]
  (cond
    (= "(" brace) 1
    (= ")" brace) -1))

Much more satisfactory – the function does exactly what it needs to and doesn’t understand anthing else which means I should be able to pick up if there is input other than ‘(‘ or ‘)’ more quickly.

The extension

And so on to the extension. Again, these tend to follow a fairly standard pattern:

  1. Add some further constraints or stopping conditions
  2. Provide some examples/test cases for the extension
  3. Calcuate the answer using the same input set as the original problem.

For the day 1 extension I need to know the position of the brace which first makes the sum negative. First of all I notice that my refactored answer is far less testable than it used to be – I have only a single function which does everything and it accepts no input. Being able to test and refactor each method at the REPL individually whilst I was building up the solution was invaluable so I set up a stub function which will do the work for the extension and hook it into the existing functionality in a way that will allow me to test the extension:

(defn extension
  "given a sequence of plus and minus ones will return the first position for which the sum becomes negative"
  [movements])

(defn day-1
  []
  (let [braces (map str (read-input-string "data/day-1"))
        movements (map brace->movement braces)
        ans (reduce + movements)
        ext (extension movements)]
    (println "answer: " ans " extension: " ext)))

First of all lets create a function to convert the sequence of movements to a sequence of maps where each map contains the original movement and the index; I’ve come across map-indexed and think that is pretty much exactly what I need:

(defn add-indices
  [movements]
  (map-indexed (fn [idx mvmt] {:idx (inc idx) :mvmt mvmt}) movements))

I have now learnt my lesson and know I should test this whilst it is fresh in my mind, so here goes (though I called the argument “movements” it can be anything which makes testing nice and easy):

user> (add-indices [:a :b :c])
({:idx 1, :mvmt :a} {:idx 2, :mvmt :b} {:idx 3, :mvmt :c})

Recursion didn’t work so well for the original problem but that was because the sequence was too long and I needed to process every value in it. In this case I expressly don’t want to process every value but stop when a certain criteria has been hit so I don’t think reduce is going to help me much and I go back to recursion. A helper function extension to do the work:

(defn extension
  [movements sum]
  (let [h (first movements)
        s (+ sum (:mvmt h))]
    (if (< s 0)
      (:idx h)
      (extension (rest movements) s))))

And I can now test it:

user> (extension (add-indices [-1]) 0)
1
user> (extension (add-indices [1 -1 1 -1 -1]) 0)
5

Seems to be working on the test cases so I jump in with both feet:

user> (def braces (map str (read-input-string "data/day-1")))
#'user/braces
user> (def movements (map brace->movement braces))
#'user/movements
user> (def indexed (add-indices movements))
#'user/indexed
user> (extension indexed 0)
1797

I submit this answer and presto! I am right. But I am not satisfied; I have a solution but I don’t have a good one. What if the answer was actually 7000? I would not have been able to find that instead I would have a stack overflow error again. In fact for all I know if it had been the 3501st value that pushed Santa over the edge into the basement that would have caused a stack overflow. To prove that I have a poor solution I devise a stress test:

user> (def stress-input (flatten [(repeat 3500 "(") (repeat 3500 ")") braces]))
#'user/stress-input
user> (def movements' (map brace->movement stress-input))
#'user/movements'
user> (def indexed (add-indices movements))
#'user/indexed'
user> (extension indexed' 0)
stack overflow error

expecting 8797

Recursion seems to be the right answer, but how to do that in a scalable way? Well, I have heard of loop .. recur as a Clojurey thing so off I go to investigate that:

(defn extension
  [movements]
  (loop [indexed (add-indices movements)
         floor   0]
    (let [n          (first indexed)
          next-floor (+ floor (:mvmt n))]
      (if (< next-floor 0)
        (:idx n)
        (recur (rest indexed) next-floor)))))

user> (extension movements)
1797
user> (extension movements')
8797

And the complete solution to day 1 is now:

(ns advent.day-1
  (:require [advent.core :refer [read-input-string]]))

(defn brace->movement
  [brace]
  (cond
    (= "(" brace) 1
    (= ")" brace) -1))

(defn add-indices
  [movements]
  (map-indexed (fn [idx mvmt] {:idx (inc idx) :mvmt mvmt}) movements))

(defn extension
  [movements]
  (loop [indexed (add-indices movements)
         floor   0]
    (let [n          (first indexed)
          next-floor (+ floor (:mvmt n))]
      (if (< next-floor 0)
        (:idx n)
        (recur (rest indexed) next-floor)))))

(defn answer
  []
  (let [braces    (map str (read-input-string "data/day-1"))
        movements (map brace->movement braces)
        ans       (reduce + movements)
        ext       (extension movements)]
    (println "answer: " ans " extension: " ext)))

And that is a solution I am happy enough to leave; it has passed all the required tests and the extra stress tests I imposed. I could perhaps spend a little while coming up with better names for functions or variables but they are clear enough for me and even though I am writing this post several months after implementing the code I found the code easy enough to understand which is usually a good sign.

Conclusions

So why is it that I think advent of code is particularly good as a source of interesting problems to help people learn to code or improve how they code? (And though I have done this one in Clojure I think that many of the points are equally applicable to any language.)

In actual fact when I first solved this problem I didn’t do nearly as much refactoring as I have shown here, but when I found that each of the next few problems followed a similar pattern it encouraged me to approach them as I have shown above. I found myself writing functions which had a single responsibility and were designed to be testable and reusable. By problem three I found myself automatically assessing a solution and a function for its flexibility; for instance what if Santa had not started on the ground floor? How much code would I have to change to make that extension? How much code could I reuse from the initial solution? I also love the fact that the naïve solution (in this case recursion) will only get you so far which forced me to find out more about the language. What else have I learnt? Perhaps that you don’t always need to define a function; once I was more familiar with Clojure it seemed silly that I ever felt the need to define the original calculate-floors function and once it had been refactored to a reduce it was even more obvious to me that it didn’t need to be a named function. So why would I define a named function now? If it involved more than one line I think I would. If I wanted to test how it behaved I would. And when wouldn’t I? If it was something trivially testable at the REPL – for example (reduce + ...) or (map str ...) and I expect that the more familiar I become with the language the more likely I am to consider something trivially testable at the REPL.

Review

At Metail we like code review – so much in fact that we have double pass reviews across the board. That even goes for blog posts. We know it doesn’t guarantee perfection, but we know it does catch a lot of mistakes. Small things like typos are simply fixed by the editor/reviewer. The reviewer for this piece went beyond that by checking the implementation of the code too and has made a suggestion for an improvement. Part of the process of being open to refactoring (and code improvements) is accepting that you aren’t perfect and you didn’t get it right first time. So in this spirit rather than change the code throughout to the improved version I think I should give it as a final improvement at the end.

It is more idiomatic to use destructuring than first and rest so the extension method *could* be rewritten as:

(defn extension
  [movements]
  (loop [[head & tail] (add-indices movements)
         floor   0]
    (let [next-floor (+ floor (:mvmt head))]
      (if (< next-floor 0)
        (:idx head)
        (recur tail next-floor)))))

It was also mentioned that list was a really poor name for a variable in Clojure as it is the name of a function – but as I fixed that one myself by refactoring I wanted to leave it it.

Review 2

How fitting that we should come back to recursion. I honestly felt quite proud that after all this time I remembered my first computer science class when an enthusiastic lecturer demonstrated that tail recursion rapidly caused stack overflow issues (or whatever they are in ML) and then introduced an accumulator as an argument to the function and Lo! no more problems. So my first solution harked back to that – obviously I must have missed something because my solution had an argument I thought was an accumulator but still blew up.

Handily Clojure thinks of people like me and saves them from themselves by providing a way to do recursion without having to remember any computer science (or using loop). So I could have rewritten my original calculation as:

(defn calculate-floor
  [braces floor]
  (if (empty? braces)
    floor
    (let [f (first braces)
          r (rest braces)
          m (brace->movement f)]
      (recur r (+ floor m)))))

However the review and I both agreed that the version using map which I found myself was better than this implementation. They also suggested further changes to the brace->movement function – using case rather than cond and went on to say that if this were a true code review reduce and reduced would be brought up as interesting corners of the language to explore. It was then that I backed away.

Thanks to the reviewers for the opportunity to learn 🙂

Now this post is written I can get back to the next problem and I very much hope that there will be another set come advent 2016.

Be in the FROW

Receive the latest fashion trends, new collections and perks

You may also like

Meetup Preview: Onyx - distributed computation for the cloud
10/Mar/2016

Metail is hosting the next meetup of Cambridge NonDysfunctional Programmers next Thursday, 17th March. This month we’ll be taking a...

Read More
Think Stats in Clojure Part III: Exploring Data
29/Feb/2016

This is the third instalment of our Think Stats study group; we are working through Allen Downey’s Think Stats, implementing...

Read More
The A-Z of A/B testing
15/Feb/2016

Dr Shrividya Ravi spoke about the statistics of A/B testing at the Data Insights Cambridge meetup. It’s now live on...

Read More