Dynamic Languages and Types

2016-01-05

clojure

Many proponents of static typing, including many of the commenters who responded to this post, strongly believe that its presence in a language directly translates into correctness, even though there is very little empirical evidence to support this assertion. More importantly, the assumption is that the safety guarantees provided by static typing can’t be provided via any other mechanism.

To me, and many others who work with statically-typed languages, the biggest benefit of types is auto documentation.

Here’s an example. Let’s suppose the developer is encountering this code, in a generic statically-typed language, for the first time.

Account generateAccount(Event e)

It’s not clear what what generateAccount actually does, but we know it takes an Event and returns an Account. In an IDE, it’s easy to click onto Event or Account and see what values they describe to get a sense of what’s happening. If Account and Event are immutable, we have a pretty good idea of what to expect. The second guarantee is that generateAccount can’t take Potato and return Tomato where Potato and Tomato have no relation to Event and Account. That’s simply not allowed. We pass in Event and get back an Account, and that’s it. Sure it might throw an exception, but the approximate understanding of the function exists, and we can move on to other code.

In a dynamic language, the same function might look something like this:

def generateAccount(e)

It’s not at all obvious what ‘e’ is and what generateAccount returns, and we’ll need to get into the internals of the function to figure it out. That’s a lot more effort than simply reading the types by sight in the previous example. And what if there are 30 of these functions? And if we do finally figure out what it is generateAccount does and comment above generateAccount describing the types informally, we have to hope that no one changes the comments. All of this function “metadata” is maintained by the compiler in the statically defined generateAccount.

Every developer that I’ve talked to who favors static typing agrees this is the crux of the issue. It’s not monads, or linear types or even type inference that people want. What’s essential is being able to look at a function and figure out the inputs and outputs to get an approximate understanding without having to dig into the internals.

I believe this situation causes some people to prefer statically-typed languages, because the dynamic languages they are used to are not powerful enough, and don’t have maintainers who were sufficiently thoughtful to give this feature out of the box. My two favorite languages Clojure and Racket, both solve this problem in a powerful and compelling way. I am going to show off a little bit of Clojure’s way of addressing this problem. Luckily for me someone thought of a decent example already. Here’s the challenge recreated here:

The JSON data is in the following format

{
    "Genesis": {
        "1": {
            "1": "In the beginning..." ,
            "2": "..."
        },
        "2": { ... }
    },
    "Exodus": { ... },
    ...
}

In JSON, keys aren’t ordered and must be strings.

The goal is to parse the JSON file, order everything correctly (which means parsing the keys into integers), and produce an array that looks like this:

[
    ("Genesis", 1, 1, "In the beginning..")
    ("Genesis", 1, 2, "")
]

The data structure can be a tuple or some new type.

Here’s the Clojure code to do the challenge:

(def book-order ["Genesis", "Exodus", "Leviticus", "Numbers", "Deuteronomy",
    "Joshua", "Judges", "Ruth", "1 Samuel", "2 Samuel", "1 Kings", "2 Kings",
    "1 Chronicles", "2 Chronicles", "Ezra", "Nehemiah", "Esther", "Job",
    "Psalms", "Proverbs", "Ecclesiastes", "Song of Solomon", "Isaiah",
    "Jeremiah", "Lamentations", "Ezekiel", "Daniel", "Hosea", "Joel", "Amos",
    "Obadiah", "Jonah", "Micah", "Nahum", "Habakkuk", "Zephaniah", "Haggai",
    "Zechariah", "Malachi", "Matthew", "Mark", "Luke", "John", "Acts",
    "Romans", "1 Corinthians", "2 Corinthians", "Galatians", "Ephesians",
    "Philippians", "Colossians", "1 Thessalonians", "2 Thessalonians",
    "1 Timothy", "2 Timothy", "Titus", "Philemon", "Hebrews", "James",
    "1 Peter", "2 Peter", "1 John", "2 John", "3 John", "Jude", "Revelation"])
 
(def numeric-str (s/and string? st/numeric? (s/conformer #(Integer/parseInt %)) pos-int?))
(s/def ::bible   (s/map-of ::book ::chapter))
(s/def ::book    (set book-order))
(s/def ::chapter (s/map-of numeric-str ::verse :conform-keys true))
(s/def ::verse   (s/map-of numeric-str (s/and string? not-empty) :conform-keys true))
(s/def ::sorted-bible (s/coll-of (s/tuple ::book pos-int? pos-int? string?)))
 
(defn get-books [bible]
  (map (fn [book] [book, (bible book)]) book-order))
 
(defn get-chapters [books]
  (mapcat (fn [[book chapter]]
            (let [i (into (sorted-map) chapter)]
              (map (fn [[n c]] [book, n, c]) i)))
          books))
 
(defn get-verses [verses]
  (mapcat (fn [[b c d]]
            (let [i (into (sorted-map) d)]
              (map (fn [[v t]] [b, c, v, t]) i))) verses))
 
(defn parse-bible []
  (let [parse (comp #(s/conform ::sorted-bible %) get-verses get-chapters get-books)
        bible (->> "./resources/ESV.json" slurp parse-string (s/conform ::bible))]
    (if (= ::s/invalid bible)
      (throw (ex-info "not valid bible data" (s/explain-data bible)))
      (parse bible))))

This code is pretty straightforward, but I do want to call attention to the s/def spec section. A ::bible is a map whose keys must exist in book-order. The value of bible is chapter which is a map whose keys are numeric strings. The values of chapter are maps named verse whose keys are again numeric strings and values are nonempty strings. All of this done declaratively. Spec not only validates data it can also transform it. In this case, we can conform the keys to the spec which means that once we validate the bible map, the numeric keys will be converted into integers.

The upside of spec is that now we have robust schema validation. The program will abort early if the data passed to it violates spec, so we don’t have to do error handling in the program’s functions, because the data is in good shape. We can just concentrate on transforming it. An even bigger upside is that now it’s easy to tell what kind of data goes in, just follow the definition of ::bible. What data comes out? Follow ::sorted-bible.

Best of all, the errors are data! Here’s a corrupt piece of data that violates the spec rule being tested against the spec. The name “Ge” is not in the set of valid books. Since the error is data, we can interrogate it.

=> (s/explain-data ::bible {"Ge" {"1" {"1" "In the..."}}})
 
#:clojure.spec{:problems ({:path [0], :pred (set book-order), :val "Ge", :via [:bible.core/bible :bible.core/book], :in ["Ge" 0]})}
 
=> (-> (s/explain-data ::bible {"Ge" {"1" {"1" "In the..."}}}) ::s/problems first :pred)
 
(set book-order)

Here are some more cases of bad data.

=> (s/explain-data ::bible {"Genesis" {"1" {"1" "In the..."}}}) ;Good data
 
nil
 
=> (s/explain-data ::bible {"Genesis" {"A" {"1" "In the..."}}}) ;A is not a number
 
#:clojure.spec{:problems ({:path [1 0], :pred numeric?, :val "A", :via [:bible.core/bible :bible.core/chapter], :in ["Genesis" 1 "A" 0]})}
 
=> (s/explain-data ::bible {"Genesis" {"1" {"1" 42}}}) ;42 is not a string
 
#:clojure.spec{:problems ({:path [1 1 1], :pred string?, :val 42, :via [:bible.core/bible :bible.core/chapter :bible.core/verse], :in ["Genesis" 1 "1" 1 "1" 1]})}

With just a few lines of code, we get all of the following:

The ability to verify that data is in the shape and form the program expects cuts down on error checking
Some free data parsing, in this case the string to int conversion
A robust system for error messages that can be customized, because it’s just data
Spec allows powerful predicates like (set book-order) and numeric-str which are not easy to do in a conventional typed language. Often types provide limited guarantees like “this key is a String” and not the fact that it’s numeric or nonempty. These are exactly the type of invariants we want but rarely get out of types.
Specs can be composed and reused in arbitrary ways
Using spec to generate test data or just get an example output of what kind of data conforms to the spec.

For example:

=> (pprint (ffirst (s/exercise ::sorted-bible 1)))
 
[["2 Kings" 1 1 "Z"]
 ["Esther" 2 1 "1"]
 ["1 Samuel" 1 1 "7Tk9"]
 ["Haggai" 2 2 "8"]]

Since specs live in a registry, we can query the registry for specs that are defined in our program programmatically in the REPL. This is another aspect that’s desirable in a large code base, because it enables reuse.
We don’t have know how the specs look like beginning development. It’s possible to incrementally build up a program in the REPL and once we’re happy with the code, to define the specs. Even this is completely optional. Personally, I find that I like writing specs upfront to conform the input to my code and once I figure out the flow of my code to write specs again afterwards.
This is a subtle point, but it’s important to realize that spec supports the creation of open systems that are amenable to change and end up less brittle. The following spec

(s/keys :req [::a ::b])

validates that a map contains two keys ::a and ::b if there are other keys in the map that don’t have specs defined they are left alone. If there’s no spec for ::a, the only thing validated is that a key ::a is present. When requirements change, spec allows for a smooth progression to new requirements. We don’t have to make all of our decisions upfront and our code can still operate.

There are many more uses for specs. I recommend following the excellent tutorial here to learn more.

I don’t believe that static typing is uniquely qualified to give programmers safety guarantees or better readability. The github link for the challenge provides a Haskell solution along side a Clojure one. Since Haskell is treated as the current poster child for statically typed languages, it would be great for somebody to provide a comparable Haskell solution that gives similar guarantees.

Contents