Haskell at work

Almost as our native languages help shape our mindsets and personas, there’s little doubt that our choice of programming languages bears a profound effect on the programming experience and produced results. Unfortunately, many of these effects are subtle and hard to qualify, which makes the choice of programming language a notably heated argument. My personal interest in programming languages is both practical and academic. While I’m always on the lookout for a better tool to get my job done, I leave some room for pure exploration that won’t yield any apparent benefits. This is exactly how I picked up Haskell. In this post, I’m going to focus on the use of Haskell to solve a regular task at work and the lessons I learned doing that.

This article won’t teach you any Haskell. Try “Learn You a Haskell for Great Good!” or “Haskell Book” instead. For my Russian readers, I can also recommend a comprehensive online course by Denis Moskvin that I took last year.

One of my favorite challenges for a new programming language is writing a command line utility for automating a repetitive task at work. This time, I needed a simple tool that would query the Postmark Messages API and save the matching messages into a local directory. It would be easy for me to write it in Ruby using the official Postmark library, but always taking the familiar path doesn’t leave much opportunity to learn. So, I decided to give Haskell a chance on this one.

Preparation

When I was re-learning Haskell last year (I’d had some experience with Haskell from college), a friend pointed me to Stack — a modern Haskell toolkit that takes care of installing GHC and common dependencies. On Mac, Stack is available from Homebrew. Once installed, you can start a new project by running stack new PROJECT_NAME simple, which will produce a project directory with the minimum of boilerplate.

To make things even more interesting, I decided to test Visual Studio Code. Being a long time user of Sublime Text, I’ve never really got onto the Atom train. The latter always felt too heavyweight and awkward, but seeing more and more people migrate off of slowly decaying Sublime Text wasn’t reassuring either. While I’m not charmed by Code, it provides a solid Haskell support via Haskell Syntax Highlighting and Haskell ghc-mod packages. It gave me type hints, auto-completion, and allowed for running GHCI (Haskell’s interactive console) right in the editor. What else to dream of?

Implementation

When designing a command line utility, it is always a good idea to begin by documenting all supported arguments and options. When called with no arguments, our utility will display the following help banner:

Usage: ./librarian [options] <api_token>
  -l n           --limit=n                Limit the maximum number of loaded messages (default 100)
  -t TARGET_DIR  --target-dir=TARGET_DIR  The directory where to put downloaded messages (defaults to current dir)
                 --from=YYYY-MM-DD        Start date
                 --to=YYYY-MM-DD          End date

The program accepts a required API token argument and some optional parameters that determine its behavior. Similarly to Clojure, Haskell encourages modeling your problem domain as pure data structures and transformation functions. For example, I store the program settings as a following Haskell record type:

data Settings =
  Settings
    { fromDate :: Maybe Day
    , toDate :: Maybe Day
    , serverToken :: String
    , limit :: Int
    , targetDir :: FilePath
    , apiEndpoint :: String
    } deriving Show

Using the built-in System.Console.GetOpt module, we can represent available flags as an enumerable type and then catalog all supported options in a list structure:

data Flag
  = ServerToken String
  | TargetDir FilePath
  | FromDate Day
  | ToDate Day
  | Limit Int
  deriving Show

supportedOptions :: [OptDescr Flag]
supportedOptions =
  [ Option ['l'] ["limit"] (ReqArg (Limit . read) "n") "Limit the maximum number of loaded messages (default 100)"
  , Option ['t'] ["target-dir"] (ReqArg TargetDir "TARGET_DIR") "The directory where to put downloaded messages (defaults to current dir)"
  , Option [] ["from"] (ReqArg (FromDate . readDate) "YYYY-MM-DD") "Start date"
  , Option [] ["to"] (ReqArg (ToDate . readDate) "YYYY-MM-DD") "End date"
  ]
  where
    readDate = parseTimeOrError False defaultTimeLocale "%Y-%m-%d"

We can now get a usage banner shown above “for free” by passing supportedOptions to the usageInfo function of GetOpt.

I’ll skip the prosaic task of loading options and leap straight to HTTP requests. Haskell’s HTTP story is a bit complicated (more on that below), so it took me several attempts to find a suitable HTTP library. I ended up using two packages: http-conduit and http-client-tls (for TLS support). You work with HTTP by creating and configuring an instance of a request and then running it via the facilities provided by http-conduit. Once you figure out how to map HTTP responses onto your program data structures, making HTTP requests becomes a breeze.

The data type I chose for representing a loaded message is also straightforward. It only has to account for the fact that Postmark doesn’t return message bodies when working with the collection resource. A separate request has to be made to retrieve these fields, which is why textBody and htmlBody are wrapped in a Maybe container to avoid having two different data types.

data Message =
  Message
    { messageId :: String
    , subject :: String
    , from :: String
    , textBody :: Maybe String
    , htmlBody :: Maybe String
    } deriving Show

To support exchanging data of this type with the Postmark API, I needed to implement FromJSON and ToJSON type classes exposed by the aeson library that provides JSON support for Haskell.

I implemented two Postmark API calls: one, getMessages, fetches a page of outbound messages, and the other, getFullMessage, loads an individual message by ID to retrieve its contents.

getMessages :: Request -> Int -> Int -> Maybe Day -> Maybe Day -> IO MessagesResponse
getMessages request offset perPage fromDate toDate =
  liftM getResponseBody $ httpJSON request'
  where
    request' =
      request |> setRequestMethod "GET"
              |> setRequestPath "/messages/outbound"
              |> setRequestQueryString
                  [ ("count", Just $ (pack . show) perPage)
                  , ("offset", Just $ (pack . show) offset)
                  , ("fromdate", fmap (pack . show) fromDate)
                  , ("todate", fmap (pack . show) toDate)
                  ]

getFullMessage :: Request -> Message -> IO Message
getFullMessage request message = do
  putStrLn $ "Loading message with ID: " ++ (messageId message)
  liftM getResponseBody $ httpJSON request'
  where
    request' =
      request |> setRequestMethod "GET"
              |> (setRequestPath $ pack $ "/messages/outbound/" ++ messageId message ++ "/details")

I could use these two functions to piece everything together, but I got an itch to implement a simplified version of the approach that I’d adopted in the official Postmark Ruby library. There, I provide an alternative “lazy” collections API that loads resources in batches on demand. Thus, a “take 100 from messages” kind of operation would only fetch enough entries to return the first one hundred to the caller. Implementing this in Haskell turned out to be nontrivial, because of the specifics of how it executes I/O actions. By default, Haskell will not allow you to “defer” an I/O operation until its result is needed. To work around this, I had to use the unsafeInterleaveIO function from the scarily-named System.IO.Unsafe module. The resulting messages function uses getMessages under the hood to page through the entire collection.

messages :: Request -> Int -> Int -> Maybe Day -> Maybe Day -> IO [Message]
messages request offset batchSize fromDate toDate = do
  MessagesResponse total page <- getMessages request offset batchSize fromDate toDate
  let nextRequest | offset + batchSize < total = messages request (offset + batchSize) batchSize fromDate toDate
                  | otherwise = return []
  liftM (page ++) $ unsafeInterleaveIO nextRequest

The full source of the program is available as a public gist on GitHub.

Impressions

Every programmer knows the joy of getting things right on the first try. While Haskell isn’t entirely free from runtime errors, the trite “if it compiles, it probably works” mantra isn’t too far from the truth. It’s hard to say if static typing saved me time on a project this small, but every time I managed to compile the program, it worked as expected.

I expected JSON parsing to be a major pain in the neck, but no, thanks to Aeson and type classes, it was a piece of cake to implement. I also enjoyed how declarative the pure code is. Any side-effects are immediately apparent from the type signatures, a property not commonly found in mainstream languages.

Although paling in comparison with Clojure REPL, the process of iteratively growing your program in GHCI works relatively well: the ability to omit type declarations and use undefined to stub unfinished functions both come in handy.

Now, to the “bad stuff”. As I mentioned above, it was surprisingly hard to find a suitable HTTP library. First, there’s the seemingly outdated http package that doesn’t support TLS. For secure connections, it recommends using http-streams or the http-client package. The latter was my initial choice, but it was making even trivial operations, like setting a custom HTTP header, unreasonably hard. Thus, I ended up using the http-conduit package, a predecessor of http-client that had lost most of its high-level functionality during “refactoring”.

Haskell’s built-in strings are implemented as linked lists of characters, which makes them ill-suited for many real-world applications. Naturally, Haskell has means to alleviate this issue. The two I’m aware of are the bytestring and text packages which provide efficient implementations of byte strings and Unicode strings accordingly. Both Aeson and http-conduit use byte strings internally, so I had to adapt. As one would expect, having more than one type of strings in your code raises entropy quickly. The OverloadedStrings language extension (makes the string literal polymorphic) helps some but doesn’t resolve the annoyance of having to pack/unpack various string types entirely.

I’ve been an occasional Haskell programmer for about a year now, but I still often struggle with operator precedence. There are moments when I just helplessly stare at a type error resulting from the operators applied in unforeseen order. I guess, it will get better with experience, but at this point, I feel compelled to wrap every operation in comforting parentheses just to escape the frustration.

Another issue is verbosity. 187 lines of code isn’t a lot, but at least half of it is spent declaring various data types used by the program. Yes, once all types are declared, Haskell code is very terse and expressive. Still, even if this initial time investment will pay off on larger projects, it’s something I can’t ignore.

I’m not giving up on Haskell just yet, and I might write more about my experiences with it in the future. I cherish the mathematical precision of Haskell programs but struggle with the day-to-day practicalities of the platform. Many of my frustrations can be attributed to my personal lack of experience, and many others might be addressed by evolutionary improvements. Though, neither of them diminish Haskell’s uniqueness.