Recently there was a discussion on twitter about formatting, which prompted me to summarise my thoughts on coding style and the style that I personally adhere to (thread). Here I recount and elaborate the points made.
The tldr version of my style advice is that the code should be self-explanatory and easy to modify. Self-explanatory code has a logical control and data structure, and reads like a crisp essay or news article.
Although I focus on Haskell because it’s my primary programming language, the advice here is mostly just general-purpose engineering hygiene.
Pointfree style is a great tool for this, to an extent. Use it whenever it does.
Good functional style emphasises clear data flow.
Good imperative style emphasises clear control flow.
I have no idea what good object-oriented style is. Objects obscure both control and data flow by entangling mutable functions and mutable data into a big mess.
(But, I hear you object, what about Scala? Well, good Scala code is basically good functional code—they use the object system as a module system.)
This is functional—
You are building a function
h . g . f and then applying it.
This is imperative. This is bad. Now you have to care about the concrete variable in question when reading this.
This comes into play when you have a large function. When reading
grossImperativeFlow, you have to track the variable as it gets passed around. It takes more mental effort and causes you to think at a low level of abstraction.
niceFunctionalFlow, you don’t care about the variable—in fact, there is no variable! There’s only the actions to concern yourself with. Your attention is called to the main
stage2 . stage1 pipeline. The specifics of the stages are offset to the
where binding, where you can focus your attention to if you need to know.
Naming your functions makes explicit data flow. Naming your data makes explicit control flow.
let-binder should refer to a meaningful unit of work. Excessive
lets obscure data flow.
For example, this (source) is very difficult to read.
There are too many binders, verbosely naming unnecessary details that you shouldn’t be caring about. The flow is obscured.
adjustLine :: Int -> Text -> Text adjustLine desiredPrefixLength oldLine = newLine where (prefix, suffix) = Data.Text.breakOn "=" oldLine actualPrefixLength = Data.Text.length prefix additionalSpaces = desiredPrefixLength - actualPrefixLength spaces = Data.Text.replicate additionalSpaces " " newLine = Data.Text.concat [ prefix, spaces, suffix ]
I would write it like this. (See my stance on import names below).
The high-level logical structure is clear, and the unnecessary details are hidden in the
where. The processing is split into three clear steps, each of which is given a meaningful short name. You break the text in two, adjust the first thing, then put the pieces back together again.
At a glance, you know what steps the data1 is flowing through, rather than knowing what the different intermediate datum are but not what’s actually going on.
With regards to the logical organisation of code, a file or module is like a chapter, a function is like a section, and each newline-separated block in a function is like a paragraph.
adjustLine function above is an example of excessive whitespace. The logical organisation of the subroutines is obscured by the spacing.
It’s fine (even preferable) to make such functions terse and pointfree and give them single character/short names like
munge, etc. The point is that the reader doesn’t care about these things and he shouldn’t need to.
In fact, it is sometimes advantageous to obscure your code with pointfree style, to force the reader to ignore the irrelevant details.
Traversal boilerplate is mundane code; do your best to avoid writing it. Abstract it with recursion schemes.
If you have 6 nested fmaps, you absolutely do not care what those intermediate containers are.
I do not use IDEs nor debuggers, a practice which shapes my following opinions. However, I do believe that codebases maintained by hackers with simple tools are kept in better shape than codebases generated by IDE monkeys, probably out of necessity. (Much like how those who commute by walking or cycling are on average in better shape than those who drive.)
Haskell is great for this because types and terms are kept separate, but not as separate as ML that you need special plugins to jump between
.mli files. The type ascription token
:: is the marker for the function definition and should always be kept on the same line as the function name.
With this in mind, my new preferred multiline formatting style for type signatures is:
I used to write multiline signatures like this:
However, this style is quite bad because to keep things in alignment, you have to reindent whenever you rename the function, necessitating unnecessary keystrokes and causing lots of diff noise.
The following style is even worse
While everything has a nice, uniform offset, the
:: token has been separated from the function name, neutering its utility when
grepping through the code.
Format lists like this, so the first line is not special and can be commented out easily.
I’m a lot less strict about this, but it’s sometimes convenient to split up long chains of function compositions:
This is self-explanatory. The converse to this statement is that common things should be unqualified or have short imports. There is a special place in Enterprise Java Hell for you if you do this:
import qualified Control.Monad.Trans.Class import qualified Control.Monad.Trans.IO import qualified Control.Monad.Reader import qualified Data.Bifunctor doSomething = do ctx <- Control.Monad.Reader.ask -- wtf?? ... let xy = Data.Bifunctor.bimap f g someBifunctor -- seriously wtf is wrong with you?? ... -- my fingers are hurting just typing this out Control.Monad.Trans.Class.lift (doSomething ctx) ... -- I have no idea what's going on anymore because of all the goddamn import noise Control.Monad.Trans.IO.liftIO (putStrLn "done")
Use type aliases, type constraints, and typeclasses liberally. Abstraction is the name of the game; that’s why you’re using Haskell!
…and provide your own stable API (e.g. Time, Regex, Compression) so that you can swap the libraries out easily.
This is just good old-school engineering hygiene. But if you don’t do this, please observe the above point and import the library consistently (for ease of
You shouldn’t be caring about the data anyhow.↩