[rabbitmq-discuss] State monad question

Matthew Sackman matthew at rabbitmq.com
Fri Apr 1 00:44:51 BST 2011


On Thu, Mar 31, 2011 at 01:19:13PM -0500, Jon Brisbin wrote:
> Saw the traffic today about the State/Cut stuff and looked at the
> code. But I feel like the guy who misses the joke and everyone else is
> laughing.
> 
> Can someone throw me a bone here and let me in on why the hubbub?
> What's the significance? Look at this as a teachable moment. ;)

Monads are a useful and powerful control-flow mechanism in any language.
However, for them to be easily used, the language tends to need to
directly support them in syntax, or have a very flexible and extensible
syntax.

In Erlang, let's say we have the three lines of code:

    A = foo(),
    B = bar(A, dog),
    ok.

They are three, straight forward statements, which are evaluated
consecutively. What a Monad gives you is control over what happens
before the first statement, between the statements, and after the last
statment. It is this control that makes them so powerful.

Now of course, there's absolutely nothing profound going on here: you
could just implement as follows:

    begin(),
    A = foo(),
    middle(),
    B = bar(A, dog),
    middle(),
    ok,
    end().

But that's not quite powerful enough, because it would be nice if
instead the extra funs had some understanding that A and B have been
bound to, and that they knew what was coming next. So instead, we could
introduce a sort of chaining function called 'andthen':

    andthen(foo(),
            fun (A) -> andthen(bar(A, dog),
                               fun (B) -> ok end)).

Thus the function 'andthen' takes all results from the previous
expression, and controls how and whether they are passed to the next
expression.

As defined, the 'andthen' function is the monadic function '>>='.

Now it's very very difficult to read the program with the 'andthen'
function (especially as Erlang is so annoying and doesn't allow us to
define new infix functions), which is why some special syntax is
necessary. Haskell has it's 'do' notation, and so I've borrowed from
that and abused Erlang's list comprehensions. Haskell also has lovely
type-classes, which I've sort of faked. So, with erlando, you can write
in Erlang:

    do([Monad ||
        A <- foo(),
        B <- bar(A, dog),
        ok]).

which is readable and straightforward, but is transformed into:

    Monad:'>>='(foo(),
                fun (A) -> Monad:'>>='(bar(A, dog),
                                       fun (B) -> ok end))

There is no intention that this latter form is any more readable than
the 'andthen' form - it is not. However, it should be clear that the
function Monad:'>>='/2 now has complete control over what happens: does
the fun on the RHS ever get invoked? If so, with what value?

A few years ago, I was working entirely in Haskell and I watched new
people pick up the language and it became something of a rite of passage
that everyone wrote their own (awful) tutorial on monads. I never did
that, and I think that looking at them from a different language helps
see them in a different light: it's simply a control structure that
allows you to inject all sorts of other functionality between and around
"normal" looking lines of code.

One of the things it can help with, as Alvaro pointed out, is dealing
with endless nested 'case' statements, especially when dealing with
errors. Somthing like a maybe or error monad can completely get rid of
the nesting, because it is the maybe:'>>='/2 function that detects
whether or not the lhs errored, and if it did error, the rhs is never
invoked.

See http://hg.rabbitmq.com/erlando/file/a1c8f1c874c6/test/src/maybe.erl
or http://hg.rabbitmq.com/erlando/file/a1c8f1c874c6/test/src/error.erl -
hopefully it should seem completely trivial at this point how this
works.

The other thing is that we have quite a lot of test code that does
things like:

   test(S0) ->
     S1 = foo:a(S0),
     S2 = foo:b(S1),
     ...
     S40 = foo:aw(S39),
     ...

This is _horrible_ code because whenever you change it, you have to
rename a billion variables, which is surprisingly error prone and makes
the diff much larger. The State monad can be used to completely hide the
state that's being passed around here. I don't really want to try and
explain how the State monad works (ultimately, rather than evaluating
the rhs, the '>>=' function returns a new function which simply
maybe-modifies the state), but to give you a small example, compare:

http://hg.rabbitmq.com/rabbitmq-server/file/default/src/rabbit_tests.erl#l2293
with
http://hg.rabbitmq.com/rabbitmq-server/file/e227a13c93a4/src/rabbit_tests.erl#l2296

They are the same test. The latter has been rewritten to use the State
monad (and it also uses 'cuts' which are a form of partial application
which I also implemented in erlando today). Now the 2nd version is a
fraction longer - it has to do some work to set up the monad and it also
suffers from going over the 80-column limit and so having to wrap lines,
but which looks more maintainable to you, and which is actually simpler
to read and to modify?

I hope that sheds a little light on it all.

The parse transformers, some monad implementations, and some tests for
this stuff are all available at http://hg.rabbitmq.com/erlando/

Matthew



More information about the rabbitmq-discuss mailing list