[rabbitmq-discuss] Erlang client RPC and dropped messages

Noah Fontes nfontes at cynigram.com
Mon Apr 26 21:20:55 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Matthew,

Thanks a lot for the clarification! I was thoroughly confused and this
all makes a lot more sense now. I think I'll try the transactional
method (or perhaps even work on replacing our current system with
rabbitmq-shovel).

Much appreciated,

Noah

Matthew Sackman wrote:
> Hi Noah,
> 
> On Mon, Apr 26, 2010 at 01:58:24PM -0500, Noah Fontes wrote:
>> 1. The connection to the remote RabbitMQ exchange is dropped (often this
>> is because I accidentally let way too many messages build up and the
>> node crashes, but that's a topic for another day and I'm guessing the
>> new persister is going to fix this issue quite nicely); however, no-one
>> is notified of the dropped connection because as far as I can tell the
>> checks for this are only run when data actually goes across the connection.
> 
> Correct, though you could set heartbeat to non-zero. That should get you
> more prompt notification.
> 
>> 2. A message is read from the local queue and published to the remote
>> exchange (via amqp_channel:cast/3), which appears to be successful.
> 
> Be aware of cast. Cast returns as soon as the message has been added to
> the writer's mailbox (and actually can be sooner...). Needless to say,
> this does not suggest the message has made it out of the socket, or even
> been looked at by the socket writer. In general, I tend to use
> amqp_channel:call, not cast, for almost everything as it avoids millions
> of messages backing up in the mailbox of the writer process.
> 
>> Relevant comments from rabbit_writer.erl are included here:
>> %% So instead we lift the code from prim_inet:send/2, which is what
>> %% gen_tcp:send/2 calls, do the first half here and then just process
>> %% the result code in handle_message/2 as and when it arrives.
>> %%
>> %% This means we may end up happily sending data down a closed/broken
>> %% socket, but that's ok since a) data in the buffers will be lost in
>> %% any case (so qualitatively we are no worse off than if we used
>> %% gen_tcp:send/2), and b) we do detect the changed socket status
>> %% eventually, i.e. when we get round to handling the result code.
> 
> Indeed, so using call, not cast, pretty much gets you to this point, but
> obviously no further.
> 
>> 3. After the message is "written" to the exchange, the connection is
>> seen as closed, messages are sent out to listening Erlang processes, and
>> a new connection is subsequently re-established by my code.
>>
>> However, at this point the message that caused the connection drop to be
>> noticed is permanently lost; since the connection wasn't actually active
>> when it was published it can't possibly be rejected, and since no errors
>> were thrown at publish-time, it appears as if the message was sent
>> successfully. In our code, this results in ~50% data loss when a node
>> unexpectedly goes down.
> 
> Right. What you're doing is fine, but with your approach you clearly
> need to hold on to the most recent message you received in case the
> connection drops and you then need to resend it. Furthermore, as you're
> using cast, you could have millions of messages queued up with the
> writer process mailbox which all would be lost.
> 
> However, you're not acking. So, on connection drop to the remote, you
> could also connection drop to the local. Then, the local broker will
> requeue everything that's not been acked, and when you reconnect, you'll
> find it all there. Now at that point, any ordering guarantees you had go
> out the window, but that may not be a concern. Assuming you're using
> basic.consume, you could set qos to prefetch of 1, which would then mean
> that at most one message is buffered in the client without it being
> acked, significantly limiting your exposure.
> 
> Finally, if you want to be really sure, use transactions to the
> destination - that way you know you have to hang on to everything you've
> published up until you get the commit_ok back, and then you are
> guaranteed that it's been received.
> 
> And in an obvious plug, our shovel is capable of dealing with these
> issues. ;)
> 
> Matthew

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkvV9acACgkQhitK+HuUQJSKwgCgn881Wc100Ck+h+1DknhhwEsl
rgQAnjUyC68Ehg1SRqFWeMUCEapmmaQc
=JuyV
-----END PGP SIGNATURE-----



More information about the rabbitmq-discuss mailing list