[rabbitmq-discuss] RabbitMQ and a two-site deployment connected via WAN.

Tue May 4 16:07:56 BST 2010

On 04/05/10 11:29, Daniel Pittman <daniel at rimspace.net> said:
<snip>
> I figured in most of a year things might have moved on; is it still likely
> that shovel will avoid losing messages across a WAN split where Erlang
> clustering would lose them?

Yeees, but it's really two different concepts. The clustered case is 
really a big RabbitMQ server; when a node vanishes it's as if that 
node's queues just don't exist. The shovel case is two separate servers. 
For your case the shovel is more reliable, since messages can back up in 
either broker if the other is not visible; BUT it makes the topology of 
the whole thing visible to clients (and something you have to manage). 
This may not be a problem of course.

>> Not in 1.7.2. Versions of this optimisation existed in previous versions
>> of RabbitMQ, but have since been disabled since they broke some ordering
>> guarantees.
>
> OK.  Given I do value stability and correctness over performance, in 1.7.2 is
> this going to be one copy per subscriber on the remote node over the WAN?

Yes.

>> There's a new, correct, version of this optimisation on branch bug19844
>> in Mercurial (so, err, requiring you to compile from source again). It's
>> in a fairly reasonable state and hopefully should get merged into the
>> default branch soon and thus find its way into the next release. You are
>> very welcome, and indeed encouraged, to test it out before that happens.
>
> Depending, I might well do so — but given we are testing the message queue
> system, we are very unlikely to actually find problems with this.

Fair enough :)

<snip>
> I want to understand better what, for example, high latency or low bandwidth
> links mean in the context of a WAN Erlang cluster: if latency gets high
> enough, do messages time-out and get retried?  Is this congestion-controlled?
>
> When the Erlang messages start to back up, do they take down the cluster?
>
> I *think* the answer is that Erlang is good about this, and time-out control
> is application specific, but there doesn't seem to be a good guide to what
> there is that I *should* be worrying about.

No, there isn't :( To answer your specific questions:

If the latency gets very high, but a node is not down/disconnected, 
messages will back up indefinitely, so they don't retry per se, but 
don't get dropped either. In very high load / latency situations Erlang 
can decide that the other node is down depending on the "ticktime" - see 
this thread for more:

http://markmail.org/message/wakptvfyogtgqnen

Hope this helps.

Cheers, Simon

-- 
Simon MacMullen
Staff Engineer, RabbitMQ
SpringSource, a division of VMware