[rabbitmq-discuss] [Minimum Air Induction] Introducing Shovel: An AMQP Relay

Sat Sep 20 15:27:37 BST 2008

Valentino,

On Sat, Sep 20, 2008 at 9:56 AM, Valentino Volonghi <dialtone at gmail.com> wrote:
> Yes, I am tracking tip, I didn't notice the big changes in lib_amqp.erl
> so I'll refactor my code to use them.

lib_amqp was evolved as a high level convenience API to sit on top of
the fine grained low level API. Whilst the low level API is stable
from a design and implementation perspective (meaning it seems to be
the correct approach and is quite simple and expressive), the higher
level lib_amqp *may* have to evolve further to cover more common use
cases. The reason why I say may, is because it would change in the
course of the community using it and seeing how it suffices their
needs. So if there are any issues or ideas with lib_amqp, now is an
ideal time to start a (separate) discussion thread about this, so it
will flow into 1.0.

> I hope so :). The main problem is the absolute necessity to not lose
> any single one of the messages. Nothing can be lost.

Sounds familiar. If you use transactional persistent messaging, this
will be guaranteed. Sounds like an expensive setup for log statements
though :-)

> Here is already the first 'problem', if the known exchange is down the line
> would be lost forever, this is simple enough though and rabbitmq would
> run embedded in mochiweb together with shovel. Every queue durable and
> every message too. In this case if mochiweb fails I won't have to worry, if
> shovel disconnects it won't send lines to anyone and wouldn't even remove
> them from the queue so nothing is lost here, if rabbitmq dies I hope it
> brings
> everything down with itself, traffic is rebalanced on the remaining servers
> and
> nothing is lost.

Phew! That was a sentence.....what do you mean by Rabbit bringing
everything down with itself?

And if you're using an embedded RabbitMQ instance, how is the Shovel
application supposed to failover to other Rabbit nodes?

I think this scenario requires a bit more elaboration.

> Right after this component there's another rabbitmq server, that we can call
> local rabbitmq, which is local to the mochiweb server, in the same subnet.
> This server would collect everything that various mochiweb+rabbitmq+shovel
> servers send, persist it and forward it to a central location. Again,
> everything is
> durable so there should be no risk of losing messages.

Why have this middleman? Why not just have the embedded Rabbit
instances forward straight to the remote brokers?

> In the central location there would be a final rabbitmq server that will
> wait for
> data. Attached to it there would be several consumers that fetch data and
> store it in various databases in small transactions (let's say one
> transaction
> every 50-100 log lines).

Have you considered doing the coalescing in Shovel (i.e. on the
sending side rather than on the receiving side)?

Maybe you also want to compress stuff if you're sending it over a WAN.

>
> So, how should shovel behave:
>
> Well, it should be pretty sure that every message was delivered to the final
> location, so I think its way of working would be:
>
>  1. receive message from embedded consumer
>  2. publish message to remote host
>  3. wait for ack
>  4. ack the rabbitmq container
>  5. the rabbitmq container at this point can remove the message

What happens when Shovel fails between step 3 and 4? Or there is a
network failure just after the remote broker sends the ack and just
before it would have been received by Shovel? This sounds like the
Byzantine General's problem. Maybe there is something you can do in
the application to achieve the idempotency your application requires.

> Now... I'm not sure if there's an ack confirmation message so that the
> consumer if
> 100% sure that the confirmation was received, I suppose there isn't so this
> means
> that the system will maybe have duplicates at the end and I'll have to take
> care of
> this somehow (any suggestions?).

Not quite sure what you mean here. Can you elaborate?

> Another small problem is the current state of Shovel where it basically
> crashes when a
> connection is dropped, a change that I would like to make (or I would like
> to see) is that
> it should be able to reconnect to the remote host with an exponential
> backoff so that it
> starts retransmitting as soon as possible.

Sure, the OTP supervisor could potentially handle this.

> I've read a bit of the archives and I see there are some problems with
> memory growth
> and rabbitmq simply crashing... I hope this doesn't become a problem for us,
> but...
> let's say that the web servers generate messages of about 1K each (could be
> much
> less, but this is for the example's sake), now the local rabbitmq instance
> is down for
> some reason, basically mochiweb will remain alive as long as it has memory
> which
> means about 4GB/1MB/sec (let's say that the server generates 1000 requests
> per
> second) that is about 66 minutes. If the problem instead is in the central
> rabbitmq
> basically each local rabbitmq is subject to the traffic of every web server,
> if we have
> 8 webservers we obtain 4000/8MB/sec or about 8 minutes.
>
> So it basically means that we have 8 minutes to react to such a failure.
> Does this
> also sound reasonable? and if so... What possible fixes can I look for?
> Ultimately... does this sound like something that rabbitmq can be good at?

ATM, queues are memory bound, so as indicated in a previous thread,
you would have to calibrate this with your own application and
production sceanario. Just test it and find out where the limit is.

BTW, we do intend to implement the disk overflow mechanism discussed
with Edwin. Just don't know when it'll get done.

>>> I took part of the code I'm using now to replace lib_shovel.erl from a
>>> guy
>>> on IRC #erlang that
>>> shared it, I had my version of it which was pretty similar but his looked
>>> more cleaned up so
>>> I'm using that now.
>>
>> Who was that?
>
> Ah, can't remember.

No probs, I just am not on IRC so much lately because whilst it is a
quick way to answer questions, it does take up a lot of time.

Ben