[rabbitmq-discuss] [Minimum Air Induction] Introducing Shovel: An AMQP Relay

Sat Sep 20 09:56:01 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
small intro: I'm trying to use Shovel (LINK HERE) to create a push- 
based system
that can move loglines from a remote location to a centralized  
storage, being a
remote location means that messages have to cross the internet.

On Sep 20, 2008, at 12:13 AM, Ben Hood wrote:

> Valentino,
>
> On Sat, Sep 20, 2008 at 1:57 AM, Valentino Volonghi <dialtone at gmail.com 
> > wrote:
>> Well... I'm currently rewriting the lib_shovel.erl file using a  
>> cleaner
>> version of it.
>
> See comment in previous email.

Yes, I am tracking tip, I didn't notice the big changes in lib_amqp.erl
so I'll refactor my code to use them.

>> SOURCE ---> LOCAL RABBITMQ (FORWARD) -----> CENTRAL RABBITMQ
>>
>> And in CENTRAL RABBITMQ there would be all the listeners.
>
> Sounds like a good idea.

I hope so :). The main problem is the absolute necessity to not lose
any single one of the messages. Nothing can be lost.

>> The other thing I need is absolute reliability, if a message is not
>> transmitted don't remove it from
>> the queue and ack it when it is delivered. And if a connection is  
>> dropped
>> instead of crashing it
>> should retry to connect and not deliver until it's connected back.
>
> Sounds reasonable. Currently Shovel does not have any reliability.
> Maybe you can elaborate on the concrete scenarios you are trying to
> cover with this.

Sure, basically the source of loglines would be a mochiweb server that  
for
relevant requests would send messages to a known exchange.

Here is already the first 'problem', if the known exchange is down the  
line
would be lost forever, this is simple enough though and rabbitmq would
run embedded in mochiweb together with shovel. Every queue durable and
every message too. In this case if mochiweb fails I won't have to  
worry, if
shovel disconnects it won't send lines to anyone and wouldn't even  
remove
them from the queue so nothing is lost here, if rabbitmq dies I hope  
it brings
everything down with itself, traffic is rebalanced on the remaining  
servers and
nothing is lost.

Right after this component there's another rabbitmq server, that we  
can call
local rabbitmq, which is local to the mochiweb server, in the same  
subnet.
This server would collect everything that various mochiweb+rabbitmq 
+shovel
servers send, persist it and forward it to a central location. Again,  
everything is
durable so there should be no risk of losing messages.

In the central location there would be a final rabbitmq server that  
will wait for
data. Attached to it there would be several consumers that fetch data  
and
store it in various databases in small transactions (let's say one  
transaction
every 50-100 log lines). Even here everything should be durable so there
should be no problem.

So, how should shovel behave:

Well, it should be pretty sure that every message was delivered to the  
final
location, so I think its way of working would be:

  1. receive message from embedded consumer
  2. publish message to remote host
  3. wait for ack
  4. ack the rabbitmq container
  5. the rabbitmq container at this point can remove the message

Then of course this procedure is repeated in each machine until we  
reach the
central location.

Now... I'm not sure if there's an ack confirmation message so that the  
consumer if
100% sure that the confirmation was received, I suppose there isn't so  
this means
that the system will maybe have duplicates at the end and I'll have to  
take care of
this somehow (any suggestions?).

Another small problem is the current state of Shovel where it  
basically crashes when a
connection is dropped, a change that I would like to make (or I would  
like to see) is that
it should be able to reconnect to the remote host with an exponential  
backoff so that it
starts retransmitting as soon as possible.

Does this usecase sound reasonable?

I've read a bit of the archives and I see there are some problems with  
memory growth
and rabbitmq simply crashing... I hope this doesn't become a problem  
for us, but...
let's say that the web servers generate messages of about 1K each  
(could be much
less, but this is for the example's sake), now the local rabbitmq  
instance is down for
some reason, basically mochiweb will remain alive as long as it has  
memory which
means about 4GB/1MB/sec (let's say that the server generates 1000  
requests per
second) that is about 66 minutes. If the problem instead is in the  
central rabbitmq
basically each local rabbitmq is subject to the traffic of every web  
server, if we have
8 webservers we obtain 4000/8MB/sec or about 8 minutes.

So it basically means that we have 8 minutes to react to such a  
failure. Does this
also sound reasonable? and if so... What possible fixes can I look for?
Ultimately... does this sound like something that rabbitmq can be good  
at?

failures of machines can be anything from hardware failure to anything  
else.

>> I took part of the code I'm using now to replace lib_shovel.erl  
>> from a guy
>> on IRC #erlang that
>> shared it, I had my version of it which was pretty similar but his  
>> looked
>> more cleaned up so
>> I'm using that now.
>
> Who was that?

Ah, can't remember.

> BTW a couple of points:
>
> 1. Bear in mind the refactoring that is being discussed on the list to
> get the core client into a 1.0 state;

Yep, will do.

> 2. I think this discussion should go over the mailing list, because
> this currently has the widest reach, so other people are made aware of
> what is going on and can potentially offer help to you -  so if don't
> mind I will turn this is into a list topic;

This email is CCed/reply-to to the ML.

> 3. It would be really cool if your changes were in a github fork, so
> that they are publicly visible (obviously posting the patches to the
> list would be sharing it as well). An example of this is Peter who is
> maintaining a fork of the as3-amqp client, and I regularly fold his
> changes back into the upstream tree.

I'll start this tree too soon.

- --
Valentino Volonghi aka Dialtone
Now running MacOS X 10.5
Home Page: http://www.twisted.it
http://www.adroll.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjUuqIACgkQ9Llz28widGVg6QCfSSXKo9NYiZLKeAe513wnFgEQ
BdIAoJTMWVlX0McVcTNlz0m5fDeRPe1x
=uDIk
-----END PGP SIGNATURE-----