[rabbitmq-discuss] Channel takes everyone down

Thu Jul 21 14:11:34 BST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>Really though, why would your processes be exiting with a non-normal
>reason? That suggests some sort of crash/bug and thus tearing down the
>connection _could_ be considered the right thing to do. One way around
>this though would be to use several connections rather than several
>channels.

I have many consumers that listen for a given type of request. They
spawn handlers for each request to see if they should handle this
particular request. Those handlers will exit non-normally because there
are a lot of pattern-matching-as-assertion lines to make sure the
handler is processing a request it knows how to respond to, a la "let it
crash". We have redundancy in the cluster so that segment A that has
clients A-M and segment B with clients N-Z (not really how its done)
will both receive a call for client P. Both segments will spawn at least
two handlers for the call each, but those handlers on segment A will
crash pretty quickly, while segment B will happily process the
particular call.

Mmore specifically, for example, there is a listener for incoming
telephone calls that spawns a handler for that particular call-id.
Events for that call are published with the routing key
call.event.CALLID. This listener can crash for a variety of reasons,
valid or not, but those moments it was out of service mean it can't know
the state of the call (having a durable queue store those messages isn't
an option because we run primarily with ram_nodes since we want to be
able to put this on an embedded device that may not have a hard-drive,
or may have a flash-card which we want to avoid writing to).

Each call handler has a channel to consume over, but since call handlers
can live a few milliseconds or a few minutes or even hours (depending on
how long the call lasts), I've found it hard to simulate various call
volumes to track down when and why my channel/consumer monitoring
process "orphans" a channel (the consumer goes down but the channel
isn't closed). So my quick fix was to link them directly.

I've stayed away from multiple connections as I thought channels were a
lighter-weight concept than connections. One day, I'd like to see if we
can't increase throughput by having multiple connections as well, but
for now what we have serves us well. And, like I said, this issue is
relatively easy to clean up manually and only becomes a problem after
weeks of running, but when we hit the embedded device phase, our memory
constraints will start to expose these types of leaks faster.

Sorry for the ramble. Will be watching for those updates!

Take care,

James

- -- 
James Aimonetti
Distributed Systems Engineer / DJ MC_

2600hz | http://2600hz.com
sip:james at 2600hz.com
tel: 415.886.7905
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOKCWGAAoJENc77s1OYoGgHdoIAK0bjjm3ZWArxzkVuih3WcmL
weWry4wBpLZ1stc+JzLBoFYgzdPVeGO+1BHC1pkkqrygd1YMYgpCn8NRuIxpFpbE
D9QjwsNz+eMJAiBR+PCHt96PEYi7W29w0wjoTSvMdZWsXro1FMWIu0cYslJ1zcgd
BA6xoNEBPj1CpOCFsbT3RkBpr130lpd14KLHl2N//LQPm7GdkTF3IXCwuuK5xcFI
1saTYM7X1aPTt1Bzf6q7HFIJsB528Pm1b6kyaNuhLvacz0psUMYGe8RTnutGDRtV
c5eZAKNk7jmBrFDgk6GNZBLKXzOe1F5pAu2zb+U8d4KPDWkGiB8g7E45LZcUKo0=
=fBi6
-----END PGP SIGNATURE-----