[rabbitmq-discuss] AMQP restart

tom kelly ttom.kelly at gmail.com
Thu Oct 18 21:09:41 BST 2012


Hi Tim,

Thanks for your answers!

My original question on the links & supervisors came from my need to know
that when my mystery error (cause still unknown but more than likely caused
by myself) occurred I got some type of exit or notification so that i could
do something about it. I've built a wrapper around it now that monitors the
connection & channel so can happily forget about meddling with the
supervision tree :-)

While doing this I've built up more understanding of the library and added
the confirmation & retry handling to my wrapper relatively easily. So I
just want to say good work on rabbitmq-erlang-client guys!

//TTom.


On Tue, Oct 16, 2012 at 12:57 PM, Tim Watson <tim at rabbitmq.com> wrote:

> Hi!
>
> On 15 Oct 2012, at 16:07, tom kelly wrote:
>
> > Hi List,
> > New user here, debugging some code I inherited so apologies if my
> questions below are irrelevant.
> >
> > I'm investigating an amqp crash that I've seen in my logs a few times
> and after a code review of the amqp component I'm a bit concerned that my
> connections may be dying & silently failing when this crash occurs.
> >
>
> Ok, let's look at that then.
>
> > I'm using an older version that unlinked from the process that called
> "start_link", anyone know why that was? I'm publishing through this channel
> by calling amqp_channel:cast, so now I'm worried that if the connection &
> channel were closed down everything that I thought I was publishing after
> this error just silently failed. And because of the unlink there's no way
> the application would have known.
> >
>
> I'm really no too sure about this unlink business, but if you could
> clarify where that was happening then I can probably look through the hg
> logs to try and figure out what was going on there.
>
> > I plan upgrading to the latest version but I'm not sure that it has all
> the features to help solve this problem. I see that the unlink is gone and
> the supervision policy is still:{one_for_all, 0, 1} So I guess this means I
> have to trap exits and I have responsibility for reopening the connection &
> channel if it dies?
>
> My reading of the supervision hierarchy is thus:
>
> The application has a top level simple_one_for_one supervisor for all
> connections, which handles the amqp_connection_sup. This just ensures that
> each connection can actually be started and they connection_sup is
> temporary, so no restarts will ever take place. This is presumably what
> you'd expect, as we're not trying to second guess how long your connections
> need to live for.
>
> The actual connection consists of a few processes - the gen_connection,
> connection_type_sup and channel_sup_sup. This is a one_for_all supervisor
> and the actual gen_connection process is an 'intrinsic' worker, so a
> non-normal exit will kill the supervisor (and sibling processes), but a
> normal exit will take everyone else down cleanly.
>
> Now the channel_sup_sup starts a temporary worker (amqp_channel_sup) and
> that starts an intrinsic worker.  So all in all, it looks to me as if the
> connection and channel will be properly re-established if a non-normal exit
> occurs.
>
> > But before I restart it, what happens to any attempts to publish
> messages? I see there's new confirmation functionality that sounds like it
> might do what's required but from my reading it seems that if amqp_channel
> is shut down after a crash on the connection then all the confirm info is
> discarded. Is there no way to keep this process alive and try to re-open
> the connection immediately on failure?
> >
>
> I'm not really sure what you're asking here, but my reading of the client
> is that if you're expecting a confirm and you've not seen it, then you
> can/should assume the message wasn't accepted by the broker. If you're
> asking about tracking the confirms between channel instances, then yes,
> you'll need to do that yourself, using whatever mechanism suits your design
> (shared/stable storage, stateful parent process, etc).
>
> > I'm just about to plug in the new version and play with the
> confirmations but any explanations of the current design might help
> enormously,
> > Thanks,
> > //TTom.
>
>
> Well I hope my comments have made it a bit clearly and not worse! Please
> *do* feel free to come back with any questions, or to clear up anything
> I've not explained properly.
>
> Cheers,
> Tim
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121018/5574094a/attachment.htm>


More information about the rabbitmq-discuss mailing list