[rabbitmq-discuss] Pika reconnection error

Mon Feb 21 00:30:25 GMT 2011

I'd do a few things:

Since you're using 0.5.2, I'd use the SimpleReconnectionStrategy handler and
instead of extending the class like you are instead create a on_connected
method in your app, utilizing the Connection.addStateChangeHandler() method
to be notified when you are connected, and in this event handler do all of
your setup.

There was a bug report on the 0.5.2 branch with the reconnection handler
system. I did not trace that back, however.

You might be better served to check out the current version, as I have
tested the SRS pretty extensively, including cases where the server goes
away and comes back.  If you use 0.9.3 (or the soon to be released 0.94)
instead of the single callback function with a bool flag in 0.5.2, it has a
callback handler split out for each state:

Connection.add_on_close_callback(callback)
Connection.add_on_open_callback(callback)

In either version, I would suggest that the reconnection strategy class is
not the right place to handle your post-connection setup.

Regards,

Gavin

On Sun, Feb 20, 2011 at 7:13 PM, Jason J. W. Williams <
jasonjwwilliams at gmail.com> wrote:

> BTW this problem only seems to occur when rabbit at Phantome is restarted
> while the consumer is connected to it. Reconnections occur properly
> when restarting rabbit_1 at Phantome and rabbit_2 at Phantome while the
> consumer is connected.
>
> Status of node rabbit_1 at Phantome ...
> [{running_applications,[{rabbit,"RabbitMQ","2.3.0"},
>                        {mnesia,"MNESIA  CXC 138 12","4.4.16"},
>                        {os_mon,"CPO  CXC 138 46","2.2.5"},
>                        {sasl,"SASL  CXC 138 11","2.1.9.2"},
>                        {stdlib,"ERTS  CXC 138 10","1.17.2"},
>                        {kernel,"ERTS  CXC 138 10","2.14.2"}]},
>  {nodes,[{disc,[rabbit_1 at Phantome,rabbit at Phantome]},
>         {ram,[rabbit_2 at Phantome]}]},
>  {running_nodes,[rabbit at Phantome,rabbit_2 at Phantome,rabbit_1 at Phantome]}]
> ...done.
>
> -J
>
> On Sun, Feb 20, 2011 at 5:04 PM, Jason J. W. Williams
> <jasonjwwilliams at gmail.com> wrote:
> > Hi Guys,
> >
> > Having some issues with a test cluster setup. It's a 3-node cluster
> > with 2-disk and 1-RAM nodes. The RAM node is joined to both disk
> > nodes.
> >
> > rabbit at Phantome - disk
> > rabbit_1 at Phantome - disk
> > rabbit_2 at Phantome - RAM
> >
> > In front of the cluster is an HAProxy instance listening on port 5670
> > doing simple round-robin TCP load balancing between the cluster
> > members.
> >
> > As long as all the nodes are running I can have my consumer connect
> > through the load balancer to any node and successfully consume.
> > Likewise with the producer. However, when I shutdown the node the
> > consumer is currently connected to and allow Pika to do the
> > reconnection, I get this error:
> >
> > error: uncaptured python exception, closing channel
> > <pika.asyncore_adapter.RabbitDispatcher connected at 0x632350> (<class
> > 'pika.exceptions.ChannelClosed'>:Connection.Close(class_id = 0,
> > method_id = 0, reply_code = 541, reply_text = 'INTERNAL_ERROR')
> >
> [/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/asyncore.py|read|74]
> >
> [/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/asyncore.py|handle_read_event|413]
> >
> [build/bdist.macosx-10.6-universal/egg/pika/asyncore_adapter.py|handle_read|86]
> >
> [build/bdist.macosx-10.6-universal/egg/pika/connection.py|on_data_available|268]
> > [build/bdist.macosx-10.6-universal/egg/pika/connection.py|_login2|375]
> >
> [build/bdist.macosx-10.6-universal/egg/pika/connection.py|handle_connection_open|201]
> > [cluster_test_consumer.py|on_connection_open|53]
> > [build/bdist.macosx-10.6-universal/egg/pika/spec.py|queue_declare|3003]
> > [build/bdist.macosx-10.6-universal/egg/pika/channel.py|_rpc|187]
> > [build/bdist.macosx-10.6-universal/egg/pika/connection.py|_rpc|326]
> >
> [build/bdist.macosx-10.6-universal/egg/pika/channel.py|wait_for_reply|125]
> > [build/bdist.macosx-10.6-universal/egg/pika/channel.py|_ensure|84])
> >
> > Every few seconds Pika will attempt to reconnect the consumer and the
> > error repeats. The only unusual error in the Rabbit logs occurs on the
> > first reconnection: https://gist.github.com/836441
> >
> > The rest of the time during these errors the logs look like this:
> > https://gist.github.com/836442
> >
> > The consumer is disconnecting and reconnecting to HAProxy because the
> > stats show the connection moving between backend nodes.
> >
> > The producer continues to operate correctly through the load balancer.
> >
> > Consumer code is here: https://gist.github.com/836445
> >
> > Any help/ideas are greatly appreciated.  Thank you in advance.
> >
> > -J
> >
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110220/3b6643f4/attachment-0001.htm>