[rabbitmq-discuss] HA active/active cluster in a bad state

Bryan Murphy bmurphy1976 at gmail.com
Thu Oct 13 21:28:39 BST 2011


On Thu, Oct 13, 2011 at 10:48 AM, Matthew Sackman <matthew at rabbitmq.com>wrote:

> On Thu, Oct 13, 2011 at 10:44:43AM -0500, Bryan Murphy wrote:
> > I'll try to get it into a bad state later today.  If I can manage that, I
> > can easily grant temporary remote access to anybody who needs it.
>
> That'd be great, many thanks. However, please note that almost all of
> the Rabbit team is in London, UK, and so there's the usual fun-and-games
> with timezones...
>
> Matthew
>

I've managed to get it into a bad state, but yet again the behavior is
inconsistent with what I've seen before.  Now, I've got the following
behavior:

/etc/init.d/rabbitmq-server stop works
/etc/init.d/rabbitmq-server start never exits

rabbitmqctl list_queues works
rabbitmqctl cluster_status works

Sending messages to the server fails:

WARNING:root:Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mediafly/bus/__init__.py",
line 111, in publish
    connection = pika.BlockingConnection(host)
  File
"/usr/local/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py",
line 32, in __init__
    BaseConnection.__init__(self, parameters, None, reconnection_strategy)
  File
"/usr/local/lib/python2.7/dist-packages/pika/adapters/base_connection.py",
line 50, in __init__
    reconnection_strategy)
  File "/usr/local/lib/python2.7/dist-packages/pika/connection.py", line
170, in __init__
    self._connect()
  File "/usr/local/lib/python2.7/dist-packages/pika/connection.py", line
228, in _connect
    self.parameters.port or  spec.PORT)
  File
"/usr/local/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py",
line 36, in _adapter_connect
    BaseConnection._adapter_connect(self, host, port)
  File
"/usr/local/lib/python2.7/dist-packages/pika/adapters/base_connection.py",
line 58, in _adapter_connect
    self.socket.connect((host, port))
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

I was able to get it into this state by repeatedly stopping and/or killing
the first node in the cluster and restarting it while simultaneously trying
various ways of tickling it with our application.

startup_log is getting stuck at "starting database" but there's no activity
going on in the cluster and I'd be surprised if I've sent >100 messages
since I provisioned it.

I can provide remote access to whomever needs it, I just need their public
ssh key.

Thanks,
Bryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111013/b5f13bcf/attachment.htm>


More information about the rabbitmq-discuss mailing list