[rabbitmq-discuss] rabbitmq 2.6.1 cluster failure recovery
alain
alain.dazzi at gmail.com
Tue Oct 4 00:04:47 BST 2011
Hi Simon,
Well I got it to work this time... It is unclear what was causing the
issue because I could ping the other node
and nmap would report that port 5672 was opened. I ended switching to
a different cluster of Ubuntu machines
running 10.04 rather than 11.04 (also the 2 machines I am using are
now on the same subnet). I wired the node ips in
/etc/hosts and bind producer/consumer by names.
I was able to create a mirrored-Q, push messages between a producer
and multiple consumers. The set-up seems to survive
when either one of the nodes goes down and comes back on line. Worked
with 1 disc node + 1 ram node as well as 2 disc nodes.
In my pika producer, receiver code I added ...
ha = {}
ha["x-ha-policy"]="all"
...
# declare queue
channel.queue_declare(queue=qname, passive=False, durable=True,
exclusive=False, auto_delete=False,
arguments=ha)
Thanks!
-A
On Oct 3, 4:26 am, Simon MacMullen <si... at rabbitmq.com> wrote:
> Hi Alain.
>
> When you see timeout_waiting_for_tables, that should mean that the node
> you're trying to start:
>
> * Could not find any other cluster nodes running
>
> * Was not the last node to shut down
>
> From your explanation it sounds like node-1 *is* running while you
> restart node-2 - is that correct? In that case, can node-2 definitely
> see node-1? (i.e. it can ping cumulonimbus)
>
> Cheers, Simon
>
> On 01/10/11 01:25, Alain Dazzi wrote:
>
>
>
>
>
>
>
>
>
> > Hi,
>
> > I can't get my rabbitmq cluster to recover from a dead node. So
> > perhaps someone can help ...
>
> > node-1 (cumulonimbus)
> > Linux cumulonimbus 2.6.38-11-server #50-Ubuntu SMP Mon Sep 12 21:34:27
> > UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
> > ii rabbitmq-server 2.6.1-1
> > root at cumulonimbus:~# ls -1 /usr/lib/rabbitmq/lib/rabbitmq_server-2.6.1/plugins/
> > amqp_client-2.6.1.ez
> > mochiweb-1.3-rmq2.6.1-git9a53dbd.ez
> > rabbitmq_management-2.6.1.ez
> > rabbitmq_management_agent-2.6.1.ez
> > rabbitmq_management_visualiser-2.6.1.ez
> > rabbitmq_mochiweb-2.6.1.ez
> > README
> > webmachine-1.7.0-rmq2.6.1-hg0c4b60a.ez
>
> > node-2 (nuage-informatique)
> > Linux nuage-informatique 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12
> > 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
> > ii rabbitmq-server 2.6.1-1
>
> > 1/ stop both servers and set-up same .erlang_cookie value; restart nodes
>
> > 2/ on node1 I create a cluster
> > rabbitmqctl stop_app
> > rabbitmqctl reset
> > rabbitmqctl cluster rabbit at nuage-informatique rabbit at cumulonimbus
> > Clustering node rabbit at cumulonimbus with ['rabbit at nuage-informatique',
> > rabbit at cumulonimbus] ...
> > ...done.
>
> > 3/ This creates 2 disc nodes !!!
>
> > 4/ run a test and pass data successfully
>
> > 5/ restart node-2 (service rabbitmq-server stop)
> > service rabbitmq-server start ... fails with ...
> > root at nuage-informatique:~/Desktop# service rabbitmq-server start
> > Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
> > rabbitmq-server.
> > Erlang has closed
> > ^M
> > Crash dump was written to: erl_crash.dump^M
> > Kernel pid terminated (application_controller)
> > ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})^M
>
> > Activating RabbitMQ plugins ...
> > 1 plugins activated:
> > * rabbitmq_management_agent-2.6.1
>
> > +---+ +---+
> > | | | |
> > | | | |
> > | | | |
> > | +---+ +-------+
> > | |
> > | RabbitMQ +---+ |
> > | | | |
> > | v2.6.1 +---+ |
> > | |
> > +-------------------+
> > AMQP 0-9-1 / 0-9 / 0-8
> > Copyright (C) 2007-2011 VMware, Inc.
> > Licensed under the MPL. Seehttp://www.rabbitmq.com/
>
> > node : rabbit at nuage-informatique
> > app descriptor :
> > /usr/lib/rabbitmq/lib/rabbitmq_server-2.6.1/sbin/../ebin/rabbit.app
> > home dir : /var/lib/rabbitmq
> > config file(s) : (none)
> > cookie hash : qHpvLciGsi5o4f8ScVzyWg==
> > log : /var/log/rabbitmq/rab... at nuage-informatique.log
> > sasl log : /var/log/rabbitmq/rab... at nuage-informatique-sasl.log
> > database dir : /var/lib/rabbitmq/mnesia/rabbit at nuage-informatique
> > erlang version : 5.7.4
>
> > -- rabbit boot start
> > starting file handle cache server ...done
> > starting worker pool ...done
> > starting database
> > ...BOOT ERROR: FAILED
> > Reason: {error,
> > {timeout_waiting_for_tables,
> > [rabbit_user,rabbit_user_permission,rabbit_vhost,
> > rabbit_durable_route,rabbit_durable_exchange,
> > rabbit_durable_queue]}}
> > Stacktrace: [{rabbit_mnesia,wait_for_tables,1},
> > {rabbit_mnesia,check_schema_integrity,0},
> > {rabbit_mnesia,ensure_schema_integrity,0},
> > {rabbit_mnesia,init,0},
> > {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
> > {rabbit,run_boot_step,1},
> > {rabbit,'-start/2-lc$^0/1-0-',1},
> > {rabbit,start,2}]
> > {"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}^M
>
> > At this point I have to re-install node-2 to recover.
>
> > Any idea why?
>
> > Thank you,
>
> > next I would like to test mirrored q but obviously this has to work first...
>
> > -Alain
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-disc... at lists.rabbitmq.com
> >https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> --
> Simon MacMullen
> RabbitMQ, VMware
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
More information about the rabbitmq-discuss
mailing list