<div dir="ltr">Hi.<div><br></div><div> I've tried to upgrade my cluster to 3.3.0.</div><div><br></div><div> Disclosure: we had seen some strange artifacts of (maybe mnesia) failures, as there were a few queues that showed up in the lists, but I could never remove them or even query their properties. I hoped that the upgrade would solve this.</div>
<div><br></div><div> Stop node2, node3, node1. (as node1 should run be the last and first)</div><div><br></div><div> node1: apt-get upgrade</div><div> node1: service rabbitmq-server start</div><div><br></div><div><div><br>
</div><div><div>/var/log/rabbitmq/startup_log<br></div><div><br></div><div></div></div><div> RabbitMQ 3.3.0. Copyright (C) 2007-2013 GoPivotal, Inc.</div><div> ## ## Licensed under the MPL. See <a href="http://www.rabbitmq.com/">http://www.rabbitmq.com/</a></div>
<div> ## ##</div><div> ########## Logs: /var/log/rabbitmq/rabbit@node1.log</div><div> ###### ## /var/log/rabbitmq/rabbit@node1-sasl.log</div><div> ##########</div><div> Starting broker...dets: file "/var/lib/rabbitmq/mnesia/rabbit@node1/schema.DAT" not properly closed, repairing ...</div>
<div><br></div><div><br></div><div>BOOT FAILED</div><div>===========</div><div><br></div><div>Timeout contacting cluster nodes: [rabbit@node3,rabbit@node2].</div><div><br></div><div>DIAGNOSTICS</div><div>===========</div>
<div><br></div><div>attempted to contact: [rabbit@node3,rabbit@node2]</div><div><br></div><div>rabbit@node3:</div><div> * found rabbit (port 25672)</div><div> * TCP connection succeeded</div><div> * suggestion: hostname mismatch?</div>
<div> * suggestion: is the cookie set correctly?</div><div>rabbit@node2:</div><div> * found rabbit (port 25672)</div><div> * TCP connection succeeded</div><div> * suggestion: hostname mismatch?</div><div> * suggestion: is the cookie set correctly?</div>
<div><br></div><div>current node details:</div><div>- node name: rabbit@node1</div><div>- home dir: /var/lib/rabbitmq</div><div>- cookie hash: XSquoZCiCPIfZlTilwy1Dg==</div></div><div><br></div><div><div>BOOT FAILED</div>
<div>===========</div><div><br></div><div>Error description:</div><div> {could_not_start,rabbit,</div><div> {bad_return,</div><div> {{rabbit,start,[normal,[]]},</div><div> {'EXIT',</div>
<div> {rabbit,failure_during_boot,</div><div> {error,</div><div> {timeout_waiting_for_tables,</div><div> [rabbit_listener,rabbit_semi_durable_route,</div>
<div> rabbit_route,rabbit_reverse_route,</div><div> rabbit_topic_trie_node,rabbit_topic_trie_edge,</div><div> rabbit_topic_trie_binding,rabbit_exchange,</div>
<div> rabbit_exchange_serial,rabbit_queue,gm_group,</div><div> mirrored_sup_childspec]}}}}}}}</div><div><br></div><div>Log files (may contain more information):</div>
<div> /var/log/rabbitmq/rabbit@node1.log</div><div> /var/log/rabbitmq/rabbit@node1-sasl.log</div><div> </div><div>{"init terminating in do_boot",{rabbit,failure_during_boot,{could_not_start,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot,{error,{timeout_waiting_for_tables,[ra</div>
<div>bbit_listener,rabbit_semi_durable_route,rabbit_route,rabbit_reverse_route,rabbit_topic_trie_node,rabbit_topic_trie_edge,rabbit_topic_trie_binding,rabbit_exchange,rabbit_exchange_serial,rabbit_queue</div><div>,gm_group,mirrored_sup_childspec]}}}}}}}}}</div>
</div><div><br></div><div><br></div><div><br></div><div>WHY is it waiting for node2, node3, if node1 was the last to stop, and it should come online in itself without them?<br></div><div><br></div><div>The cookie is the same, the hostnames are correct.</div>
<div><br></div><div>Although:</div><div><br></div><div>/var/log/rabbitmq/startup_err<br></div><div><div>Crash dump was written to: erl_crash.dump</div><div>init terminating in do_boot ()</div></div><div><br></div><div><br>
</div><div>/var/log/rabbitmq/rabbit@node1-sasl.log <br></div><div><br></div><div><div>=CRASH REPORT==== 2-Apr-2014::19:51:57 ===</div><div> crasher:</div><div> initial call: application_master:init/4</div><div> pid: <0.118.0></div>
<div> registered_name: []</div><div> exception exit: {bad_return,</div><div> {{rabbit,start,[normal,[]]},</div><div> {'EXIT',</div><div> {rabbit,failure_during_boot,</div>
<div> {error,</div><div> {timeout_waiting_for_tables,</div><div> [rabbit_listener,rabbit_semi_durable_route,</div><div> rabbit_route,rabbit_reverse_route,</div>
<div> rabbit_topic_trie_node,rabbit_topic_trie_edge,</div><div> rabbit_topic_trie_binding,rabbit_exchange,</div><div> rabbit_exchange_serial,rabbit_queue,gm_group,</div>
<div> mirrored_sup_childspec]}}}}}}</div><div> in function application_master:init/4 (application_master.erl, line 133)</div><div> ancestors: [<0.117.0>]</div><div> messages: [{'EXIT',<0.119.0>,normal}]</div>
<div> links: [<0.117.0>,<0.7.0>]</div><div> dictionary: []</div><div> trap_exit: true</div><div> status: running</div><div> heap_size: 4185</div><div> stack_size: 27</div><div> reductions: 322</div>
<div> neighbours:</div></div><div><br></div><div>rabbitmq.config:</div><div><br></div><div><div>[{rabbit, [{vm_memory_high_watermark, 0.45},{vm_memory_high_watermark, 0.5},{disk_free_limit, 10000000000},{hipe_compile,false}]}].</div>
</div><div><br></div><div><br></div><div>erl_crash.dump<br></div><div><br></div><div><div>=erl_crash_dump:0.2</div><div>Wed Apr 2 19:51:58 2014</div><div>Slogan: init terminating in do_boot ()</div><div>System version: Erlang R16B01 (erts-5.10.2) [source] [64-bit] [smp:4:4] [async-threads:30] [hipe] [kernel-poll:true]</div>
<div>Compiled: Wed Jul 10 10:49:45 2013</div><div>Taints:</div><div>Atoms: 22590</div><div>=memory</div><div>total: 41536280</div><div>processes: 14801392</div><div>processes_used: 14801320</div><div>system: 26734888</div>
<div>atom: 703377</div><div>atom_used: 673775</div><div>binary: 1330608</div><div>code: 19294518</div><div>ets: 859648</div><div>=hash_table:atom_tab</div><div>size: 19289</div><div>used: 13349</div><div>objs: 22590</div>
<div>depth: 8</div><div>=index_table:atom_tab</div><div>size: 23552</div><div>limit: 1048576</div><div>entries: 22590</div><div>=hash_table:module_code</div><div>size: 397</div><div>used: 288</div><div>objs: 539</div><div>
depth: 5</div><div>=index_table:module_code</div><div>size: 1024</div><div>limit: 65536</div><div>entries: 539</div><div>=hash_table:export_list</div><div>size: 6421</div><div>used: 4793</div></div><div><div>objs: 8988</div>
<div>depth: 6</div><div>=index_table:export_list</div><div>size: 9216</div><div>limit: 524288</div><div>entries: 8988</div><div>=hash_table:export_list</div><div>size: 6421</div><div>used: 4788</div><div>objs: 8956</div>
<div>
depth: 6</div><div>=hash_table:process_reg</div><div>size: 47</div><div>used: 24</div><div>objs: 31</div><div>depth: 2</div><div>=hash_table:fun_table</div><div>size: 3203</div><div>used: 2189</div><div>objs: 4009</div><div>
depth: 6</div><div>=hash_table:node_table</div><div>size: 5</div><div>used: 3</div><div>objs: 4</div><div>depth: 2</div><div>=hash_table:dist_table</div><div>size: 5</div><div>used: 3</div><div>objs: 4</div><div>depth: 2</div>
<div>=allocated_areas</div><div>sys_misc: 126685</div><div>static: 524928</div><div>atom_space: 360536 330934</div><div>atom_table: 342841</div><div>module_table: 187920</div><div>export_table: 375636</div></div><div><div>
export_list: 1581888</div><div>register_table: 468</div><div>fun_table: 25714</div><div>module_refs: 25824</div><div>loaded_code: 17097536</div><div>dist_table: 99875</div><div>node_table: 323</div><div>bits_bufs_size: 0</div>
<div>bif_timer: 80200</div><div>link_lh: 208</div><div>process_table: 12582912</div><div>port_table: 786432</div><div>ets_misc: 49192</div><div>=allocator:sys_alloc</div><div>option e: true</div><div>option m: libc</div>
<div>
option tt: 131072</div><div>option tp: 0</div><div>=allocator:temp_alloc[0]</div><div>versions: 2.1 3.0</div><div>option e: true</div><div>option t: option acul: %d</div><div><br></div><div>option ramv: 5</div><div>option sbct: 5865745</div>
<div>option asbcst: 524288</div><div>option rsbcst: 4145152</div><div>option rsbcmt: 90</div><div>option rmbcmt: 80</div><div>option mmbcs: 100</div><div>option mmsbc: 131072</div><div>option mmmbc: 256</div><div>option lmbcs: 18446744073709551615</div>
<div>option smbcs: 10485760</div><div>option mbcgs: 1048576</div><div>option mbsd: 3</div><div>option as: gf</div><div>mbcs blocks: 0 4 4</div></div><div><div>mbcs blocks size: 0 9848 9848</div><div>mbcs carriers: 1 1 1</div>
<div>mbcs mseg carriers: 1</div><div>mbcs sys_alloc carriers: 0</div><div>mbcs carriers size: 1048576 1048576 1048576</div><div>mbcs mseg carriers size: 1048576</div><div>mbcs sys_alloc carriers size: 0</div><div>sbcs blocks: 0 0 0</div>
<div>sbcs blocks size: 0 0 0</div><div>sbcs carriers: 0 0 0</div><div>sbcs mseg carriers: 0</div><div>sbcs sys_alloc carriers: 0</div><div>sbcs carriers size: 0 0 0</div><div>sbcs mseg carriers size: 0</div><div>sbcs sys_alloc carriers size: 0</div>
<div>temp_alloc calls: 8506</div><div>temp_free calls: 8506</div><div>temp_realloc calls: 1</div><div>mseg_alloc calls: 1</div><div>mseg_dealloc calls: 0</div><div>mseg_realloc calls: 0</div><div>sys_alloc calls: 0</div>
<div>
sys_free calls: 0</div><div>sys_realloc calls: 0</div><div>=allocator:temp_alloc[1]</div><div>versions: 0.9 3.0</div><div>option e: true</div><div>option t: option acul: %d</div><div><br></div><div>option ramv: 5</div><div>
option sbct: 5865745</div><div>option asbcst: 524288</div><div>option rsbcst: 4145152</div><div>option rsbcmt: 90</div><div>option rmbcmt: 80</div><div>option mmbcs: 100</div><div>option mmsbc: 131072</div><div>option mmmbc: 256</div>
</div><div>......</div><div><br></div><div>....</div><div><br></div><div><div>=allocator:alloc_util</div><div>option mmc: 1024</div><div>option ycs: 1048576</div><div>=allocator:instr</div><div>option m: false</div><div>
option s: false</div>
<div>option t: false</div><div>=proc:<0.0.0></div><div>State: Running</div><div>Name: init</div><div>Spawned as: otp_ring0:start/2</div><div>Spawned by: []</div><div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div>
<div>Number of heap fragments: 0</div><div>Heap fragment data: 0</div><div>Link list: [<0.3.0>, <0.7.0>, <0.6.0>]</div><div>Reductions: 57344</div><div>Stack+heap: 6772</div><div>OldHeap: 6772</div><div>
Heap unused: 3114</div><div>OldHeap unused: 6325</div><div>Memory: 109264</div><div>Program counter: 0x00007f94178c6f80 (init:boot_loop/2 + 64)</div><div>CP: 0x0000000000000000 (invalid)</div><div>=proc:<0.3.0></div>
<div>State: Waiting</div><div>Name: erl_prim_loader</div><div>Spawned as: erlang:apply/2</div><div>Spawned by: <0.2.0></div><div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div><div>Number of heap fragments: 0</div>
<div>Heap fragment data: 0</div><div>Link list: [<0.0.0>, #Port<0.0>]</div><div>Reductions: 3093307</div><div>Stack+heap: 1598</div></div><div><div>OldHeap: 4185</div><div>Heap unused: 945</div><div>OldHeap unused: 2668</div>
<div>Memory: 47136</div><div>Program counter: 0x00007f94179369b8 (erl_prim_loader:loop/3 + 176)</div><div>CP: 0x0000000000000000 (invalid)</div><div>arity = 0</div><div>=proc:<0.6.0></div><div>State: Waiting</div><div>
Name: error_logger</div><div>Spawned as: proc_lib:init_p/5</div><div>Spawned by: <0.2.0></div><div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div><div>Number of heap fragments: 0</div><div>
Heap fragment data: 0</div><div>Link list: [<0.30.0>, <0.0.0>, {from,<0.46.0>,#Ref<0.0.0.64>}, {from,<0.47.0>,#Ref<0.0.0.69>}]</div><div>Reductions: 38274</div><div>Stack+heap: 4185</div>
<div>OldHeap: 6772</div><div>Heap unused: 2304</div><div>OldHeap unused: 5548</div><div>Memory: 88776</div><div>Program counter: 0x00007f9417962c50 (gen_event:fetch_msg/5 + 80)</div><div>CP: 0x0000000000000000 (invalid)</div>
<div>arity = 0</div><div>=proc:<0.7.0></div><div>State: Waiting</div><div>Name: application_controller</div><div>Spawned as: erlang:apply/2</div><div>Spawned by: <0.2.0></div><div>Started: Wed Apr 2 19:51:25 2014</div>
<div>Message queue length: 0</div><div>Number of heap fragments: 0</div><div>Heap fragment data: 0</div><div>Link list: [<0.0.0>, <0.35.0>, <0.9.0>]</div><div>Reductions: 208958</div><div>Stack+heap: 6772</div>
</div><div><div>OldHeap: 28690</div><div>Heap unused: 3968</div><div>OldHeap unused: 19495</div><div>Memory: 284712</div><div>Program counter: 0x00007f941638a1c8 (gen_server:loop/6 + 264)</div><div>CP: 0x0000000000000000 (invalid)</div>
<div>arity = 0</div><div>=proc:<0.9.0></div><div>State: Waiting</div><div>Spawned as: proc_lib:init_p/5</div><div>Spawned by: <0.8.0></div><div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div>
<div>Number of heap fragments: 0</div><div>Heap fragment data: 0</div><div>Link list: [<0.10.0>, <0.7.0>]</div><div>Reductions: 44</div><div>Stack+heap: 376</div><div>OldHeap: 376</div><div>Heap unused: 335</div>
<div>OldHeap unused: 171</div><div>Memory: 6992</div><div>Program counter: 0x00007f941639b430 (application_master:main_loop/2 + 64)</div><div>CP: 0x0000000000000000 (invalid)</div><div>arity = 0</div><div>=proc:<0.10.0></div>
<div>State: Waiting</div><div>Spawned as: application_master:start_it/4</div><div>Spawned by: <0.9.0></div><div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div><div>Number of heap fragments: 0</div>
<div>Heap fragment data: 0</div><div>Link list: [<0.11.0>, <0.9.0>]</div><div>Reductions: 69</div><div>Stack+heap: 233</div><div>OldHeap: 0</div><div>Heap unused: 86</div></div><div><div>OldHeap unused: 0</div>
<div>Memory: 2736</div><div>Program counter: 0x00007f941639d720 (application_master:loop_it/4 + 72)</div><div>CP: 0x0000000000000000 (invalid)</div><div>arity = 0</div><div>=proc:<0.11.0></div><div>State: Waiting</div>
<div>Name: kernel_sup</div><div>Spawned as: proc_lib:init_p/5</div><div>Spawned by: <0.10.0></div><div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div><div>Number of heap fragments: 0</div>
<div>
Heap fragment data: 0</div><div>Link list: [<0.10.0>, <0.13.0>, <0.12.0>, <0.18.0>, <0.25.0>, <0.24.0>, <0.16.0>, <0.27.0>, <0.32.0>, <0.31.0>, <0.29.0>, <0.26.0>]</div>
<div>Reductions: 52300</div><div>Stack+heap: 6772</div><div>OldHeap: 28690</div><div>Heap unused: 3529</div><div>OldHeap unused: 24822</div><div>Memory: 285072</div><div>Program counter: 0x00007f941638a1c8 (gen_server:loop/6 + 264)</div>
<div>CP: 0x0000000000000000 (invalid)</div><div>arity = 0</div><div>=proc:<0.12.0></div><div>State: Waiting</div><div>Name: rex</div><div>Spawned as: proc_lib:init_p/5</div><div>Spawned by: <0.11.0></div><div>
Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div><div>Number of heap fragments: 0</div><div>Heap fragment data: 0</div><div>Link list: [<0.11.0>, {from,<22585.2.0>,#Ref<22585.0.0.50>}, {to,<0.155.0>,#Ref<0.0.0.767>}]</div>
<div>Reductions: 858</div><div>Stack+heap: 2586</div><div>OldHeap: 376</div><div>Heap unused: 1188</div></div><div><div>OldHeap unused: 350</div><div>Memory: 24824</div><div>Program counter: 0x00007f941638a1c8 (gen_server:loop/6 + 264)</div>
<div>CP: 0x0000000000000000 (invalid)</div><div>arity = 0</div><div>=proc:<0.13.0></div><div>State: Waiting</div><div>Name: global_name_server</div><div>Spawned as: proc_lib:init_p/5</div><div>Spawned by: <0.11.0></div>
<div>Started: Wed Apr 2 19:51:25 2014</div><div>Message queue length: 0</div><div>Number of heap fragments: 0</div><div>Heap fragment data: 0</div><div>Link list: [<0.11.0>, <0.15.0>, <0.14.0>]</div><div>
Reductions: 757</div><div>Stack+heap: 610</div><div>OldHeap: 376</div><div>Heap unused: 353</div><div>OldHeap unused: 280</div><div>Memory: 8904</div><div>Program counter: 0x00007f941638a1c8 (gen_server:loop/6 + 264)</div>
<div>CP: 0x0000000000000000 (invalid)</div><div>arity = 0</div><div>=proc:<0.14.0></div><div>State: Waiting</div><div>Spawned as: erlang:apply/2</div><div>Spawned by: <0.13.0></div><div>Started: Wed Apr 2 19:51:25 2014</div>
<div>Message queue length: 0</div><div>Number of heap fragments: 0</div><div>Heap fragment data: 0</div><div>Link list: [<0.13.0>]</div><div>Reductions: 1023</div><div>Stack+heap: 1598</div><div>OldHeap: 376</div><div>
Heap unused: 1088</div><div>OldHeap unused: 308</div></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> I can't start node2 and node3, as they were stopped before node1, so they won't start till it gets up or the locks get removed.</div>
<div><br></div><div> So currently I've got a three legged cluster down on its six knees.</div><div><br></div><div>---</div><div><br></div><div> I'm trying to start node2 and node3 to have at least one running but they always fail, even if I copied back the previous disk state, and upgraded again...</div>
<div><br></div><div> Not a single node running currently, looks like I have to: </div><div> - reinstall each rabbitmq-server with clean /var/lib/rabbitmq dirs</div><div> - rebuild cluster</div><div> - rebuild the virtual hosts</div>
<div> - recreate all users and permissions </div><div> - all queues and exchanges and policies</div><div><br></div><div> Thats not the way upgrade should happen. :(</div><div><br></div><div><br></div><div> Peter</div></div>