<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">You're aware that there is no eager synchronisation of HA queues, yes?</div>
So it's only by the unsynchronised head of each queue being consumed<br>
that synchronisation occurs.</blockquote><div><br></div><div>Yes, I allow this to happen.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">
Did the previously downed node really come back up and join the cluster</div>
correctly? Does the output of rabbitmqctl cluster_status on each of the<br>
3 nodes report all 3 nodes are running?</blockquote><div><br></div><div>I couldn't tell you that without starting the tests again, but the management plugin reports they are all up, and producers + consumers reconnect to the downed node once it has come back up without problems.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">
> I've also found that sometimes queues stop delivering messages when certain<br>
> nodes go down (even after being left for minutes), despite being in HA mode<br>
> (haven't been able to dig into this more yet).<br>
<br>
</div>When this happens, could you check the logs please on all nodes for any<br>
entries regarding the queues. If a node with a queue master goes down<br>
then there should be entries about some slave on another node being<br>
promoted, but even if it's just a slave that dies, there should be<br>
entries in the logs that show others have noticed that.</blockquote><div><br></div><div>Yes, will look for that.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">
> Sometimes connections to nodes which have gone down are still shown<br>
> and get stuck. <br>
<br>
</div>Interesting. That might be a bug in the mgmt plugin. Does rabbitmqctl<br>
list_connections also show such phantom connections?</blockquote><div><br></div><div>Will check next time I see it - running some more tests this PM.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">> Today, while bringing up the cluster from scratch (shutdown all instances,</div><div class="im">
> wipe mnesia, restart) I've got 3 nodes running, but an HA queue with 1<br>
> master, 2 synced slaves and 1 unsynced slave. Other queues are showing 1<br>
> master and 2 synced slaves as expected. (see<br>
> <a href="http://www.evernote.com/shard/s53/sh/b6345885-88d1-4d21-9614-24abda75a1cb/c2a0dd265b39d21f3e8c336c67ced979" target="_blank">http://www.evernote.com/shard/s53/sh/b6345885-88d1-4d21-9614-24abda75a1cb/c2a0dd265b39d21f3e8c336c67ced979</a><br>
> )<br>
<br>
</div>Well, drain the "unsynced" queue and it'll become synced. </blockquote></div><br clear="all"><div>My issue here is that for this one queue, it claims to have more copies of the queue than nodes in the cluster. Trying to check if this is a plugin bug with list_queues (list_queues name slave_pids synchronised_slave_pids), I can't get the cluster to list them at all, it just sits there (for the last 15 minutes).</div>
<div><br></div><div>A</div>