[rabbitmq-discuss] Shovel stops receiving acks from cluster
matthias at rabbitmq.com
Thu Aug 23 15:00:36 BST 2012
On 22/08/12 17:04, Jon Bergli Heier wrote:
> I just set up two new VMs with RabbitMQ and clustered them. I was still
> able to reproduce the problem, but compared to one or two tries it took
> me 12 tries (of stopping and starting nodes). I also tried disabling the other
> shovels on the shovel node (by commenting them out in the config) and removing
> all other queues and exchanges, with that it took 15 tries.
> I attached the results of the same commands with the new setup, qa-test1 and
> qa-test2 are the new nodes I set up. qa-test1 was the trigger in both cases.
We are pretty sure this isn't a problem with the shovel but rather some
edge case in the combination of ha queues and confirms.
Unfortunately we still have not been able to reproduce this :(
So the next best thing we can do is grab some more state from your
shovel destination cluster...
1) Repeat the above test with the minimal shovel config until the
"stuck" state is reached.
2) capture both the logs and sasl logs for rabbit1 and rabbit2 - we only
need to see what happened in this test run, so feel free to discard
anything before then
3) run 'rabbitmqctl report > report.txt' for rabbit1 (all subsequent
rabbitmqctl invocations below are for rabbit1 too)
4) Look for the "Channels" section in the result of (3). It should
contain one channel with a messages_unconfirmed of 1000. Grab the pid of
that - the first column, something looking like <rabbit1 at i.1.755.0> -
and feed it into the following:
rabbitmqctl eval 'sys:get_status(rabbit_misc:string_to_pid("ThePid")).'
(replacing ThePid with the pid you obtained, including the <>).
5) Look for the "Queues" section in the result of (3). It should contain
the shovel's destination queue. Grab the pid of that - the first column
- and feed it into the same eval as above, capturing the output in queue.txt
6) Same as (6) but this time grab the slave pid from the list of slaves
in the 7th column and feed the eval output into slave.txt
7) Send us everything captured in the above steps
Note that some of the evals might take a few seconds to complete.
More information about the rabbitmq-discuss