[rabbitmq-discuss] Shovel stops receiving acks from cluster

Matthias Radestock matthias at rabbitmq.com
Thu Aug 23 15:00:36 BST 2012


Jon,

On 22/08/12 17:04, Jon Bergli Heier wrote:
> I just set up two new VMs with RabbitMQ and clustered them. I was still
> able to reproduce the problem, but compared to one or two tries it took
> me 12 tries (of stopping and starting nodes). I also tried disabling the other
> shovels on the shovel node (by commenting them out in the config) and removing
> all other queues and exchanges, with that it took 15 tries.
>
> I attached the results of the same commands with the new setup, qa-test1 and
> qa-test2 are the new nodes I set up. qa-test1 was the trigger in both cases.

We are pretty sure this isn't a problem with the shovel but rather some 
edge case in the combination of ha queues and confirms.

Unfortunately we still have not been able to reproduce this :(

So the next best thing we can do is grab some more state from your 
shovel destination cluster...

1) Repeat the above test with the minimal shovel config until the 
"stuck" state is reached.

2) capture both the logs and sasl logs for rabbit1 and rabbit2 - we only 
need to see what happened in this test run, so feel free to discard 
anything before then

3) run 'rabbitmqctl report > report.txt' for rabbit1 (all subsequent 
rabbitmqctl invocations below are for rabbit1 too)

4) Look for the "Channels" section in the result of (3). It should 
contain one channel with a messages_unconfirmed of 1000. Grab the pid of 
that - the first column, something looking like <rabbit1 at i.1.755.0> - 
and feed it into the following:

rabbitmqctl eval 'sys:get_status(rabbit_misc:string_to_pid("ThePid")).' 
 > channel.txt

(replacing ThePid with the pid you obtained, including the <>).

5) Look for the "Queues" section in the result of (3). It should contain 
the shovel's destination queue. Grab the pid of that - the first column 
- and feed it into the same eval as above, capturing the output in queue.txt

6) Same as (6) but this time grab the slave pid from the list of slaves 
in the 7th column and feed the eval output into slave.txt

7) Send us everything captured in the above steps


Note that some of the evals might take a few seconds to complete.


Regards,

Matthias.


More information about the rabbitmq-discuss mailing list