[rabbitmq-discuss] Active-active crash report

Vadim Chekan kot.begemot at gmail.com
Fri Apr 27 00:27:48 BST 2012


Hi Matthew,

Thanks for your reply. At least web UI reports that all 3 nodes are 2.8.1
so I guess they are  up to date.
Reproducing this bug is difficult. I'll work on it for may be 2 more days,
trying to find a combination which would be reproducible. Very preliminary,
I have impression that "exclusive" queue flag have something to do with it,
but I will experiment with ttl too, as you suggested.

I dont know if it is related, but I had a strange situation when there was
an exclusive autodelete queue without a connection (connection reported
"unknown") and it was impossible to delete it because "resource lock"
error.
Hopefully I'll be able to report more tomorrow.

Thanks,
Vadim.

On Thu, Apr 26, 2012 at 3:31 PM, Matthew Sackman <matthew at rabbitmq.com>wrote:

> Hi Vadim,
>
> On Thu, Apr 26, 2012 at 01:01:20PM -0700, Vadim Chekan wrote:
> > I'm testing my active-active setup (2.8.1, linux 64) and I am randomly
> > running into some crashes when I'm stopping a node. I can stop one node
> > abut another one fails along with it. Below is a crash log.
> >
> > =ERROR REPORT==== 26-Apr-2012::12:15:59 ===
> > Discarding message
> > {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}}
> > from <0.1955.0> to <0.26823.834>
> >  in an old incarnation (2) of this node (3)
>
> I'm worried about these messages. Someone else on this list has seen
> this sort of thing too and it's causing them trouble. I've not seen this
> issue myself in testing which is frustrating. However, that's not the
> cause of your crash in this case (I think).
>
> > ** Generic server <0.1800.0> terminating
> > ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
> > ** When Server state == {state,
> >                             {amqqueue,
> >                                 {resource,<<"/">>,queue,<<"test_29">>},
> >                                 true,false,<0.1433.0>,
> >                                 [{<<"x-ha-policy">>,longstr,<<"all">>},
> >                                  {<<"x-message-ttl">>,signedint,600000}],
> >                                 <0.1799.0>,[],all},
> >                             <0.1801.0>,
> >                             {dict,0,16,16,8,80,48,
> >
> > {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> >                                  []},
> >
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> >                                   [],[]}}},
> >                             #Fun<rabbit_mirror_queue_master.1.2951048>,
> >                             #Fun<rabbit_mirror_queue_master.2.72654940>}
> > ** Reason for termination ==
> > ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
> >     [{rabbit_mirror_queue_coordinator,handle_cast,2},
> >      {gen_server2,handle_msg,2},
> >      {proc_lib,wake_up,3}]}
>
> Well this is very odd. We fixed a bug that looked like this, but it got
> fixed in 2.7.1 (and related to x-ha-policy = nodes. Could you just check
> that you really are running 2.8.1? We're not aware of any bug in this
> area in 2.8.1, but that's certainly not saying there's not one there! Is
> there any particular sequence of events that you can perform that
> reliably triggers this crash? Could you also check the logs of the other
> nodes (both .log and -sasl.log) to see if there's further crash reports
> in there?
>
> Also, there have been discovered lots of bugs relating to the code
> changes made to add DLX support in 2.8.1, especially in relation to HA.
> It's possible one of the issues I found with TTL and HA is causing this.
> 2.8.2 should be out soonish which might introduce fewer new bugs than it
> fixes, but in the mean time, could you try without the TTL and see if
> that helps?
>
> Matthew
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>



-- 


More information about the rabbitmq-discuss mailing list