[rabbitmq-discuss] latency in starting/stopping a rabbitmq node

Alexandru Scvorţov alexandru at rabbitmq.com
Mon Sep 12 15:51:25 BST 2011


Hi Praveen,

Thanks a lot for the info.

I can now replicate the slow start-up.  The good news is that the
broker eventually does start-up (I left it overnight, so I'm not sure
how long it actually took), but it's not immediately obvious how to
speed that up.  We'll look into it further.

The fix to speed up the shutdown didn't make it into 2.6.1 :,(  It'll
probably be in the release after that.

> It says that it is rebuilding the index from scratch..and that mnesia is
> overloaded with  write_threshold and then time_threshold.
> I'm not very sure I understand what they really mean. :(

That's not really a problem; that just indicates that the database is
being used a lot.

> Can you please tell me if these configs are ok, or am I missing something?

They look fine.

There's not much you can do about the slow shutdown and slow recovery
right now.  Sorry.

Cheers,
Alex

On Fri, Sep 09, 2011 at 11:57:09AM -0700, Praveen M wrote:
> Hi Alex, thanks for your email. That helped a lot.
> 
> To answer your question about the hang in the "starting exchange, queue and
> binding recovery.." step on creating 100,000 durable queues and restarting
> the broker,
> 
> *Is it really hung?  Is it using the CPU or disk at all at this time?  Is
> there anything in the logs (both the rabbit and SASL logs)?*
> *
> *
> The SASL log doesn't have anything. But the rabbit log has something.
> 
> I have attached the .log file for your reference.
> 
> It says that it is rebuilding the index from scratch..and that mnesia is
> overloaded with  write_threshold and then time_threshold.
> I'm not very sure I understand what they really mean. :(
> 
> My /etc/rabbitmq/rabbitmq.config file entry is as follows:
> 
> [ {mnesia, [{dump_log_write_threshold, 50000}, {dc_dump_limit, 40}]},
> {rabbit, [{vm_memory_high_watermark, 0.34}]}].
> 
> Can you please tell me if these configs are ok, or am I missing something?
> 
> Also, I checked the IO and CPU...when I just start the broker after the
> 100,000 queues creation
> both IO and CPU shoots up for the first minute, but then when everything
> required is fetched to
> memory there is no activity in IO. But CPU consistently stays up.
> 
> From top  the values are like below ~ and the CPU almost always stays up and
> it never goes down.
> 
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 13997 root      20   0 3184m 2.5g   2324 S  210      21.2       8:17.97
>  beam.smp
> 
> Feel free to let me know if you need more info. I can provide you with
> memory dumps and stack traces if required.
> 
> Thanks a lot for your help.
> Praveen
> 
> 
> 
> On Fri, Sep 9, 2011 at 3:12 AM, Alexandru Scvorţov
> <alexandru at rabbitmq.com>wrote:
> 
> > Hi Praveen,
> >
> > > However I realized when i wanted to shutdown the broker before starting a
> > > new test, the stop command (rabbitmqctl stop) took a long time
> > > to complete.
> >
> > We are aware of this problem.  The fix is currently going through QA and
> > will probably be in the next release, which should be around fairly
> > soon.
> >
> > > Query 1)
> > > I am curious as to what causes the latency to stop the broker when issued
> > a
> > > rabbitmqctl stop command. It seems to be something to do with the number
> > of
> > > queues created as the stop time increase proportionally as the number of
> > > queues increases.
> >
> > Internally, when we terminate a queue, we do a few file operations.  This
> > is usually not a problem, but when you close a connection with 100 000s
> > of queues, the same order of file operations get scheduled.  Erlang's
> > IO system then does some expensive operations of this long queue and
> > it ends up processing the operations in quadratic time.  The fix going
> > through QA brings this down to linear time; for instance, I can delete
> > 40k queues in 20s (compared to 211s on the latest release).
> >
> > > Query 2)
> > > In the case of durable queues, I measured the time taken to restart the
> > > broker after stopping it (a clean and unclean stop).
> > > I found that even after a clean/unclean stop the time to restart the
> > broker
> > > was just about 20 seconds on an average.
> > > However, in the case where i created 50000 durable queues and did an
> > unclean
> > > stop(just aborted the broker) and tried to restart the broker it didn't
> > > start for over to 6 minutes (when I gave up)...
> > > It was hung in the step of "starting exchange,queue and binding
> > recovery.."
> > > It will be great if someone could explain why this could be caused.
> >
> > I can't reproduce this.  Declaring 100 000 durable queues, killing the
> > broker
> > and re-starting it seems to work fine.  It takes about 1 min on my
> > machine.
> >
> > Is it really hung?  Is it using the CPU or disk at all at this time?  Is
> > there anything in the logs (both the rabbit and SASL logs)?
> >
> > > It will be great if someone could answer the above queries or provide me
> > > with some pointers about the same.
> >
> > There's not much you can do at the moment except avoiding terminating a
> > large number of queues at the same time.
> >
> > Hope this clears things up.
> >
> > Cheers,
> > Alex
> >
> > On Thu, Sep 08, 2011 at 07:38:14PM -0700, Praveen M wrote:
> > > Hi,
> > >
> > > I'm a rabbitmq newbie and am trying to run some experiments to figure out
> > if
> > > rabbitmq would serve my use case.
> > >
> > > I would like to create queues in the order of 100,000s. (one for each of
> > my
> > > customers).
> > >
> > > I ran various tests,
> > >
> > > I'm using the latest 2.6.0 server and 2.6.0 client, and the following
> > tests
> > > in durable queues mode and in non-durable queues mode.
> > >
> > > Tests,
> > > 1) to create 1000 queues , produce, consume
> > > 2) to create 10000 queues , produce, consume
> > > 3) to create 50000 queues, produce and consume.
> > >
> > > It works like a charm and the memory usage even with 50,000 queues seem
> > very
> > > reasonable. (the order of 1-1.7G)
> > >
> > > However I realized when i wanted to shutdown the broker before starting a
> > > new test, the stop command (rabbitmqctl stop) took a long time
> > > to complete.
> > >
> > > I made a small chart of how long the stop command on the broker takes to
> > > execute after the test creates 'N' queues listed below.
> > > Also, in the case of durable queues, i found some weird numbers for the
> > time
> > > taken to restart the queues after a clean/unclean(aborting broker) stop
> > >
> > > *NON_DURABLE_QUEUES TEST*
> > > *No of Queues Stop Time*
> > > 1000 10.7 seconds
> > > 10000 2 minutes
> > > 50000 11 minutes
> > >
> > > *
> > > *
> > > *DURABLE_QUEUES TEST
> > > *No of Queues Start Time Stop Time*
> > > 1000 2 seconds 10 seconds
> > > 10000 24 seconds 2 minutes
> > > 10000 after crash it recovers in 20 seconds (on improper shutdown).
> > > 50000 even at 6 minutes the queues doesn't start on a improper shutdown
> > >
> > >
> > > Query 1)
> > > I am curious as to what causes the latency to stop the broker when issued
> > a
> > > rabbitmqctl stop command. It seems to be something to do with the number
> > of
> > > queues created as the stop time increase proportionally as the number of
> > > queues increases.
> > >
> > > Query 2)
> > > In the case of durable queues, I measured the time taken to restart the
> > > broker after stopping it (a clean and unclean stop).
> > > I found that even after a clean/unclean stop the time to restart the
> > broker
> > > was just about 20 seconds on an average.
> > > However, in the case where i created 50000 durable queues and did an
> > unclean
> > > stop(just aborted the broker) and tried to restart the broker it didn't
> > > start for over to 6 minutes (when I gave up)...
> > > It was hung in the step of "starting exchange,queue and binding
> > recovery.."
> > > It will be great if someone could explain why this could be caused.
> > >
> > > It will be great if someone could answer the above queries or provide me
> > > with some pointers about the same.
> > >
> > > Thank you for your help,
> > > --
> > > -Praveen
> >
> > > _______________________________________________
> > > rabbitmq-discuss mailing list
> > > rabbitmq-discuss at lists.rabbitmq.com
> > > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> >
> >
> 
> 
> -- 
> -Praveen

> [ {mnesia, [{dump_log_write_threshold, 50000}, {dc_dump_limit, 40}]},
> {rabbit, [{vm_memory_high_watermark, 0.34}]}].




More information about the rabbitmq-discuss mailing list