[rabbitmq-discuss] Memory Management Concerns / Questions
andy.berryman at channeladvisor.com
Thu Dec 22 19:35:44 GMT 2011
Thanks for the quick reply guys. With the help of one of my systems
guys, I think that we have made some progress on this issue. After
performing further tests with my application and doing some close
monitoring on the server node itself. We noticed some unexpected
messages being written to disk for the queue which is transient. That
immediately confused us. And after some creative google'ing we came
across the following page which helped us a great deal ...
Given that information and some tweaks to the tests, I was finally
able to reproduce the memory throttling of publishers that I was
expecting to see. And I think that I have a better understanding now
for the RAM usage and statistics and their tie to the watermark value.
But this gets me to my next concern ... The throttling of the
publishers appears to be a blocking operation from within the
"BasicPublish" method and as best I can tell, I'm not seeing any sort
of timeout. This indefinite blocking would be pretty bad if it were
to occur in my production environment. Is there a way that I can
specify some sort of timeout for the blocking operation? I see an
overload for "BasicPublish" which has boolean params for "immediate"
and "mandatory". Are either of those meant to help in this situation?
On Dec 22, 1:31 pm, Steve Powell <st... at rabbitmq.com> wrote:
> By the 'lastest release' I presume you mean 2.7.0 (a week ago that would be
> true, but now we have released 2.7.1.)
> Please can you show us your rabbitmq log after a crash? The test environment
> case would be interesting, though the production system is probably
> experiencing issues which are of an application nature.
> If the consumers were failing (getting exceptions) for some application reason,
> and they were responsible for acknowledging the messages, then it is entirely
> likely that the messages they failed to process are being re-queued, and the
> queue is building up without being drained. The application exceptions are
> therefore very interesting, and you should take care that a consumer should
> acknowledge the message WHEN IT HAS BEEN DEALT WITH -- even if that means it
> was an error that has been logged/passed on, or whatever.
> In the latest release message re-queuing preserves the order (for a single
> consumer) so a failing message might reappear in the queue at the head -- this
> would cause it to be re-processed more-or-less straight away, and if it is a
> message payload logical error of some kind, this is likely to fail again --
> and so on. The previous releases did not try to preserve message order, and
> so the failing messages could be overtaken by non-failing ones. This would not
> show up as a bottle-neck during high-load.
> I'm interested in your RAM configuration. It is entirely possible for rabbitmq
> to run out of memory even if there is a threshold set. Continual high
> publication rates, especially with new publishers all the time, will not be
> blocked entirely even then. This might mean that the test you ran is giving
> you misleading information.
> When the memory piles up you could also issue a rabbitmqctl report, which
> should tell us the general situation.
> Steve Powell (a perplexed bunny)
> ----------some more definitions from the SPD----------
> avoirdupois (phr.) 'Would you like peas with that?'
> distribute (v.) To denigrate an award ceremony.
> definite (phr.) 'It's hard of hearing, I think.'
> modest (n.) The most mod.
> On 22 Dec 2011, at 16:12, AndyB wrote:
> > I'm running into some problems that I'm hoping that someone here can
> > help me with. Let me first state my setup. I'm running the latest
> > release of RabbitMQ on CentOS (64bit) in a clustered configuration
> > with 2 nodes in my Production environment. In my development
> > environment I only have 1 node though. My code is written in C# and
> > is using the downloaded SDK from the website. The memory
> > configuration value is set to 40% as the default.
> > I ran into a problem in my Production environment a week ago where
> > work was building up on my queue faster than my consumers were
> > processing it. Unfortunately I wasnt able to perform any debugging or
> > metrics gathering before the system was recycled or the queue was
> > purged. I'm still not exactly sure exactly which happened. But what
> > I can tell you is that it looked to me like exceptions were occurring
> > for both the publishers and consumers and nothing was really
> > happening. I have since added more consumers and have not had the
> > problem occur again. But this obviously has me concerned.
> > I am currently in the process of trying to reproduce this problem in
> > my development environment, but I'm running into some confusing
> > results. My test case is to run multiple publishers sending messages
> > over and over as fast as possible while also have a single consumer
> > processing those messages with a delay. The goal is to obviously
> > force messages to pile up in memory on the node to trigger the memory
> > alert. Since I have both publishers and consumers connected, I'm
> > expecting to see the consumers begin to get some sort of exception
> > saying that they cant submit anymore work while the consumer continues
> > to process. But what I'm actually seeing is that the publishers
> > continue piling on and the consumer continues to process, but the
> > machine eventually runs out of disk space and crashes.
> > Is there anyone that can advise me on what I'm doing wrong or help me
> > figure out what changes I can make?
> > Thanks
> > Andy
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-disc... at lists.rabbitmq.com
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
More information about the rabbitmq-discuss