[rabbitmq-discuss] RabbitMQ Stability Issues with large queue - 2011-12-28

Tue Jan 3 14:10:18 GMT 2012

Hi,

The logfile indicates that the memory alarm has triggered, preventing
further publishes. A broker in this state will only allow fetching. Is
this what you see? What are the symptoms of the crash you observed?

According to the report your one queue alone uses 3.7Gb while the total
memory limit is only 4.8Gb. I would suggest lowering the
vm_memory_high_watermark to the default value of 0.4 or lower so that
the broker starts conserving memory sooner.

If you see segfaults then consider disabling HiPE. This feature is
experimental and could be the cause of your problem.

On 28/12/11 22:20, DawgTool wrote:
> ==> dc001.log <==
> =INFO REPORT==== 28-Dec-2011::17:00:00 ===
> vm_memory_high_watermark set. Memory used:8662988728 allowed:5153960755
> 
> =INFO REPORT==== 28-Dec-2011::17:00:00 ===
>     alarm_handler: {set,{{vm_memory_high_watermark,'dc001 at rmquat-m01'},[]}}

>  {vm_memory_high_watermark,0.5999999999767169},
>  {vm_memory_limit,5153960755}]

>  {vm_memory_high_watermark,0.6}]

>>  {rabbit,                    [{vm_memory_high_watermark, 0.6},
>>                               {collect_statistics_interval, 5000},
>>                               {hipe_compile, true}
>>                              ]
>>  },
>>  {rabbitmq_management,       [ {http_log_dir, "/data/rabbitmq/dc001/rabbit-mgmt"} ] },
>>  {rabbitmq_management_agent, [ {force_fine_statistics, true} ] }

>> [17:32] <dawgtool> background. doing some testing on 2.7.0 : 4 servers 2vcpu, 8gb ram, 80gb disc.
>> [17:32] <dawgtool> cluser is setup all disc, currently one exchange durable fanout
>> [17:33] <dawgtool> one queue also durable bind to the exchange.
>> [17:33] <dawgtool> i'm pushing about 5M records, payload is ~500bytes each record
>> [17:34] <dawgtool> rate is about 14k/s (which seems pretty slow)
>> [17:35] <dawgtool> but my problem is, I'm testing a case where they consumers are busy or unavailable, so the queues would be filling up.
>> [17:35] <dawgtool> even after slowing the publish rate to about 4k/s the mirrored queue does not complete on any of the clusters nodes other then master.
>> [17:37] <dawgtool> memory seems to be the biggest issue here, as the servers will grow passed the high water mark, and eventually crash one at a time.
>> [17:37] <dawgtool> once they are restarted, most servers in the cluster will have about 200k to 300k of messages in their queue
>> [17:40] <dawgtool> so question is, why is so much memory being consumed (on disk these records are about 5.5GB) RabbitMQ pushes to 7.9 real, 11.9 virtual (swapping).
>> [17:40] <dawgtool> why is the queue not stopping the publishers (RAM based clusters seem to stop the publisher until it can be spilled to disk)
>> [17:41] <dawgtool> Why is mirroring unreliable in this test.