[rabbitmq-discuss] RabbitMQ failure under high load

Michał Kiędyś michal at kiedys.net
Wed Jun 27 12:19:04 BST 2012


Dear Simon,

My tool uses company internal libraries, so I can not publish it.
Would you like to get more details of this test to be able to play it on
your own?

Regards,
MK

2012/6/27 Simon MacMullen <simon at rabbitmq.com>

> Hi Michał - please can you keep rabbitmq-discuss on CC?
>
> So as I said, the limit is only the point at which Rabbit stops accepting
> new messages. In the general case this should be enough to stop further
> memory consumption - but in your case it looks like it isn't. If you were
> able to post your test tool in a way that would make it easy for us to run,
> then that might be the easiest way for us to help you. At the moment we
> just don't have enough information.
>
> Cheers, Simon
>
>
> On 27/06/12 09:36, Michał Kiędyś wrote:
>
>> Simon,
>>
>> My question becomes from fact, that Rabbit can consume even more than
>> 4GB when limit is set to 1.6GB.
>> At this scenario raports usage at 2.7GB but real usage is more than 4GB.
>>
>> rabbit at arch-task-mq -8
>> <http://arch-task-mq-7:55672/#**/nodes/rabbit%40arch-task-mq-8<http://arch-task-mq-7:55672/#/nodes/rabbit%40arch-task-mq-8>
>> **>
>>
>> 734 / 1024
>> 701 / 829
>> 5795 / 1048576
>> 2.7GB (?)
>> _1.6GB high watermark
>> 49.6GB
>> _4.0GB low watermark 12m 33sRAM
>>
>>
>>
>> After a while kernel kills Rabbit process:
>>
>> Mem-info:
>> DMA per-cpu:
>> cpu 0 hot: high 186, batch 31 used:8
>> cpu 0 cold: high 62, batch 15 used:48
>> cpu 1 hot: high 186, batch 31 used:108
>> cpu 1 cold: high 62, batch 15 used:55
>> cpu 2 hot: high 186, batch 31 used:118
>> cpu 2 cold: high 62, batch 15 used:53
>> cpu 3 hot: high 186, batch 31 used:89
>> cpu 3 cold: high 62, batch 15 used:55
>> DMA32 per-cpu: empty
>> Normal per-cpu: empty
>> HighMem per-cpu: empty
>> Free pages:       12076kB (0kB HighMem)
>> Active:0 inactive:741324 dirty:0 writeback:9 unstable:0 free:3023
>> slab:101876 mapped:3649 pagetables:2586
>> DMA free:12092kB min:8196kB low:10244kB high:12292kB active:0kB
>> inactive:2965168kB present:4202496kB pages_scanned:32 all_unreclaimable?
>> no
>> lowmem_reserve[]: 0 0 0 0
>> DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
>> present:0kB pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0 0
>> Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
>> present:0kB pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0 0
>> HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
>> present:0kB pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0 0
>> DMA: 172*4kB 533*8kB 170*16kB 41*32kB 11*64kB 1*128kB 1*256kB 1*512kB
>> 0*1024kB 1*2048kB 0*4096kB = 12632kB
>> DMA32: empty
>> Normal: empty
>> HighMem: empty
>> Swap cache: add 4358, delete 4243, find 0/0, race 0+0
>> Free swap  = 1031136kB
>> Total swap = 1048568kB
>> Free swap:       1031136kB
>> 1050624 pages of RAM
>> 26588 reserved pages
>> 17300 pages shared
>> 83 pages swap cached
>> Out of Memory: Kill process 2213 (rabbitmq-server) score 14598295 and
>> children.
>> Out of memory: Killed process 2227 (beam.smp).
>>
>>
>>
>> This is Ok?
>>
>>
>> Regards,
>> MK
>>
>> 2012/6/22 Simon MacMullen <simon at rabbitmq.com <mailto:simon at rabbitmq.com
>> >>
>>
>>
>>    Hi Michał.
>>
>>    This is quite vague - if we can't see the source of your test tool
>>    it's hard to see what it's actually doing.
>>
>>    The server can use more memory than the high watermark; that's just
>>    the point at which it stops accepting new messages from the network.
>>    This should greatly cut the extent to which it can consume more
>>    memory, but will not eliminate it.
>>
>>    There is an existing issue where the processes used by connections
>>    do not close when the connection is closed and memory use is above
>>    the watermark. When the memory use drops the processes will go.
>>    Could your test application be opening new connections?
>>
>>    Also, you say:
>>
>>
>>        The readers has been disconnected by the server ahead of time.
>>
>>
>>    does this mean that huge numbers of messages are building up in the
>>    server? Note that in the default configuration there is a
>>    per-message cost in memory of a hundred bytes or so even when the
>>    message has been paged out to disc, so that might explain why so
>>    much memory is being used.
>>
>>    I hope this helps explain what you are seeing. But I'm not exactly
>>    sure what you are doing...
>>
>>    Cheers, Simon
>>
>>
>>    On 22/06/12 14:09, Michał Kiędyś wrote:
>>
>>        Hi,
>>
>>        Software version: 2.8.2
>>        The cluster has been stressed with 1000 writers and 100 readers.
>>        Message
>>        size is 100kB.
>>        Test configuration:
>>
>>        _readers node #1_
>>
>>        test.ConnectionPerWorker=true
>>        test.WritersCount=0
>>        test.ReadersCount=33
>>        test.Durable=true
>>        test.QueuesCount=1
>>        test.AutoAck=false
>>        test.ExchangeType=direct
>>        test.QueueNamePrefix=direct
>>        test.Host=arch-task-mq-7.atm
>>
>>        _readers node #2_
>>
>>        test.ConnectionPerWorker=true
>>        test.WritersCount=0
>>        test.ReadersCount=33
>>        test.Durable=true
>>        test.QueuesCount=1
>>        test.AutoAck=false
>>        test.ExchangeType=direct
>>        test.QueueNamePrefix=direct
>>        test.Host=arch-task-mq-8.atm
>>
>>        _readers node #3_
>>
>>        test.ConnectionPerWorker=true
>>        test.WritersCount=0
>>        test.ReadersCount=33
>>        test.Durable=true
>>        test.QueuesCount=1
>>        test.AutoAck=false
>>        test.ExchangeType=direct
>>        test.QueueNamePrefix=direct
>>        test.Host=arch-task-mq-8.atm
>>
>>        _writers node #4_
>>
>>        test.ConnectionPerWorker=true
>>        test.WritersCount=333
>>        test.ReadersCount=0
>>        test.Durable=true
>>        test.QueuesCount=1
>>        test.AutoAck=false
>>        test.ExchangeType=direct
>>        test.QueueNamePrefix=direct
>>        test.BodySize=102400
>>        # available units: s(seconds), m(minutes), h(hours) d(days)
>>        test.TestDuration=3h
>>        test.Host=arch-task-mq-8.atm
>>
>>        writers node #5
>>        test.ConnectionPerWorker=true
>>        test.WritersCount=333
>>        test.ReadersCount=0
>>        test.Durable=true
>>        test.QueuesCount=1
>>        test.AutoAck=false
>>        test.ExchangeType=direct
>>        test.QueueNamePrefix=direct
>>        test.BodySize=102400
>>        # available units: s(seconds), m(minutes), h(hours) d(days)
>>        test.TestDuration=3h
>>        test.Host=arch-task-mq-7.atm
>>
>>        writers node #6
>>        test.ConnectionPerWorker=true
>>        test.WritersCount=334
>>        test.ReadersCount=0
>>        test.Durable=true
>>        test.QueuesCount=1
>>        test.AutoAck=false
>>        test.ExchangeType=direct
>>        test.QueueNamePrefix=direct
>>        test.BodySize=102400
>>        # available units: s(seconds), m(minutes), h(hours) d(days)
>>        test.TestDuration=3h
>>        test.Host=arch-task-mq-8.atm
>>
>>
>>        _Actual tests state:_
>>
>>        Running worker-1000w-100r-100kB
>>        Preparing tests on arch-task-mq-1
>>        Preparing tests on arch-task-mq-2
>>        Preparing tests on arch-task-mq-3
>>        Preparing tests on arch-task-mq-4
>>        Preparing tests on arch-task-mq-5
>>        Preparing tests on arch-task-mq-6
>>        Preparations done, starting testing procedure
>>        Start tests on arch-task-mq-1
>>        Start tests on arch-task-mq-2
>>        Start tests on arch-task-mq-3
>>        Start tests on arch-task-mq-4
>>        Start tests on arch-task-mq-5
>>        Start tests on arch-task-mq-6
>>        Waiting for tests to finish
>>        Tests done on arch-task-mq-5
>>        Tests done on arch-task-mq-6
>>        Tests done on arch-task-mq-4
>>
>>
>>        The readers has been disconnected by the server ahead of time.
>>
>>
>>        _Actual cluster state (data from Management Plugin view):_
>>
>>        Name                   File descriptors (?)           Socket
>>        descriptors
>>        (?)           Erlang processes      Memory
>>  Disk
>>        space     Uptime     Type
>>                                            (used / available)
>>           (used
>>        / available)                  (used / available)
>>        rabit at arch-task-mq-7    392 / 1024                     334 / 829
>>                              2885 / 1048576        540.2MB
>>          49.6GB          21h 14m  Disc Stats *
>>
>>
>>          1.6GB high watermark 4.0GB low watermark
>>        rabbit at arch-task-mq-8  692 / 1024                     668 / 829
>>                               5522 / 1048576        1.8GB (?)
>>          46.1GB          21h 16m  RAM
>>
>>
>>          1.6GB high watermark 4.0GB low watermark
>>
>>        Number of processes is growing all the time even though no
>>        messages are
>>        not published or received.
>>        All publishers has been blocked. After some time I killed the
>>        publisher processes, but RabbitMQ still sees them as connected and
>>        blocked. :)
>>
>>        Some logs:
>>
>>        mkiedys at arch-task-mq-8:/var/__**log/rabbitmq$ cat
>>
>>        rabbit at arch-task-mq-8.log
>>        |grep vm_memory_high|tail -n 20
>>        vm_memory_high_watermark clear. Memory used:1709148224
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2135174984
>>        <tel:2135174984> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1593121728
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2043534608
>>        <tel:2043534608> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1681947128
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2088225952
>>        <tel:2088225952> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1710494800
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2208875080
>>        allowed:1717986918
>>        vm_memory_high_watermark clear. Memory used:1713902032
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2122564032
>>        <tel:2122564032> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1663616264
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2098909664
>>        <tel:2098909664> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1712666136
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2088814360
>>        <tel:2088814360> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1640273568
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2116966952
>>        allowed:1717986918
>>        vm_memory_high_watermark clear. Memory used:1715305176
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2186572648
>>        <tel:2186572648> allowed:1717986918
>>
>>        vm_memory_high_watermark clear. Memory used:1716620504
>>        allowed:1717986918
>>        vm_memory_high_watermark set. Memory used:2180898440
>>        allowed:1717986918
>>
>>        mkiedys at arch-task-mq-8:/var/__**log/rabbitmq$ cat
>>
>>        rabbit at arch-task-mq-8.log
>>        |grep vm_memory_high|wc -l
>>        2935
>>
>>        Why does the server consumes more memory than 1.6GB limit?
>>
>>        Regards,
>>        MK
>>
>>
>>
>>        ______________________________**___________________
>>        rabbitmq-discuss mailing list
>>        rabbitmq-discuss at lists.__rabbi**tmq.com <http://rabbitmq.com>
>>        <mailto:rabbitmq-discuss@**lists.rabbitmq.com<rabbitmq-discuss at lists.rabbitmq.com>
>> >
>>        https://lists.rabbitmq.com/__**cgi-bin/mailman/listinfo/__**
>> rabbitmq-discuss<https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss>
>>
>>        <https://lists.rabbitmq.com/**cgi-bin/mailman/listinfo/**
>> rabbitmq-discuss<https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>> >
>>
>>
>>
>>    --
>>    Simon MacMullen
>>    RabbitMQ, VMware
>>
>>
>>
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120627/f6acb0b7/attachment.htm>


More information about the rabbitmq-discuss mailing list