[rabbitmq-discuss] RabbitMQ failure under high load

Wed Jun 27 11:36:24 BST 2012

Hi Michał - please can you keep rabbitmq-discuss on CC?

So as I said, the limit is only the point at which Rabbit stops 
accepting new messages. In the general case this should be enough to 
stop further memory consumption - but in your case it looks like it 
isn't. If you were able to post your test tool in a way that would make 
it easy for us to run, then that might be the easiest way for us to help 
you. At the moment we just don't have enough information.

Cheers, Simon

On 27/06/12 09:36, Michał Kiędyś wrote:
> Simon,
>
> My question becomes from fact, that Rabbit can consume even more than
> 4GB when limit is set to 1.6GB.
> At this scenario raports usage at 2.7GB but real usage is more than 4GB.
>
> rabbit at arch-task-mq -8
> <http://arch-task-mq-7:55672/#/nodes/rabbit%40arch-task-mq-8>
> 734 / 1024
> 701 / 829
> 5795 / 1048576
> 2.7GB (?)
> _1.6GB high watermark
> 49.6GB
> _4.0GB low watermark 12m 33sRAM
>
>
> After a while kernel kills Rabbit process:
>
> Mem-info:
> DMA per-cpu:
> cpu 0 hot: high 186, batch 31 used:8
> cpu 0 cold: high 62, batch 15 used:48
> cpu 1 hot: high 186, batch 31 used:108
> cpu 1 cold: high 62, batch 15 used:55
> cpu 2 hot: high 186, batch 31 used:118
> cpu 2 cold: high 62, batch 15 used:53
> cpu 3 hot: high 186, batch 31 used:89
> cpu 3 cold: high 62, batch 15 used:55
> DMA32 per-cpu: empty
> Normal per-cpu: empty
> HighMem per-cpu: empty
> Free pages:       12076kB (0kB HighMem)
> Active:0 inactive:741324 dirty:0 writeback:9 unstable:0 free:3023
> slab:101876 mapped:3649 pagetables:2586
> DMA free:12092kB min:8196kB low:10244kB high:12292kB active:0kB
> inactive:2965168kB present:4202496kB pages_scanned:32 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
> present:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
> present:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
> present:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> DMA: 172*4kB 533*8kB 170*16kB 41*32kB 11*64kB 1*128kB 1*256kB 1*512kB
> 0*1024kB 1*2048kB 0*4096kB = 12632kB
> DMA32: empty
> Normal: empty
> HighMem: empty
> Swap cache: add 4358, delete 4243, find 0/0, race 0+0
> Free swap  = 1031136kB
> Total swap = 1048568kB
> Free swap:       1031136kB
> 1050624 pages of RAM
> 26588 reserved pages
> 17300 pages shared
> 83 pages swap cached
> Out of Memory: Kill process 2213 (rabbitmq-server) score 14598295 and
> children.
> Out of memory: Killed process 2227 (beam.smp).
>
>
>
> This is Ok?
>
>
> Regards,
> MK
>
> 2012/6/22 Simon MacMullen <simon at rabbitmq.com <mailto:simon at rabbitmq.com>>
>
>     Hi Michał.
>
>     This is quite vague - if we can't see the source of your test tool
>     it's hard to see what it's actually doing.
>
>     The server can use more memory than the high watermark; that's just
>     the point at which it stops accepting new messages from the network.
>     This should greatly cut the extent to which it can consume more
>     memory, but will not eliminate it.
>
>     There is an existing issue where the processes used by connections
>     do not close when the connection is closed and memory use is above
>     the watermark. When the memory use drops the processes will go.
>     Could your test application be opening new connections?
>
>     Also, you say:
>
>
>         The readers has been disconnected by the server ahead of time.
>
>
>     does this mean that huge numbers of messages are building up in the
>     server? Note that in the default configuration there is a
>     per-message cost in memory of a hundred bytes or so even when the
>     message has been paged out to disc, so that might explain why so
>     much memory is being used.
>
>     I hope this helps explain what you are seeing. But I'm not exactly
>     sure what you are doing...
>
>     Cheers, Simon
>
>
>     On 22/06/12 14:09, Michał Kiędyś wrote:
>
>         Hi,
>
>         Software version: 2.8.2
>         The cluster has been stressed with 1000 writers and 100 readers.
>         Message
>         size is 100kB.
>         Test configuration:
>
>         _readers node #1_
>
>         test.ConnectionPerWorker=true
>         test.WritersCount=0
>         test.ReadersCount=33
>         test.Durable=true
>         test.QueuesCount=1
>         test.AutoAck=false
>         test.ExchangeType=direct
>         test.QueueNamePrefix=direct
>         test.Host=arch-task-mq-7.atm
>
>         _readers node #2_
>
>         test.ConnectionPerWorker=true
>         test.WritersCount=0
>         test.ReadersCount=33
>         test.Durable=true
>         test.QueuesCount=1
>         test.AutoAck=false
>         test.ExchangeType=direct
>         test.QueueNamePrefix=direct
>         test.Host=arch-task-mq-8.atm
>
>         _readers node #3_
>
>         test.ConnectionPerWorker=true
>         test.WritersCount=0
>         test.ReadersCount=33
>         test.Durable=true
>         test.QueuesCount=1
>         test.AutoAck=false
>         test.ExchangeType=direct
>         test.QueueNamePrefix=direct
>         test.Host=arch-task-mq-8.atm
>
>         _writers node #4_
>
>         test.ConnectionPerWorker=true
>         test.WritersCount=333
>         test.ReadersCount=0
>         test.Durable=true
>         test.QueuesCount=1
>         test.AutoAck=false
>         test.ExchangeType=direct
>         test.QueueNamePrefix=direct
>         test.BodySize=102400
>         # available units: s(seconds), m(minutes), h(hours) d(days)
>         test.TestDuration=3h
>         test.Host=arch-task-mq-8.atm
>
>         writers node #5
>         test.ConnectionPerWorker=true
>         test.WritersCount=333
>         test.ReadersCount=0
>         test.Durable=true
>         test.QueuesCount=1
>         test.AutoAck=false
>         test.ExchangeType=direct
>         test.QueueNamePrefix=direct
>         test.BodySize=102400
>         # available units: s(seconds), m(minutes), h(hours) d(days)
>         test.TestDuration=3h
>         test.Host=arch-task-mq-7.atm
>
>         writers node #6
>         test.ConnectionPerWorker=true
>         test.WritersCount=334
>         test.ReadersCount=0
>         test.Durable=true
>         test.QueuesCount=1
>         test.AutoAck=false
>         test.ExchangeType=direct
>         test.QueueNamePrefix=direct
>         test.BodySize=102400
>         # available units: s(seconds), m(minutes), h(hours) d(days)
>         test.TestDuration=3h
>         test.Host=arch-task-mq-8.atm
>
>
>         _Actual tests state:_
>
>         Running worker-1000w-100r-100kB
>         Preparing tests on arch-task-mq-1
>         Preparing tests on arch-task-mq-2
>         Preparing tests on arch-task-mq-3
>         Preparing tests on arch-task-mq-4
>         Preparing tests on arch-task-mq-5
>         Preparing tests on arch-task-mq-6
>         Preparations done, starting testing procedure
>         Start tests on arch-task-mq-1
>         Start tests on arch-task-mq-2
>         Start tests on arch-task-mq-3
>         Start tests on arch-task-mq-4
>         Start tests on arch-task-mq-5
>         Start tests on arch-task-mq-6
>         Waiting for tests to finish
>         Tests done on arch-task-mq-5
>         Tests done on arch-task-mq-6
>         Tests done on arch-task-mq-4
>
>
>         The readers has been disconnected by the server ahead of time.
>
>
>         _Actual cluster state (data from Management Plugin view):_
>
>         Name                   File descriptors (?)           Socket
>         descriptors
>         (?)           Erlang processes      Memory                      Disk
>         space     Uptime     Type
>                                             (used / available)
>            (used
>         / available)                  (used / available)
>         rabit at arch-task-mq-7    392 / 1024                     334 / 829
>                               2885 / 1048576        540.2MB
>           49.6GB          21h 14m  Disc Stats *
>
>
>           1.6GB high watermark 4.0GB low watermark
>         rabbit at arch-task-mq-8  692 / 1024                     668 / 829
>                                5522 / 1048576        1.8GB (?)
>           46.1GB          21h 16m  RAM
>
>
>           1.6GB high watermark 4.0GB low watermark
>
>         Number of processes is growing all the time even though no
>         messages are
>         not published or received.
>         All publishers has been blocked. After some time I killed the
>         publisher processes, but RabbitMQ still sees them as connected and
>         blocked. :)
>
>         Some logs:
>
>         mkiedys at arch-task-mq-8:/var/__log/rabbitmq$ cat
>         rabbit at arch-task-mq-8.log
>         |grep vm_memory_high|tail -n 20
>         vm_memory_high_watermark clear. Memory used:1709148224
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2135174984
>         <tel:2135174984> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1593121728
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2043534608
>         <tel:2043534608> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1681947128
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2088225952
>         <tel:2088225952> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1710494800
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2208875080
>         allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1713902032
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2122564032
>         <tel:2122564032> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1663616264
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2098909664
>         <tel:2098909664> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1712666136
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2088814360
>         <tel:2088814360> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1640273568
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2116966952
>         allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1715305176
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2186572648
>         <tel:2186572648> allowed:1717986918
>         vm_memory_high_watermark clear. Memory used:1716620504
>         allowed:1717986918
>         vm_memory_high_watermark set. Memory used:2180898440
>         allowed:1717986918
>
>         mkiedys at arch-task-mq-8:/var/__log/rabbitmq$ cat
>         rabbit at arch-task-mq-8.log
>         |grep vm_memory_high|wc -l
>         2935
>
>         Why does the server consumes more memory than 1.6GB limit?
>
>         Regards,
>         MK
>
>
>
>         _________________________________________________
>         rabbitmq-discuss mailing list
>         rabbitmq-discuss at lists.__rabbitmq.com
>         <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>         https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
>         <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
>
>
>     --
>     Simon MacMullen
>     RabbitMQ, VMware
>
>

-- 
Simon MacMullen
RabbitMQ, VMware