<div dir="ltr"><div><div>Carl,<br><br></div>Possibly an I/O problem. When the the operating system flushes cached file system data to disk there is a default timeout of 120s and while flushing is occurring writes will become blocked/synchronous. If your server has a lot of RAM and slow storage, and you're doing a lot of writes, this could happen. By default the kernel param vm.dirty_ratio (% RAM used for file system caching) is 40% (see /etc/sysctl.conf). You could try playing with this value to increase the frequency of I/O flushes but reduce their size...<br>
<br></div>Brett<br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 8, 2014 at 1:36 PM, carlhoerberg <span dir="ltr"><<a href="mailto:carl.hoerberg@gmail.com" target="_blank">carl.hoerberg@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Have a cluster where the nodes sometimes crash on high load, nothing is shown<br>
in the rabbitmq logs, but this shows up in syslog:<br>
<br>
kernel: [582840.748073] INFO: task beam:9794 blocked for more than 120<br>
seconds.<br>
kernel: [582840.748082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"<br>
disables this message.<br>
kernel: [582840.748088] beam            D ffff8800efc13700     0  9794<br>
9718 0x00000000<br>
kernel: [582840.748092]  ffff8800e7bc3cb8 0000000000000282 0000000000000000<br>
ffffffffffffffe0<br>
kernel: [582840.748095]  ffff8800e7bc3fd8 ffff8800e7bc3fd8 ffff8800e7bc3fd8<br>
0000000000013700<br>
kernel: [582840.748098]  ffff88000249c4d0 ffff88000249ade0 00007f7c2aa289e0<br>
ffff8800024b1180<br>
kernel: [582840.748101] Call Trace:<br>
kernel: [582840.748109]  [<ffffffff81659bbf>] schedule+0x3f/0x60<br>
kernel: [582840.748113]  [<ffffffff8106c535>] exit_mm+0x85/0x130<br>
kernel: [582840.748116]  [<ffffffff8106c74e>] do_exit+0x16e/0x450<br>
kernel: [582840.748120]  [<ffffffff8109e4d9>] ?<br>
futex_wait_queue_me+0xc9/0x100<br>
kernel: [582840.748122]  [<ffffffff8109e14f>] ? __unqueue_futex+0x3f/0x80<br>
kernel: [582840.748126]  [<ffffffff8107ad4a>] ? __dequeue_signal+0x6a/0xb0<br>
kernel: [582840.748128]  [<ffffffff8106cbd4>] do_group_exit+0x44/0xa0<br>
kernel: [582840.748131]  [<ffffffff8107d8cc>]<br>
get_signal_to_deliver+0x21c/0x420<br>
kernel: [582840.748135]  [<ffffffff81014825>] do_signal+0x45/0x130<br>
kernel: [582840.748137]  [<ffffffff810a126c>] ? do_futex+0x7c/0x1b0<br>
kernel: [582840.748139]  [<ffffffff810a14e2>] ? sys_futex+0x142/0x1a0<br>
kernel: [582840.748142]  [<ffffffff81091d7f>] ? __put_cred+0x3f/0x50<br>
kernel: [582840.748144]  [<ffffffff81014ad5>] do_notify_resume+0x65/0x80<br>
kernel: [582840.748147]  [<ffffffff81664350>] int_signal+0x12/0x17<br>
<br>
RabbitMQ 3.3.1, Erlang 17, ubuntu 12.04<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://rabbitmq.1065348.n5.nabble.com/beam-blocking-tp35412.html" target="_blank">http://rabbitmq.1065348.n5.nabble.com/beam-blocking-tp35412.html</a><br>
Sent from the RabbitMQ mailing list archive at Nabble.com.<br>
_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
</blockquote></div><br></div>