[rabbitmq-discuss] Rabbitmq falling over & losing messages

Ben Hood 0x6e6562 at gmail.com
Sat Nov 29 00:06:18 GMT 2008


On Fri, Nov 28, 2008 at 10:47 AM, Toby White
<toby.o.h.white at googlemail.com> wrote:
> I've pulled the latest repository version, and run the same tests. (This
> also required me to upgrade the version of Erlang I was running on from 11b5
> to 12b3)
> The outcome is basically the same, but I get slightly different errors
> logged.

So are you saying that with the latest version of Rabbit you are still
losing messages that are marked as persistent (as you indicated in
your first post)?

> The publishing script gets an exception like so:
> [{exit,{timeout,{gen_server,call,[<0.230.0>,{commit,{{1,<0.247.0>},82082}},5000]}}}]',
> (90, 20), 'Channel.tx_commit')

This client side trace looks normal for a server crash.

> and the rabbit error message is very slightly different:
> =ERROR REPORT==== 28-Nov-2008::10:11:40 ===
> connection <0.242.0> (running), channel 1 - error:
> {amqp,internal_error,
>      "commit failed:
> [{exit,{timeout,{gen_server,call,[<0.230.0>,{commit,{{1,<0.247.0>},82082}},5000]}}}]",
>      'tx.commit'}
> (followed by a dump of the whole queue as before.)
> Furthermore, the following appears on the console:
> (rabbit at domU-12-31-39-02-61-F6)3>
> /usr/lib/erlang/lib/os_mon-2.1.6/priv/bin/memsup: Erlang has closed.
>                           Erlang has closed
> Crash dump was written to: erl_crash.dump
> eheap_alloc: Cannot allocate 729810240 bytes of memory (of type "heap").
> Aborted

Ok, this looks normal for a case when Rabbit runs out of memory
because you have flooded it with messages.

Currently the only preventative action against this is the
channel.flow command - see this article for the background :

ATM producer throttling requires a well behaved client, i.e. one that
obeys the channel.flow command - the Python client currently isn't
well behaved in this respect.

> even though, as best I can tell from watching the output of top, the erlang
> process never took more than about 10% of available memory.

Do you not see anything in the log about an alarm handler for memory, e.g.

=INFO REPORT==== 9-Nov-2008::15:13:31 ===
    alarm_handler: {set,{system_memory_high_watermark,[]}}


I find the memory statistic a bit strange - the alarm handler kicks in
by default at 95%.

Simon is currently looking into a issue with the way Erlang reports on
memory consumption on Linux - maybe he can can shed some light on what
may be going on with your installation.

Also, can you give some more details about your environment? Are you
running Xen?


