[rabbitmq-discuss] Debugging rabbitmq crash

Scott Brooks scott at beamdog.com
Wed Jun 23 16:09:45 BST 2010


Which version of erlang are you running?  If you are not >= R13B03,
I'd recommend the upgrade.

Even though you are fully consuming all the messages, at that rate the
GC may not be aggressive enough to clean up that memory.
The erlang GC has seen some huge improvements, and if you are on R12,
that may be the problem.

Here is a post from lshift about even the differences between R13B02 and R13B03
http://www.lshift.net/blog/2009/12/01/garbage-collection-in-erlang

As far as debugging steps, I guess you could look at the memory usage
of your beam process and see how fast it is growing,
and also use
rabbitmqctl list_queues name messages messages_ready
messages_unacknowledged messages_uncommited memory
to watch the stats as far as RabbitMQ is concerned.

Scott

On Wed, Jun 23, 2010 at 4:48 AM, Matt Calder <mvcalder at gmail.com> wrote:
> I am looking for guidance on how to debug a crash of rabbit. The
> situation is this. There is a pair of processes running on a single
> machine. One process sends a message to the other, and the other
> responds. This is happening at ~20 message pairs per second. There is
> a single direct exchange and two queues, one for messages going one
> way, the other for messages going the other way. The queues are not
> durable, and not auto delete. The processes run for ~15 hours without
> issue over a single connection. Then the connection closes. Here is
> the rabbit.log:
>
>  =INFO REPORT==== 22-Jun-2010::18:03:03 ===
> accepted TCP connection on 0.0.0.0:5672 from 127.0.0.1:55824
>
> =INFO REPORT==== 22-Jun-2010::18:03:03 ===
> starting TCP connection <0.15621.0> from 127.0.0.1:55824
>
> =INFO REPORT==== 23-Jun-2010::01:59:00 ===
>    alarm_handler: {set,{system_memory_high_watermark,[]}}
>
> =INFO REPORT==== 23-Jun-2010::02:02:55 ===
>    alarm_handler: {clear,system_memory_high_watermark}
>
> =WARNING REPORT==== 23-Jun-2010::06:27:37 ===
> exception on TCP connection <0.15425.0> from 127.0.0.1:47025
> connection_closed_abruptly
>
> =INFO REPORT==== 23-Jun-2010::06:27:37 ===
> closing TCP connection <0.15425.0> from 127.0.0.1:47025
>
> =WARNING REPORT==== 23-Jun-2010::06:27:37 ===
> exception on TCP connection <0.15621.0> from 127.0.0.1:55824
> connection_closed_abruptly
>
> =INFO REPORT==== 23-Jun-2010::06:27:37 ===
> closing TCP connection <0.15621.0> from 127.0.0.1:55824
>
>
> So, there appeared to be a memory issue, but it also appeared to pass.
> I watched the process on and off and it seemed to be using a steady
> 15% of available memory according to top. All the other logs are
> either old, or empty. Specifically, rabbit-sasl.log was touched
> apparently at the time of the crash.
>
> -rw-r--r-- 1 rabbitmq rabbitmq     0 2010-06-23 06:27 rabbit-sasl.log
>
> I can reproduce this, though it takes the 15 hours or so.
>
> If anyone can guide me through the steps necessary to debugging this I
> would appreciate it. Specific suggestions, like, "your problem is X",
> are of course welcome, but I am also interested in process. Thank you,
>
> Matt
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>


More information about the rabbitmq-discuss mailing list