[rabbitmq-discuss] no next heap size found after crash

Thu Apr 25 13:37:36 BST 2013

Hi Matthias,

I have read this discussion "http://markmail.org/message/dklyh24mrihuloiv#query:+page:1+mid:ishyo3ijnto25wwx+state:results" and I think that this maybe work for us.

In our rabbit database, directory "/var/lib/rabbitmq/mnesia/rabbit at xxx-hostname/msg_store_persistent" there are 54284 .rdq files, if I remove half and start rabbit, wait to process all messages, stop it and then move the other half, is this safe? or is there any chance to corrupt the rabbit database?

Regards,

----- Original Message -----
From: "Matthias Radestock" <matthias at rabbitmq.com>
To: "Discussions about RabbitMQ" <rabbitmq-discuss at lists.rabbitmq.com>
Cc: "Gilles Danycan" <gdanycan at slip-software.com>, "Daniel Jimbel" <djimbel at beverlydata.com>
Sent: Thursday, April 25, 2013 8:22:40 AM
Subject: Re: [rabbitmq-discuss] no next heap size found after crash

Gilles,

On 25/04/13 07:04, Gilles Danycan wrote:
> Unfortunately i have a new problem now :(
> i'm still on : RabbitMQ 3.0.4.40411, Erlang R14A
> server: 1To disk , 48Go ram , bi xeon 16 cores
>
> the server just reboot without reason apparently but the rabbitmq is not
> able to start after the reboot...
> i have the message :
>
> Crash dump was written to: erl_crash.dump
> no next heap size found: -2063447786, offset 0
> or
> Crash dump was written to: erl_crash.dump
> eheap_alloc: Cannot allocate 18446744065900154288 bytes of memory (of
> type "heap").
>
> i tried to change the vm_memory_high_watermark in the config file but
> the second message appears.

I suggest upgrading Erlang. To R16B if you can. That will certainly get 
rid of the negative number reported.

This has come up before - see 
http://rabbitmq.1065348.n5.nabble.com/Rabbit-won-t-restart-no-next-heap-size-found-td22195.html 
- and at the time we also identified an issue we excessive memory usage 
on recovery after an unclean shutdown. That got fixed in 3.0.0 though.

So let's see whether an Erlang upgrade makes a difference. If not we can 
look at other options.

> unforutnatly the partition of the rabbit is :
> /dev/sda2             910G  864G 1006M 100% /var
>
> i don't understand because we have only 5millions of messages in all
> queues... the last night i deleted some messages (more than 1millions)
> but the space disk available didn't change...
> is there some problem with erlang or indexes? we consume and requeue
> messages and when we don't requeue message the space disk used is still
> the same.

Disk space is reclaimed by a garbage collector. That only kicks in when 
garbage exceeds 50% of the allocated space, so deleting 1m out of 5m 
messages won't be enough to trigger a gc.

Regards,

Matthias.