[rabbitmq-discuss] Help pinpointing an error

Jaime Herazo B. jherazo at beverlydata.com
Fri Aug 31 10:35:12 BST 2012


Hi.

I'm new to RabbitMQ and the whole messaging platforms world in general.
I'm working with this Rabbit setup already in place as part of my
sysadmin duties. There's other people who are the ones that really know
about it, but in general i have to take care of the servers.

Today a rabbit instance went down. I restarted the service only to be
greeted by screams as apparently many messages were lost in the process
(as far as i understood, once queues were marked as "Durable" this
couldn't happen, but it happened).

The "Reason for termination" was:

{{badmatch,[{file_summary,2064936,4810835,2064935,2064937,16780759,true,1}]},                                                                                                             
 [
  {rabbit_msg_store,combine_files,3},
  {rabbit_msg_store_gc,attempt_action,3},
  {rabbit_msg_store_gc,handle_cast,2},
  {gen_server2,handle_msg,2},
  {proc_lib,wake_up,3}
 ]
}

I'm having trouble even identifying what does this mean, let alone
preventing it from happening again. It started just fine, so it was
probably a transient error, but the fact that it took with it all the
messages in the queue is troubling.

Can you please point me towards more resources to handle these kinds of
problems in the future that don't involve loss of data? What did i do
wrong?

Also, do you see a hint of what went wrong there, or do i need to give
more info for this?

Thanks for any help or hints.



More information about the rabbitmq-discuss mailing list