[rabbitmq-discuss] Mnesia Corruption Bug
Simon MacMullen
simon at rabbitmq.com
Thu Jun 13 13:49:42 BST 2013
Hi Lee. I would be interested to know how you got the machine into that
state.
There is a bug with a similar stack trace that will be fixed in the next
release - but I don't think it's the same bug. In your case we are
seeing a message which has been published and delivered according to the
queue index, but only published (and not delivered) according to the
queue index's journal. As the journal should always record the same
state or newer as the main index, this should be impossible.
So to eliminate obvious causes of weirdness first: are you usuing an
unusual filesystem, or mounting the filesystem with unusual options?
Cheers, Simon
On 13/06/13 12:36, Lee Hambley wrote:
> Posting this to the list after some discussion on IRC with bob2351 on
> irc.freenode.net.
>
> We have a *slightly* strange situation with using RabbitMQ, we start it
> under `runit`, and it effectively believes that it's running in the
> foreground. I have anecdotal evidence that this causes other problems,
> but at least not anything that hurts too often (i.e you lose "persistent
> messages" in this setup)
>
> That all aside, attached ( https://gist.github.com/leehambley/5773039 )
> is a stacktrace from a problematic box, we couldn't get it to recover
> (single node, single replica, etc, etc) - we simply deleted the mnesia
> database, which worked well enough.
>
> Some information about our environment:
>
> $ erl --version
> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8]
> [async-threads:0] [kernel-poll:false]
> $ dpkg --list | grep rabbit
> ii rabbitmq-server 3.0.4-1 AMQP server written in Erlang
> $ sudo RABBITMQ_NODENAME=ourproject rabbitmqctl status
> Status of node ourproject at carla ...
> [{pid,8055},
> {running_applications,
> [{rabbitmq_management,"RabbitMQ Management Console","3.0.4"},
> {rabbitmq_management_agent,"RabbitMQ Management Agent","3.0.4"},
> {rabbit,"RabbitMQ","3.0.4"},
> {os_mon,"CPO CXC 138 46","2.2.7"},
> {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.0.4"},
> {webmachine,"webmachine","1.9.1-rmq3.0.4-git52e62bc"},
> {mochiweb,"MochiMedia Web Server","2.3.1-rmq3.0.4-gitd541e9a"},
> {xmerl,"XML parser","1.2.10"},
> {inets,"INETS CXC 138 49","5.7.1"},
> {mnesia,"MNESIA CXC 138 12","4.5"},
> {amqp_client,"RabbitMQ AMQP Client","3.0.4"},
> {sasl,"SASL CXC 138 11","2.1.10"},
> {stdlib,"ERTS CXC 138 10","1.17.5"},
> {kernel,"ERTS CXC 138 10","2.14.5"}]},
> {os,{unix,linux}},
> {erlang_version,
> "Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8]
> [async-threads:30] [kernel-poll:true]\n"},
> {memory,
> [{total,33984216},
> {connection_procs,756760},
> {queue_procs,325576},
> {plugins,218728},
> {other_proc,9518440},
> {mnesia,93728},
> {mgmt_db,148472},
> {msg_index,71528},
> {other_ets,1145600},
> {binary,604208},
> {code,17266925},
> {atom,1550457},
> {other_system,2283794}]},
> {vm_memory_high_watermark,0.4},
> {vm_memory_limit,6656894566},
> {disk_free_limit,1000000000},
> {disk_free,11247643770880},
> {file_descriptors,
> [{total_limit,924},
> {total_used,23},
> {sockets_limit,829},
> {sockets_used,12}]},
> {processes,[{limit,1048576},{used,345}]},
> {run_queue,0},
> {uptime,2692}]
> ...done.
>
>
> I believe this bug is already being tracked internally, and I post the
> report here in the hope that I'll have a place to attach a snapshot of
> an mnesia database the next time this happens to us, or that someone
> else might find this report and be able to contribute. Finally,
> selfishly, in the hope that I'll get notified when this gets fixed, and
> I upgrade, and sleep at night again.
>
> - Lee Hambley
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list