[rabbitmq-discuss] Rabbitmq boot failure with "tables_not_present"

Thu Jan 17 00:38:43 GMT 2013

Hi Jerry,

From: rabbitmq-discuss-bounces at lists.rabbitmq.com [mailto:rabbitmq-discuss-bounces at lists.rabbitmq.com] On Behalf Of Jerry Kuch
Sent: Wednesday, January 16, 2013 4:02 PM
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Rabbitmq boot failure with "tables_not_present"

On Wed, Jan 16, 2013 at 11:54 AM, Zhao, Shanyu <shanyu.zhao at intel.com<mailto:shanyu.zhao at intel.com>> wrote:

The relevant part of the log is shown below. But the problem is that we saw these log messages repeated every 7-8 seconds and can last as long as 80 minutes before rabbit finally start up correctly. During this time any connection to the rabbitmq cluster will get a disconnected exception.

Any idea on what might have caused this problem?
=INFO REPORT==== 16-Jan-2013::14:11:37 ===
Error description:
   {case_clause,{error,tables_not_present}}

Log files (may contain more information):
   /var/log/rabbitmq/rabbit at ip-10-0-2-97.log<mailto:/var/log/rabbitmq/rabbit at ip-10-0-2-97.log>
   /var/log/rabbitmq/rabbit at ip-10-0-2-97-sasl.log<mailto:/var/log/rabbitmq/rabbit at ip-10-0-2-97-sasl.log>

Stack trace:
   [{rabbit_mnesia,discover_cluster,1},
    {rabbit_mnesia,init_from_config,0},
    {rabbit_mnesia,init,0},
    {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
    {rabbit,run_boot_step,1},
    {rabbit,'-start/2-lc$^0/1-0-',1},
    {rabbit,start,2},
    {application_master,start_it_old,4}]

=INFO REPORT==== 16-Jan-2013::14:11:38 ===
    application: rabbit
    exited: {bad_return,
                {{rabbit,start,[normal,[]]},
                 {'EXIT',
                     {rabbit,failure_during_boot,
                         {case_clause,{error,tables_not_present}}}}}}
type: temporary

You mention that you sometime see this after a redeploy.  Depending on how you've redeployed, have you successfully clustered the nodes in the first place?  The error means that some of the tables in Erlang's Mnesia distributed database upon which Rabbit relies to maintain broker metadata weren't found, suggesting that some prior state or configuration perished during your redeploy process.

I think during the time the error logs are generated, the cluster may not be successfully formed. As part of the deployment scripts, I deleted all content in /var/lib/rabbitmq/mnesia to recover from some scenario when cluster cannot be formed. Here is the relevant part of the deployment scripts:

sudo("bash -c 'echo XXXXXXXXXXXXXXXX > /var/lib/rabbitmq/.erlang.cookie'")
sudo("chown rabbitmq /var/lib/rabbitmq/.erlang.cookie")
sudo("chmod 600 /var/lib/rabbitmq/.erlang.cookie")
sudo("rm -fr /var/lib/rabbitmq/mnesia")

What I want to achieve after redeployment is to erase previous states completely and let the cluster starts with a clean state, that's why I erased the /mnesia folder (is there a better way to do that?). The problem is sometimes the error messages show up for a few minutes then everything works fine after that, but other times I saw the error message being logged for 80 minutes before the cluster works correctly. Do you have any suggestions?

Thanks,
Shanyu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130117/fc2cf3d5/attachment.htm>