[rabbitmq-discuss] Really bizarre startup issue...

Robert Nickel rnickel at scea.com
Thu May 6 18:05:55 BST 2010


On 2010.05.06 05:56:56 +0100, Matthias Radestock wrote:
> Robert,
>
> Robert Nickel wrote:
>> On sdcloudsh01, contents of /etc/rabbitmq files:
>>   rabbitmq.conf:
>>     NODENAME=regsvc at sdcloudsh01
>>   rabbitmq.config:
>>     [
>>       {rabbit, []}
>>     ].
>>   rabbitmq_cluster.config:
>>     [ 'regsvc at sdcloudsh01','regsvc at sdcloudsh02' ].
>>
>> When starting the rabbitmq server using /sbin/service rabbitmq-server start,
>> the service fails
>
> Does rabbit start up fine if you a) remove all the above configuration  
> files, and b) delete the database directory (usually  
> /var/lib/rabbitmq/mnesia)?

Cleaned out the files and ran the test:

    [root at sdcloudsh01 ~]# ls /etc/rabbitmq/
    [root at sdcloudsh01 ~]# ls /var/lib/rabbitmq/
    [root at sdcloudsh01 ~]# /etc/init.d/rabbitmq-server start
    Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_log, _err
    rabbitmq-server.
    [root at sdcloudsh01 ~]# ls -l /var/log/rabbitmq/*
    -rw-r--r-- 1 root root 34 May  6 09:40 /var/log/rabbitmq/startup_err
    -rw-r--r-- 1 root root 58 May  6 09:40 /var/log/rabbitmq/startup_log
    [root at sdcloudsh01 ~]# cat /var/log/rabbitmq/*
    Error: {node_start_failed,normal}
    Starting all nodes...
    Starting node rabbit at sdcloudsh01...

Same results, unfortunately.

>> the following outputs are in /var/log/rabbitmq/startup_err and log:
>>
>>   _log:
>>     Starting all nodes...
>>     Starting node regsvc at sdcloudsh01...
>>   _err:
>>     Error: {node_start_failed,normal}
>
> Are there any other non-empty log files in /var/log/rabbitmq?

None.  See listing above.

>> After a bunch of troubleshooting, I noticed that if I strace the above
>> command, everything works fine:
>>
>>   strace -f /sbin/service rabbitmq-server start
>>
>> Terminating the strace leaves the rabbit server running happily.
>
> That would suggest some sort of race / timing issue. Strange.

Agreed.  My inital thought is that the parent process is terminating before
the fork somehow.  I don't have a good way to validate this as I have zero
experience with erlang and how to debug it.

FWIW: 
[root at sdcloudsh01 ~]# uname -rvmi
2.6.18-53.el5PAE #1 SMP Mon Nov 12 02:55:09 EST 2007 i686 i386

The rabbitmq user has no .bash* files other than .bash_history.

Thank you,
  --Robert



More information about the rabbitmq-discuss mailing list