[rabbitmq-discuss] possible bug starting rabbit with NODENAME set?

Edwin Fine rabbitmq-discuss_efine at usa.net
Fri May 16 22:36:16 BST 2008


Matthias,

I can also successfully run /usr/sbin/rabbitmq-server from the command line
if I change -sname to -name. I just have to make sure that I change the
actual mnesia dir to match the NODENAME.

But I think that in addition to the rabbit_multi script, rabbit_multi.erl
will need some serious tweaking. I think it was written with local
clustering in mind and did not adequately consider the effect of using
fully-qualified domain names with -name.

This is my analysis of the code and behavior. If I am mistaken, I am sure
you will point out where and why :)

Firstly, rabbit_multi.erl hard-codes the node name "rabbit":

    NodeName = if NodeNumber == 0 ->
                       %% For compatibility with running a single node
                       "rabbit";
                  true ->
                       "rabbit_" ++ integer_to_list(NodeNumber)
               end,
    {NodePid, Started} = start_node(NodeName, "0.0.0.0", 5672 + NodeNumber,
                                    RpcTimeout),

and then goes on to cut everything off after the "@" in the node name in
localnode():

    Node = rabbit_misc:localnode(list_to_atom(NodeName)),
    case rpc:call(Node, os, getpid, []) of

Even if NodeName wasn't hard-coded to "rabbit", localnode would strip off
everything after the "@" anyway. Now if rabbit at myexample.com were running
locally, but using -name, the rpc call would fail because erlang would want
the full node name.

Anyway, the rpc fails because rabbit isn't running yet, so rabbit_multi will
now try to start it (but it will fail for reasons mentioned above). Just
prior to this, to add insult to injury, the code blasts any environment
values that might have existed. This explains the behavior of why RabbitMQ
times out waiting for Mnesia to start. It can't find the node. Even if I
called the node 'rabbit at something.com' and not 'grendel at something.com', it
would not work because it would be trying to rpc to 'rabbit', which does not
and never will exist while starting up with -name.

start_node(NodeName, NodeIPAddress, NodePort, RpcTimeout) ->
    os:putenv("NODENAME", NodeName),
    os:putenv("NODE_IP_ADDRESS", NodeIPAddress),
    os:putenv("NODE_PORT", integer_to_list(NodePort)),
    Node = rabbit_misc:localnode(list_to_atom(NodeName)),
    case rpc:call(Node, os, getpid, []) of
        {badrpc, _} ->
            Port = run_cmd(script_filename()),
            Started = wait_for_rabbit_to_start(Node, RpcTimeout, Port),
            Pid = case rpc:call(Node, os, getpid, []) of
                      {badrpc, _} -> throw(cannot_get_pid);
                      PidS -> list_to_integer(PidS)
                  end,
            {{Node, Pid}, Started};
        PidS ->
            Pid = list_to_integer(PidS),
            throw({node_already_running, Node, Pid})
    end.

I respectfully suggest that some more thought needs to go into
rabbit_multi.erl and perhaps the startup/node name architecture.

Regards,
Edwin

On Fri, May 16, 2008 at 3:06 AM, Matthias Radestock <matthias at lshift.net>
wrote:

> Edwin,
>
> Edwin Fine wrote:
>
>> On this topic, does RabbitMQ have to use a short name (-sname) for its
>> node name? I had issues trying to use -name instead of -sname.
>>
>
> changing rabbitmq-server to use -name instead of -sname works fine for me,
> though rabbitmq-multi would need some tweaking to get working again after
> such a change.
>
> What problems did you encounter?
>
>
> Matthias.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20080516/ff200040/attachment.htm 


More information about the rabbitmq-discuss mailing list