[rabbitmq-discuss] getting started, broker runs; can't get status

Doug Barth dougbarth at gmail.com
Tue Feb 3 17:48:30 GMT 2009


Hey all,

This is Doug at Interactive Mediums. I work with Dave, so you may see
us alternate back and forth responding to this thread as we work
through this issue. Thanks a lot for the help so far.

On Feb 3, 9:45 am, Dmitriy Samovskiy
<dmitriy.samovs... at cohesiveft.com> wrote:
> I suspect that your response from net_adm:names() will be {error,timeout} and it will
> appear not immediately but after some time (within 30 seconds). Could you please confirm.

We don't get exactly that error. Instead we are getting
{error,address}.

  [root at db1 home]# erl -sname foo -cookie coo
  Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8]
[async-threads:0] [hipe] [kernel-poll:false]

  Eshell V5.6.3  (abort with ^G)
  (foo at db1)1> net_adm:names().

  ... long pause ...

  {error,address}
  (foo at db1)2>
  BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
         (v)ersion (k)ill (D)b-tables (d)istribution


> After you exit from erl, please do grep -r /usr/lib/erlang/erts-5.6.5/bin /var/log/*
> (please replace 5.6.5 with emulator version that is displayed when you start erl).
> Anything of interest in the output? I am particularly looking for something related to
> auth, or access being denied. Interesting lines are likely to come from auth.log or its
> equivalent on your system, and there should be many similar lines (unless your syslog
> suppresses dup lines - mine didn't). If nothing shows up, maybe try similar greps -
> erlang, erts.

grep didn't turn up anything in these directories. I grepped for
erlang, erts, and the absolute path to the erlang bin directory.

> And finally, if you have or can get strace on the target system, could you please run this:
>
> % strace -e trace=write -o erl_strace.log erl -sname -cookie coo

I'm assuming that there should have been a "foo" after -sname in the
command above. I added that when running the next commands.

> and do the same net_adm:names() in erlang shell. When it times out, exit erlang shell and
> take a look at erl_strace.log. I expect that at the end of that file you will see many
> lines like this:
>
> --- SIGCHLD (Child exited) @ 0 (0) ---
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> --- SIGCHLD (Child exited) @ 0 (0) ---
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> --- SIGCHLD (Child exited) @ 0 (0) ---
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> --- SIGCHLD (Child exited) @ 0 (0) ---
> --- SIGPIPE (Broken pipe) @ 0 (0) ---

This is all we're getting:

  --- SIGCHLD (Child exited) @ 0 (0) ---
  --- SIGCHLD (Child exited) @ 0 (0) ---
  write(6, "\0", 1)                       = 1
  --- SIGINT (Interrupt) @ 0 (0) ---
  write(6, "I", 1)                        = 1
  --- SIGINT (Interrupt) @ 0 (0) ---
  write(6, "I", 1)                        = 1

> Feel free to do strace with -e trace=all or -e verbose=all.

We ran this test using strace -e trace=all on the machine that is
failing as well as on a different machine that is working fine. The
machine that is failing is running RHEL 5.2 (64-bit). The box that
works is running CentOS 5 (32-bit).

>From those logs, the two differences that jumped out are:
  1) Calls to mmap have a MAP_32BIT flag set. I'm assuming this is
expected due to the fact that our new machine is running a 64bit OS.
      -mmap2(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x68b4000
      +mmap(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x471a4000

  2) Erlang seems to drop into a loop waiting on a futex. On the 64bit
machine, its FUTEX_WAIT commands also fail with "EAGAIN (Resource
temporarily unavailable)". On the 32-bit machine, those same commands
always succeed, though the same number of repeats occur.

    +futex(0x7fff9f5fdab0, FUTEX_WAKE, 1)    = 1
    +futex(0x7fff9f5fdad8, FUTEX_WAKE, 1)    = 1
    +futex(0x7fff9f5fdadc, FUTEX_WAIT, 1, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
    +futex(0x7fff9f5fdab0, FUTEX_WAKE, 1)    = 0

> What Linux distro are you using? uname -a? Any particular details how you installed the
> OS? If it's safe to share, maybe output of "rpm -qa" or "dpkg -l"?

[textme at db1 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.2 (Tikanga)
[textme at db1 ~]$ uname -a
Linux db1.interactivemediums.com 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20
02:36:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

> And like Matthias said before, access to box or image of the box will be most helpful.

We'll get in touch with you guys off list to get you access so you can
poke around. Expect an email from Dave.

--
Doug Barth




More information about the rabbitmq-discuss mailing list