[rabbitmq-discuss] TCP/SSL Listeners stuck, not responding

Mark Steele msteele at beringmedia.com
Tue Dec 15 16:38:22 GMT 2009


Hi folks,

I'm trying to troubleshoot an issue I'm having. For unknown reasons
after a certain amount of time the SSL listener just seems to stop
working. The logs have:

=ERROR REPORT==== 15-Dec-2009::09:31:00 ===
Error in process <0.55.0> on node 'rabbit at d1750-1' with exit value:
{{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2},{cpu_sup,measurement_server_loop,1}]}

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19037.10> from 1.2.3.4:44437

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.18940.10> from 1.2.3.4:44433

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.18962.10> from 1.2.3.4:44434

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19156.10> from 1.2.3.4:44442

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19178.10> from 1.2.3.4:44443

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19059.10> from 1.2.3.4:44438

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19081.10> from 1.2.3.4:44439

=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19245.10> from 1.2.3.4:44446
<snip>

# cat /etc/rabbitmq/rabbitmq.conf
SERVER_START_ARGS='-rabbit ssl_listeners [{"2.3.4.5",9000}] -rabbit
ssl_options [{cacertfile,"/etc/rabbitmq/ca.crt"},{certfile,"/etc/rabbitmq/cert.crt"},{keyfile,"/etc/rabbitmq/cert.key"},{fail_if_no_peer_cert,true}]'
NODE_IP_ADDRESS=127.0.0.1
CONSOLE_LOG=reuse

# netstat -nlp
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp        3      0 2.3.4.5:9000       0.0.0.0:*               LISTEN
    7943/beam.smp
tcp        0      0 127.0.0.1:5672          0.0.0.0:*
LISTEN      7943/beam.smp
tcp        0      0 0.0.0.0:54160           0.0.0.0:*
LISTEN      7943/beam.smp
tcp        0      0 0.0.0.0:4369            0.0.0.0:*
LISTEN      11107/epmd
<snip>

Looking at what's going on with strace doesn't seem to help either:

# strace -p 7943
Process 7943 attached - interrupt to quit
select(0, NULL, NULL, NULL, NULL
<no more output, stuck here>

# tcpdump 'tcp src port 9000 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
((tcp[12]&0xf0)>>2)) != 0)' -A -s 0 -vv -i eth1
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
65535 bytes
^C <no output...>
0 packets captured
0 packets received by filter
0 packets dropped by kernel


# tcpdump 'tcp dst port 9000 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
((tcp[12]&0xf0)>>2)) != 0)' -A -s 0 -vv -i eth1
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
65535 bytes
11:06:54.496422 IP (tos 0x0, ttl 50, id 61746, offset 0, flags [DF],
proto TCP (6), length 113)
    68.230.240.91.58871 > 208.94.50.28.9000: Flags [P.], cksum 0x4897
(correct), seq 620713031:620713092, ack 967624400, win 114, options
[nop,nop,TS val 69409028 ecr 404972440], length 61
<snip, lots of output>
..s7..B..')8.p..>b.O...Vd............Y.3WT....I.&C...Q....:c....g'...$........4.....
................}.s........Z.pp.h.7..D.J..,..s
....c.b6<U...73oB.W...H.E.A.....$P..S..E.]...6!j...[D....I.2......o1.>.
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel

There are clients trying to connect to the server, and there is no
response being sent to the clients.

However when I connect to the localhost (non-ssl) interface, it works fine.

# erl -version
Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 5.7.4

# rabbitmqctl status
Status of node 'rabbit at d1750-1' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.0"},
                        {ssl,"Erlang/OTP SSL application","3.10.7"},
                        {crypto,"CRYPTO version 1","1.6.3"},
                        {mnesia,"MNESIA  CXC 138 12","4.4.12"},
                        {os_mon,"CPO  CXC 138 46","2.2.4"},
                        {sasl,"SASL  CXC 138 11","2.1.8"},
                        {stdlib,"ERTS  CXC 138 10","1.16.4"},
                        {kernel,"ERTS  CXC 138 10","2.13.4"}]},
 {nodes,['rabbit at d1750-1']},
 {running_nodes,['rabbit at d1750-1']}]
...done.

# openssl
OpenSSL> version
OpenSSL 0.9.8l 5 Nov 2009

# uname -a
Linux D1750-1 2.6.31-gentoo-r6 #1 SMP Thu Nov 26 16:30:46 EST 2009
i686 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux

We're running this on Gentoo and have tried various combinations of
compile time options (without hipe, without SMP, without kpoll) with
the same results.

Does anyone have an idea on how we can go about to troubleshoot this?
The amount of time it stays up appears to be random (sometimes hours,
sometime minutes). I'm guessing it's SSL related otherwise I'd imagine
it would have been caught by other users already.

When it does crash, I get this on the console:

Erlang has closed

Cheers,

--
Mark Steele
Director of development
Bering Media Inc.




More information about the rabbitmq-discuss mailing list