[rabbitmq-discuss] TCP/SSL Listeners stuck, not responding
Mark Steele
msteele at beringmedia.com
Tue Dec 15 16:38:22 GMT 2009
Hi folks,
I'm trying to troubleshoot an issue I'm having. For unknown reasons
after a certain amount of time the SSL listener just seems to stop
working. The logs have:
=ERROR REPORT==== 15-Dec-2009::09:31:00 ===
Error in process <0.55.0> on node 'rabbit at d1750-1' with exit value:
{{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2},{cpu_sup,measurement_server_loop,1}]}
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19037.10> from 1.2.3.4:44437
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.18940.10> from 1.2.3.4:44433
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.18962.10> from 1.2.3.4:44434
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19156.10> from 1.2.3.4:44442
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19178.10> from 1.2.3.4:44443
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19059.10> from 1.2.3.4:44438
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19081.10> from 1.2.3.4:44439
=INFO REPORT==== 15-Dec-2009::09:31:00 ===
closing TCP connection <0.19245.10> from 1.2.3.4:44446
<snip>
# cat /etc/rabbitmq/rabbitmq.conf
SERVER_START_ARGS='-rabbit ssl_listeners [{"2.3.4.5",9000}] -rabbit
ssl_options [{cacertfile,"/etc/rabbitmq/ca.crt"},{certfile,"/etc/rabbitmq/cert.crt"},{keyfile,"/etc/rabbitmq/cert.key"},{fail_if_no_peer_cert,true}]'
NODE_IP_ADDRESS=127.0.0.1
CONSOLE_LOG=reuse
# netstat -nlp
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 3 0 2.3.4.5:9000 0.0.0.0:* LISTEN
7943/beam.smp
tcp 0 0 127.0.0.1:5672 0.0.0.0:*
LISTEN 7943/beam.smp
tcp 0 0 0.0.0.0:54160 0.0.0.0:*
LISTEN 7943/beam.smp
tcp 0 0 0.0.0.0:4369 0.0.0.0:*
LISTEN 11107/epmd
<snip>
Looking at what's going on with strace doesn't seem to help either:
# strace -p 7943
Process 7943 attached - interrupt to quit
select(0, NULL, NULL, NULL, NULL
<no more output, stuck here>
# tcpdump 'tcp src port 9000 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
((tcp[12]&0xf0)>>2)) != 0)' -A -s 0 -vv -i eth1
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
65535 bytes
^C <no output...>
0 packets captured
0 packets received by filter
0 packets dropped by kernel
# tcpdump 'tcp dst port 9000 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
((tcp[12]&0xf0)>>2)) != 0)' -A -s 0 -vv -i eth1
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
65535 bytes
11:06:54.496422 IP (tos 0x0, ttl 50, id 61746, offset 0, flags [DF],
proto TCP (6), length 113)
68.230.240.91.58871 > 208.94.50.28.9000: Flags [P.], cksum 0x4897
(correct), seq 620713031:620713092, ack 967624400, win 114, options
[nop,nop,TS val 69409028 ecr 404972440], length 61
<snip, lots of output>
..s7..B..')8.p..>b.O...Vd............Y.3WT....I.&C...Q....:c....g'...$........4.....
................}.s........Z.pp.h.7..D.J..,..s
....c.b6<U...73oB.W...H.E.A.....$P..S..E.]...6!j...[D....I.2......o1.>.
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel
There are clients trying to connect to the server, and there is no
response being sent to the clients.
However when I connect to the localhost (non-ssl) interface, it works fine.
# erl -version
Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 5.7.4
# rabbitmqctl status
Status of node 'rabbit at d1750-1' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.0"},
{ssl,"Erlang/OTP SSL application","3.10.7"},
{crypto,"CRYPTO version 1","1.6.3"},
{mnesia,"MNESIA CXC 138 12","4.4.12"},
{os_mon,"CPO CXC 138 46","2.2.4"},
{sasl,"SASL CXC 138 11","2.1.8"},
{stdlib,"ERTS CXC 138 10","1.16.4"},
{kernel,"ERTS CXC 138 10","2.13.4"}]},
{nodes,['rabbit at d1750-1']},
{running_nodes,['rabbit at d1750-1']}]
...done.
# openssl
OpenSSL> version
OpenSSL 0.9.8l 5 Nov 2009
# uname -a
Linux D1750-1 2.6.31-gentoo-r6 #1 SMP Thu Nov 26 16:30:46 EST 2009
i686 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux
We're running this on Gentoo and have tried various combinations of
compile time options (without hipe, without SMP, without kpoll) with
the same results.
Does anyone have an idea on how we can go about to troubleshoot this?
The amount of time it stays up appears to be random (sometimes hours,
sometime minutes). I'm guessing it's SSL related otherwise I'd imagine
it would have been caught by other users already.
When it does crash, I get this on the console:
Erlang has closed
Cheers,
--
Mark Steele
Director of development
Bering Media Inc.
More information about the rabbitmq-discuss
mailing list