[rabbitmq-discuss] [help] [beginner] server stops sending messages; publish (in transaction) hangs on commit
Alistair Bayley
alistair at abayley.org
Thu Feb 23 02:47:21 GMT 2012
Hello,
I'm looking for ways to debug a rabbitmq server that has been running
since 2010-02-01. Sometime in the last 2 days it stopped delivering
messages. I've seen this happen before, and restarting fixes it, but
I'd like to get to the bottom of why it stops delivering messages, so
I'd prefer to run tests while it is still in this state.
We're using the python amqplib client. The various versions are:
$ uname -srvmpio
Linux 2.6.32-37-generic-pae #81-Ubuntu SMP Fri Dec 2 22:24:22 UTC 2011
i686 unknown unknown GNU/Linux
$ sudo rabbitmqctl report
Reporting server status on {{2012,2,23},{2,18,37}}
Status of node 'rabbit at rabbitmq1' ...
[{pid,11649},
{running_applications,[{rabbit,"RabbitMQ","2.7.1"},
{os_mon,"CPO CXC 138 46","2.2.4"},
{sasl,"SASL CXC 138 11","2.1.8"},
{mnesia,"MNESIA CXC 138 12","4.4.12"},
{stdlib,"ERTS CXC 138 10","1.16.4"},
{kernel,"ERTS CXC 138 10","2.13.4"}]},
{os,{unix,linux}},
{erlang_version,"Erlang R13B03 (erts-5.7.4) [source] [rq:1]
[async-threads:30] [hipe] [kernel-poll:true]\n"},
{memory,[{total,107054896},
{processes,72060236},
{processes_used,72055268},
{system,34994660},
{atom,736077},
{atom_used,727538},
{binary,15125896},
{code,6063325},
{ets,10643172}]},
{vm_memory_high_watermark,0.3999999992195367},
{vm_memory_limit,102503219}]
...
Python 2.6.5
python-amqplib 0.6.1-1
I've used this python test program:
#----------------------------------------------
"""
To setup for this test you must say (on rabbitmq1):
sudo rabbitmqctl add_user scheduler scheduler
sudo rabbitmqctl add_vhost /test
sudo rabbitmqctl set_permissions -p /test scheduler ".*" ".*" ".*"
"""
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="rabbitmq1:5672", userid="scheduler",
password="scheduler", virtual_host="/test", insist=False)
chan = conn.channel()
chan.queue_declare("alistair_test_q", durable=True, auto_delete=False)
chan.exchange_declare("alistair_test_ex", type="direct", durable=True,
auto_delete=False)
chan.queue_bind("alistair_test_q", "alistair_test_ex", routing_key="alistair")
msg = amqp.Message("test", delivery_mode=2)
print "tx_select"
chan.tx_select()
print "send message"
chan.basic_publish(msg, "alistair_test_ex", "alistair")
print "tx_commit"
chan.tx_commit()
#----------------------------------------------
It hangs on the chan.tx_commit().
tcpdump shows this (pruned summary, 172.22.2.53 is the client,
172.22.2.50 is the rabbitmq server):
23 0.047038 172.22.2.53 172.22.2.50 AMQP Tx.Select
24 0.047369 172.22.2.50 172.22.2.53 AMQP Tx.Select-Ok
25 0.047607 172.22.2.53 172.22.2.50 AMQP Basic.Publish
26 0.086293 172.22.2.50 172.22.2.53 TCP amqp > 34000 [ACK] Seq=365
Ack=385 Win=6864 Len=0 TSV=951060977 TSER=951060626
27 0.086469 172.22.2.53 172.22.2.50 AMQP Content-Header Content-Body Tx.Commit
28 0.126293 172.22.2.50 172.22.2.53 TCP amqp > 34000 [ACK] Seq=365
Ack=432 Win=6864 Len=0 TSV=951060987 TSER=951060636
You can see the client Tx.Commit (and it gets an ACK) but no
Tx.Commit-Ok comes back from the server. The client hangs here until I
ctrl-C it.
$ sudo rabbitmqctl list_queues -p /test name messages
Listing queues ...
alistair_test_q 0
...done.
A good session on a different broker shows this (again pruned,
172.22.3.2 is client, 172.22.3.3 is rabbitmq):
23 0.051783 172.22.3.2 172.22.3.3 AMQP Tx.Select
24 0.052205 172.22.3.3 172.22.3.2 AMQP Tx.Select-Ok
25 0.052517 172.22.3.2 172.22.3.3 AMQP Basic.Publish
26 0.090882 172.22.3.3 172.22.3.2 TCP amqp > 45215 [ACK] Seq=365
Ack=385 Win=6864 Len=0 TSV=2487670438 TSER=2160225287
27 0.091095 172.22.3.2 172.22.3.3 AMQP Content-Header Content-Body Tx.Commit
28 0.091288 172.22.3.3 172.22.3.2 TCP amqp > 45215 [ACK] Seq=365
Ack=432 Win=6864 Len=0 TSV=2487670438 TSER=2160225297
29 0.097365 172.22.3.3 172.22.3.2 AMQP Tx.Commit-Ok
30 0.097703 172.22.3.2 172.22.3.3 AMQP Tx.Select
31 0.097926 172.22.3.3 172.22.3.2 AMQP Tx.Select-Ok
32 0.098190 172.22.3.2 172.22.3.3 AMQP Basic.Get
33 0.098900 172.22.3.3 172.22.3.2 AMQP Basic.Get-Ok Content-Header Content-Body
34 0.099780 172.22.3.2 172.22.3.3 AMQP Tx.Commit
35 0.100436 172.22.3.3 172.22.3.2 AMQP Tx.Commit-Ok
$ sudo rabbitmqctl list_queues -p /test name messages
Listing queues ...
alistair_test_q 1
...done.
What else can I do to figure out what is wrong? Is there a way to get
more verbose logging without restarting the server? Are there some
other tools that'll help with debugging this?
Thanks,
Alistair
More information about the rabbitmq-discuss
mailing list