[rabbitmq-discuss] [help] [beginner] server stops sending messages; publish (in transaction) hangs on commit

Alistair Bayley alistair at abayley.org
Thu Feb 23 02:47:21 GMT 2012


I'm looking for ways to debug a rabbitmq server that has been running
since 2010-02-01. Sometime in the last 2 days it stopped delivering
messages. I've seen this happen before, and restarting fixes it, but
I'd like to get to the bottom of why it stops delivering messages, so
I'd prefer to run tests while it is still in this state.

We're using the python amqplib client. The various versions are:

$ uname -srvmpio
Linux 2.6.32-37-generic-pae #81-Ubuntu SMP Fri Dec 2 22:24:22 UTC 2011
i686 unknown unknown GNU/Linux

$ sudo rabbitmqctl report
Reporting server status on {{2012,2,23},{2,18,37}}

Status of node 'rabbit at rabbitmq1' ...
                        {os_mon,"CPO  CXC 138 46","2.2.4"},
                        {sasl,"SASL  CXC 138 11","2.1.8"},
                        {mnesia,"MNESIA  CXC 138 12","4.4.12"},
                        {stdlib,"ERTS  CXC 138 10","1.16.4"},
                        {kernel,"ERTS  CXC 138 10","2.13.4"}]},
 {erlang_version,"Erlang R13B03 (erts-5.7.4) [source] [rq:1]
[async-threads:30] [hipe] [kernel-poll:true]\n"},

Python 2.6.5
python-amqplib 0.6.1-1

I've used this python test program:
To setup for this test you must say (on rabbitmq1):
sudo rabbitmqctl add_user scheduler scheduler
sudo rabbitmqctl add_vhost /test
sudo rabbitmqctl set_permissions -p /test scheduler ".*" ".*" ".*"
from amqplib import client_0_8 as amqp

conn = amqp.Connection(host="rabbitmq1:5672", userid="scheduler",
password="scheduler", virtual_host="/test", insist=False)
chan = conn.channel()
chan.queue_declare("alistair_test_q", durable=True, auto_delete=False)
chan.exchange_declare("alistair_test_ex", type="direct", durable=True,
chan.queue_bind("alistair_test_q", "alistair_test_ex", routing_key="alistair")

msg = amqp.Message("test", delivery_mode=2)
print "tx_select"
print "send message"
chan.basic_publish(msg, "alistair_test_ex", "alistair")
print "tx_commit"

It hangs on the chan.tx_commit().

tcpdump shows this (pruned summary, is the client, is the rabbitmq server):

23	0.047038	AMQP	Tx.Select
24	0.047369	AMQP	Tx.Select-Ok
25	0.047607	AMQP	Basic.Publish
26	0.086293	TCP	amqp > 34000 [ACK] Seq=365
Ack=385 Win=6864 Len=0 TSV=951060977 TSER=951060626
27	0.086469	AMQP	Content-Header Content-Body Tx.Commit
28	0.126293	TCP	amqp > 34000 [ACK] Seq=365
Ack=432 Win=6864 Len=0 TSV=951060987 TSER=951060636

You can see the client Tx.Commit (and it gets an ACK) but no
Tx.Commit-Ok comes back from the server. The client hangs here until I
ctrl-C it.

$ sudo rabbitmqctl list_queues -p /test name messages
Listing queues ...
alistair_test_q 0

A good session on a different broker shows this (again pruned, is client, is rabbitmq):

23	0.051783	AMQP	Tx.Select
24	0.052205	AMQP	Tx.Select-Ok
25	0.052517	AMQP	Basic.Publish
26	0.090882	TCP	amqp > 45215 [ACK] Seq=365
Ack=385 Win=6864 Len=0 TSV=2487670438 TSER=2160225287
27	0.091095	AMQP	Content-Header Content-Body Tx.Commit
28	0.091288	TCP	amqp > 45215 [ACK] Seq=365
Ack=432 Win=6864 Len=0 TSV=2487670438 TSER=2160225297
29	0.097365	AMQP	Tx.Commit-Ok
30	0.097703	AMQP	Tx.Select
31	0.097926	AMQP	Tx.Select-Ok
32	0.098190	AMQP	Basic.Get
33	0.098900	AMQP	Basic.Get-Ok Content-Header Content-Body
34	0.099780	AMQP	Tx.Commit
35	0.100436	AMQP	Tx.Commit-Ok

$ sudo rabbitmqctl list_queues -p /test name messages
Listing queues ...
alistair_test_q 1

What else can I do to figure out what is wrong? Is there a way to get
more verbose logging without restarting the server? Are there some
other tools that'll help with debugging this?


More information about the rabbitmq-discuss mailing list