[rabbitmq-discuss] hanging on message sending with py-amqplib

Fri Oct 1 18:56:27 BST 2010

> Ok, so you should be seeing a fair amount of disk activity.

Sort of; the disks on the affected machines seem to resist activity
(i.e. they're crap).  I think I am maxing them out though, which I
guess is the problem here.

> Are the consumers part of the same clients that are doing the sending or
> are they different clients?

The consumers also do sending on their own.  It turns out that it's a
single consumer that is the source of the problem.  That consumer gets
messages via a subscription, does some pretty IO heavy work
(parsing/indexing file data), and then publishes new messages and acks
the ones it was working on.  All the queue work is done
transactionally, so the acks and the publishes happen at once.  All my
consumers are also producers, but none of them smash the disk as badly
as that one, which is probably why the problem goes away when I shut
off that consumer.

> At what rate are the consumers consuming messages? Rabbit is optimised
> to get rid of messages quickly, thus if a queue is quite long, it can
> drive consumers as fast as possible, but will ignore publishes until the
> queue has become empty... however, that should really only express
> itself internally within Rabbit - the client shouldn't actually see any
> impact on publishing, unless it was also doing something synchronous
> from time to time like tx.commit. Are you finding it blocks when it hits
> tx.commit?

Yeah, every tx.commit is blocking when that one consumer is running.
Can the priority of publishes/commits be elevated when in a
transaction?  It sounds like starvation is really likely for
transaction producer/consumer processes, if I understand what you said
correctly.  Would setting a really low prefetch for this consumer
cause rabbit to send fewer messages to that consumer without
acknowledgement, and thus maybe provide some degree of throttling?  I
don't think I'm setting the prefetch limit on that channel at all
right now, which would maybe explain why it can keep working even when
rabbit is unable to handle any tx_commits or publishes.

> Seeing as you're using a client to shovel messages from one broker to
> another, I would suggest experimenting with using the shovel - I'd
> configure it in the new broker (where it's easiest to install) and just
> have it drag all messages over from the old broker. If that works, then
> it does point to something going wrong in the python client library.

To get two broker's running, I'd need to use different names for both,
and configure the old one to use a non-standard port, right?  Is there
anything else I'd need to do to get two rabbits running on the same
machine?  Also, does shovel run in transactional mode to guarantee
that no messages are lost?

> Socket flushing will block if the broker has hit its high memory
> watermarks. This is because we now use TCP backpressure to block any
> clients which are sending us messages. If you have the sender and the
> consumer on the same connection, this will certainly affect you, and it
> is possible that memory use will be worse (more fragmented) if you have
> a consumer and publisher at the same time rather than just a publisher,
> so you might just be hitting the limits more often. It could be that the
> socket flush doesn't correctly return even when the socket becomes
> unblocked.

I'm guessing that maybe it wasn't an eternal block, it was just due to
how unhappy the machines disks are with running rabbit and my indexing
at the same time.  I never saw this under bug21673 (last checkout in
April, though); did the publishes being starved by sends change since
then?

> Brilliant timing ;) Hmm, check disk access on the broker - if
> transactions are occurring then there should be a fair amount of disk
> activity. Probably a dumb question, but you have done a tx.select on the
> publishing channel first?

Yeah, tx.select is the first thing I do on all my channels.