[rabbitmq-discuss] hanging on message sending with py-amqplib

Sat Oct 2 18:53:09 BST 2010

> Why is your water mark set so low? I thought you said you had 2GB of memory
> in the machine, which with the default setting of the mark at 0.4 would be
> about 819Mb.

The machine runs quite a few processes.  My understanding of the high
water mark is that the erlang process running rabbit will use at most
2*(watermark * total_ram), so a watermark of 0.1 will result in rabbit
consuming at most 20% of the machine's RAM, correct?  The default
value of 0.4 would let rabbit consume 80% of RAM, which isn't so good
for the other stuff running on the machine.  I know the ideal is that
rabbit gets its own machine, but that isn't an option for us; we sell
individual computers.

> Toke would probably help since you have one queue with nearly 600k messages.
> But 200Mb is really very little to give to a rabbit that is supposed to
> handle the message volume you have. So I'd start be increasing that.

We're not too worried about having really high rabbit performance;
rabbit's ability to handle hundreds of thousands of messages per
second is awesome, but the rest of our system can't handle more than a
few thousand messages per minute, so rabbit isn't a bottleneck.  I had
hoped that setting a low watermark would just make rabbit keep more
messages on disk and less in RAM, and it would just have to feed from
its disk queues more often than it would with a higher watermark.

> A prefetch of 1000 is quite high. I'd suggest lowering that.

That process gets messages from rabbit, does indexing on the
associated data (not stored in rabbit) and does a checkpoint per
minute on a postgres database, a transactional search index, and on
rabbit.  If it acknowledges messages from rabbit before they are
committed, then they can be lost, which is bad.  Checkpoints on the
search index are somewhat expensive; doing them once a minute isn't
bad, but more frequent checkpoints mean busier disks, which means an
uphappy machine.  The number of messages we can process per checkpoint
interval is effectively limited by the prefetch limit, and going under
a thousand means our indexer is idle until the next checkpoint.

> Blocking happens at the *connection* level. A client connection that hasn't
> done any publishing will not be blocked.

Ok, so if I have a few channels on a connection, and one of the
channels is only getting messages, acking message, and committing
while another channel is publishing, then the connection will be
blocked on a low memory alert.  If I move the publishing channel and
the consuming channel to different connections, then the consuming
channel will never be blocked?  I can do that easily enough, and it
might just fix things.