[rabbitmq-discuss] Feature Req / Bug list

Thu Oct 3 22:03:16 BST 2013

Hey everyone,

I've recently been doing a deployment of a 5 node rabbit cluster, and found
some rough edges I thought I should share. I realize many of these are
feature reqs, but I'm hoping that I just haven't discovered the proper
configuration to deal with some of these issues, or have misunderstood
Rabbit's behaviour. If not, hopefully they can become feature req items
that'll make a dev schedule at some point.

All items below were discovered while deploying 3.1.5 over the past few
days. Hosts in question have 24 sandy bridge HT cores, 64GB of RAM, XFS
filesystem, running on CentOS 6. Cluster is 5 nodes, with a default HA
policy on all queues of exact/3/automatic-sync.

HA / Clustering:

- expected queues to be distributed evenly among cluster machines, instead
got all queues on first 3 machines in the cluster, nothing on the last 2.
- expected message reads from a mirror machine for a queue to do the read
i/o locally, so as to spread out workload, but it appears to always go to
the host where the queue was created.
- this led to a single node with ~35k active open filehandles, and 4 nodes
with ~90. not an optimum distribution of read workload.
- expected that if system a queue was created on is permanently removed
(shut down and "rabbitmqctl forget_cluster_node hostname"'d), automatic
sync would ensure there's the right number of copies replicated, but
instead it just left every single queue under replicated.
- when a new policy is applied that defines specific replication nodes, or
a number of copies using 'exact, and auto-sync is set, sometimes it just
syncs the first replica and leaves any others unsynced and calls it job
done. This is bad.
- had to add a new global HA policy and delete the existing one before
rabbit fixed my queue replication.
- Attempted to create small per-queue policies to redistribute messages and
then delete the per-queue policies, but this often leads to a inconsistent
cluster state where queues continued to show as being part of a policy that
was already deleted, attempt to resync, and get stuck, unable to complete
or switch back to the global default policy.
- sometimes the cluster refuses to accept any more policy commands. Have to
fully shut down and restart the cluster to clear this condition.
- sometimes policies applied to empty and inactive queues don't get
correctly applied, and the queue hangs on "resyncing / 100%". this makes no
sense, given the queue is empty, and requires a full cluster restart to
clear.
- would like to see a tool to redistribute queues amongst available cluster
machines according to HA policy. Ideally something that happens
automatically on queue creation, cluster membership and policy changes, but
would take something manual I could run out of cron.
- I've managed to get the cluster into an inconsistent state a /lot/ using
the HA features, so it feels like they need more automated stress testing
and bulletproofing.

Persistent message storage:

- it appears as if messages are put into very small batch files on the
filesystem (1-20 MB)
- this causes the filesystem to thrash if your IO isn't good at random IO
(SATA disks) and you have lots of persistent messages (>200k messages
500kB-1MB in size) that don't fit in RAM.
- this caused CentOS 6 kernel to kill erlang after stalling the XFS
filesystem for > 120s.
- if a node crashes, Rabbit seems to rescan the entire on-disk datastore
before continuing, instead of using some sort of checkpointing or
journaling system to quickly recover from a crash.
- all of above should be solvable by using an existing append-only
datastore like eLevelDB or Bitcask.
- we solved for now by using SSDs, but this bumps up the cost of each RMQ
node, and doesn't solve the node crash recovery problem, just speeds up the
process somewhat.

Web API:
- API seems to block when cluster is busy, even for informational GETs, so
you can't determine what's going on with the cluster.
- Some API operations seem to block until they complete (like putting a new
policy), while others return immediately even though they're definitely not
completed yet (like deleting a policy). It's not documented which have
which behaviour, or why they don't just all block until op is completed.

Hopefully you guys can educate me on what I'm doing wrong in some of these
scenarios, or how to mitigate some of these issues. Any issue that requires
taking down and restarting the cluster to fix is especially troubling.

Thanks,
Graeme
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131003/da60c375/attachment.htm>