[rabbitmq-discuss] A few questions about custom rabbit_msg_store_index implementations

Wed Aug 10 17:27:18 BST 2011

Hi Michael,

On Wed, Aug 03, 2011 at 06:43:41PM +0400, Michael Klishin wrote:
> I am looking at implementing my own rabbit_msg_store_index that would use
> Redis (probably with JSON serialization
> of Erlang data types, I need basic queue inspection capabilities and don't
> care much about performance degradation compared to
> raw ETS).
> 
> My first question is, could someone from the rabbitmq team briefly explain
> when each of the functions is used? insert and delete
> are pretty self-explanatory but some others are not. This is what I am
> referring to:
> 
> behaviour_info(callbacks) ->
>     [{new,            1},
>      {recover,        1},
>      {lookup,         2},
>      {insert,         2},
>      {update,         2},
>      {update_fields,  3},
>      {delete,         2},
>      {delete_by_file, 2},
>      {terminate,      1}];

I'm afraid this isn't what you're looking for. A brief explanation of
the architecture is as follows:

Each queue is a a process which has its API in amqqueue, and its
implementation in amqqueue_process. amqqueue_process does quite
high-level things though and isn't concerned with the storage of
messages in any way. Thus every queue is parameterised with an
implementation of backing_queue. There is currently one implementation
of backing_queue which is variable_queue (though there used to be
another called invariable_queue which used the old persister). The
purpose of the backing_queue interface is to abstract out all concerns
about storage of messages, memory pressure etc etc.

variable_queue itself depends on queue_index and msg_store. queue_index
is there so that when queues need to, they can store to disk ordering
information and queue-msg-specific data (i.e. in _this_ queue, msg B
follows msg A, and in _this_ queue, msg A has been delivered). Note that
because the same msg can end up in multiple different queues (and that
msg will have the same identity in every queue), it's important to
identify which bits of information are specific to a msg in a particular
queue. The storage of that information on disk is the purpose of
queue_index.

msg_store is a node-global process that is concerned with the storage of
msgs themselves on disk. It is, if you like, a very very specialised
form of key-value store where updates are not allowed, clients of the
store must behave in a particular way, and entries are refernce counted.
The msg_store knows nothing about queues and cares not about any
ordering issues. The msg_store has to have a couple of indexes which
allows it to keep track of where msgs are. Thus the msg_store_index is
simply there to provide the mapping from msg id to
{file,offset,length,ref_count}. The reason it's abstracted is because in
its normal form (i.e. an ets table), each entry costs RAM, and there is
one entry per message. If you actually need to have queues of such
length that they are bounded by disk space only, then you install
rabbitmq-toke, which replaces this ets mapping with a tokyo-cabinet
mapping which is a bit slower, but the mapping has fixed memory
footprint and is itself disk based. Thus you eliminate what is the last
major per-message RAM cost.

Regardless of the entries in this mapping, they tell you nothing about
which queues the msgs belong to, nor which order they occur in the
queues. This mapping tells you _where_ msgs can be found. It doesn't
give you the msgs themselves. It also can't be safely used on its own:
there are other mappings which tell you information about the state of
the files, and in some cases (eg a disk GC going on), files may be
locked as msgs are moved around and compacted. Thus at various points,
the contents of the msg_store_index mapping are wrong and can't be used.

I suspect what you want to do is to implement backing_queue such that it
stores msgs in something like redis. It shouldn't be too hard to do that
for the basic cases, especially if performance isn't a concern.

I trust you've read
http://www.rabbitmq.com/blog/2011/01/20/rabbitmq-backing-stores-databases-and-disks/

Hope that helps,

Matthew