[rabbitmq-discuss] RabbitMQ and batch processing
Laing, Michael
michael.laing at nytimes.com
Tue May 20 13:09:05 BST 2014
> I like the idea of being able to keep a large "buffer" so to speak of
> recently sent messages available for analysis. So, a failed batch job could
> very easily inspect the actual messages that were persisted/sent in an
> effort to debug. That is pretty sweet. Cassandra is very nice for this as
> well, since persistence to C* is can be delegated to a consumer in the
> buffer layer or core. Do you simply shovel everything to a particular path
> queue for something to persist to Cassandra? Or is that done inline
> somewhere before publishing to RabbitMQ? I assume that that kind of thing
> is done out-of-band of the client.
>
'cache_push' and 'cache_pull' are currently done in the central (switching)
core in parallel with other operations. They will be run on our autoscaling
edge nodes also as soon as demand warrants. The clients never work directly
with the cache and, in fact, we have used DynamoDB, Riak, etc. in the past
as implementations. We are happiest w Cassandra because it is very fast,
free to use, easy to manage, and scales.
>
>
>> We also 'journal' all messages with ttl of 30 days. It has worked well
>> for us and we can easily scale up and down. The fast generalized cache
>> makes it possible to support quite interesting apps.
>>
>
> Hmm. I'm not sure I follow. What do you mean by journal?
>
Each message is stored by timestamp like a log. It's slightly more
complicated than that in practice, to avoid hotspots, but the idea is to
have a complete sequential record of all activity for analysis and replay.
>
> I am also curious about some of the things you do with Fabrik... For
> example, is persistence handled by the client (indirectly via the Fabrik
> library) or some other component service that listens to RabbitMQ?
>
The fabrik handles persistence on behalf of the client. We have
parallelized event-driven python services that do this using the pika and
cassandra-driver modules. Examples and benchmarks, including a version of
our 'rabbit_helpers' framework, will be open-sourced for my Cassandra Day
presentation on 21 June in NYC at our office.
To be clear, each of our python services is quite small and focused: they
subscribe to one or a few queues and publish to an exchange or 2. So
'cache_pull' and 'cache_push', and our other services, are not libraries
others use, they are services that respond only via AMQP. Hence they are
'black boxes', with internals hidden, and are easy to test.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140520/e6fb9944/attachment.html>
More information about the rabbitmq-discuss
mailing list