<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, May 19, 2014 at 1:10 PM, Laing, Michael <span dir="ltr"><<a href="mailto:michael.laing@nytimes.com" target="_blank">michael.laing@nytimes.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class=""><div><br>
</div></div><div>It's taking longer than I thought :) Bits and pieces are in the pipeline - our RabbitMQ / Python / Cassandra benchmarks will be out there by Cassandra Day NYC, here at the NYTimes 21 Jun. A big piece of our rabbit_helpers python framework is included.</div>
<div class="">
<div></div></div></div></div></div></blockquote><div><br></div><div>I totally appreciate that. For about two files of Ruby code, it took me 2-3 weeks to get it open sourced. :) I appreciate the effort though! Looking forward to its eventual announcement, whenever that may be.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>We let them pass whatever size message they want pretty much. If it is over a configurable size, currently 10k, we try gzipping it; if it is still over another configurable amount, currently 100k, we push it to S3 and place a signed URL in the metadata.</div>
</div></div></div></blockquote><div><br></div><div>Yeah, we already have one of our users gzipping before sending their messages. This actually helped considerably on memory and disk. I really like the idea of doing this on the RabbitMQ side. We would have to support Java and Ruby for this, but I think that's not entirely terrible... as we could probably figure out away to do this reasonably with JRuby. We'll see. </div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>The proxy clusters buy us a lot by: forcing non-persistence, buffering the core clusters, making copies for message replay, allowing us to redirect the message flows among core clusters, etc. And they are relatively local to the our internal customers. We don't run any code on them other than shovels and and federation. Each internal customer has its own vhost. </div>
</div></div></div></blockquote><div> </div><div>Yeah, this part is pretty key to me as well. Having a buffer for core RabbitMQ clusters is essential to things like seamless upgrade, cluster migration, scaling, configuration changes, etc. I want to be able to bring up an entirely new cluster with all of the new changes, begin shoveling data from the old cluster to the new cluster, and then flip the proxy over to the new cluster as the shovel empties the old cluster's queues.</div>
<div><br></div><div>We already have the use pattern where every internal customer has its own vhost. If two services want to communicate, they are given access to the appropriate vhost and go from there -- as well as having their own. Basically, multiple pub-sub channels that any two services can communicate on. This has worked out well, but right now they all reside on the same clusters. This is not great, because it means that one user can cause an outage for all services utilizing our RabbitMQ offering. Makes for a pretty bad user experience.</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div></div></blockquote></div><div>Our customers don't actually pay attention to the fact that they go through a proxy - it is all 'fabrik' to them.</div></div></div></div></blockquote><div><br></div><div>That's exactly where I want to head. </div>
<div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Our proxies are 2-way.</div>
<div class=""><div><br></div></div><div>The publish-to exchange in the proxy has a queue bound to it which is shoveled to the core cluster. The queue will buffer the messages until they are shoveled. The core does whatever. A proxy consume-from exchange is federated to the core as its upstream. The core publishes whatever to that exchange. Consumers create queues and bind to the proxy consume-from exchange, implicitly signaling the upstream to send matching messages. This is one way of configuring the plumbing.</div>
<div class="">
<div></div></div></div></div></div></blockquote><div><br></div><div>Fabrik is responsible for the creation of this plumbing? I.e. users ineract with Fabrik like a standard AMQP library, but when they, for example, create the exchange, Fabrik takes over, creates the publish-to exchange, proxy queue, and shovel configuration... then when they bind a queue to the exchange, Fabrik creates the consume-from exchange and federation configuration on the proxy and core?</div>
<div><br></div><div>Ahhh... and you don't have to maintain state, because all of the state is kept in RabbitMQ. Each new client that comes up is going to attempt to create the appropriate queues/exchanges, but these become noops because they already exist.</div>
<div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Event-driven, no MxN, fast, reliable, flexible, cheap. We have 2 forms of persistence: S3 for big messages, and Cassandra for memory. So most of the fabrik focuses on the routing of small messages. Cassandra lets us 'replay' messages selectively: show me the messages I missed, give me the messages sent yesterday during a 5 minute period, give me the latest 10 messages on this topic, etc. And it lets us gather event messages for near real-time and longitudinal analysis. </div>
</div></div></div></blockquote><div><br></div><div>So do you persist every messages sent via Fabrik? I think we could do this for some things, but we would have to have TTLs for others. We could even have our persistence store honor TTLs that are assigned during queue creation, I think. I wonder if Cassandra has something built in that would make this easy for us.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>That's what we do. Actually we store the json-like stuff in the headers property as a 'metadata' item. The body is opaque to the fabrik - we treat it as binary.</div>
</div></div></div></blockquote><div><br></div><div>I like this. So you have a consistent message format which, when simplified, resmebles:</div><div><br></div><div>{ metadata: {}, body: "" }</div><div><br></div>
<div>I like it.</div><div><br></div><div>Thanks again.</div></div></div></div>