[rabbitmq-discuss] Design for global data ingestion
videlalvaro at gmail.com
Wed Nov 13 13:33:51 GMT 2013
I've been trying to setup something like this, more like an
experiment/research to see how far federation could be taken across AWS
availability zones, but I have to admit that I've been sidetracked by
various projects and conferences.
When I was living in China I remember that a company there used to have a
similar problem as you, when sending data to the US.
Considering that you have upstreams at various locations in the world,
could it be the case that you have better TCP bandwidth from China to
another non US location, say HK, Japan, Singapore, or Korea?
If that's the case you could perhaps try to form a different federation
graph that would allow you to achieve some max flow routing. Say publishing
from China to a country that has better bandwidth than from China to the
US, and then from that country to the US, or to a third country and so on.
I know this might sound nice theoretically but I haven't tried it. I throw
this out nonetheless, in case it might lead you to a solution.
On Thu, Nov 7, 2013 at 6:47 PM, Dustin <dustink.ml at gmail.com> wrote:
> Hello All!
> I wanted to shoot this out to see if anyone has had any experience with
> using RabbitMQ for a mass global data ingestion pipeline. A small
> disclaimer, I'm a total RMQ noob :)
> We currently have a fan-in design, where we have a single downstream 2
> node HA cluster in the same data center as our data warehouse. We have
> around 22 upstreams (also 2 node HA clusters) located in datacenters all
> over the world. The configuration is extremely simple. We have a single
> direct exchange, which everything publishes to. Each application uses a
> specified routing key for that application. We end up with queue per
> application (currently around 10). We are running 3.0.0 on the downstream
> cluster (been waiting for a maintenance window to upgrade) and 3.1.5 on the
> This design has held up well, and we are averaging around 20k/sec messages
> a day.
> We have ran into 2 problems which won't allow us to scale any further.
> The first is the max bandwidth for a single TCP connection across the
> globe (specifically between the US and China). The second is we have maxed
> out the CPU for the federation clients on the downstream (SSL is enabled,
> I'm not sure how much CPU overhead that adds).
> For the CPU issue, I figured the newly added federated queues would be a
> perfect solution to the problem. I can setup additional Rabbits on the
> downstream side, setup the federation links, and have everything load
> balance nicely. The only thing it doesn't address is the max bandwidth for
> a single TCP connection. Because of our initial design, we would run into
> max bandwidth problems for each queue.
> Our current objective is to be able to send 20k/sec messages from each
> datacenter for a single application. In China, the most we can do is
> around 2.5k/sec (ends up being around 1.6MB/sec, this is on a good day).
> Because this message load will be from a single application, with the
> current design, it will be tied to a single routing key. So for China, I
> would need around 8 TCP connections to do this.
> For this use case, message order doesn't matter. Does anyone have any
> ideas on how I can setup multiple federation links that will be load
> balanced? Here are some ideas I have, but they all feel hacky.
> 1) On the upstreams, use a consistent hash exchange, with exchange to
> exchange bindings to 8 direct exchanges, which would be federated.
> 2) Run multiple instances of RMQ on the downstream machines, and use
> federated queues. Total number of instances across all machines should be
> greater than 8.
> My apologies in advance if I'm missing something obvious. Please let me
> know if I'm trying to fit a round peg in a square hole. :)
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rabbitmq-discuss