[rabbitmq-discuss] Active/Active: shutdown of one service brings down the cluster

Vadim Chekan kot.begemot at gmail.com
Thu Feb 16 03:37:00 GMT 2012


Hi Jerry,

My use case is a typical pub/sub. There is a farm of applications which
broadcast their state to eachother periodically. Each app creates its own
queue and binds it to a well-known broadcast (fanout) exchange.
  It is typical for pub/sub to use transient queues.
Besides pub/sub broadcast, applications use messaging for critical use
cases, such as quick cache invalidation, that's why HA is configured.

> Note that you can't straightforwardly redeclare a queue that had been on
> a node that's gone down.  The cluster's metadata will still know about it
> and prevent you from redeclaring it.

The above is the only reason I still declare transient queue an HA. I
wouldn't like it when a failed node cause errors in applications.

Unless you mean that the above applies to a persistent queue only?
Otherwise I do not see how transient queue can be used in a cluster
environment. If a node goes down and there is no way to re-declare a queue
with the same name, it very likely breaks the application tier.

To summarize,
  critical use cases -> HA cluster (AKA active/active)
  pub/sub use case -> transient queues
  don't want failed node to cause lost queue -> forced to declare transient
HA queue?

Thanks,
Vadim.

On Wed, Feb 15, 2012 at 2:09 PM, Jerry Kuch <jerryk at vmware.com> wrote:

> Hi, Vadim.
>
> I apologize if I'm misunderstanding you...  I'm not entirely sure why you'd
> want a *transient* queue to be HA.  Unless you've developed a mistaken
> understanding that reading from a queue when you're connected to a
> particular
> cluster node, N, requires a mirror of that queue on the node N:  it, in
> fact
> does not, and Rabbit will get messages from the queue internally and
> deliver
> them to your consumer on whichever node it's connected to.
>
> So if I'm reading you correctly and your scenario is that you want to
> declare
> transient queues and access them from any cluster node, then you don't have
> any extra work to do.  Just declare that transient queue, period, and
> access
> it freely.  You don't have to worry about where it is, and you'd use HA
> only
> if you wanted the queue to remain available if the node it lived on went
> down.
>
> Take a look at:  http://www.rabbitmq.com/clustering.html
>
> It summarizes what data lives where in a cluster.  In particular, that
> "All data/state required for the operation of a RabbitMQ broker is
> replicated
> across all nodes, for reliability and scaling, with full ACID properties.
> An exception to this are message queues, which by default reside on the
> node
> that created them, though *they are visible and reachable from all nodes*."
>
> Make sense?
>
> Best regards,
> Jerry
>
>
>
> ~~~~]
> :
>
> ----- Original Message -----
> From: "Vadim Chekan" <kot.begemot at gmail.com>
> To: "Jerry Kuch" <jerryk at vmware.com>
> Cc: rabbitmq-discuss at lists.rabbitmq.com
> Sent: Wednesday, February 15, 2012 12:11:45 PM
> Subject: Re: [rabbitmq-discuss] Active/Active: shutdown of one service
> brings down the cluster
>
> Hi Jerry,
>
> So is there a better way to declare transient queues then declaring them
> as HA?
> I can see only alternative by adding a random string to queue name. And
> which way is preferred?
>
> Vadim.
>
>
> On Tue, Feb 14, 2012 at 12:45 PM, Jerry Kuch < jerryk at vmware.com > wrote:
>
>
> Hi, Vadim:
>
> A client doesn't need to be connected directly to the node on which
> a queue and its attendant Erlang process reside. If your load balancer
> sends you to any live node in the cluster you can consume from the queue
> of your choice, as long as it's still alive.
>
> Note that you can't straightforwardly redeclare a queue that had been on
> a node that's gone down. The cluster's metadata will still know about it
> and prevent you from redeclaring it. This is intentional, to avoid the
> confusion that would result if you succeeded at the redeclare, a new
> queue of the same name and properties came into existence on another node,
> and then the original, downed node came back up in the cluster...
>
> Best regards,
> Jerry
>
>
>
> ----- Original Message -----
> From: "Vadim Chekan" < kot.begemot at gmail.com >
> To: "Simon MacMullen" < simon at rabbitmq.com >,
> ghanna at verticalsearchworks.com
> Cc: rabbitmq-discuss at lists.rabbitmq.com
> Sent: Tuesday, February 14, 2012 12:40:05 PM
> Subject: Re: [rabbitmq-discuss] Active/Active: shutdown of one service
> brings down the cluster
>
>
> Hi Simon,
>
> Thanks for looking into the logs. Since we fixed channel leak in our
> application we do not experience any problems anymore.
>
> Regarding transient queues in HA. I am just not sure how system would
> behave when non-HA queue is declared in a cluster environment.
> Documentation describes in great details what happen to mirrored queues but
> I can't find anything about non-ha queue in HA cluster. Queue will be
> created on a single server, and application should be ready to re-declare
> queue in case of failover. So far so good. But how does it work with load
> balancer? When request is made against a server which does not have a given
> queue, will the cluster "know" where the queue is and proxy the request to
> the proper server?
>
> Thanks,
> Vadim.
>
>
> On Mon, Feb 13, 2012 at 10:09 AM, Simon MacMullen < simon at rabbitmq.com >
> wrote:
>
>
>
> On 10/02/12 00:20, Vadim Chekan wrote:
>
>
> I think we nailed down a problem. We had a channel leak in our
> application. With ~50 connections we had >90 channels per connection and
> growing. This definitely correlates to high CPU usage.
>
> What I still do not understand either it triggered rabbit into unstable
> state or it was something else. Maybe increasing latencies in message
> handling triggered cluster members into flipping neighbor aliveness
> status back and force? Just speculating here: could timeouts because of
> high load cause network fragmentation, when every node temporally does
> not see neighbors, becomes a master, than see a neighbor, freak out, etc?
>
> That's plausible, but I don't think that's what's happening (there's
> nothing about network partitioning in the logs).
>
>
>
>
> I've attached logs from all 3 cluster members. They are polluted with
> load balancer "ping".
>
> Thanks. I've had a poke at this but nothing is leaping out at me yet. I'll
> keep at it though.
>
> One thing that's a bit odd: you seem to be creating HA / transient /
> autodelete / exclusive queues. So although they're "HA", they will vanish
> if any of the following happens:
>
> * The entire cluster goes down (transient) or
> * All consumers for a queue cancel (autodelete) or
> * The connection that created them closes (exclusive)
>
> Is this intentional? It seems like an odd use of HA.
>
> Cheers, Simon
>
>
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>
>
>
> --
> From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
> explicitly specified
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
> --
> From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
> explicitly specified
>



-- 


More information about the rabbitmq-discuss mailing list