[rabbitmq-discuss] Active/active HA setup

Fri Sep 3 12:11:29 BST 2010

Cool.

So, I would suggest going ahead with this.  I think we have explored
the main issues.  Thank-you very much for asking about it on the list!

Do please tell us how you get on...

alexis

On Fri, Sep 3, 2010 at 11:15 AM,  <jiri at krutil.com> wrote:
> Alexis
>
> Yes, precisely. One request queue bound to one request exchange per broker
> plus one response exchange per broker, all durable.
>
> Responses go via the response exchange to client-specific exclusive
> auto-delete queues according to message reply-to attribute.
>
> Sorry for not being clear about that.
>
> Cheers
> Jiri
>
>
>> Jiri
>>
>> Ahh.  So maybe I misunderstood something.  Is it the case that there
>> is exactly one 'request' queue on each broker?
>>
>> alexis
>>
>>
>> On Fri, Sep 3, 2010 at 10:33 AM,  <jiri at krutil.com> wrote:
>>>
>>> Alexis
>>>
>>> I don't think that exclusive queues are a problem. Our clients use
>>> auto-delete server-named exclusive queues for receiving responses, so
>>> every
>>> time a client re-connects, it must re-create and re-bind its exclusive
>>> queue(s) anyway, even with a single broker.
>>>
>>> The common exchange and queue where requests are sent are pre-declared as
>>> durable on both brokers.
>>>
>>> So I really don't see any resources that require migration.
>>>
>>> Regards
>>> Jiri
>>>
>>>
>>>> Jiri
>>>>
>>>> It makes some sense, but you *will* need to migrate resources, or
>>>> recreate
>>>> them, to continue working on the secondary.  You will not need to
>>>> migrate
>>>> messages, but there is other broker state that you will need: for
>>>> example
>>>> a
>>>> consumer's exclusive queue on the primary, will need to exist on the
>>>> secondary, with the correct bindings, connection, etc.
>>>>
>>>> So my question is: do you plan to create that queue (etc) on the
>>>> secondary
>>>> after the primary fails?  If "yes" then a potential issue is managing
>>>> the
>>>> failover window size and (possible, but less likely) resource limits on
>>>> the
>>>> secondary.  If "no" then you will need to 'pre build' the spare
>>>> resources
>>>> somehow.
>>>>
>>>> These are not necessarily difficult issues to solve in any particular
>>>> instance, however I am highlighting them in response to your request for
>>>> feedback.  These issues do become deeper when you try to generalise to
>>>> multiple different failure scenarios, of course.
>>>>
>>>> alexis
>>>>
>>>>
>>>> On Fri, Sep 3, 2010 at 8:40 AM, <jiri at krutil.com> wrote:
>>>>
>>>>> Alexis
>>>>>
>>>>> Our plan is to have two brokers with the same setup, both being
>>>>> actively
>>>>> used at the same time, but each broker serving a different set of
>>>>> clients.
>>>>>
>>>>> Our back-end service will be connected to and process requests from
>>>>> both
>>>>> brokers at the same time.
>>>>>
>>>>> When one broker fails, the clients will loose connection and will have
>>>>> to
>>>>> reconnect, ending up on the other broker. Messages that were on the
>>>>> dead
>>>>> broker will be lost.
>>>>>
>>>>> So we don't need migration of resources between brokers in case of
>>>>> failure,
>>>>> we only need the clients to move their connections to the other broker.
>>>>>
>>>>> Hope that makes sense...
>>>>>
>>>>> Jiri
>>>>>
>>>>>
>>>>>
>>>>>  Jiri
>>>>>>
>>>>>> Cool.  So yes messages will then only arrive out of order in the case
>>>>>> where some arrive from the secondary before 'delayed' messages from
>>>>>> the failed primary; and then, for reordering them, it suffices to know
>>>>>> which broker they came from.  (In the absence of failure, TCP should
>>>>>> take care of reordering).
>>>>>>
>>>>>> I think the issues will be:
>>>>>>
>>>>>> 1. Deciding when to stop listening to a primary.  Given consumers
>>>>>> don't care about message loss, I would suggest "as soon as consumers
>>>>>> are aware of primary failure, then they should ignore further messages
>>>>>> from the primary"
>>>>>>
>>>>>> 2. Failover time.  AIUI you want to minimise this by having a copy of
>>>>>> the whole queue/exchange/binding set-up on both brokers.  But how
>>>>>> exactly do you plan to do this?
>>>>>>
>>>>>> alexis
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 3, 2010 at 8:12 AM,  <jiri at krutil.com> wrote:
>>>>>>
>>>>>>> Alexis
>>>>>>>
>>>>>>> The answer is no - a client can send requests to only one broker at
>>>>>>> any
>>>>>>> given moment. The client connects via load balancer to one of the
>>>>>>> brokers
>>>>>>> and stays connected all the time. The client does not even know that
>>>>>>> there
>>>>>>> are two brokers (it only sees one IP address).
>>>>>>>
>>>>>>> I think requests may be delivered out of order only if a client fails
>>>>>>> over
>>>>>>> to another broker. Then messages send to one broker can get mixed up
>>>>>>> with
>>>>>>> messages sent to the other.
>>>>>>>
>>>>>>> My concern was: are there any other issues with this kind of setup
>>>>>>> that
>>>>>>> I
>>>>>>> might have missed? Does anyone have experience with this?
>>>>>>>
>>>>>>> Thanks a lot for your help
>>>>>>> Jiri
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Jiri
>>>>>>>>
>>>>>>>> You say that "Some clients send requests to one broker, some to the
>>>>>>>> other".
>>>>>>>>
>>>>>>>>
>>>>>>>> Does this mean that one client publisher can send messages
>>>>>>>> (requests)
>>>>>>>> to
>>>>>>>> both brokers, in such a way that a pair of messages may arrive out
>>>>>>>> of
>>>>>>>> order
>>>>>>>> if one is sent to each broker?
>>>>>>>>
>>>>>>>> If the answer is no, then I think my answer stands, because causal
>>>>>>>> order
>>>>>>>> will be preserved even if messages are lost.  That is: messages that
>>>>>>>> arrive
>>>>>>>> successfully, will not be out of order with each other.
>>>>>>>>
>>>>>>>> If the answer is yes, then I am not sure how you can recover global
>>>>>>>> ordering
>>>>>>>> without imposing it at the publisher using sequence numbers at the
>>>>>>>> app
>>>>>>>> level.
>>>>>>>>
>>>>>>>> Does this make sense?
>>>>>>>>
>>>>>>>> alexis
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 2, 2010 at 9:46 PM, Jiri Krutil <jiri at krutil.com> wrote:
>>>>>>>>
>>>>>>>>  Alexis
>>>>>>>>>
>>>>>>>>> Sorry I probably didn't express myself well.
>>>>>>>>>
>>>>>>>>> We don't plan a primary and secondary broker, but a pair of brokers
>>>>>>>>> that
>>>>>>>>> are both active at the same time. A load balancer divides client
>>>>>>>>> connections
>>>>>>>>> to these brokers. A request queue with the same name exists on both
>>>>>>>>> brokers,
>>>>>>>>> but with different contents. Some clients send requests to one
>>>>>>>>> broker,
>>>>>>>>> some
>>>>>>>>> to the other. Our back-end server listens to both queues, processes
>>>>>>>>> requests
>>>>>>>>> and sends each response to an exclusive client queue on the broker
>>>>>>>>> from
>>>>>>>>> where the request came.
>>>>>>>>>
>>>>>>>>> Ideally this would be transparent to the clients, because the
>>>>>>>>> brokers
>>>>>>>>> would
>>>>>>>>> be hidden by a virtual IP address. Of course it can't be
>>>>>>>>> transparent
>>>>>>>>> to
>>>>>>>>> the
>>>>>>>>> back-end server, which needs to talk to both brokers at the same
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>> So (a) is correct, but (b) not.
>>>>>>>>>
>>>>>>>>> Hope that makes it a bit clearer...
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> Jiri
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Jiri
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That answered my questions.  Now, as I understood your example:
>>>>>>>>>>
>>>>>>>>>> a. you don't mind messages being lost
>>>>>>>>>> *and*
>>>>>>>>>> b. you don't use the secondary until after the primary has failed.
>>>>>>>>>>
>>>>>>>>>> Note that if consumption is completely 'fire and forget' then it
>>>>>>>>>> is
>>>>>>>>>> possible that a message from the primary may *arrive* after a
>>>>>>>>>> message
>>>>>>>>>> from
>>>>>>>>>> the secondary.  But this can happen whether you use sequence
>>>>>>>>>> numbers
>>>>>>>>>> or
>>>>>>>>>> not.
>>>>>>>>>>
>>>>>>>>>> So if the primary broker fails, why not just forget all
>>>>>>>>>> undelivered
>>>>>>>>>> messages?  Consumers will know that any message consumed from the
>>>>>>>>>> secondary
>>>>>>>>>> must be later in *all* orderings than any message consumed from
>>>>>>>>>> the
>>>>>>>>>> primary.
>>>>>>>>>>  So, additional sequence numbering is not necessary.
>>>>>>>>>>
>>>>>>>>>> alexis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>
>
>