[rabbitmq-discuss] Active/active HA setup

Fri Sep 3 10:48:01 BST 2010

Jiri

Ahh.  So maybe I misunderstood something.  Is it the case that there
is exactly one 'request' queue on each broker?

alexis

On Fri, Sep 3, 2010 at 10:33 AM,  <jiri at krutil.com> wrote:
> Alexis
>
> I don't think that exclusive queues are a problem. Our clients use
> auto-delete server-named exclusive queues for receiving responses, so every
> time a client re-connects, it must re-create and re-bind its exclusive
> queue(s) anyway, even with a single broker.
>
> The common exchange and queue where requests are sent are pre-declared as
> durable on both brokers.
>
> So I really don't see any resources that require migration.
>
> Regards
> Jiri
>
>
>> Jiri
>>
>> It makes some sense, but you *will* need to migrate resources, or recreate
>> them, to continue working on the secondary.  You will not need to migrate
>> messages, but there is other broker state that you will need: for example
>> a
>> consumer's exclusive queue on the primary, will need to exist on the
>> secondary, with the correct bindings, connection, etc.
>>
>> So my question is: do you plan to create that queue (etc) on the secondary
>> after the primary fails?  If "yes" then a potential issue is managing the
>> failover window size and (possible, but less likely) resource limits on
>> the
>> secondary.  If "no" then you will need to 'pre build' the spare resources
>> somehow.
>>
>> These are not necessarily difficult issues to solve in any particular
>> instance, however I am highlighting them in response to your request for
>> feedback.  These issues do become deeper when you try to generalise to
>> multiple different failure scenarios, of course.
>>
>> alexis
>>
>>
>> On Fri, Sep 3, 2010 at 8:40 AM, <jiri at krutil.com> wrote:
>>
>>> Alexis
>>>
>>> Our plan is to have two brokers with the same setup, both being actively
>>> used at the same time, but each broker serving a different set of
>>> clients.
>>>
>>> Our back-end service will be connected to and process requests from both
>>> brokers at the same time.
>>>
>>> When one broker fails, the clients will loose connection and will have to
>>> reconnect, ending up on the other broker. Messages that were on the dead
>>> broker will be lost.
>>>
>>> So we don't need migration of resources between brokers in case of
>>> failure,
>>> we only need the clients to move their connections to the other broker.
>>>
>>> Hope that makes sense...
>>>
>>> Jiri
>>>
>>>
>>>
>>>  Jiri
>>>>
>>>> Cool.  So yes messages will then only arrive out of order in the case
>>>> where some arrive from the secondary before 'delayed' messages from
>>>> the failed primary; and then, for reordering them, it suffices to know
>>>> which broker they came from.  (In the absence of failure, TCP should
>>>> take care of reordering).
>>>>
>>>> I think the issues will be:
>>>>
>>>> 1. Deciding when to stop listening to a primary.  Given consumers
>>>> don't care about message loss, I would suggest "as soon as consumers
>>>> are aware of primary failure, then they should ignore further messages
>>>> from the primary"
>>>>
>>>> 2. Failover time.  AIUI you want to minimise this by having a copy of
>>>> the whole queue/exchange/binding set-up on both brokers.  But how
>>>> exactly do you plan to do this?
>>>>
>>>> alexis
>>>>
>>>>
>>>> On Fri, Sep 3, 2010 at 8:12 AM,  <jiri at krutil.com> wrote:
>>>>
>>>>> Alexis
>>>>>
>>>>> The answer is no - a client can send requests to only one broker at any
>>>>> given moment. The client connects via load balancer to one of the
>>>>> brokers
>>>>> and stays connected all the time. The client does not even know that
>>>>> there
>>>>> are two brokers (it only sees one IP address).
>>>>>
>>>>> I think requests may be delivered out of order only if a client fails
>>>>> over
>>>>> to another broker. Then messages send to one broker can get mixed up
>>>>> with
>>>>> messages sent to the other.
>>>>>
>>>>> My concern was: are there any other issues with this kind of setup that
>>>>> I
>>>>> might have missed? Does anyone have experience with this?
>>>>>
>>>>> Thanks a lot for your help
>>>>> Jiri
>>>>>
>>>>>
>>>>>
>>>>>  Jiri
>>>>>>
>>>>>> You say that "Some clients send requests to one broker, some to the
>>>>>> other".
>>>>>>
>>>>>>
>>>>>> Does this mean that one client publisher can send messages (requests)
>>>>>> to
>>>>>> both brokers, in such a way that a pair of messages may arrive out of
>>>>>> order
>>>>>> if one is sent to each broker?
>>>>>>
>>>>>> If the answer is no, then I think my answer stands, because causal
>>>>>> order
>>>>>> will be preserved even if messages are lost.  That is: messages that
>>>>>> arrive
>>>>>> successfully, will not be out of order with each other.
>>>>>>
>>>>>> If the answer is yes, then I am not sure how you can recover global
>>>>>> ordering
>>>>>> without imposing it at the publisher using sequence numbers at the app
>>>>>> level.
>>>>>>
>>>>>> Does this make sense?
>>>>>>
>>>>>> alexis
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 2, 2010 at 9:46 PM, Jiri Krutil <jiri at krutil.com> wrote:
>>>>>>
>>>>>>  Alexis
>>>>>>>
>>>>>>> Sorry I probably didn't express myself well.
>>>>>>>
>>>>>>> We don't plan a primary and secondary broker, but a pair of brokers
>>>>>>> that
>>>>>>> are both active at the same time. A load balancer divides client
>>>>>>> connections
>>>>>>> to these brokers. A request queue with the same name exists on both
>>>>>>> brokers,
>>>>>>> but with different contents. Some clients send requests to one
>>>>>>> broker,
>>>>>>> some
>>>>>>> to the other. Our back-end server listens to both queues, processes
>>>>>>> requests
>>>>>>> and sends each response to an exclusive client queue on the broker
>>>>>>> from
>>>>>>> where the request came.
>>>>>>>
>>>>>>> Ideally this would be transparent to the clients, because the brokers
>>>>>>> would
>>>>>>> be hidden by a virtual IP address. Of course it can't be transparent
>>>>>>> to
>>>>>>> the
>>>>>>> back-end server, which needs to talk to both brokers at the same
>>>>>>> time.
>>>>>>>
>>>>>>> So (a) is correct, but (b) not.
>>>>>>>
>>>>>>> Hope that makes it a bit clearer...
>>>>>>>
>>>>>>> Cheers
>>>>>>> Jiri
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Jiri
>>>>>>>
>>>>>>>>
>>>>>>>> That answered my questions.  Now, as I understood your example:
>>>>>>>>
>>>>>>>> a. you don't mind messages being lost
>>>>>>>> *and*
>>>>>>>> b. you don't use the secondary until after the primary has failed.
>>>>>>>>
>>>>>>>> Note that if consumption is completely 'fire and forget' then it is
>>>>>>>> possible that a message from the primary may *arrive* after a
>>>>>>>> message
>>>>>>>> from
>>>>>>>> the secondary.  But this can happen whether you use sequence numbers
>>>>>>>> or
>>>>>>>> not.
>>>>>>>>
>>>>>>>> So if the primary broker fails, why not just forget all undelivered
>>>>>>>> messages?  Consumers will know that any message consumed from the
>>>>>>>> secondary
>>>>>>>> must be later in *all* orderings than any message consumed from the
>>>>>>>> primary.
>>>>>>>>  So, additional sequence numbering is not necessary.
>>>>>>>>
>>>>>>>> alexis
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
>