[rabbitmq-discuss] Client API error recovery
Lev Walkin
vlm at lionet.info
Sat Apr 11 15:35:24 BST 2009
Here I continue to check the Erlang Client library.
Suppose I want to establish a connection with a remote broker, and
subscribe to some channel events. How do I ensure the broker is still
there and have not actually disappeared?
Well, let's try stopping broker using `rabbitmqctl stop` and see what we
can discover by monitoring the processes the Erlang way:
1> Connection = lib_amqp:start_connection("test-amq").
<0.33.0>
2> Channel = lib_amqp:start_channel(Connection).
<0.40.0>
3> {erlang:monitor(process, Connection), erlang:monitor(process, Channel)}.
{#Ref<0.0.0.41>,#Ref<0.0.0.42>}
4> R = fun() -> receive X -> X after 0 -> empty end end.
#Fun<erl_eval.20.67289768>
5> % Doing `rabbitmqctl stop` on the broker.
Broker forced connection: 320 -> <<"CONNECTION_FORCED - broker forced
connection closure with reason 'shutdown'">>
Channel 1 is shutting down due to: normal
5> R().
{'DOWN',#Ref<0.0.0.41>,process,<0.33.0>,normal}
6> R().
{'DOWN',#Ref<0.0.0.42>,process,<0.40.0>,normal}
7> R().
empty
8>
So far so good: if the broker disappears, both Connection and Channel
disappear as well. It seems I can just assume that if I receive a 'DOWN'
message on a Connection, I must re-establish it.
But let's try doing a hard failure: killing the RabbitMQ beam using
`killall beam` on the broker machine. This is a realistic failure scenario.
1> Connection = lib_amqp:start_connection("amq1").
<0.33.0>
2> Channel = lib_amqp:start_channel(Connection).
<0.40.0>
3> {erlang:monitor(process, Connection), erlang:monitor(process, Channel)}.
{#Ref<0.0.0.41>,#Ref<0.0.0.42>}
4> R = fun() -> receive X -> X after 0 -> empty end end.
#Fun<erl_eval.20.67289768>
5> % Doing `killall beam` on the broker.
Channel 1 is shutting down due to: normal
5> R().
{'DOWN',#Ref<0.0.0.42>,process,<0.40.0>,normal}
6> R().
empty
7> is_process_alive(Connection).
true
8>
You see, the channel is dead, but connection is still alive. What can we
do with this connection? Can we start RabbitMQ, open a new Channel and
see if it has reconnected automatically?
8> f(Channel).
ok
9> Channel = lib_amqp:start_channel(Connection).
Channel 1 is shutting down due to: {writer,send_failed,badarg}
=ERROR REPORT==== 11-Apr-2009::07:23:49 ===
** Generic server <0.33.0> terminating
** Last message in was {open_channel,none,<<>>}
** When Server state == {connection_state,"guest","guest","test-amq",
#Port<0.488>,<<"/">>,<0.37.0>,<0.38.0>,0,0,
amqp_network_driver,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}}}
** Reason for termination ==
** {normal,{gen_server,call,[<0.50.0>,{call,{'channel.open',<<>>}}]}}
** exception exit: {{normal,
{gen_server,call,
[<0.50.0>,{call,{'channel.open',<<>>}}]}},
{gen_server,call,[<0.33.0>,{open_channel,none,<<>>}]}}
in function gen_server:call/2
10>
No, it hasn't. The Connection we have been holding to all this time has
not been operational. It was alive, in Erlang process terms, but it is
useless because we cannot use it. Moreover, we cannot discover that the
connection is dead unless we try to use it.
But let's try using that non-operational Connection again and see how it
behaves:
10> f(Channel), Channel = lib_amqp:start_channel(Connection).
** exception exit: {noproc,
{gen_server,call,[<0.33.0>,{open_channel,none,<<>>}]}}
in function gen_server:call/2
11> is_process_alive(Connection).
false
12>
Oh, it's dead! It didn't die at the time it received connection failure
from the broker, it died when we tried to use it a considerable time later.
Let's see if RabbitMQ client library actually notices this hard failure,
when no open channels are present:
16> f(), Connection = lib_amqp:start_connection("test-amq").
<0.66.0>
17> erlang:monitor(process, Connection).
#Ref<0.0.0.94>
18> R = fun() -> receive X -> X after 0 -> empty end end.
#Fun<erl_eval.20.67289768>
19> length(processes()).
35
20> % Killing the broker
20> length(processes()).
33
21> R().
empty
22> is_process_alive(Connection).
true
23>
Yes, it does! But it does not die "in full".
I believe the right behavior for the Connection process is to die
right after receiving some network failure, avoid waiting in
non-operational state and not allowing the Erlang monitoring to take
care of network issues between the client and the broker.
--
Lev Walkin
vlm at lionet.info
More information about the rabbitmq-discuss
mailing list