[rabbitmq-discuss] Client API error recovery

Lev Walkin vlm at lionet.info
Sat Apr 11 15:35:24 BST 2009


Here I continue to check the Erlang Client library.

Suppose I want to establish a connection with a remote broker, and 
subscribe to some channel events. How do I ensure the broker is still 
there and have not actually disappeared?

Well, let's try stopping broker using `rabbitmqctl stop` and see what we 
can discover by monitoring the processes the Erlang way:

1> Connection = lib_amqp:start_connection("test-amq").
<0.33.0>
2> Channel = lib_amqp:start_channel(Connection).
<0.40.0>
3> {erlang:monitor(process, Connection), erlang:monitor(process, Channel)}.
{#Ref<0.0.0.41>,#Ref<0.0.0.42>}
4> R = fun() -> receive X -> X after 0 -> empty end end.
#Fun<erl_eval.20.67289768>
5> % Doing `rabbitmqctl stop` on the broker.
Broker forced connection: 320 -> <<"CONNECTION_FORCED - broker forced 
connection closure with reason 'shutdown'">>
Channel 1 is shutting down due to: normal
5> R().
{'DOWN',#Ref<0.0.0.41>,process,<0.33.0>,normal}
6> R().
{'DOWN',#Ref<0.0.0.42>,process,<0.40.0>,normal}
7> R().
empty
8>


So far so good: if the broker disappears, both Connection and Channel 
disappear as well. It seems I can just assume that if I receive a 'DOWN' 
message on a Connection, I must re-establish it.

But let's try doing a hard failure: killing the RabbitMQ beam using 
`killall beam` on the broker machine. This is a realistic failure scenario.

1> Connection = lib_amqp:start_connection("amq1").
<0.33.0>
2> Channel = lib_amqp:start_channel(Connection).
<0.40.0>
3> {erlang:monitor(process, Connection), erlang:monitor(process, Channel)}.
{#Ref<0.0.0.41>,#Ref<0.0.0.42>}
4> R = fun() -> receive X -> X after 0 -> empty end end.
#Fun<erl_eval.20.67289768>
5> % Doing `killall beam` on the broker.
Channel 1 is shutting down due to: normal
5> R().
{'DOWN',#Ref<0.0.0.42>,process,<0.40.0>,normal}
6> R().
empty
7> is_process_alive(Connection).
true
8>

You see, the channel is dead, but connection is still alive. What can we 
do with this connection? Can we start RabbitMQ, open a new Channel and 
see if it has reconnected automatically?

8> f(Channel).
ok
9> Channel = lib_amqp:start_channel(Connection).
Channel 1 is shutting down due to: {writer,send_failed,badarg}

=ERROR REPORT==== 11-Apr-2009::07:23:49 ===
** Generic server <0.33.0> terminating
** Last message in was {open_channel,none,<<>>}
** When Server state == {connection_state,"guest","guest","test-amq",
                             #Port<0.488>,<<"/">>,<0.37.0>,<0.38.0>,0,0,
                             amqp_network_driver,
                             {dict,0,16,16,8,80,48,
 
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  []},
 
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]}}}}
** Reason for termination ==
** {normal,{gen_server,call,[<0.50.0>,{call,{'channel.open',<<>>}}]}}
** exception exit: {{normal,
                         {gen_server,call,
                             [<0.50.0>,{call,{'channel.open',<<>>}}]}},
                     {gen_server,call,[<0.33.0>,{open_channel,none,<<>>}]}}
      in function  gen_server:call/2
10>

No, it hasn't. The Connection we have been holding to all this time has 
not been operational. It was alive, in Erlang process terms, but it is 
useless because we cannot use it. Moreover, we cannot discover that the 
connection is dead unless we try to use it.

But let's try using that non-operational Connection again and see how it 
behaves:

10> f(Channel), Channel = lib_amqp:start_channel(Connection).
** exception exit: {noproc,
 
{gen_server,call,[<0.33.0>,{open_channel,none,<<>>}]}}
      in function  gen_server:call/2
11> is_process_alive(Connection).
false
12>

Oh, it's dead! It didn't die at the time it received connection failure 
from the broker, it died when we tried to use it a considerable time later.

Let's see if RabbitMQ client library actually notices this hard failure, 
when no open channels are present:

	16> f(), Connection = lib_amqp:start_connection("test-amq").
	<0.66.0>
	17> erlang:monitor(process, Connection).
	#Ref<0.0.0.94>
	18> R = fun() -> receive X -> X after 0 -> empty end end. 

	#Fun<erl_eval.20.67289768>
	19> length(processes()).
	35
	20> % Killing the broker
	20> length(processes()).
	33
	21> R().
	empty
	22> is_process_alive(Connection).
	true
	23>

Yes, it does! But it does not die "in full".

I believe the right behavior for the Connection process is to die
right after receiving some network failure, avoid waiting in
non-operational state and not allowing the Erlang monitoring to take
care of network issues between the client and the broker.


-- 
Lev Walkin
vlm at lionet.info




More information about the rabbitmq-discuss mailing list