[rabbitmq-discuss] Windows service recovery
Barnea at 3i-mind.com
Mon May 28 14:46:24 BST 2012
"Based on this reading, it ought to be 'possible' to verify that an instance is still up, but I'm a little unclear on how you would confirm that the said program was the one that erlsrv 'ought' to be monitoring. "
This is even worse, cause if I run some other erl process and both erlsrv process and rabbitMQ's erl process goes down it is possible that erlsrv will fail to spawn rabbitMQ for no reason...
A possible solution - erlsrv will save (On disk for example) the PID the erl.exe it needs to monitor and on startup, if there is an erl.exe with the same PID, it is the one.
From: Tim Watson [mailto:tim at rabbitmq.com]
Sent: Monday, May 28, 2012 15:39
To: Lior Barnea
Cc: Emile Joubert; rabbitmq-discuss at lists.rabbitmq.com
Subject: Re: [rabbitmq-discuss] Windows service recovery
On 28/05/12 13:38, Lior Barnea wrote:
> Erlsrv is the service instance, acting as a watchdog (spawning it on startup) for erl.exe which runs rabbitMQ.
> As I said before, the scenario in which erlsvr is going down without erl.exe is currently hypothetic but its behavior after its restarting because I killed it is wrong.
> For my QA team: "Its unlikely that it will happens" equals "It can
> happen" :)
Yeah, you're right on both counts. The behaviour is wrong and 'it can happen' in theory. This question now needs to move over to the erlang-questions mailing list and get raised with the OTP team, as they're responsible for erlsrv and should be able to get it fixed.
From a brief glance at the (erlsrv) code, my reading of it is that there is no link between erlsrv and erl.exe other than the use of the fifo pipes provided by to_erl, used to handle interactions (such as shut down). Based on this reading, it ought to be 'possible' to verify that an instance is still up, but I'm a little unclear on how you would confirm that the said program was the one that erlsrv 'ought' to be monitoring.
More information about the rabbitmq-discuss