[rabbitmq-discuss] Windows service recovery

Tim Watson watson.timothy at gmail.com
Mon May 28 14:51:54 BST 2012


On 28 May 2012, at 14:46, Lior Barnea wrote:

> "Based on this reading, it ought to be 'possible' to verify that an instance is still up, but I'm a little unclear on how you would confirm that the said program was the one that erlsrv 'ought' to be monitoring. "
> 
> This is even worse, cause if I run some other erl process and both erlsrv process and rabbitMQ's erl process goes down it is possible that erlsrv will fail to spawn rabbitMQ for no reason...
> 

Yes well that's just my reading of the erlsrv code. Today, erlsrv simply won't start, which is a different problem altogether.

> A possible solution - erlsrv will save (On disk for example) the PID the erl.exe it needs to monitor and on startup, if there is an erl.exe with the same PID, it is the one.

That's pretty much what standalone rabbit (rabbit-server) does now on unix, although the pid is actually created by the wrapper script which in turn runs `exec erl <....etc>`. This doesn't happen in the windows batch files, not am I aware of a mechanism for doing so in such scripts, so the behaviour you're describing for windows services clearly belongs in erlsrv.

Needless to say, you're going to need to have some conversations on the erlang-questions mailing list about this one, as we're not responsible for developing or maintaining erlsrv at all.

Cheers,
Tim

> 
> -----Original Message-----
> From: Tim Watson [mailto:tim at rabbitmq.com] 
> Sent: Monday, May 28, 2012 15:39
> To: Lior Barnea
> Cc: Emile Joubert; rabbitmq-discuss at lists.rabbitmq.com
> Subject: Re: [rabbitmq-discuss] Windows service recovery
> 
> On 28/05/12 13:38, Lior Barnea wrote:
>> Erlsrv is the service instance, acting as a watchdog (spawning it on startup) for erl.exe which runs rabbitMQ.
>> 
>> As I said before, the scenario in which erlsvr is going down without erl.exe is currently hypothetic but its behavior after its restarting because I killed it is wrong.
>> 
>> For my QA team: "Its unlikely that it will happens" equals "It can 
>> happen" :)
> 
> Yeah, you're right on both counts. The behaviour is wrong and 'it can happen' in theory. This question now needs to move over to the erlang-questions mailing list and get raised with the OTP team, as they're responsible for erlsrv and should be able to get it fixed.
> 
> From a brief glance at the (erlsrv) code, my reading of it is that there is no link between erlsrv and erl.exe other than the use of the fifo pipes provided by to_erl, used to handle interactions (such as shut down). Based on this reading, it ought to be 'possible' to verify that an instance is still up, but I'm a little unclear on how you would confirm that the said program was the one that erlsrv 'ought' to be monitoring.
> 
> 



More information about the rabbitmq-discuss mailing list