[rabbitmq-discuss] RabbitMQ stopping when trying to delete locked mnesia files?
simon at rabbitmq.com
Wed May 30 11:49:54 BST 2012
On 29/05/12 13:01, Øyvind Tjervaag wrote:
> I've had this problem a couple of times now where it seems, if I'm
> deciphering the logs correctly, the RabbitMQ crashed when it tries to
> delete mnesia files that are locked by backup or virus-scanning. I
> thought I had seen something about this on this list a while back,
> but I can't seem to find it now.
That's a correct decipherment.
> I'm running RabbitMQ version 2.8.2 and Erlang R15B01 on Windows 2008
> R2 (64 bit). Now, I've told the IT-ops-people to stop locking any
> files in the mnesia folder, but I think RabbitMQ should not fail this
> badly when it tries deleting a locked file?
Hmm. We do in general assume that file operations on files owned by
Rabbit will succeed; it's hard to know in the general case what else to do.
In the crash that you're seeing, Rabbit was not able to delete an old
file from the message store. That's a *comparatively* benign event, but
the question of what Rabbit should do in this case is still not obvious.
Should it ignore the fact that the delete failed (and thus leak that
file)? Should it maintain records of which deletions have failed with
the intent of retrying them later? (And does that get persisted?) Should
the message store hang until the file can be deleted? (That could be a
long time, and you won't accept any new persistent messages until then.)
But it's worse than that - while Rabbit in general tries to open files
and keep them open, it will close and reopen files when it is running
low on file descriptors. If reopening (for example) a queue index file
fails, it's *really* not obvious what our plan B could be.
Rabbit really needs to know that its files are under its control.
More information about the rabbitmq-discuss