[rabbitmq-discuss] millions of unack'd messages in a day-- disk store instead of ram?

Sun May 10 13:02:52 BST 2009

Hi Brian,

On Fri, May 08, 2009 at 10:41:07AM -0700, Brian Sullivan wrote:
>> [snip] Beware though that the broker you are using could have internal
>> messages timeout on it - this was fixed in bug 20546 which is in our
>> "default" development branch, not the stable "v1_5" branch. As such,  
>> you
>> may see messages coming out of the broker which indicate strange  
>> timeouts
>> occurred. [snip]
>
> Would this timeout error you mention possibly be the reason I have seen 
> this error crop up once every few days?  If so, do you know when the fix 
> will make it to the stable branch?
>
> =INFO REPORT==== 1-May-2009::16:29:00 ===
> starting TCP connection <0.17070.115> from 10.0.4.56:51169
>
> =ERROR REPORT==== 1-May-2009::19:09:53 ===
> connection <0.17070.115> (running), channel 1 - error:
> {{timeout,{gen_server,call,[<0.17078.115>,stat]}},

Yep. That is a synchronous call within rabbit timing out. By default the
OTP platform sets these timeouts to be 5 seconds. When a machine is very
heavily loaded, it's very possible for calls to take more than 5 seconds
to complete. We have changed all the timeouts to 'infinity' now because
doing so allows rabbit to continue to operate successfully in heavily
loaded situations and also because we don't think there's any value in
trying to catch such timeout errors - it's not like there's something
other than retrying that we could do when a timeout like this occurs, thus
it's better to just remove the timeout completely.

I'd be interested to know if there are any specific circumstances which
cause your broker to emit these timeout errors.

Wrt these changes landing in a stable branch, it's very unlikely they'll
hit in 1.5. They will, I suspect, be in 1.6, but we don't have a timetable
as to when we're planning on forking that. Sorry I can't help more.

Matthew