[rabbitmq-discuss] Windows RabbitMQ Crashes and Blue Screens under Load

Jerry Kuch jerryk at vmware.com
Wed Jan 18 21:08:41 GMT 2012


Hi, James...

Sorry, not yet meaningfully but it's on my list to hopefully get
to in the next couple of days...

Jerry

----- Original Message -----
From: "James Poole" <james.poole at rsa.com>
To: "James Poole" <james.poole at rsa.com>, "Jerry Kuch (VMware)" <jerryk at vmware.com>
Cc: rabbitmq-discuss at lists.rabbitmq.com
Sent: Wednesday, January 18, 2012 12:52:51 PM
Subject: RE: [rabbitmq-discuss] Windows RabbitMQ Crashes and Blue Screens under Load

Has anyone had a chance to investigate this crash?  I can re-send the repro source files if needed.

Thanks,
James

-----Original Message-----
From: Poole, James
Sent: Friday, January 13, 2012 3:45 PM
To: Kuch, Jerry (VMware)
Cc: rabbitmq-discuss at lists.rabbitmq.com
Subject: RE: [rabbitmq-discuss] Windows RabbitMQ Crashes and Blue Screens under Load

Jerry,

I have modified the EmitLog.java and ReceiveLogs.java from the tutorials on the website to reproduce the crash (attached).  If the mailing list strips these attachments out, just ping me if anyone wants a copy and I'll send them directly.

Both files will need to be modified to change the address on the factory.setHost() call to your specific broker, and you will need to pass the path to a 2 MB+ file as an argument to the EmitLog process.

Thanks for looking into this.

-James



-----Original Message-----
From: Jerry Kuch [mailto:jerryk at vmware.com]
Sent: Wednesday, January 11, 2012 1:44 PM
To: Poole, James
Cc: rabbitmq-discuss at lists.rabbitmq.com
Subject: Re: [rabbitmq-discuss] Windows RabbitMQ Crashes and Blue Screens under Load

James:  Out of curiousity have you tried the new 64-bit release of Erlang for Windows in your environment?  The address space size limitations of the 32-bit version have been associated with crashy Rabbits in the past (although bringing your memory high watermark value down so that the back-pressure mechanisms engage when the broker is in less trouble may help).  I think you can scare up the new Erlang here:

http://www.erlang.org/download/otp_win64_R15B.exe

Until recently there was no 64-bit Erlang, so even those running on 64-bit Windows boxes were still relegated to 32-bit VMs.

I am curious about the different results between a physical machine and a virtualized one, with one showing a "clean" Erlang VM crash and the other exhibiting a blue-screen, fatal OS-wrecker...

Is the traffic you're using to bring these systems down part of a large or proprietary app, or can you extract a bare minimum piece of code that brings the pain and share it with us?  If you could do the latter we could more easily investigate the situation within VMware since the difference in behavior between baremetal and virtualization is disquieting...

Best regards,
Jerry

----- Original Message -----
From: "james poole" <james.poole at rsa.com>
To: rabbitmq-discuss at lists.rabbitmq.com
Sent: Wednesday, January 11, 2012 10:32:23 AM
Subject: [rabbitmq-discuss] Windows RabbitMQ Crashes and Blue Screens under	Load





We’ve let loose one of our testing ninjas on RabbitMQ for load testing, and we’re consistently running into issues when the high memory watermark is hit.



Windows Server 2003 32-bit , Erlang R15B 32-bit, Rabbit 2.7.1



2,000 Consumers each with their own queue bound to a direct exchange

1 Producer, publishing a 2 MB message to the exchange, once every second, for a total of 50 seconds



Everything behaves as expected, until the memory footprint hits the high watermark, at which point:

On a physical machine: ERL process crashes and dump file is created

On a Virtual Machine: Blue Screen of Death is shown and server reboots



VM environment = VMware, Inc.® vCenter Lab Manager 4.0 (4.0.3.1318) 



One other note is that we see the same problem with ERL R14B04 and Rabbit 2.7.0.



I have looked through the log file and also turned on the console debug output, and nothing seems to be jumping out as an error. If needed, I can upload the minidump from the Blue Screen and the ERL crash dump file, just point me where to do it.



Let me know if there is anything else I can do to try and help get this fixed.







In the rabbit log, there are no errors, and only a few warnings 20 seconds before the crash:



=INFO REPORT==== 11-Jan-2012::10:55:53 ===

closing TCP connection <0.4405.0> from 10.6.64.104:57830



=WARNING REPORT==== 11-Jan-2012::10:55:53 ===

exception on TCP connection <0.20552.0> from 10.6.64.104:59521

connection_closed_abruptly





In the console output log file for the physical machine, this is the only message I see:



starting direct_client ...done

starting notify cluster nodes ...done



broker running

Eshell V5.9 (abort with ^G)

(rabbit at QEDLP082)1>

Crash dump was written to: C:/Documents and Settings/Administrator.QEDLP/Application Data/RabbitMQ/erl_crash.dump

eheap_alloc: Cannot allocate 6731340 bytes of memory (of type "heap").

in message_loop

win32sysinfo:Erlang has closed.




_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss at lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


More information about the rabbitmq-discuss mailing list