[rabbitmq-discuss] RabbitMQ broker crashing under heavy load with mirrored queues
Venkat
vveludan at gmail.com
Mon Jan 23 00:39:27 GMT 2012
Hi Steve, sorry for the delayed response. Please find the following:
is it still true that the rabbitmqctl report
> command is failing on t-2? I wondered if this might be something to do with the
> user you are running the rabbitmqctl command under? Try it with and without
> sudo, for example.
I was running the command without sudo. I will try with sudo and let
you know.
> Your message rate of 40k with only one exception/loss is good, isn't it? I'm not
> certain that the recreate connection code you have used is all necessary, but if
> it works for you, that's fine. What made you put in a 2-second delay (why a
> delay and why 2 seconds)?
Steve in HA Proxy config the check interval was set to 2 seconds.
Therefore I have set 2 seconds delay.
While posting 40K messages, one message was sent with retry option
when I brought down the broker. In other words all 40K messages were
posted to the queue.
But the consumer was losing 4K to 5K messages. I have run several
tests. Consistently 4K-5K messages were lost.
Finally I used channel transaction while posting messages as follows:
@Bean
public RabbitTemplate rabbitTemplate() {
RabbitTemplate template = new RabbitTemplate(connectionFactory());
template.setChannelTransacted(true);
template.setMessageConverter(messageConverter());
configureMDBTemplate(template);
return template;
}
Steve I am not sure if I could use confirms with spring-amqp. That's
why I used channel transaction.
Even after using channel transaction, 4K to 5K messages were lost.
This loss was consistent from 10-15 runs.
I have verified the loss of messages having the queue consumer stopped
so that I could track received message count.
Thanks
Venkat
On Jan 16, 10:21 am, Steve Powell <st... at rabbitmq.com> wrote:
> Hi Venkat,
>
> I'm not at all sure what is happening with the cluster nodes. But it is hard to
> tell with the information provided. It looks as though your nodes are both
> running as disc nodes happily -- is it still true that the rabbitmqctl report
> command is failing on t-2? I wondered if this might be something to do with the
> user you are running the rabbitmqctl command under? Try it with and without
> sudo, for example.
>
> Your message rate of 40k with only one exception/loss is good, isn't it? I'm not
> certain that the recreate connection code you have used is all necessary, but if
> it works for you, that's fine. What made you put in a 2-second delay (why a
> delay and why 2 seconds)?
>
> The only other thing I might suggest is that you investigate publisher confirms.
> This is a lightweight way of knowing that a publish actually got through to the
> rabbitmq node and was successfully passed on (or stored). See
> [http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/] for an
> introduction using Java, and
> [http://www.rabbitmq.com/extensions.html#publishing] for the AMQP details. It
> may be just what you want to know when wondering if your message is lost.
>
> Steve Powell (a happy bunny)
> ----------some more definitions from the SPD----------
> avoirdupois (phr.) 'Would you like peas with that?'
> distribute (v.) To denigrate an award ceremony.
> definite (phr.) 'It's hard of hearing, I think.'
> modest (n.) The most mod.
>
> On 13 Jan 2012, at 05:03, Venkat wrote:
>
> (You did start_app after the cluster command, didn't you??? :-))
>
> Hi Steve I did restart the the app.
> Following are the steps I have performed on both nodes:
>
> Starting the second node t-4:
> ./rabbitmq-server -detached
>
> Steps to join t-4 node to t-2:
> /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/rabbitmqctl stop_app
> /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/rabbitmqctl reset
> /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/rabbitmqctl cluster rabbit at t-2 rabbit at t-4
> Clustering node 'rabbit at t-4' with ['rabbit at t-2',
> 'rabbit at t-4'] ...
> ...done.
> /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/rabbitmqctl start_app
> Starting node 'rabbit at t-4' ...
> ...done.
>
> Running cluster_status on t-4 node:
> [ecloud at t-4 sbin]$ /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/rabbitmqctl cluster_status
> Cluster status of node 'rabbit at t-4' ...
> [{nodes,[{disc,['rabbit at t-4','rabbit at t-2']}]},
> {running_nodes,['rabbit at t-2','rabbit at t-4']}]
> ...done.
>
> Running cluster_status on t-2 node (to which t-4 is joined):
> [ecloud at t-2 vv]$ /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/rabbitmqctl cluster_status
> Cluster status of node 'rabbit at t-2' ...
> [{nodes,[{disc,['rabbit at t-4','rabbit at t-2']}]},
> {running_nodes,['rabbit at t-4','rabbit at t-2']}]
> ...done.
>
> --------------------------------------------------------------------------- -----
> I have been testing with HA feature with different scenario.
> In my previous test the messages were pumped in with a SOAP service.
> This was pumping messages at slow rate.
> I have used a test that pumps in messages by calling plain Java
> Service. I have also increased messages pumping in from 20K to 40K.
> I am finding that messages are lost while pumping into the queue.
> As you mentioned earlier this could be due to connecting to dead
> broker.
> I modified the producer code by giving 2 seconds lapse of time and
> setting a fresh ConnectionFactory as follows:
>
> @Override
> public void convertAndSend(final Object message) throws AmqpException
> {
> MessageProperties props = null;
> try {
> props = new MessageProperties();
> props.setDeliveryMode(MessageDeliveryMode.PERSISTENT); //setting delivery mode as PERSISTENT
> send(getMessageConverter().toMessage(message, props));
> } catch (AmqpException amqpe) {
> System.out.println("Exception occurred while sending:"+amqpe.getMessage());
> try {
> Thread.sleep(2000);
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> Properties props1 = FrameworkServiceLocator.getInstance().
> getCommonsConfigurationService(ServiceConstants.DMB_COMMONS_CONFIG_SERVICE) .
> getProperties(CommonsConfigurationConstants.RABBIT_MQ_CONFIG_NAME);
> String rabbitMQUser = props1.getProperty(CommonsConfigurationConstants.RABBITMQ_USER);
> String rabbitMQPassword = props1.getProperty(CommonsConfigurationConstants.RABBITMQ_PASSWORD);
> String rabbitMQHost = props1.getProperty(CommonsConfigurationConstants.RABBITMQ_HOST);
> String rabbitMQChannelCacheSize = props1.getProperty(CommonsConfigurationConstants.RABBITMQ_CHANNEL_CACHE_SIZ E);
> CachingConnectionFactory connectionFactory = new CachingConnectionFactory(rabbitMQHost);
> connectionFactory.setChannelCacheSize(Integer.parseInt(rabbitMQChannelCache Size));
> connectionFactory.setUsername(rabbitMQUser);
> connectionFactory.setPassword(rabbitMQPassword);
> setConnectionFactory(connectionFactory);
> try {
> send(getMessageConverter().toMessage(message, props));
> } catch(AmqpException e1) {
> e1.printStackTrace();
> }
> }
>
> }
>
> After this change is made, I saw an exception occurred once while
> sending 40K messages which is as follows:
> java.net.SocketException: Broken pipe.
> I have run the test 10-15 times each time 5K-6K messages were lost
> but this exception was occurring only once.
>
> Thanks
> Venkat
>
> On Jan 11, 12:55 pm, Steve Powell <st... at rabbitmq.com> wrote:
> Hi Venkat,
>
> This time there were no messages lost. All 20K messages were
> processed.
>
> That's great.
>
> I'm trying to figure out what might be wrong with
> rabbitmqctl report; I'll get back to you.
>
> Meanwhile, running
> rabbitmqctl -n rabbit at t-2 status
> ON NODE t-4 might be interesting.
>
> Also, can you tell us the output from
> rabbitmqctl cluster_status
> on both nodes, please.
>
> It is not clear if you have issued the stop_app and start_app and
> reset/force_reset commands properly (you probably have), so could you follow
> the steps as described in the clustering guide, and issue
> rabbitmqctl cluster_status on both nodes after each cluster change?
> We should be able to see where things went wrong, then.
>
> (You did start_app after the cluster command, didn't you??? :-))
>
> Cheers,
>
> Steve Powell (a hoppy bunny)
> ----------some more definitions from the SPD----------
> avoirdupois (phr.) 'Would you like peas with that?'
> distribute (v.) To denigrate an award ceremony.
> definite (phr.) 'It's hard of hearing, I think.'
> modest (n.) The most mod.
>
> On 11 Jan 2012, at 01:22, Venkat wrote:> ...
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
More information about the rabbitmq-discuss
mailing list