[rabbitmq-discuss] Having some issues with RabbitMQ

Sun Jul 18 01:31:45 BST 2010

Thanks Matthew, comments inline.

On Jul 17, 2010, at 4:14 PM, Matthew Sackman wrote:

> Hi Christian,
> 
> On Sat, Jul 17, 2010 at 02:54:11PM -0700, Christian Legnitto wrote:
>> http://bit.ly/9P3IEX
> 
> I replied on your blog with (mostly) the following, but it's easier to
> keep the discussion on the mailing list.
> 
> The new persister branch is regularly merged into from the current
> default branch. Whilst it is correct to say it is currently based off
> the 1.8 release, that’s only true in the sense of the above. The new
> persister stores messages in a completely different format from the old,
> and there is currently no tool to allow upgrading from a released
> version of Rabbit to the new persister without losing persistent
> messages.
> 
> The issue you ran into when going 1.7 to 1.8 is subtly different. Whilst
> both use the old persister, both the on-disk format of the messages when
> they are written to disk, and a database schema changed, again,
> resulting in no state-maintaining upgrade path. To date we have never
> produced a tool which can do upgrades maintaining state when database
> schema or on-disk formats have changed.

Right, I understand this. I eventually made sure to delete the entire /var/db/rabbitmq (?) directory when changing versions...I didn't care about the messages in there anyway. I'm saying even after clearing out the data it didn't seem to be using the new persister (or, the broker behavior looked the same).

> You talk about users creating queues. I think that what you want is for
> all users to use queues which have server-generated names, thus you
> guarantee they are private, and you want to declare them “exclusive”,
> which means that when the connection that created the queue disappears,
> the queue itself (and any bindings to the queue) also automatically get
> deleted.
> 
> I quote this text from
> http://www.rabbitmq.com/admin-guide.html#access-control:
> 
> “Some AMQP operations can create resources with server-generated names.
> Every user has configure, write and read permissions for such resources.
> However, the names are strong and not discoverable as part of the
> protocol, only through management functionality. Therefore these
> resources are in effect private to the user unless they choose to
> dilvuge their names to other users.”
> 
> Thus I think that if you force users to create server-named queues, you
> don’t need to grant any write privileges to your public user. It’ll need
> read access to the exchange to create the binding, and it should
> automatically have write access to create the binding to the private
> queues. If this doesn’t work please let us know.

I saw that and I'm not sure that is what I want. There will be users creating queues with scripts running locally on laptops and such. With your suggestion I believe this would happen:

	1. User has a local script "foo.py" running on a laptop, which connects ("connection1"), gets a queue with a server-defined name (call it "queue1") and reads messages
	2. User goes to a meeting and the laptop sleeps, causing RabbitMQ to close connection1
	3. User comes back, foo.py is running but has thrown an exception (or perhaps is listening on a dead connection, not sure what happens here)
	4. User restarts foo.py, getting a new connection ("connection2") to a new server-generated queue ("queue2")
	5. All messages sent between the closing of connection1 and creation of connection2 never make it into queue2, so foo.py possibly missed a bunch of messages

If I am mistaken, please correct me..this is all new :-)

What I want:

	1. User has a local script "foo.py" running on a laptop, which connects ("connection1"), gets a queue with a a script-defined unique name (call it "queue1") and reads messages
	2. User goes to a meeting and the laptop sleeps, causing RabbitMQ to close connection1
	3. User comes back, foo.py is running but has thrown an exception (or perhaps is listening on a dead connection, not sure what happens here)
	4. User restarts foo.py, getting a new connection ("connection2") and connecting to the existing queue1, which has been queueing messages while connection1 has been closed
	5. foo.py can choose to empty queue1 before processing (if it is the sort of script that doesn't maintain state) or can choose to process the "old" messages

So, if the server takes care of creating the queue, there is no way for the client to tell it to reconnect when it comes back (and no queue will be there anyway as the server has cleared it). Creating a named queue takes write permission, correct? If so that means my public user could send (possibly nefarious) messages in for others to consume, correct? Is this use-case a non-starter with the current permissions system? Like I said, this appeared to work with 1.7 with read-only access for the public user.

> The 1.8 semantic changes concern what happens when you *re*declare a
> queue. Previously, if the queue already existed and you redeclare it,
> but with different attributes, it would still come back with an OK
> result. This is misleading because it could lead the user to think that
> a queue had been created with the specified attributes when in fact it
> has not. Thus now, you must ensure you redeclare with the same
> attributes as created the queue otherwise the redeclaration will fail
> and close the channel. Full details can be found in the lower half of
> http://lists.rabbitmq.com/pipermail/rabbitmq-announce/2010-June/000025.html

Ok, I was aware of this because I had hit it in testing :-). I guess before it would silently just use the previous/first declaration and now it returns an error? This is fine, as my usecase is there are no changes between connection1 and connection2 above (of course they could change the attributes and name it something else if needed)

> 
> I am very curious about you managing to get the new persister to crash.
> Could you provide the rabbit logs (or the end of the logs) which should
> show some sort of stack trace? The new persister just *should not*
> crash.

It doesn't actually crash (bad choice of words on my part). The whole server is hung from the standpoint of any client I can use (python mainly). The BQL client won't even connect or give me a prompt at that point, it just hangs. amqp-utils (ruby, but may use the same lib as python) hangs and doesn't let me do anything. I end up stopping the server, clearing the data directory, and restarting it (which clearly wouldn't work in production). FWIW I got the same behavior with the old persister, which is why I thought I perhaps wasn't turning the new one on even though I am using the branch.

> One thing that might be happening is that Rabbit is raising flow
> control, to request that publishers stop sending further messages to
> Rabbit – even with the new persister this can happen sometimes to allow
> disks to catch up, but this tends to only be necessary at high data
> rates. The client must respond with a flow_ok message to the broker to
> confirm it understands the flow control, and it must then not send any
> further messages – this is usually handled by the AMQP client library as
> it just makes any subsequent publishes block – until it receives a
> further flow control message from the broker, informing it it can
> resume. Now I notice you’re using a python client, and they have
> historically not supported flow control, which can lead to Rabbit
> forcibly disconnecting clients that do not respond appropriately to the
> flow control messages. This may crash a badly written client, but it
> should not crash the broker itself.

Yeah, I saw that flow control is generally not supported by the python libs (though I see http://gist.github.com/399282), but I'm not sure I would have hit it with ~30 msgs per second going to 10 queues.
The fact that the BQL plugin stopped working was suspect. I'm not sure how that's written though, but I assumed it used the erlang client and would allow me to clear queues and get everything unblocked. So even though it isn't a crash, the behavior is the same....I can't read anything, publish anything, or clear queues to unblock, via carrot (python), amqp-utils (ruby), and the BQL plugin (erlang I guess). Perhaps all the libs I am using to interact with the broker barf with flow control?

I also notice the status plugin says "memory (used/available) = 1498MB / 810MB" with the new persister...is that expected? I thought that it would always stay under the max and just flush to disk. Is my VM too wimpy?

> You have a very interesting use case, and there is absolutely nothing
> about it that shouldn’t work perfectly well with RabbitMQ.

Whew, that's good to hear. There may be some administration-type stuff that moves us off Rabbit, but I am enjoying working with it on the prototype system in the meantime and would love to get it solid for us.

Thanks again for the quick and detailed reply!

Christian