[rabbitmq-discuss] webmachine error

Fri Nov 8 11:26:21 GMT 2013

On 08/11/2013 10:55AM, Ceri Storey wrote:
> (08/11/13 08:27), Simon MacMullen wrote:
>> The AMQP (and STOMP and MQTT) specs all say that names should be
>> encoded in UTF-8 (and usually don't provide a way to specify an
>> alternate encoding, so it's not obvious what an invalid UTF-8 byte
>> sequence could be taken to mean).
>>
>> Unfortunately the broker does not enforce this. A future version
>> probably will.
> If we submit a patch for this, could we it get folded into a minor
> release in the 3.2.x series? For example, a partial workaround would be
> to percent-encode non-utf-8 queue names (and percents) before passing
> them to the JSON machinary. I'd guess you'd be wary of changing the name
> validation in a minor release, though.

So there are two ways to fix this. Ultimately I think we want to do both:

1) Prevent AMQP (and STOMP and...) clients from sending badly-formed 
strings.

2) Deal with such strings in mgmt if they are already there.

We need this belt-and-braces approach since there are quite a lot of 
ways badly-formed strings could get into the broker (indeed the OP was 
using the UDP exchange).

We currently have a work in progress for 1) (which would not get 
released until 3.3.0 since it could break weird clients). I was planning 
on working on 2) at some point and releasing it in a bugfix release.

But if you want to contribute 2) instead that would help it happen 
faster. I'm not convinced that percent-encoding is the right thing to do 
though; certainly I don't think it lets you get round-trip access to 
things with badly formed names. If you plan on un-encoding strings as 
they come in to let that happen then all clients need to know this is 
how we work, otherwise they could send a string which is malformed from 
the POV of percent-encoding. Suddenly we don't speak normal JSON / URLs 
any more (and URL bits get percent encoded twice!)

If it only goes one way then that's more workable - but you still have 
the double-percent-encoding thing, which is not very loveable. How do 
you distinguish "myqueue%80%80" which is the result of a badly-formed 
name from "myqueue%80%80" which just happens to be called that?

I would just swap in U+FFFD REPLACEMENT CHARACTER instead.

Also bear in mind that this doesn't just happen with object names, 
basic.properties, arguments and so on can all contain badly-formed strings.

Apologies if any of the above is grandmother-sucking-eggs territory.

Cheers, Simon