[rabbitmq-discuss] webmachine error
Simon MacMullen
simon at rabbitmq.com
Fri Nov 8 11:26:21 GMT 2013
On 08/11/2013 10:55AM, Ceri Storey wrote:
> (08/11/13 08:27), Simon MacMullen wrote:
>> The AMQP (and STOMP and MQTT) specs all say that names should be
>> encoded in UTF-8 (and usually don't provide a way to specify an
>> alternate encoding, so it's not obvious what an invalid UTF-8 byte
>> sequence could be taken to mean).
>>
>> Unfortunately the broker does not enforce this. A future version
>> probably will.
> If we submit a patch for this, could we it get folded into a minor
> release in the 3.2.x series? For example, a partial workaround would be
> to percent-encode non-utf-8 queue names (and percents) before passing
> them to the JSON machinary. I'd guess you'd be wary of changing the name
> validation in a minor release, though.
So there are two ways to fix this. Ultimately I think we want to do both:
1) Prevent AMQP (and STOMP and...) clients from sending badly-formed
strings.
2) Deal with such strings in mgmt if they are already there.
We need this belt-and-braces approach since there are quite a lot of
ways badly-formed strings could get into the broker (indeed the OP was
using the UDP exchange).
We currently have a work in progress for 1) (which would not get
released until 3.3.0 since it could break weird clients). I was planning
on working on 2) at some point and releasing it in a bugfix release.
But if you want to contribute 2) instead that would help it happen
faster. I'm not convinced that percent-encoding is the right thing to do
though; certainly I don't think it lets you get round-trip access to
things with badly formed names. If you plan on un-encoding strings as
they come in to let that happen then all clients need to know this is
how we work, otherwise they could send a string which is malformed from
the POV of percent-encoding. Suddenly we don't speak normal JSON / URLs
any more (and URL bits get percent encoded twice!)
If it only goes one way then that's more workable - but you still have
the double-percent-encoding thing, which is not very loveable. How do
you distinguish "myqueue%80%80" which is the result of a badly-formed
name from "myqueue%80%80" which just happens to be called that?
I would just swap in U+FFFD REPLACEMENT CHARACTER instead.
Also bear in mind that this doesn't just happen with object names,
basic.properties, arguments and so on can all contain badly-formed strings.
Apologies if any of the above is grandmother-sucking-eggs territory.
Cheers, Simon
More information about the rabbitmq-discuss
mailing list