[rabbitmq-discuss] webmachine error
ceri at lshift.net
Fri Nov 8 11:40:53 GMT 2013
(08/11/13 11:26), Simon MacMullen wrote:
> On 08/11/2013 10:55AM, Ceri Storey wrote:
>> (08/11/13 08:27), Simon MacMullen wrote:
>>> The AMQP (and STOMP and MQTT) specs all say that names should be
>>> encoded in UTF-8 (and usually don't provide a way to specify an
>>> alternate encoding, so it's not obvious what an invalid UTF-8 byte
>>> sequence could be taken to mean).
>>> Unfortunately the broker does not enforce this. A future version
>>> probably will.
>> If we submit a patch for this, could we it get folded into a minor
>> release in the 3.2.x series? For example, a partial workaround would be
>> to percent-encode non-utf-8 queue names (and percents) before passing
>> them to the JSON machinary. I'd guess you'd be wary of changing the name
>> validation in a minor release, though.
> So there are two ways to fix this. Ultimately I think we want to do both:
> 1) Prevent AMQP (and STOMP and...) clients from sending badly-formed
> 2) Deal with such strings in mgmt if they are already there.
> We need this belt-and-braces approach since there are quite a lot of
> ways badly-formed strings could get into the broker (indeed the OP was
> using the UDP exchange).
> We currently have a work in progress for 1) (which would not get
> released until 3.3.0 since it could break weird clients). I was
> planning on working on 2) at some point and releasing it in a bugfix
> But if you want to contribute 2) instead that would help it happen
> faster. I'm not convinced that percent-encoding is the right thing to
> do though; certainly I don't think it lets you get round-trip access
> to things with badly formed names. If you plan on un-encoding strings
> as they come in to let that happen then all clients need to know this
> is how we work, otherwise they could send a string which is malformed
> from the POV of percent-encoding. Suddenly we don't speak normal JSON
> / URLs any more (and URL bits get percent encoded twice!)
I'd like to imagine that a client would just send back the string from
the JSON with appropriate encoding; but then again, my years of dealing
with the web tells me that is wildly optimistic.
> If it only goes one way then that's more workable - but you still have
> the double-percent-encoding thing, which is not very loveable. How do
> you distinguish "myqueue%80%80" which is the result of a badly-formed
> name from "myqueue%80%80" which just happens to be called that?
You'd need to include '%' in the set of things that are escaped, so you
end up with "myqueue%2580%2580", which looks pretty fugly. But you're
right that there's a big risk of breaking things unexpectedly. As for
clients that consume the Management API, I know of Michael Klishin's
approach of using using a plain HTTP client with string interpolation to
construct URLs. Do you know of any others?
> I would just swap in U+FFFD REPLACEMENT CHARACTER instead.
> Also bear in mind that this doesn't just happen with object names,
> basic.properties, arguments and so on can all contain badly-formed
You're right; it's potentially quite a rabbit hole.
> Apologies if any of the above is grandmother-sucking-eggs territory.
I honestly can't imagine that sucking eggs has anywhere as many wierd
corner cases as character encodings on the web :)
> Cheers, Simon
More information about the rabbitmq-discuss