[rabbitmq-discuss] Cannot send message with STOMP

Fri Apr 27 14:12:45 BST 2012

Lionel,

> Without colon escaping, the header line "a:b:c" is ambiguous and can be
> parsed either as (name "a" & value "b:c") or (name "a:b" & value "c").

No. It's not ambiguous. It is the name "a" and value "b:c". Colons only
need to be escaped in the name, NOT in the value. The FIRST colon
delimits the name. That's it.

> The reason behind this being in the spec is that the broker may need to
> understand (= decode) what is in the header, for instance for envelope
> based routing (e.g. a topic exchange or JMS-style message selectors).

If the broker (or anyone else) stipulates they need a header value, they
define how it is encoded. If a client needs to know what it "means" they
will also know how to decode it.  UTF-8 strings are fine, but why force
everyone to use them?

>> Well, my point is we had (almost) enough already and 1.1 gave us less.
>> How is any binary OCTET sequence worse than any UTF-8 text string?
> 
> Because it lacks information on how to decode/interpret it.

No it doesn't. And saying it is UTF8 doesn't tell anyone anything useful
anyway. That information is implicit -- like UTF8 is implicit. If the
broker needs to pass headers to some other process/system it must
understand what format is required AND HOW THAT IS RENDERED IN UTF8. Raw
binary would be a nightmare, as would IEEE floating point, or object
references, or anything other than a String.

I repeat: there is no conceivable advantage in this rule, and it makes
life harder for brokers and clients.

Steve Powell
steve at rabbitmq.com
[wrk: +44-2380-111-528] [mob: +44-7815-838-558]

On 27 Apr 2012, at 11:45, Lionel Cons wrote:

> Steve Powell writes:
>> Here is the problem: you are forcing us to escape colons. Why?
> 
> Without colon escaping, the header line "a:b:c" is ambiguous and can be
> parsed either as (name "a" & value "b:c") or (name "a:b" & value "c").
> 
> In STOMP 1.1, the line above is invalid and, depending on the exact header,
> it should appear as "a:b\cc" or "a\cb:c".
> 
>> I really don't understand this. 1.1 limits headers to UTF-8 sequences.
>> There is no conceivable benefit in doing this -- all binary sequences
>> could have been used (with escapes for a few) but you chose to restrict
>> it.
> 
> This came under the assumption that headers usually are a table of text
> strings (as opposed to binary strings). With Unicode, most text strings of
> most languages can be expressed. With UTF-8, a standard encoding is defined
> and it is backward compatible with US-ASCII, that many people use.
> 
> The reason behind this being in the spec is that the broker may need to
> understand (= decode) what is in the header, for instance for envelope based
> routing (e.g. a topic exchange or JMS-style message selectors). If the
> encoding is not defined, how do I know how to encode/decode the lowercase e
> with acute accent? Should it be 0xE9 (ISO-8859-1) or 0xC3A9 (UTF-8) or
> 0x00E9 (UTF-16)?
> 
> UTF-8 was not the only solution but it was felt standard, widely available,
> compact and, last not least, backward compatible with US-ASCII. Another
> option would have been MIME-style encoding such as "=?ISO-8859-1?Q?a?="
> which is more flexible (allows per header entry encoding) but quite noisy
> and probably far less easy to implement than UTF-8. Maybe this will be
> reconsidered in a later spec.
> 
>> Well, my point is we had (almost) enough already and 1.1 gave us less.
>> How is any binary OCTET sequence worse than any UTF-8 text string?
> 
> Because it lacks information on how to decode/interpret it.
> 
> Cheers,
> 
> Lionel