Discussion:
[xdg-email] Percent encoding problems
Michael Bäuerle
2014-08-20 13:46:03 UTC
Permalink
The reported version number is 1.0.2, the source package is named
1.1.0-rc1 for its toplevel directory.

The first thing is only a side note:
The 'xdg-email' script request '/bin/sh' as interpreter,
according to [1] it should do the comparison with "=" instead
of "==" to be portable.

The real problem is the handling of e-mail addresses:
If the address contains a '-', it is percent encoded (even if this is
not required). But if it contains a '?' this stay a literal '?' (what
is not allowed inside a "mailto:" URI according to [2]).

The problem is that the '-' character is used literally inside a regex
bracket expression (where it has the meaning of a range, like in "a-z"),
look at [3] (Paragraph 7) for the syntax definition.
The resulting range spans over the '?' character and prevents its
percent encoding.

The suggested patches are attached. The '-' is escaped with a backslash.
Maybe the solution from [3] to put it at the beginning or the end is
cleaner but less obvious.
In [3] there is also a note that ranges are only guaranteed to work as
expected in the POSIX locale. Therefore the patch sets the locale to
POSIX before starting awk.


[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
[2] https://tools.ietf.org/html/rfc2368#section-2
[3]
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
Rex Dieter
2014-08-23 14:10:24 UTC
Permalink
Next time, please file a bug.

For now, I think all of these issues are already fixed in 1.1.0-rc2 (and
latest git), except for escaping the '-' character, which I will commit
shortly.

Thanks.

-- Rex
Post by Michael Bäuerle
The reported version number is 1.0.2, the source package is named
1.1.0-rc1 for its toplevel directory.
The 'xdg-email' script request '/bin/sh' as interpreter,
according to [1] it should do the comparison with "=" instead
of "==" to be portable.
If the address contains a '-', it is percent encoded (even if this is
not required). But if it contains a '?' this stay a literal '?' (what
is not allowed inside a "mailto:" URI according to [2]).
The problem is that the '-' character is used literally inside a regex
bracket expression (where it has the meaning of a range, like in "a-z"),
look at [3] (Paragraph 7) for the syntax definition.
The resulting range spans over the '?' character and prevents its
percent encoding.
The suggested patches are attached. The '-' is escaped with a backslash.
Maybe the solution from [3] to put it at the beginning or the end is
cleaner but less obvious.
In [3] there is also a note that ranges are only guaranteed to work as
expected in the POSIX locale. Therefore the patch sets the locale to
POSIX before starting awk.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
[2] https://tools.ietf.org/html/rfc2368#section-2
[3]
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
Rex Dieter
2014-08-23 14:16:49 UTC
Permalink
Post by Michael Bäuerle
The problem is that the '-' character is used literally inside a regex
bracket expression (where it has the meaning of a range, like in "a-z"),
look at [3] (Paragraph 7) for the syntax definition.
The resulting range spans over the '?' character and prevents its
percent encoding.
Testing myself, I cannot reproduce the problem you describe. Can you give a
reproducible example?

I tried:

xdg-email rdieter-***@bar.com

and my email client opens correctly (with rdieter-***@bar.com as expected,
instead of something percent-encoded as you suggested would happen).

Fwiw, it happens the same for me whether I include your suggested fix or
not.

-- Rex
Rex Dieter
2014-08-23 16:30:14 UTC
Permalink
Post by Rex Dieter
Post by Michael Bäuerle
The problem is that the '-' character is used literally inside a regex
bracket expression (where it has the meaning of a range, like in "a-z"),
look at [3] (Paragraph 7) for the syntax definition.
The resulting range spans over the '?' character and prevents its
percent encoding.
Testing myself, I cannot reproduce the problem you describe. Can you give
a reproducible example?
instead of something percent-encoded as you suggested would happen).
Fwiw, it happens the same for me whether I include your suggested fix or
not.
My mail client testing this (thunderbird) was handling the percent-encoded
input for me :-/

After adding some extra debugging I do see that unpatched code was passing
on:
mailto:rdieter%***@bar.com


Which mail client(s) did you use that didnt handle this?
Michael Bäuerle
2014-08-25 08:34:47 UTC
Permalink
Post by Rex Dieter
Post by Rex Dieter
Post by Michael Bäuerle
The problem is that the '-' character is used literally inside a regex
bracket expression (where it has the meaning of a range, like in "a-z"),
look at [3] (Paragraph 7) for the syntax definition.
The resulting range spans over the '?' character and prevents its
percent encoding.
Testing myself, I cannot reproduce the problem you describe. Can you give
a reproducible example?
instead of something percent-encoded as you suggested would happen).
Fwiw, it happens the same for me whether I include your suggested fix or
not.
My mail client testing this (thunderbird) was handling the percent-encoded
input for me :-/
After adding some extra debugging I do see that unpatched code was passing
Which mail client(s) did you use that didnt handle this?
I got a report from a user where the "-" don't work with Thunderbird
on an Ubuntu system. Don't ask me why Thunderbird don't like it. And
yes, your example above is valid and it is the fault of Thunderbird to
not decode it correctly.

The reason why I reported the problem is that it's not the intent of
xdg-email to encode the '-' (the missing backslash). Letting awk
interpret the backslash as range breaks the encoding of some characters
inside the range that are not allowed to appear literally. And this is
a bug in xdg-email, see below.

I use claws-mail without a desktop environment. Therefore I have created
a 'xdg-email-hook.sh' script with the following content:
----------------------------------------------------------------------
#! /bin/sh
echo "xdg-email-hook: $1"
claws-mail $1
----------------------------------------------------------------------

The echo show me (without the backslash-patch):
----------------------------------------------------------------------
$ xdg-email --utf8 'rdieter-***@bar.com'
xdg-email-hook: mailto:rdieter%***@bar.com
$ xdg-email --utf8 'rdieter?***@bar.com'
xdg-email-hook: mailto:rdieter?***@bar.com
----------------------------------------------------------------------
The first example is valid (as stated above), but the second is
invalid. Both ('-' and '?') are allowed to be part of the local-part of
an addr-spec according to RFC5322. But a literal '?' is not allowed to
be part of an URI (with this meaning) according to RFC2368.
As a result claws-mail correctly truncates the address and show
"rdieter" in the address field instead of "rdieter?***@bar.com".

With the backslash-patch it works as expected (and IMHO intended by the
authors of xdg-email because the '-' don't need percent encoding):
---------------------------------------------------------------------
$ xdg-email --utf8 'rdieter-***@bar.com'
xdg-email-hook: mailto:rdieter-***@bar.com
$ xdg-email --utf8 'rdieter?***@bar.com'
xdg-email-hook: mailto:rdieter%***@bar.com
---------------------------------------------------------------------
That the '-' now works for the noted Thunderbird user too is a nice
side effect.

This example shows why the '?' is reserved by RFC2368 and must be
percent encoded:
---------------------------------------------------------------------
$ xdg-email --utf8 --subject 'test' 'rdieter?***@bar.com'
xdg-email-hook: mailto:rdieter%***@bar.com?subject=test
---------------------------------------------------------------------
Rex Dieter
2014-08-25 12:32:16 UTC
Permalink
Fix committed,

http://cgit.freedesktop.org/xdg/xdg-utils/commit/?id=7cd846d62e17f36be2f7d29e56188ddf6a6d72cb
Loading...