[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gettext] broken handling of unicode code point escapes in Tcl
From: |
Guido Berhoerster |
Subject: |
Re: [bug-gettext] broken handling of unicode code point escapes in Tcl |
Date: |
Wed, 26 Jun 2013 11:27:22 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
* Daiki Ueno <address@hidden> [2013-06-26 04:22]:
> Guido Berhoerster <address@hidden> writes:
>
> > I still wonder why you're substituting \u escapes with unicode
> > characters at all, as that potentially allows unescaped control
> > sequences which make the .po file quite fragile?
>
> I agree that interpreting \u escapes might cause confusing output for
> Unicode control characters, but I don't think it is totally unuseful.
>
> I can think of at least a couple of benefits of the current behavior:
>
> 1. translators are provided with decoded (human-readable) strings
> 2. strings escaped in different escaping schemes (e.g. \U in Python) can
> be unified
>
> Perhaps an idea might be to introduce gettext-specific Unicode escaping
> scheme (which may only escape control characters) and add an option to
> xgettext to use it.
It can be a bit more complicated than just control characters,
e.g. certain space characters such as U+00A0, U+202F or U+2001
are also non-obvious but not control sequences. Maybe a better
option would be to offer substitution of only alphanumeric and
punctuation characters rather than non-control characters.
Or you could simply add an option to not substitute \u escapes
at all, that is the behavior of the diverse native Tcl
.msg-format extractors that float around (e.g. thos included in
in tkabber or coccinella) and what I'd personally prefer.
--
Guido Berhoerster