[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Patch Logs vs. character sets
From: |
Tom Lord |
Subject: |
[Gnu-arch-users] Patch Logs vs. character sets |
Date: |
Tue, 25 May 2004 11:44:01 -0700 (PDT) |
Aaron mentioned his belief that patch logs will eventually be UTF-8.
I don't think so -- I think that would be a mistake.
Instead:
~ Patch log entries can be in any character set and encoding
form which is a superset of ASCII
~ All header data which arch wants to be parsable will be
in ASCII, using Pika escaping and Unicode for non-ASCII
character data.
~ Header names consist of any non-:, non-whitespace (not _ascii_
whitespace), non-empty string. Header names that arch
cares about will be ASCII.
~ An optional header will be used to specify encoding form,
e.g.:
Encoding: iso-8859-1
or
Encoding: utf-8
~ Some commands produce as output non-parsed fragments from
patch logs. One example is the "--summary" option
that many commands take (e.g,. `tla missing --summary').
Another example is an automatically constructed ChangeLog.
Most of these (ChangeLogs being the exception) should infer
the user's preferred character set from the locale and
transcode log message data appropriately. For example,
if a log message is encoded in iso-8859-4 but my terminal
understands utf-8, `tla missing --summary' should recode
the summary line in utf-8 before printing it.
(If transcoding isn't possible because the destination set
can't represent a particular character or because arch
doesn't know how, then Pika escaping can be used for
non-ASCII characters.)
Log excerpts injected into ChangeLogs should also be automatically
transcoded but in that case, the target encoding should be taken
from a comment in the ChangeLog.
-t
- [Gnu-arch-users] Patch Logs vs. character sets,
Tom Lord <=