[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnulib] quote characters in stds

From: Karl Berry
Subject: Re: [bug-gnulib] quote characters in stds
Date: Thu, 9 Jun 2005 20:51:17 -0400

    The main point is that it transmits the perception that 

Now I understand.  Thanks.

    These two paragraphs seem out of place:

I had been thinking of that as referring only to quotation characters,
but I see that you are right.  Not sure what rms will think, but it does
seem cleaner to have two separate section, so let's try that.

Trying to take both your latest comments into account, now I have the
following ...

@node Character set
@section Character set
@cindex character set
@cindex encodings
@cindex ASCII characters
@cindex non-ASCII characters

Sticking to the ASCII character set (plain text, 7-bit characters) is
preferred in GNU source code comments, text documents, and other
contexts, unless there is good reason to do something else because of
the domain at hand.

If you need to use non-ASCII characters, for example to represent
names of contributors, you should normally stick with one encoding, as
one cannot in general mix encodings reliably.  

@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for
quotation characters in messages to users: preferably 0x60 (`) for
left quotes and 0x27 (') for right quotes.  If using ` is unacceptable
in your application, other possibilities are using ' for both opening
and closing, or 0x22 (") for both opening and closing.  It is ok, but
not required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
and @code{quotearg} modules provide a reasonably straightforward way
to support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

In any case, the documentation for your program should clearly specify
how it does quoting, if different than the preferred method of ` and
'.  This is especially important if the output of your program is ever
likely to be parsed by another program.

Quotation characters are a difficult area in the computing world at
this time: there are no true left or right quote characters in ASCII,
or even Latin1; the ` character we use was standardized as a grave
accent.  Latin1 does have paired standalone accents, but it seems
wrong in principle to abuse them as quotes.  Also, Latin1 is still not
universally usable.

Unicode contains the unambiguous quote characters required, and its
common encoding UTF-8 is upward compatible with address@hidden  However,
Unicode and UTF-8 are not universally well-supported, either. 

This may change over the next few years, and then we will revisit

reply via email to

[Prev in Thread] Current Thread [Next in Thread]