texinfo-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: Recommend UTF-8 only as encoding


From: Patrice Dumas
Subject: branch master updated: Recommend UTF-8 only as encoding
Date: Sat, 20 Aug 2022 12:25:49 -0400

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 75a29eb1a8 Recommend UTF-8 only as encoding
75a29eb1a8 is described below

commit 75a29eb1a8e16df2fe0120e486fc5f884dafb185
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Sat Aug 20 18:24:57 2022 +0200

    Recommend UTF-8 only as encoding
    
    * doc/texinfo.texi (@code{@@documentencoding}): describe non
    UTF-8 encodings as being mainly for older manuals.
    (@code{@@documentencoding}, Info Format Regular Nodes): describe
    that cross-references between Info files with different character
    encodings with non-ASCII characters in node names fail in
    the '@code{@@documentencoding}' node.  Remove uneeded information
    from 'Info Format Regular Nodes'.
    Do not recommend using 7bit ASCII for portability, UTF-8 should
    be portable.
    (@code{@@euro}): remove information on the 8bit encoding since
    they should not be used anymore.
    
    (Internationalization of Document Strings): remove @ from
    documentlanguage and documentencoding, these are not @-commands
    in this context.
    
    (Other Customization Variables): update the description of defaults
    in Info for *_QUOTE_SYMBOL.
---
 ChangeLog        | 23 +++++++++++++++
 doc/texinfo.texi | 89 ++++++++++++++++++++++----------------------------------
 tp/TODO          |  2 --
 3 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 55b4ab2676..a28cb82289 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,26 @@
+2022-08-20  Patrice Dumas  <pertusus@free.fr>
+
+       Recommend UTF-8 only as encoding
+
+       * doc/texinfo.texi (@code{@@documentencoding}): describe non
+       UTF-8 encodings as being mainly for older manuals.
+       (@code{@@documentencoding}, Info Format Regular Nodes): describe
+       that cross-references between Info files with different character
+       encodings with non-ASCII characters in node names fail in
+       the '@code{@@documentencoding}' node.  Remove uneeded information
+       from 'Info Format Regular Nodes'.
+       Do not recommend using 7bit ASCII for portability, UTF-8 should
+       be portable.
+       (@code{@@euro}): remove information on the 8bit encoding since
+       they should not be used anymore.
+
+       (Internationalization of Document Strings): remove @ from
+       documentlanguage and documentencoding, these are not @-commands
+       in this context.
+
+       (Other Customization Variables): update the description of defaults
+       in Info for *_QUOTE_SYMBOL.
+
 2022-08-20  Gavin Smith  <gavinsmith0123@gmail.com>
 
        Fix texinfo.tex with XeTeX
diff --git a/doc/texinfo.texi b/doc/texinfo.texi
index 4ec546490f..516f0e10ee 100644
--- a/doc/texinfo.texi
+++ b/doc/texinfo.texi
@@ -3086,9 +3086,10 @@ single space.  For example:
 
 @c Consistently with the HTML Cross-reference Node Name Expansion
 @c spaces and newlines generated by @-commands should also be
-@c collapsed to a single space.  If we want to be explicit, we
-@c could add a line corresponding to
+@c collapsed to a single space.  If we wanted to be explicit, we
+@c could have added a line corresponding to
 @c @node @  foo @: @* bar,
+@c However, it is best to leave those special cases non documented
 
 @noindent all define the same node, namely @samp{foo bar}.
 @c FIXME to be removed.  In 2022 both Info readers find the node.
@@ -10267,19 +10268,6 @@ Use the @code{@@euro@{@}} command to generate 
`@euro{}'.  Where
 possible, this is the symbol for the Euro currency.  Otherwise, the
 word @samp{Euro} is used.
 
-Texinfo cannot magically synthesize support for the Euro symbol where
-the underlying system (fonts, software, whatever) does not support it.
-Therefore, you may find it preferable to use the word ``Euro''.  (In
-banking contexts, the abbreviation for the Euro is EUR@.)
-
-@cindex ISO 8859-15, and Euro
-@cindex Latin 9, and Euro
-In order to get the Euro symbol in encoded Info output, for example,
-it is necessary to specify @code{@@documentencoding ISO-8859-15} or
-@code{@@documentencoding UTF-8} (@xref{@code{@@documentencoding}}.)
-The Euro symbol is in ISO 8859-15 (aka Latin@tie{}9), and is
-@emph{not} in the more widely-used ISO 8859-1 (Latin@tie{}1).
-
 @pindex feymr10
 @cindex Euro font
 The Euro symbol does not exist in the standard @TeX{} fonts (which
@@ -12409,33 +12397,34 @@ the official web site for ISO@tie{}3166 can be found 
via
 @cindex Document input encoding
 
 In the default case, the input and output document encoding are assumed
-to be UTF-8, which is compatible with 7-bit ASCII.  The
-@code{@@documentencoding} command declares the input document encoding, and
-also affects the encoding of the output.  Write it on a line by itself, with a
-valid encoding specification following, near the beginning of the file if your
-document encoding is not the default encoding or if you want to set the
-encoding explicitly.
+to be UTF-8, the vast global character encoding, expressed in 8-bit bytes.
+UTF-8 is compatible with 7-bit ASCII.  It is recommended to use UTF-8
+encoding for the Texinfo manuals.
+
+The @code{@@documentencoding} command declares the input document encoding,
+and also affects the encoding of the output.  Write it on a line by itself,
+with a valid encoding specification following, near the beginning of the file
+if your document encoding is not the default encoding.
 
 @example
 @@documentencoding @var{enc}
 @end example
 
-Texinfo supports these encodings:
+Using UTF-8 should always be the best choice for the encoding.
+Texinfo still supports additional encodings, mainly for compatibility with
+older manuals:
 
 @table @code
 @item US-ASCII
 Character encoding based on the English alphabet.
 
-@item UTF-8
-The default.  The vast global character encoding, expressed in 8-bit bytes.
-
 @item ISO-8859-1
 @itemx ISO-8859-15
 @itemx ISO-8859-2
 @cindex Euro symbol, and encodings
-These specify the standard encodings for Western European (the first
-two) and Eastern European languages (the third), respectively.  ISO
-8859-15 replaces some little-used characters from 8859-1 (e.g.,
+These specify the pre UTF-8 standard encodings for Western European
+(the first two) and Eastern European languages (the third), respectively.
+ISO 8859-15 replaces some little-used characters from 8859-1 (e.g.,
 precomposed fractions) with more commonly needed ones, such as the
 Euro symbol (@euro{}).
 
@@ -12443,10 +12432,10 @@ A full description of the encodings is beyond our 
scope here;
 one useful reference is @uref{http://czyborra.com/charsets/iso8859.html}.
 
 @item koi8-r
-This is the commonly used encoding for the Russian language.
+This was a commonly used encoding for the Russian language before UTF-8.
 
 @item koi8-u
-This is the commonly used encoding for the Ukrainian language.
+This was a commonly used encoding for the Ukrainian language before UTF-8.
 
 @end table
 
@@ -12521,12 +12510,10 @@ load different fonts in the preamble and use
 @@end latex
 @end example
 
-For maximum portability of Texinfo documents across the many different
-user environments in the world, we recommend sticking to 7-bit ASCII
-in the input unless your particular manual needs a substantial amount
-of non-ASCII, e.g., it's written in German.  You can use the
-@code{@@U} command to insert an occasional needed character
-(@pxref{Inserting Unicode}).
+Cross-references between Info files with different character encodings
+with non-ASCII characters in node names fail.  We strongly recommend
+using UTF-8 only as the encoding for manuals with non-ASCII characters
+in cross-references sources or destinations.
 
 
 @node Conditionals
@@ -16731,7 +16718,8 @@ in DocBook.  Undefined in the default case in HTML and 
set to @code{&rsquo;}
 if @code{USE_NUMERIC_ENTITY} is not set, to @code{&#8217;} if set, and
 to a quote character if @option{--enable-encoding} is set and the output
 encoding includes that character.
-The default for Info is the same as @code{OPEN_QUOTE_SYMBOL} (see below).
+The default for Info is set the same as for @code{OPEN_QUOTE_SYMBOL},
+except that the Unicode code is a closing quote (see below).
 
 @item COMMAND_LINE_ENCODING
 Encoding used to decode command-line arguments.  Default is based on the locale
@@ -16758,8 +16746,9 @@ the encoding of input file names, such as file names 
specified as
 the locale encoding instead.  Default is set, except on MS-Windows where
 the locale encoding is used by default.
 
-Note that this is for file names only; @code{@@documentencoding} is always
-used for the encoding of file content (@pxref{@code{@@documentencoding}}).
+Note that this is for file names only; the default encoding or
+@code{@@documentencoding} is always used for the encoding of file
+content (@pxref{@code{@@documentencoding}}).
 
 The @code{INPUT_FILE_NAME_ENCODING} variable overrides this variable.
 
@@ -16914,11 +16903,11 @@ Undefined in the default case in HTML and set to 
@code{&lsquo;}
 if @code{USE_NUMERIC_ENTITY} is not set, to @code{&#8217;} if set, and
 to a quote character if @option{--enable-encoding} is set and the output
 encoding includes that character.
-For Info, the default depends on the enabled document encoding
-(@pxref{@code{@@documentencoding}}); if no document encoding is set, or the
-encoding is US-ASCII, etc., @samp{'} is used.  This character usually appears
-as an undirected single quote on modern systems.  If the document encoding is
-Unicode, the Info output uses a Unicode left quote.
+For Info, the default depends on the enabled document encoding.  If
+@option{--disable-encoding} is set or the document encoding is not UTF-8,
+@samp{'} is used.  This character usually appears
+as an undirected single quote on modern systems.  Otherwise, the Info
+output uses a Unicode left quote.
 
 @item OUTPUT_ENCODING_NAME
 Normalized encoding name used for output files.  Should be a usable
@@ -17146,10 +17135,10 @@ The expansion of a translation string is done like 
this:
 
 @enumerate
 @item First, the string is translated.  The locale
-is @var{@@documentlanguage}@code{.}@var{@@documentencoding}.
+is @var{documentlanguage}@code{.}@var{documentencoding}.
 
 @cindex @code{us-ascii} encoding, and translations
-If the @var{@@documentlanguage} has the form @samp{ll_CC}, that is
+If the @var{documentlanguage} has the form @samp{ll_CC}, that is
 tried first, and then just @samp{ll}.  If that does not exist, and the
 encoding is not @code{us-ascii}, then @code{us-ascii} is tried.
 
@@ -23899,14 +23888,6 @@ characters (@samp{CTRL-?}, character number 127).  
@command{makeinfo} adds
 these characters when needed in the default case.  Note that not all Info
 readers recognize this syntax.  @xref{Info Node Names Constraints}.
 
-The use of non-ASCII characters in the names of nodes is permitted,
-but can cause problems in cross-references between nodes in Info files
-with different character encodings, and also when node names from many
-different files are listed (for example, with the @option{--apropos}
-option to the standalone Info browser), so we recommend avoiding them
-whenever feasible.  For example, prefer the use of the ASCII
-apostrophe character (@t{'}) to Unicode directional quotes.
-
 The @t{<general text>} of the node can include the special constructs
 described next.
 
diff --git a/tp/TODO b/tp/TODO
index e72c7ba372..cef004e5a1 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -558,8 +558,6 @@ or Info.pm).  No idea whether it is right or wrong.
 in hyphenation: only text and accent commands, and should
 only appear in toplevel
 
-use definfoenclose information in Convert::Text?
-
 From vincent Belaïche. About svg image files in HTML:
 
 I don't think that supporting svg would be easy: its seems that to embed an



reply via email to

[Prev in Thread] Current Thread [Next in Thread]