[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: Recommend UTF-8 only as encoding
From: |
Patrice Dumas |
Subject: |
branch master updated: Recommend UTF-8 only as encoding |
Date: |
Sat, 20 Aug 2022 12:25:49 -0400 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new 75a29eb1a8 Recommend UTF-8 only as encoding
75a29eb1a8 is described below
commit 75a29eb1a8e16df2fe0120e486fc5f884dafb185
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Sat Aug 20 18:24:57 2022 +0200
Recommend UTF-8 only as encoding
* doc/texinfo.texi (@code{@@documentencoding}): describe non
UTF-8 encodings as being mainly for older manuals.
(@code{@@documentencoding}, Info Format Regular Nodes): describe
that cross-references between Info files with different character
encodings with non-ASCII characters in node names fail in
the '@code{@@documentencoding}' node. Remove uneeded information
from 'Info Format Regular Nodes'.
Do not recommend using 7bit ASCII for portability, UTF-8 should
be portable.
(@code{@@euro}): remove information on the 8bit encoding since
they should not be used anymore.
(Internationalization of Document Strings): remove @ from
documentlanguage and documentencoding, these are not @-commands
in this context.
(Other Customization Variables): update the description of defaults
in Info for *_QUOTE_SYMBOL.
---
ChangeLog | 23 +++++++++++++++
doc/texinfo.texi | 89 ++++++++++++++++++++++----------------------------------
tp/TODO | 2 --
3 files changed, 58 insertions(+), 56 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 55b4ab2676..a28cb82289 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,26 @@
+2022-08-20 Patrice Dumas <pertusus@free.fr>
+
+ Recommend UTF-8 only as encoding
+
+ * doc/texinfo.texi (@code{@@documentencoding}): describe non
+ UTF-8 encodings as being mainly for older manuals.
+ (@code{@@documentencoding}, Info Format Regular Nodes): describe
+ that cross-references between Info files with different character
+ encodings with non-ASCII characters in node names fail in
+ the '@code{@@documentencoding}' node. Remove uneeded information
+ from 'Info Format Regular Nodes'.
+ Do not recommend using 7bit ASCII for portability, UTF-8 should
+ be portable.
+ (@code{@@euro}): remove information on the 8bit encoding since
+ they should not be used anymore.
+
+ (Internationalization of Document Strings): remove @ from
+ documentlanguage and documentencoding, these are not @-commands
+ in this context.
+
+ (Other Customization Variables): update the description of defaults
+ in Info for *_QUOTE_SYMBOL.
+
2022-08-20 Gavin Smith <gavinsmith0123@gmail.com>
Fix texinfo.tex with XeTeX
diff --git a/doc/texinfo.texi b/doc/texinfo.texi
index 4ec546490f..516f0e10ee 100644
--- a/doc/texinfo.texi
+++ b/doc/texinfo.texi
@@ -3086,9 +3086,10 @@ single space. For example:
@c Consistently with the HTML Cross-reference Node Name Expansion
@c spaces and newlines generated by @-commands should also be
-@c collapsed to a single space. If we want to be explicit, we
-@c could add a line corresponding to
+@c collapsed to a single space. If we wanted to be explicit, we
+@c could have added a line corresponding to
@c @node @ foo @: @* bar,
+@c However, it is best to leave those special cases non documented
@noindent all define the same node, namely @samp{foo bar}.
@c FIXME to be removed. In 2022 both Info readers find the node.
@@ -10267,19 +10268,6 @@ Use the @code{@@euro@{@}} command to generate
`@euro{}'. Where
possible, this is the symbol for the Euro currency. Otherwise, the
word @samp{Euro} is used.
-Texinfo cannot magically synthesize support for the Euro symbol where
-the underlying system (fonts, software, whatever) does not support it.
-Therefore, you may find it preferable to use the word ``Euro''. (In
-banking contexts, the abbreviation for the Euro is EUR@.)
-
-@cindex ISO 8859-15, and Euro
-@cindex Latin 9, and Euro
-In order to get the Euro symbol in encoded Info output, for example,
-it is necessary to specify @code{@@documentencoding ISO-8859-15} or
-@code{@@documentencoding UTF-8} (@xref{@code{@@documentencoding}}.)
-The Euro symbol is in ISO 8859-15 (aka Latin@tie{}9), and is
-@emph{not} in the more widely-used ISO 8859-1 (Latin@tie{}1).
-
@pindex feymr10
@cindex Euro font
The Euro symbol does not exist in the standard @TeX{} fonts (which
@@ -12409,33 +12397,34 @@ the official web site for ISO@tie{}3166 can be found
via
@cindex Document input encoding
In the default case, the input and output document encoding are assumed
-to be UTF-8, which is compatible with 7-bit ASCII. The
-@code{@@documentencoding} command declares the input document encoding, and
-also affects the encoding of the output. Write it on a line by itself, with a
-valid encoding specification following, near the beginning of the file if your
-document encoding is not the default encoding or if you want to set the
-encoding explicitly.
+to be UTF-8, the vast global character encoding, expressed in 8-bit bytes.
+UTF-8 is compatible with 7-bit ASCII. It is recommended to use UTF-8
+encoding for the Texinfo manuals.
+
+The @code{@@documentencoding} command declares the input document encoding,
+and also affects the encoding of the output. Write it on a line by itself,
+with a valid encoding specification following, near the beginning of the file
+if your document encoding is not the default encoding.
@example
@@documentencoding @var{enc}
@end example
-Texinfo supports these encodings:
+Using UTF-8 should always be the best choice for the encoding.
+Texinfo still supports additional encodings, mainly for compatibility with
+older manuals:
@table @code
@item US-ASCII
Character encoding based on the English alphabet.
-@item UTF-8
-The default. The vast global character encoding, expressed in 8-bit bytes.
-
@item ISO-8859-1
@itemx ISO-8859-15
@itemx ISO-8859-2
@cindex Euro symbol, and encodings
-These specify the standard encodings for Western European (the first
-two) and Eastern European languages (the third), respectively. ISO
-8859-15 replaces some little-used characters from 8859-1 (e.g.,
+These specify the pre UTF-8 standard encodings for Western European
+(the first two) and Eastern European languages (the third), respectively.
+ISO 8859-15 replaces some little-used characters from 8859-1 (e.g.,
precomposed fractions) with more commonly needed ones, such as the
Euro symbol (@euro{}).
@@ -12443,10 +12432,10 @@ A full description of the encodings is beyond our
scope here;
one useful reference is @uref{http://czyborra.com/charsets/iso8859.html}.
@item koi8-r
-This is the commonly used encoding for the Russian language.
+This was a commonly used encoding for the Russian language before UTF-8.
@item koi8-u
-This is the commonly used encoding for the Ukrainian language.
+This was a commonly used encoding for the Ukrainian language before UTF-8.
@end table
@@ -12521,12 +12510,10 @@ load different fonts in the preamble and use
@@end latex
@end example
-For maximum portability of Texinfo documents across the many different
-user environments in the world, we recommend sticking to 7-bit ASCII
-in the input unless your particular manual needs a substantial amount
-of non-ASCII, e.g., it's written in German. You can use the
-@code{@@U} command to insert an occasional needed character
-(@pxref{Inserting Unicode}).
+Cross-references between Info files with different character encodings
+with non-ASCII characters in node names fail. We strongly recommend
+using UTF-8 only as the encoding for manuals with non-ASCII characters
+in cross-references sources or destinations.
@node Conditionals
@@ -16731,7 +16718,8 @@ in DocBook. Undefined in the default case in HTML and
set to @code{’}
if @code{USE_NUMERIC_ENTITY} is not set, to @code{’} if set, and
to a quote character if @option{--enable-encoding} is set and the output
encoding includes that character.
-The default for Info is the same as @code{OPEN_QUOTE_SYMBOL} (see below).
+The default for Info is set the same as for @code{OPEN_QUOTE_SYMBOL},
+except that the Unicode code is a closing quote (see below).
@item COMMAND_LINE_ENCODING
Encoding used to decode command-line arguments. Default is based on the locale
@@ -16758,8 +16746,9 @@ the encoding of input file names, such as file names
specified as
the locale encoding instead. Default is set, except on MS-Windows where
the locale encoding is used by default.
-Note that this is for file names only; @code{@@documentencoding} is always
-used for the encoding of file content (@pxref{@code{@@documentencoding}}).
+Note that this is for file names only; the default encoding or
+@code{@@documentencoding} is always used for the encoding of file
+content (@pxref{@code{@@documentencoding}}).
The @code{INPUT_FILE_NAME_ENCODING} variable overrides this variable.
@@ -16914,11 +16903,11 @@ Undefined in the default case in HTML and set to
@code{‘}
if @code{USE_NUMERIC_ENTITY} is not set, to @code{’} if set, and
to a quote character if @option{--enable-encoding} is set and the output
encoding includes that character.
-For Info, the default depends on the enabled document encoding
-(@pxref{@code{@@documentencoding}}); if no document encoding is set, or the
-encoding is US-ASCII, etc., @samp{'} is used. This character usually appears
-as an undirected single quote on modern systems. If the document encoding is
-Unicode, the Info output uses a Unicode left quote.
+For Info, the default depends on the enabled document encoding. If
+@option{--disable-encoding} is set or the document encoding is not UTF-8,
+@samp{'} is used. This character usually appears
+as an undirected single quote on modern systems. Otherwise, the Info
+output uses a Unicode left quote.
@item OUTPUT_ENCODING_NAME
Normalized encoding name used for output files. Should be a usable
@@ -17146,10 +17135,10 @@ The expansion of a translation string is done like
this:
@enumerate
@item First, the string is translated. The locale
-is @var{@@documentlanguage}@code{.}@var{@@documentencoding}.
+is @var{documentlanguage}@code{.}@var{documentencoding}.
@cindex @code{us-ascii} encoding, and translations
-If the @var{@@documentlanguage} has the form @samp{ll_CC}, that is
+If the @var{documentlanguage} has the form @samp{ll_CC}, that is
tried first, and then just @samp{ll}. If that does not exist, and the
encoding is not @code{us-ascii}, then @code{us-ascii} is tried.
@@ -23899,14 +23888,6 @@ characters (@samp{CTRL-?}, character number 127).
@command{makeinfo} adds
these characters when needed in the default case. Note that not all Info
readers recognize this syntax. @xref{Info Node Names Constraints}.
-The use of non-ASCII characters in the names of nodes is permitted,
-but can cause problems in cross-references between nodes in Info files
-with different character encodings, and also when node names from many
-different files are listed (for example, with the @option{--apropos}
-option to the standalone Info browser), so we recommend avoiding them
-whenever feasible. For example, prefer the use of the ASCII
-apostrophe character (@t{'}) to Unicode directional quotes.
-
The @t{<general text>} of the node can include the special constructs
described next.
diff --git a/tp/TODO b/tp/TODO
index e72c7ba372..cef004e5a1 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -558,8 +558,6 @@ or Info.pm). No idea whether it is right or wrong.
in hyphenation: only text and accent commands, and should
only appear in toplevel
-use definfoenclose information in Convert::Text?
-
From vincent Belaïche. About svg image files in HTML:
I don't think that supporting svg would be easy: its seems that to embed an
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: Recommend UTF-8 only as encoding,
Patrice Dumas <=