Encoding of LilyPond console output

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Encoding of LilyPond console output

From:	Wilbert Berendsen
Subject:	Encoding of LilyPond console output
Date:	Sat, 31 Dec 2011 14:44:01 +0100

Hi all,

When dealing with files and directories with accented letters
(non-ASCII filenames) I came across a number of small issues, which I
need to handle correctly in Frescobaldi, my LilyPond compagnion.

Everything boils down to the question in which encoding the 8bit
console output of LilyPond is presented. I was always assuming that
this was UTF-8, which worked correctly on Linux. I.e. both
filenameswith accented letters and translated messages (such as French,
with many accented letters) always showed up correctly in the LilyPond
console output. Which is read by Frescobaldi as an 8bit bytestream and
decoded into unicode strings using the UTF-8 encoding.

Then, on Windows, I discovered that filenames do not use UTF-8
encoding, but rather 'mbcs' or something like that (what is returned
by sys.getfilesystemencoding() in Python).

So I changed Frescobaldi to use that encoding when reading LilyPond
console output, but then we discovered that translated messages (such
as the French ones) with accented characters do show in a garbled
encoding (clearly showing something like UTF-8 displayed as Latin1).

So again I changed Frescobaldi, and now it reads the console output
byte stream and parses that for file references (such as: file.ly:12:3:
error: blabla) and decodes those filenames using the filesystem
encoding, and the rest using UTF-8.

This seems to work well: file references are correctly parsed and
messages are readable still.

Only some other messages from LilyPond that show filenames, like
"processing `file.ly'...", show the filename in a wrong encoding,
because the filename is written as-is in the filesystem encoding,
intermingled in a message encoded as UTF-8. This can also be seen on
the Windows console (both CMD and the Git bash console).

So everything thrown together: is my analysis of the mixed output
encodings LilyPond uses on stdout and stderr correct?

And in line with this: can LilyPond be made more aware of this, and use
the same encoding for all output (correctly encoding filenames)? Or am
I wrong?

With many regards and best wishes to all for the new year!
Wilbert

-- 
Wilbert Berendsen
(http://www.wilbertberendsen.nl)

[Prev in Thread]

Current Thread

[Next in Thread]

Encoding of LilyPond console output, Wilbert Berendsen <=
- Re: Encoding of LilyPond console output, David Kastrup, 2011/12/31

Prev by Date: Re: Doc: Usage added @knownissue for LaTeX (issue 5492075)
Next by Date: Re: Encoding of LilyPond console output
Previous by thread: Adds barNumberVisibility regtest (issue 5501088)
Next by thread: Re: Encoding of LilyPond console output
Index(es):
- Date
- Thread