[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Wrong letter in title
From: |
David Kastrup |
Subject: |
Re: Wrong letter in title |
Date: |
Sun, 30 Sep 2018 15:58:53 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) |
David Kastrup <address@hidden> writes:
> Davide Liessi <address@hidden> writes:
>
>> Il giorno dom 20 mag 2018 alle ore 18:35 Davide Liessi
>> <address@hidden> ha scritto:
>>> The file
>>>
>>> \version "2.19.81"
>>> \header { title = "č" }
>>> { b1 }
>>>
>>> results in a PDF with correct printed title (lowercase c with caron)
>>> but wrong title field in metadata (Ċ, i.e. uppercase c with dot
>>> above).
>>
>> On Sun, 20 May 2018 20:52:58 +0200 David Kastrup wrote:
>>> Ghostscript bug when converting PostScript output to PDF. The
>>> PostScript reads (pasted from less' display)
>>>
>>> mark /Creator (LilyPond 2.21.0)
>>> /Title (<FE><FF>^A^M)
>>> /DOCINFO pdfmark
>>>
>>> which is the correct UTF16-LE string with BOM. GhostScript however
>>> converts the ^M (0x0d) into ^J (0x0a), basically converting an ASCII CR
>>> to an ASCII LF. Unfortunately, we are not in the middle of ASCII here.
>>
>> Actually, it turns out that the behaviour of GhostScript is not wrong
>> and this is probably a bug in how LilyPond produces the PostScript
>> file.
>>
>> PostScript strings must either properly escape non-ASCII or ASCII
>> non-printable bytes, e.g., as \ddd with ddd the octal representation,
>> or they must be defined as a hexadecimal string (see [1], pages
>> 29–31).
>
> Uh WHAT? To quote:
>
> The \ddd form may be used to include any 8-bit character constant in
> a string. One, two, or three octal digits may be specified, with
> high-order overflow ignored. This notation is preferred for
> specifying a character outside the recommended ASCII character set
> for the PostScript language, since the notation itself stays within
> the standard set and thereby avoids possible difficulties in
> transmitting or storing the text of the program. It is recommended
> that three octal digits always be used, with leading zeros as
> needed, to prevent ambiguity. The string (\0053) , for example,
> contains two characters—an ASCII 5 (Control-E) followed by the digit
> 3—whereas the strings (\53) and (\053) contain one character, the
> ASCII character whose code is octal 53 (plus sign).
>
> Recommended/preferred is not at all equivalent to "must". However, one
> problem indeed is that strings as such have no notion of encoding and
> CR, LF, CRLF are all equivalent. So at least those bytes, when they
> occur as part of UTF-16, would warrant escaping.
Tracker issue: 5422 (https://sourceforge.net/p/testlilyissues/issues/5422/)
Rietveld issue: 345090043 (https://codereview.appspot.com/345090043)
Issue description:
Escape nul, cr, newline in PDF metadata
I wasn't really aware that the strings remain pure 8-bit strings on
input and the UTF16 interpretation is private business of the pdfmark
command. So thanks for that pointer, allowing to tackle this fairly
long-known bug.
--
David Kastrup