Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (iss

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (iss

From:	reinhold . kainhofer
Subject:	Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046)
Date:	Sat, 16 Apr 2011 17:40:17 +0000

Reviewers: carl.d.sorensen_gmail.com, reinhold_kainhofer.com, Carl,

Message:
On 2011/04/16 14:43:02, Carl wrote:

Bertrand's patch has now been pushed.  If you can finish your patch,

then we can

get it pushed and applied to stable/2.14


Patchset 2 now includes that patch, so now all pdf metadata should be
correctly encoded and escaped (the escaping needs to be done AFTER
encoding).

Cheers,
Reinhold


Description:
pdf-metadata: Use UTF-16BE for metadata if required (fix #1502)

All Latin1 metadata strings need to be printed out to the .ps file
in Latin1 (or UTF-16BE, but definitely NOT in UTF-8), and all
non-Latin1 strings need to use UTF-16BE encoding.

Please review this at http://codereview.appspot.com/4398046/

Affected files:
  A input/regression/pdfmark-metadata-unicode.ly
  A lily/pdf-scheme.cc
  M scm/framework-ps.scm


Index: input/regression/pdfmark-metadata-unicode.ly

diff --git a/input/regression/pdfmark-metadata-unicode.lyb/input/regression/pdfmark-metadata-unicode.ly

new file mode 100644

index0000000000000000000000000000000000000000..5f51a620c0753917bdf4d6c7d22f1499b784b138

--- /dev/null
+++ b/input/regression/pdfmark-metadata-unicode.ly
@@ -0,0 +1,26 @@
+\version "2.13.60"
+
+
+\header
+{
+
+  texidoc = "PDF metadata need either Latin1 encoding (not UTF8) or full
+  UTF-16BE with BOM. The title field uses full UTF-16 (russian characters,
+  euro, etc), while the composer uses normal european diacrits (which need
+  to be encoded as Latin1, not as UTF8). Closing parenthesis need to be
+  escaped by a backslash AFTER encoding!"
+

+ % Non-latin1 text, requiring UTF-16BE (with BOM) encoding in PDFmetatdata:

+  % closing parentheses and backslashed need to be escaped AFTER encoding!
+  title = "UTF-16BE title:² € ĂĄœŖŮůſЖюљ)\\\n ¡"

+ % Latin1 text, requiring at least PDFDocEncoding in PDF metadata, allLatin1

+  % characters coincide, so no special encoding is required, just print out
+  % the Latin1 characters (NOT the utf8 bytes!)
+  composer = "Latin1 composer (with special chars): Jöhånñ Strauß"
+  poet = "UTF-16BE with parentheses: ) € ĂĄœŖŮůſЖюљ"
+}
+
+\score
+{
+  \new Staff c'1
+}
\ No newline at end of file
Index: lily/pdf-scheme.cc
diff --git a/lily/pdf-scheme.cc b/lily/pdf-scheme.cc
new file mode 100644

index0000000000000000000000000000000000000000..6d717c55ad70dfd732462ca750536d049edece76

--- /dev/null
+++ b/lily/pdf-scheme.cc
@@ -0,0 +1,60 @@
+/*
+  This file is part of LilyPond, the GNU music typesetter.
+
+  Copyright (C) 2011 Reinhold Kainhofer <address@hidden>
+
+  LilyPond is free software: you can redistribute it and/or modify
+  it under the terms of the GNU General Public License as published by
+  the Free Software Foundation, either version 3 of the License, or
+  (at your option) any later version.
+
+  LilyPond is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU General Public License for more details.
+
+  You should have received a copy of the GNU General Public License
+  along with LilyPond.  If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <glib.h>
+using namespace std;
+
+#include "lily-guile.hh"
+
+
+LY_DEFINE (ly_encode_string_for_pdf, "ly:encode-string-for-pdf",
+          1, 0, 0, (SCM str),
+          "Check whether the string needs to be encoded for PDF output 
(Latin1,"
+          " PDFDocEncoding or in the most general case UTF-16BE).")
+{
+  LY_ASSERT_TYPE (scm_is_string, str, 1);
+  char *p = ly_scm2str0 (str);
+  char *g = NULL;
+  const char *charset;
+  gsize bytes_written = 0;
+  g_get_charset (&charset); /* The current locale */
+
+  /* First, try to convert to ISO-8859-1 (no encodings required) */
+  g = g_convert (p, -1, "ISO-8859-1", charset, 0, &bytes_written, 0);
+  /* If that fails, we have to resolve to full UTF-16BE */
+  if (!g) {

+ char *g_without_BOM = g_convert (p, -1, "UTF-16BE", charset, 0,&bytes_written, 0);

+    /* prepend the BOM manually, g_convert doesn't do it! */
+    g = new char[bytes_written+3];
+    g[0] = (char)254;
+    g[1] = (char)255;
+    memcpy (&g[2], g_without_BOM, bytes_written+1); // Copy string + \0
+    free (g_without_BOM);
+    bytes_written += 2;
+  }
+  free (p);
+
+  /* Convert back to SCM object and return it */
+  if (g) {
+    return scm_from_locale_stringn (g, bytes_written);
+  } else {
+    return str;
+  }
+
+}
Index: scm/framework-ps.scm
diff --git a/scm/framework-ps.scm b/scm/framework-ps.scm

indexeb4e54553158a3c203b4f02bba640b8cd7e1dda4..b0359bcd884cc2a710aa0d58a51d93c483b02006100644

--- a/scm/framework-ps.scm
+++ b/scm/framework-ps.scm
@@ -413,12 +413,16 @@
 ;;; Create DOCINFO pdfmark containing metadata
 ;;; header fields with pdf prefix override those without the prefix
 (define (handle-metadata header port)
+  (define (metadata-encode val)
+    ;; First, call ly:encode-string-for-pdf to encode the string (latin1 or
+    ;; utf-16be), then escape all parentheses and backslashes
+    (ps-quote (ly:encode-string-for-pdf val)))
   (define (metadata-lookup-output overridevar fallbackvar field)
     (let* ((overrideval (ly:modules-lookup (list header) overridevar))
           (fallbackval (ly:modules-lookup (list header) fallbackvar))
           (val (if overrideval overrideval fallbackval)))
       (if val
-         (format port "/~a (~a)\n" field (ps-quote (markup->string val))))))

+ (format port "/~a (~a)\n" field (metadata-encode (markup->stringval))))))

   (display "[ " port)
   (metadata-lookup-output 'pdfcomposer 'composer "Author")
   (format port "/Creator (LilyPond ~a)\n" (lilypond-version))

[Prev in Thread]

Current Thread

[Next in Thread]

pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046), Carl . D . Sorensen, 2011/04/14
- Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046), Reinhold Kainhofer, 2011/04/14
- Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046), Carl . D . Sorensen, 2011/04/16
- Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046), reinhold . kainhofer <=
- Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046), Carl . D . Sorensen, 2011/04/16

Prev by Date: Re: lily-git.tcl not working?
Next by Date: Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046)
Previous by thread: Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046)
Next by thread: Re: pdf-metadata: Use UTF-16BE for metadata if required (fix #1502) (issue4398046)
Index(es):
- Date
- Thread