Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue44

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue44

From:	reinhold . kainhofer
Subject:	Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061)
Date:	Tue, 26 Apr 2011 20:47:43 +0000

Reviewers: jan.nieuwenhuizen,

Message:
On 2011/04/26 13:19:01, jan.nieuwenhuizen wrote:

LGTM

Some nitpicks, see below


New patch is up, which includes all of Jan's nitpicks (in particular,
GNU coding standard for braces, char const*, BOM comment and definition
as \xFE\xFF). I also fixed a memleak by using scm_take_str instead of
scm_from_locale_stringn (which does NOT eventually free its first
argument). This also makes more sense as scm_take_str is meant for
arbitrary binary data, while scm_from_locale_stringn might mess up when
the locale handling changes in GUILE.

Cheers,
Reinhold


Description:
Fix "make doc" error: Character set messup in pdf-scheme.cc

In "make doc", somehow the LANG env variable is cleared, so
glib returned charset=ANSI_X3.4-1968 for the input string. This would
disallow any accented characters in the input string.
However, we know that the input string is by definition always UTF-8,
so the solution is not to use the current locale, but hardcode UTF-8.

Please review this at http://codereview.appspot.com/4449061/

Affected files:
  M lily/pdf-scheme.cc


Index: lily/pdf-scheme.cc
diff --git a/lily/pdf-scheme.cc b/lily/pdf-scheme.cc

index6d717c55ad70dfd732462ca750536d049edece76..c10fe59418f50e9ab1e72c482cc3208a8bb35efc100644

--- a/lily/pdf-scheme.cc
+++ b/lily/pdf-scheme.cc
@@ -20,41 +20,63 @@
 #include <glib.h>
 using namespace std;

+#include "international.hh"
+#include "warn.hh"
 #include "lily-guile.hh"


 LY_DEFINE (ly_encode_string_for_pdf, "ly:encode-string-for-pdf",
           1, 0, 0, (SCM str),
-          "Check whether the string needs to be encoded for PDF output 
(Latin1,"
-          " PDFDocEncoding or in the most general case UTF-16BE).")
+          "Encode the given string to either Latin1 (which is a subset of "
+          "the PDFDocEncoding) or if that's not possible to full UTF-16BE "
+          "with Byte-Order-Mark (BOM).")
 {
   LY_ASSERT_TYPE (scm_is_string, str, 1);
   char *p = ly_scm2str0 (str);
   char *g = NULL;
-  const char *charset;
+  char const*charset="UTF-8"; // Input is ALWAYS UTF-8!
   gsize bytes_written = 0;
-  g_get_charset (&charset); /* The current locale */

-  /* First, try to convert to ISO-8859-1 (no encodings required) */
+  /* First, try to convert to ISO-8859-1 (no encodings required). This will
+   * fail, if the string contains accented characters, so we do not check
+   * for errors. */
   g = g_convert (p, -1, "ISO-8859-1", charset, 0, &bytes_written, 0);
   /* If that fails, we have to resolve to full UTF-16BE */
-  if (!g) {

- char *g_without_BOM = g_convert (p, -1, "UTF-16BE", charset, 0,&bytes_written, 0);

-    /* prepend the BOM manually, g_convert doesn't do it! */
-    g = new char[bytes_written+3];
-    g[0] = (char)254;
-    g[1] = (char)255;
-    memcpy (&g[2], g_without_BOM, bytes_written+1); // Copy string + \0
-    free (g_without_BOM);
-    bytes_written += 2;
-  }
+  if (!g)
+    {
+      GError *e = NULL;

+ char *g_without_BOM = g_convert (p, -1, "UTF-16BE", charset, 0,&bytes_written, &e);

+      if (e != NULL)
+        {

+ warning (_f("Conversion of string \'%s\' to UTF-16befailed: %s", p, e->message));

+          g_error_free (e);
+        }
+      /* UTF-16BE allows/recommends a byte-order-mark (BOM) of two bytes
+       * \xFE\xFF at the begin of the string. The pdfmark specification
+       * requires it and depends on it to distinguish PdfDocEncoding from
+       * UTF-16BE. As g_convert does not automatically prepend this BOM
+       * for UTF-16BE (only for UTF-16, which uses lower endian by default,
+       * though), we have to prepend it manually. */

+ if (g_without_BOM) // conversion to UTF-16be might have failed(shouldn't!)

+        {
+          g = new char[bytes_written+3];
+          char const *BOM = "\xFE\xFF";
+          strcpy (g, BOM);

+ memcpy (&g[2], g_without_BOM, bytes_written+1); // Copy string +\0

+          free (g_without_BOM);
+          bytes_written += 2;
+        }
+    }
   free (p);

   /* Convert back to SCM object and return it */
-  if (g) {
-    return scm_from_locale_stringn (g, bytes_written);
-  } else {
-    return str;
-  }
+  if (g)
+    {

+ return scm_take_str (g, bytes_written); // scm_take_str eventuallyfrees g!

+    }
+  else
+    {
+      return str;
+    }

 }

[Prev in Thread]

Current Thread

[Next in Thread]

Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061), jan . nieuwenhuizen, 2011/04/27
- Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061), reinhold . kainhofer <=
- Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061), percival . music . ca, 2011/04/28
- Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061), n . puttock, 2011/04/28
- Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061), Carl . D . Sorensen, 2011/04/30

Prev by Date: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061)
Next by Date: Re: Doc: NR rewrite of 3.2 Titles and Headers (issue4124056)
Previous by thread: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061)
Next by thread: Re: Fix "make doc" error: Character set messup in pdf-scheme.cc (issue4449061)
Index(es):
- Date
- Thread