[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gettext] [PATCH] format-xml: Add format string parser for XML
From: |
Daiki Ueno |
Subject: |
[bug-gettext] [PATCH] format-xml: Add format string parser for XML |
Date: |
Thu, 15 Jan 2015 19:07:30 +0900 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) |
Now that we separated the libexpat wrapper into a separate C file, it is
not that hard to implement the "xml-format" flag, we discussed long
before:
https://lists.gnu.org/archive/html/bug-gettext/2012-04/msg00013.html
https://lists.gnu.org/archive/html/bug-gettext/2013-05/msg00010.html
I'm attaching a patch, which works like the following:
$ cat test.po
#, xml-format
msgid "0"
msgstr "<"
#, xml-format
msgid "<foo>0</foo>"
msgstr "<foo><bar>0</bar></foo>"
#, xml-format
msgid "<foo>foo</foo>"
msgstr "<foo>FOO!</foo>"
$ msgfmt --check-format test.po
test.po:3: 'msgstr' is not a valid XML format string, unlike 'msgid'. Reason:
error while parsing: not well-formed (invalid token)
test.po:7: incompatible XML tree structure 'msgid' and 'msgstr'
msgfmt: found 2 fatal errors
It checks the well-formedness of XML fragments as well as that the same
tree structure is preserved after translation. I wonder if the latter
check might be too rigid, but sometimes it would be useful.
Regards,
--
Daiki Ueno
>From b72241c63c84845119ee556103f9b4f036b4275d Mon Sep 17 00:00:00 2001
From: Daiki Ueno <address@hidden>
Date: Thu, 15 Jan 2015 18:24:39 +0900
Subject: [PATCH] format-xml: Add format string parser for XML
* gettext-tools/src/libexpat-compat.h (XML_SetUserData): New declaration.
* gettext-tools/src/libexpat-compat.c (p_XML_SetUserData): New variable.
(XML_SetUserData): New function.
(load_libexpat): Expose "XML_SetUserData".
* gettext-tools/src/xgettext.c (flag_table_xml): New variable.
(xgettext_record_flag): Initialize flag_table_xml.
* gettext-tools/src/message.h (enum format_type): New enumeration value
format_xml.
(NFORMATS): Increase to 28.
* gettext-tools/src/message.c (format_language): Add "xml".
(format_language_pretty): Add "XML".
* gettext-tools/src/format.h (formatstring_xml): New declaration.
* gettext-tools/src/format.c (formatstring_parsers): Register
formatstring_xml.
* gettext-tools/src/format-xml.c: New file.
* gettext-tools/src/Makefile.am (FORMAT_SOURCE): Add format-xml.c.
(xgettext_SOURCES): Move libexpat-compat.c to...
(COMMON_SOURCE): ...here.
* gettext-tools/libgettextpo/Makefile.am (libgettextpo_la_LDFLAGS):
Add @address@hidden
(libgettextpo_la_AUXSOURCES): Add ../src/libexpat-compat.c.
* gettext-tools/tests/xgettext-9: Adjust PO output.
* gettext-tools/tests/format-xml-1: New file.
* gettext-tools/tests/Makefile.am (TESTS): Add new test.
---
gettext-tools/libgettextpo/ChangeLog | 5 +
gettext-tools/libgettextpo/Makefile.am | 4 +-
gettext-tools/src/ChangeLog | 20 ++++
gettext-tools/src/Makefile.am | 6 +-
gettext-tools/src/format-xml.c | 179 +++++++++++++++++++++++++++++++++
gettext-tools/src/format.c | 3 +-
gettext-tools/src/format.h | 1 +
gettext-tools/src/libexpat-compat.c | 13 +++
gettext-tools/src/libexpat-compat.h | 1 +
gettext-tools/src/message.c | 6 +-
gettext-tools/src/message.h | 5 +-
gettext-tools/src/xgettext.c | 6 ++
gettext-tools/tests/ChangeLog | 6 ++
gettext-tools/tests/Makefile.am | 1 +
gettext-tools/tests/format-xml-1 | 52 ++++++++++
gettext-tools/tests/xgettext-9 | 1 +
16 files changed, 300 insertions(+), 9 deletions(-)
create mode 100644 gettext-tools/src/format-xml.c
create mode 100644 gettext-tools/tests/format-xml-1
diff --git a/gettext-tools/libgettextpo/ChangeLog
b/gettext-tools/libgettextpo/ChangeLog
index c439706..0ee7dac 100644
--- a/gettext-tools/libgettextpo/ChangeLog
+++ b/gettext-tools/libgettextpo/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-15 Daiki Ueno <address@hidden>
+
+ * Makefile.am (libgettextpo_la_LDFLAGS): Add @address@hidden
+ (libgettextpo_la_AUXSOURCES): Add ../src/libexpat-compat.c.
+
2014-12-24 Daiki Ueno <address@hidden>
* gettext 0.19.4 released.
diff --git a/gettext-tools/libgettextpo/Makefile.am
b/gettext-tools/libgettextpo/Makefile.am
index b4c07f7..518a779 100644
--- a/gettext-tools/libgettextpo/Makefile.am
+++ b/gettext-tools/libgettextpo/Makefile.am
@@ -62,6 +62,7 @@ libgettextpo_la_AUXSOURCES = \
../src/read-catalog-abstract.c \
../src/read-catalog.c \
../src/plural-table.c \
+ ../src/libexpat-compat.c \
../src/format-c.c \
../src/format-sh.c \
../src/format-python.c \
@@ -87,6 +88,7 @@ libgettextpo_la_AUXSOURCES = \
../src/format-kde.c \
../src/format-boost.c \
../src/format-lua.c \
+ ../src/format-xml.c \
../src/format.c \
../src/plural-exp.c \
../src/plural-eval.c \
@@ -105,7 +107,7 @@ libgettextpo_la_LIBADD = libgnu.la $(WOE32_LIBADD)
$(LTLIBUNISTRING)
libgettextpo_la_LDFLAGS = \
-version-info $(LTV_CURRENT):$(LTV_REVISION):$(LTV_AGE) \
-rpath $(libdir) \
- @LTLIBINTL@ @LTLIBICONV@ -lc -no-undefined
+ @LTLIBINTL@ @LTLIBICONV@ @LTLIBEXPAT@ -lc -no-undefined
# Tell the mingw or Cygwin linker which symbols to export.
if WOE32DLL
diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog
index 0a4dbdb..5c126c8 100644
--- a/gettext-tools/src/ChangeLog
+++ b/gettext-tools/src/ChangeLog
@@ -1,3 +1,23 @@
+2015-01-15 Daiki Ueno <address@hidden>
+
+ format-xml: Add format string parser for XML
+ * libexpat-compat.h (XML_SetUserData): New declaration.
+ * libexpat-compat.c (p_XML_SetUserData): New variable.
+ (XML_SetUserData): New function.
+ (load_libexpat): Expose "XML_SetUserData".
+ * xgettext.c (flag_table_xml): New variable.
+ (xgettext_record_flag): Initialize flag_table_xml.
+ * message.h (enum format_type): New enumeration value format_xml.
+ (NFORMATS): Increase to 28.
+ * message.c (format_language): Add "xml".
+ (format_language_pretty): Add "XML".
+ * format.h (formatstring_xml): New declaration.
+ * format.c (formatstring_parsers): Register formatstring_xml.
+ * format-xml.c: New file.
+ * Makefile.am (FORMAT_SOURCE): Add format-xml.c.
+ (xgettext_SOURCES): Move libexpat-compat.c to...
+ (COMMON_SOURCE): ...here.
+
2015-01-13 Daiki Ueno <address@hidden>
* x-c.c (phase5_get): Reset raw_expected at the beginning of the
diff --git a/gettext-tools/src/Makefile.am b/gettext-tools/src/Makefile.am
index 9f2325f..d190ea9 100644
--- a/gettext-tools/src/Makefile.am
+++ b/gettext-tools/src/Makefile.am
@@ -106,7 +106,7 @@ CSHARPCOMPFLAGS = @CSHARPCOMPFLAGS@
COMMON_SOURCE = message.c po-error.c po-xerror.c \
read-catalog-abstract.c po-lex.c po-gram-gen.y po-charset.c \
read-po.c read-properties.c read-stringtable.c open-catalog.c \
-dir-list.c str-list.c
+dir-list.c str-list.c libexpat-compat.c
# xgettext and msgfmt deal with format strings.
if !WOE32DLL
@@ -140,7 +140,8 @@ FORMAT_SOURCE += \
format-kde.c \
format-boost.c \
format-lua.c \
- format-javascript.c
+ format-javascript.c \
+ format-xml.c
# libgettextsrc contains all code that is needed by at least two programs.
libgettextsrc_la_SOURCES = \
@@ -180,7 +181,6 @@ xgettext_SOURCES += \
x-c.c x-po.c x-sh.c x-python.c x-lisp.c x-elisp.c x-librep.c x-scheme.c \
x-smalltalk.c x-java.c x-csharp.c x-awk.c x-ycp.c x-tcl.c x-perl.c x-php.c \
x-rst.c x-glade.c x-lua.c x-javascript.c x-vala.c x-gsettings.c \
- libexpat-compat.c \
x-desktop.c
if !WOE32DLL
msgattrib_SOURCES = msgattrib.c
diff --git a/gettext-tools/src/format-xml.c b/gettext-tools/src/format-xml.c
new file mode 100644
index 0000000..03542a6
--- /dev/null
+++ b/gettext-tools/src/format-xml.c
@@ -0,0 +1,179 @@
+/* XML format strings.
+ Copyright (C) 2001-2004, 2006-2009, 2015 Free Software Foundation, Inc.
+ Written by Daiki Ueno.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "format.h"
+#include "c-ctype.h"
+#include "xalloc.h"
+#include "xvasprintf.h"
+#include "format-invalid.h"
+#include "gettext.h"
+#include "libexpat-compat.h"
+
+#define _(str) gettext (str)
+
+#if DYNLOAD_LIBEXPAT || HAVE_LIBEXPAT
+
+#define XML_FRAGMENT_NS "https://www.gnu.org/s/gettext/xml/fragment"
+
+struct element_state
+{
+ char *buffer;
+ size_t bufmax;
+ size_t buflen;
+};
+
+/* Callback called when <element> is seen. */
+static void
+start_element_handler (void *data, const char *name,
+ const char **attributes)
+{
+ struct element_state *p = data;
+ size_t namelen, taglen;
+
+ namelen = strlen (name);
+ taglen = namelen + 2;
+ if (p->buflen + taglen > p->bufmax)
+ {
+ p->bufmax = 2 * p->bufmax;
+ if (p->bufmax < p->buflen + taglen)
+ p->bufmax = p->buflen + taglen;
+ p->buffer = xrealloc (p->buffer, p->bufmax);
+ }
+ sprintf (p->buffer + p->buflen, "<%s>", name);
+ p->buflen += taglen;
+}
+
+/* Callback called when </element> is seen. */
+static void
+end_element_handler (void *data, const char *name)
+{
+ struct element_state *p = data;
+ size_t namelen, taglen;
+
+ namelen = strlen (name);
+ taglen = namelen + 3;
+ if (p->buflen + taglen > p->bufmax)
+ {
+ p->bufmax = 2 * p->bufmax;
+ if (p->bufmax < p->buflen + taglen)
+ p->bufmax = p->buflen + taglen;
+ p->buffer = xrealloc (p->buffer, p->bufmax);
+ }
+ sprintf (p->buffer + p->buflen, "</%s>", name);
+ p->buflen += taglen;
+}
+#endif
+
+static void *
+format_parse (const char *format, bool translated, char *fdi,
+ char **invalid_reason)
+{
+#if DYNLOAD_LIBEXPAT || HAVE_LIBEXPAT
+ if (LIBEXPAT_AVAILABLE ())
+ {
+ XML_Parser parser;
+ struct element_state data;
+ char *fragment;
+
+ parser = XML_ParserCreate (NULL);
+ if (parser == NULL)
+ {
+ *invalid_reason = xasprintf (_("memory exhausted"));
+ return NULL;
+ }
+
+ XML_SetElementHandler (parser, start_element_handler,
end_element_handler);
+
+ memset (&data, 0, sizeof (data));
+ XML_SetUserData (parser, &data);
+
+ fragment = xasprintf ("<?xml version='1.0' encoding='UTF-8'?>"
+ "<gt:fragment xmlns:gt='%s'>%s</gt:fragment>",
+ XML_FRAGMENT_NS, format);
+ if (XML_Parse (parser, fragment, strlen (fragment), 0) == 0)
+ {
+ *invalid_reason =
+ xasprintf (_("error while parsing: %s"),
+ XML_ErrorString (XML_GetErrorCode (parser)));
+ free (data.buffer);
+ free (fragment);
+ XML_ParserFree (parser);
+ return NULL;
+ }
+
+ if (XML_Parse (parser, NULL, 0, 1) == 0)
+ {
+ *invalid_reason =
+ xasprintf (_("error while parsing: %s"),
+ XML_ErrorString (XML_GetErrorCode (parser)));
+ free (data.buffer);
+ free (fragment);
+ XML_ParserFree (parser);
+ return NULL;
+ }
+
+ free (fragment);
+ XML_ParserFree (parser);
+ return data.buffer;
+ }
+#endif
+ return xstrdup ("");
+}
+
+static int
+format_get_number_of_directives (void *descr)
+{
+ return 0;
+}
+
+static bool
+format_check (void *msgid_descr, void *msgstr_descr, bool equality,
+ formatstring_error_logger_t error_logger,
+ const char *pretty_msgid, const char *pretty_msgstr)
+{
+ char *tree1 = msgid_descr;
+ char *tree2 = msgstr_descr;
+ bool err = false;
+
+ if (strcmp (tree1, tree2) != 0)
+ {
+ if (error_logger)
+ error_logger (_("incompatible XML tree structure '%s' and '%s'"),
+ pretty_msgid, pretty_msgstr);
+ err = true;
+ }
+
+ return err;
+}
+
+struct formatstring_parser formatstring_xml =
+{
+ format_parse,
+ free,
+ format_get_number_of_directives,
+ NULL,
+ format_check
+};
diff --git a/gettext-tools/src/format.c b/gettext-tools/src/format.c
index c73ad7d..14adf5d 100644
--- a/gettext-tools/src/format.c
+++ b/gettext-tools/src/format.c
@@ -60,7 +60,8 @@ struct formatstring_parser *formatstring_parsers[NFORMATS] =
/* format_kde */ &formatstring_kde,
/* format_boost */ &formatstring_boost,
/* format_lua */ &formatstring_lua,
- /* format_javascript */ &formatstring_javascript
+ /* format_javascript */ &formatstring_javascript,
+ /* format_xml */ &formatstring_xml
};
/* Check whether both formats strings contain compatible format
diff --git a/gettext-tools/src/format.h b/gettext-tools/src/format.h
index d92532d..f2e04f7 100644
--- a/gettext-tools/src/format.h
+++ b/gettext-tools/src/format.h
@@ -122,6 +122,7 @@ extern DLL_VARIABLE struct formatstring_parser
formatstring_kde;
extern DLL_VARIABLE struct formatstring_parser formatstring_boost;
extern DLL_VARIABLE struct formatstring_parser formatstring_lua;
extern DLL_VARIABLE struct formatstring_parser formatstring_javascript;
+extern DLL_VARIABLE struct formatstring_parser formatstring_xml;
/* Table of all format string parsers. */
extern DLL_VARIABLE struct formatstring_parser *formatstring_parsers[NFORMATS];
diff --git a/gettext-tools/src/libexpat-compat.c
b/gettext-tools/src/libexpat-compat.c
index ad680db..9176b1b 100644
--- a/gettext-tools/src/libexpat-compat.c
+++ b/gettext-tools/src/libexpat-compat.c
@@ -195,6 +195,16 @@ XML_SetCommentHandler (XML_Parser parser,
XML_CommentHandler handler)
}
+static void (*p_XML_SetUserData) (XML_Parser parser,
+ void *userData);
+
+void
+XML_SetUserData (XML_Parser parser, void *userData)
+{
+ (*p_XML_SetUserData) (parser, userData);
+}
+
+
static int (*p_XML_Parse) (XML_Parser parser, const char *s,
int len, int isFinal);
@@ -300,6 +310,9 @@ load_libexpat ()
&& (p_XML_SetCommentHandler =
(void (*) (XML_Parser, XML_CommentHandler))
dlsym (handle, "XML_SetCommentHandler")) != NULL
+ && (p_XML_SetUserData =
+ (void (*) (XML_Parser, void *))
+ dlsym (handle, "XML_SetUserData")) != NULL
&& (p_XML_Parse =
(int (*) (XML_Parser, const char *, int, int))
dlsym (handle, "XML_Parse")) != NULL
diff --git a/gettext-tools/src/libexpat-compat.h
b/gettext-tools/src/libexpat-compat.h
index 2ff6465..004f692 100644
--- a/gettext-tools/src/libexpat-compat.h
+++ b/gettext-tools/src/libexpat-compat.h
@@ -76,6 +76,7 @@ void XML_SetElementHandler (XML_Parser parser,
void XML_SetCharacterDataHandler (XML_Parser parser,
XML_CharacterDataHandler handler);
void XML_SetCommentHandler (XML_Parser parser, XML_CommentHandler handler);
+void XML_SetUserData (XML_Parser parser, void *userData);
int XML_Parse (XML_Parser parser, const char *s, int len, int isFinal);
enum XML_Error XML_GetErrorCode (XML_Parser parser);
int64_t XML_GetCurrentLineNumber (XML_Parser parser);
diff --git a/gettext-tools/src/message.c b/gettext-tools/src/message.c
index 586675f..c7680b3 100644
--- a/gettext-tools/src/message.c
+++ b/gettext-tools/src/message.c
@@ -60,7 +60,8 @@ const char *const format_language[NFORMATS] =
/* format_kde */ "kde",
/* format_boost */ "boost",
/* format_lua */ "lua",
- /* format_javascript */ "javascript"
+ /* format_javascript */ "javascript",
+ /* format_xml */ "xml"
};
const char *const format_language_pretty[NFORMATS] =
@@ -91,7 +92,8 @@ const char *const format_language_pretty[NFORMATS] =
/* format_kde */ "KDE",
/* format_boost */ "Boost",
/* format_lua */ "Lua",
- /* format_javascript */ "JavaScript"
+ /* format_javascript */ "JavaScript",
+ /* format_xml */ "XML"
};
diff --git a/gettext-tools/src/message.h b/gettext-tools/src/message.h
index bf2215a..ad631ad 100644
--- a/gettext-tools/src/message.h
+++ b/gettext-tools/src/message.h
@@ -69,9 +69,10 @@ enum format_type
format_kde,
format_boost,
format_lua,
- format_javascript
+ format_javascript,
+ format_xml
};
-#define NFORMATS 27 /* Number of format_type enum values. */
+#define NFORMATS 28 /* Number of format_type enum values. */
extern DLL_VARIABLE const char *const format_language[NFORMATS];
extern DLL_VARIABLE const char *const format_language_pretty[NFORMATS];
diff --git a/gettext-tools/src/xgettext.c b/gettext-tools/src/xgettext.c
index 28d28a0..1c91f36 100644
--- a/gettext-tools/src/xgettext.c
+++ b/gettext-tools/src/xgettext.c
@@ -169,6 +169,7 @@ static flag_context_list_table_ty flag_table_php;
static flag_context_list_table_ty flag_table_lua;
static flag_context_list_table_ty flag_table_javascript;
static flag_context_list_table_ty flag_table_vala;
+static flag_context_list_table_ty flag_table_xml;
/* If true, recognize Qt format strings. */
static bool recognize_format_qt;
@@ -1825,6 +1826,11 @@ xgettext_record_flag (const char *optionstring)
name_start, name_end,
argnum, value, pass);
break;
+ case format_xml:
+ flag_context_list_table_insert (&flag_table_xml, 0,
+ name_start, name_end,
+ argnum, value, pass);
+ break;
default:
abort ();
}
diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog
index f8cb454..83fdd27 100644
--- a/gettext-tools/tests/ChangeLog
+++ b/gettext-tools/tests/ChangeLog
@@ -1,3 +1,9 @@
+2015-01-15 Daiki Ueno <address@hidden>
+
+ * xgettext-9: Adjust PO output.
+ * format-xml-1: New file.
+ * Makefile.am (TESTS): Add new test.
+
2015-01-13 Daiki Ueno <address@hidden>
* xgettext-c-20: Improve test coverage of raw string tests.
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am
index 5a0d3c0..342969e 100644
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -135,6 +135,7 @@ TESTS = gettext-1 gettext-2 gettext-3 gettext-4 gettext-5
gettext-6 gettext-7 \
format-ycp-1 format-ycp-2 \
format-lua-1 format-lua-2 \
format-javascript-1 format-javascript-2 \
+ format-xml-1 \
plural-1 plural-2 \
gettextpo-1 \
lang-c lang-c++ lang-objc lang-sh lang-bash lang-python-1 \
diff --git a/gettext-tools/tests/format-xml-1 b/gettext-tools/tests/format-xml-1
new file mode 100644
index 0000000..7d02566
--- /dev/null
+++ b/gettext-tools/tests/format-xml-1
@@ -0,0 +1,52 @@
+#! /bin/sh
+. "${srcdir=.}/init.sh"; path_prepend_ . ../src
+
+# Test recognition of XML format strings.
+
+cat <<\EOF > f-x-1.data
+# Invalid: not wellformed
+msgid "0"
+msgstr "<"
+# Invalid: different tree
+msgid "<foo>0</foo>"
+msgstr "<foo><bar>0</bar></foo>"
+# Valid: only text has changed
+msgid "<foo>foo</foo>"
+msgstr "<foo>FOO!</foo>"
+EOF
+
+: ${MSGFMT=msgfmt}
+n=0
+while read comment; do
+ read msgid_line
+ read msgstr_line
+ n=`expr $n + 1`
+ cat <<EOF > f-x-1-$n.po
+#, xml-format
+${msgid_line}
+${msgstr_line}
+EOF
+ fail=
+ if echo "$comment" | grep 'Valid:' > /dev/null; then
+ if ${MSGFMT} --check-format -o f-x-1-$n.mo f-x-1-$n.po; then
+ :
+ else
+ fail=yes
+ fi
+ else
+ ${MSGFMT} --check-format -o f-x-1-$n.mo f-x-1-$n.po 2> /dev/null
+ if test $? = 1; then
+ :
+ else
+ fail=yes
+ fi
+ fi
+ if test -n "$fail"; then
+ echo "Format string checking error:" 1>&2
+ cat f-x-1-$n.po 1>&2
+ exit 1
+ fi
+ rm -f f-x-1-$n.po f-x-1-$n.mo
+done < f-x-1.data
+
+exit 0
diff --git a/gettext-tools/tests/xgettext-9 b/gettext-tools/tests/xgettext-9
index 9489be0..2329230 100755
--- a/gettext-tools/tests/xgettext-9
+++ b/gettext-tools/tests/xgettext-9
@@ -36,6 +36,7 @@ msgstr ""
#. xhtml-format
#. xml-format
#: xg-test9.c:5
+#, xml-format
msgid "seamew"
msgstr ""
--
2.1.0
- [bug-gettext] [PATCH] format-xml: Add format string parser for XML,
Daiki Ueno <=