bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xgettext and XML files - extracting strings


From: Shaun McCance
Subject: Re: xgettext and XML files - extracting strings
Date: Fri, 05 May 2023 16:43:47 -0400
User-agent: Evolution 3.46.4 (3.46.4-1.fc37)

On Sat, 2023-04-29 at 10:14 -0300, Jamenson Espindula wrote:
> Em qua., 26 de abr. de 2023 às 08:24, Bruno Haible <bruno@clisp.org>
> escreveu:
> > 
> > Jamenson Espindula wrote:
> > >     45     <para>This book is licensed under a <xref
> > > linkend="CC"/>.</para>
> > >     46     <para>Computer instructions may be extracted from the
> > > book under the
> > >     47           <xref linkend="MIT"/>.</para>
> > >     48
> > >     49     <para><trademark class='registered'>Linux</trademark>
> > > is a registered trademark of
> > >     50     Linus Torvalds.</para>
> > >     51
> > > 
> > >  = = = End XML file = = =
> > > 
> > > I tried to extract the strings. The POT file produced is:
> > > 
> > >  = = = Begin "bookinfo.pot" file = = =
> > > ...
> > > 
> > > #: /home/jamenson/repos/upstream/lfs-git/prologue/bookinfo.xml:49
> > > msgid "Linux"
> > > msgstr ""
> > > 
> > >  = = = End "bookinfo.pot" file = = =
> > > 
> > > As you can see, lines 45 and 46 were skipped.
> > 
> > Also, from line 49, only the <trademark> element was extracted, not
> > the
> > <para> element.
> > 
> > It seems to me like XML elements with nested XML sub-elements are
> > not
> > supported.
> > 
> > If you want support for these, I would look at other tools that
> > produce
> > POT files, not xgettext directly. For DocBook, there is the 'poxml'
> > package
> > https://invent.kde.org/sdk/poxml .
> > 
> > Bruno
> > 
> > 
> > 
> Thank you for your response.
> 
> In addition to the package suggested by you, there is the "itstool",
> which is a script written in Perl. Maybe, there are other packages
> out
> too.

Hi there. itstool maintainer here. It's Python actually. :)

> But, my point of view is: none of them does not make any of the real
> work of string extraction. The effective, real, work of string
> extraction is performed by the binary executable, that is, the
> "xgettext". So, I think I should firstly learn how the "xgettext"
> works before deciding to ease the work. Do you agree?

itstool doesn't actually use xgettext for string extraction. It has its
own algorithms, based on the W3C Internationalization Tag Set (ITS),
that it uses to determine what constitutes a translation unit. It uses
that to create a POT file. You then use gettext and whatever other
tools you like to handle translations and generate an MO file. itstool
then handles creating translated XML files form source XML file and the
MO file.

We wrote itstool before gettext had real XML support. These days,
gettext handles XML well, with support for certain ITS data categories.
For most data-oriented XML formats, gettext does a great job, and
there's no need to introduce another tool. But for document-oriented
formats like Mallard, DocBook, and DITA, itstool has some tricks that
make document translation much nicer.

--
Shaun





reply via email to

[Prev in Thread] Current Thread [Next in Thread]