bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] implement --enable-encoding for UTF-8 info files


From: Eli Zaretskii
Subject: Re: [PATCH] implement --enable-encoding for UTF-8 info files
Date: Sat, 06 Oct 2007 14:49:09 +0200

> From: Bruno Haible <address@hidden>
> Date: Sat, 6 Oct 2007 13:37:24 +0200
> Cc: address@hidden
> 
> Eli Zaretskii wrote:
> > > !       for (i = 0; i < sizeof (unicode_map) / sizeof (unicode_map[0]); 
> > > i++)
> > > !         if (strcmp (html, unicode_map[i].html) == 0)
> > > !           return unicode_map[i].unicode;
> > 
> > unicode_map[] has over 200 entries.  I think linear search is not
> > really appropriate for such a long list.
> 
> Here is a revised patch, using binary search.

Thanks!

> If even binary search is not fast enough, one can also use gperf for
> maximal speed lookup.

No, I think binary search is okay for a list like this.

> + /* List of HTML entities.  */
> + static struct { const char *html; unsigned int unicode; } unicode_map[] = {
> + /* Extracted from http://www.w3.org/TR/html401/sgml/entities.html through
> +    sed -n -e 's|<!ENTITY \([^ ][^ ]*\) *CDATA "[&]#\([0-9][0-9]*\);".*|  { 
> "\1", \2 },|p'

I get an empty output when I run this Sed command on entities.html
downloaded with wget.  I think that's because the downloaded file uses
"&lt;" and "&amp;" instead of literal "<" and "&", respectively, that
you seem to have in your copy of the file.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]