[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] implement --enable-encoding for UTF-8 info files
From: |
Eli Zaretskii |
Subject: |
Re: [PATCH] implement --enable-encoding for UTF-8 info files |
Date: |
Sat, 06 Oct 2007 14:49:09 +0200 |
> From: Bruno Haible <address@hidden>
> Date: Sat, 6 Oct 2007 13:37:24 +0200
> Cc: address@hidden
>
> Eli Zaretskii wrote:
> > > ! for (i = 0; i < sizeof (unicode_map) / sizeof (unicode_map[0]);
> > > i++)
> > > ! if (strcmp (html, unicode_map[i].html) == 0)
> > > ! return unicode_map[i].unicode;
> >
> > unicode_map[] has over 200 entries. I think linear search is not
> > really appropriate for such a long list.
>
> Here is a revised patch, using binary search.
Thanks!
> If even binary search is not fast enough, one can also use gperf for
> maximal speed lookup.
No, I think binary search is okay for a list like this.
> + /* List of HTML entities. */
> + static struct { const char *html; unsigned int unicode; } unicode_map[] = {
> + /* Extracted from http://www.w3.org/TR/html401/sgml/entities.html through
> + sed -n -e 's|<!ENTITY \([^ ][^ ]*\) *CDATA "[&]#\([0-9][0-9]*\);".*| {
> "\1", \2 },|p'
I get an empty output when I run this Sed command on entities.html
downloaded with wget. I think that's because the downloaded file uses
"<" and "&" instead of literal "<" and "&", respectively, that
you seem to have in your copy of the file.