[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algo
From: |
Pádraig Brady |
Subject: |
Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm |
Date: |
Thu, 29 Jul 2010 10:05:56 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 |
On 28/07/10 22:32, Bruno Haible wrote:
> Pádraig Brady wrote:
>> I would suggest a new function due to the
>> way I see this function called most often.
>>
>> /* definitely not sure of this name */
>> uint8_t *
>> u8_str_u8_chr (const uint8_t *s, const uint8_t *c, size_t size)
>> {
>> switch (size):
>> {
>> case 1:
>> return (uint8_t *) strchr ((const char *) s, *c);
>> case 2:
>> //use logic from current u8_strchr()
>> case 3:
>> ...
>> case 4:
>> ...
>> }
>> }
>> ...
>> while ((f=u8_str_u8_chr (s, "–", 3));
>
> Such an API does not appear very robust to me: it is quite easy to
> mistakenly pass a string consisting of more or less than 1 character as
> second argument. If the argument to be searched for is given as an
> UTF-8 string rather than as an ucs4_t
It's not that confusing to me, but fair enough.
> I would better recommend to use
> the u8_strstr function.
I wonder could we speed that up for UTF-8
by just deferring to strstr() ?
I've not tested this so feel free to bin it.
cheers,
Pádraig.
commit 8b154a3421de21254e628085ccf22ce736947635
Author: Pádraig Brady <address@hidden>
Date: Thu Jul 29 08:16:20 2010 +0100
unistr/u8-strstr: simplify and probably speedup the UTF-8 case
* lib/unistr/u-strstr.h (UTF8_MODE): A new define so we can
do a compile time check for code to use for the UTF-8 case.
* lib/unistr/u8-strstr.c (u8_strstr): Use strstr() for UTF-8 and
needles bigger than 1 byte as it's simpler and probably faster.
Also add a comment about when using u8_strchr() may be faster.
* modules/unistr/u8-strstr: Depend on strstr-simple so that we don't
access out of bounds memory on glibc-2.10 on 64 bit platforms.
diff --git a/ChangeLog b/ChangeLog
index 897387c..d3f8ccc 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2010-07-29 Pádraig Brady <address@hidden>
+
+ * lib/unistr/u8-strstr.c (u8_strstr): Use strstr() as it's probably
+ faster.
+
2010-07-26 Paul R. Eggert <address@hidden>
timespec: use cast and not conditional, as truncation isn't possible
diff --git a/lib/unistr/u-strstr.h b/lib/unistr/u-strstr.h
index df32be8..9fb64cd 100644
--- a/lib/unistr/u-strstr.h
+++ b/lib/unistr/u-strstr.h
@@ -28,6 +28,13 @@ FUNC (const UNIT *haystack, const UNIT *needle)
if (needle[1] == 0)
return U_STRCHR (haystack, first);
+#if UTF8_MODE
+ /* Optimize/simplify the UTF-8 case.
+ Note to users of u8_strstr(), if passing a single multibyte character
+ as a needle, then it may be faster to convert the needle to ucs4_t
+ and use u8_strchr(), for longer haystacks. */
+ return (uint8_t *) strstr ((const char *) haystack, (const char *) needle);
+#else
/* Search for needle's first unit. */
for (; *haystack != 0; haystack++)
if (*haystack == first)
@@ -44,6 +51,7 @@ FUNC (const UNIT *haystack, const UNIT *needle)
return (UNIT *) haystack;
}
}
+#endif
return NULL;
}
diff --git a/lib/unistr/u8-strstr.c b/lib/unistr/u8-strstr.c
index cce37ad..37f2aa4 100644
--- a/lib/unistr/u8-strstr.c
+++ b/lib/unistr/u8-strstr.c
@@ -20,9 +20,12 @@
/* Specification. */
#include "unistr.h"
+#include <string.h>
+
/* FIXME: Maybe walking the string via u8_mblen is a win? */
#define FUNC u8_strstr
#define UNIT uint8_t
#define U_STRCHR u8_strchr
+#define UTF8_MODE 1
#include "u-strstr.h"
diff --git a/modules/unistr/u8-strstr b/modules/unistr/u8-strstr
index 5996917..2531ec1 100644
--- a/modules/unistr/u8-strstr
+++ b/modules/unistr/u8-strstr
@@ -7,6 +7,7 @@ lib/unistr/u-strstr.h
Depends-on:
unistr/base
+strstr-simple
configure.ac:
gl_LIBUNISTRING_MODULE([0.9], [unistr/u8-strstr])
- [PATCH v2 4/5] unistr/u*-strchr: add tests, (continued)
- [PATCH v2 4/5] unistr/u*-strchr: add tests, bonzini, 2010/07/23
- [PATCH v2 5/5] unistr/u8-chr, unistr/u8-strchr: use Boyer-Moore like algorithm., bonzini, 2010/07/23
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/23
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/23
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/27
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/27
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/27
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/27
- Re: ucs4_t type, Bruno Haible, 2010/07/28
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/28
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm,
Pádraig Brady <=
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Message not available
- Message not available
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/31
- guarantees of u8_mbtouc/u8_strmbtouc (was Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm), Paolo Bonzini, 2010/07/29
- Re: guarantees of u8_mbtouc/u8_strmbtouc, Bruno Haible, 2010/07/31
- Re: guarantees of u8_mbtouc/u8_strmbtouc, Paolo Bonzini, 2010/07/31