bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speed up u8_strstr


From: Bruno Haible
Subject: Re: speed up u8_strstr
Date: Sat, 31 Jul 2010 22:10:11 +0200
User-agent: KMail/1.9.9

Paolo Bonzini wrote:
> If there is one, use the faster u8_strchr algorithm (you can 
> just use u8_strchr, even though that does a useless conversion back to 
> UTF-8).

Nice suggestion. Implemented as follows:


2010-07-31  Bruno Haible  <address@hidden>

        unistr/u8-strstr, unistr/u16-strstr: Optimize the one-character case.
        * lib/unistr/u-strstr.h (FUNC): When the needle contains only one
        character, perform the search using U_STRCHR.
        * lib/unistr/u8-strstr.c (U_STRMBTOUC): New macro.
        * lib/unistr/u16-strstr.c (U_STRMBTOUC): Likewise.
        * modules/unistr/u8-strstr (Depends-on): Add unistr/u8-strmbtouc.
        * modules/unistr/u16-strstr (Depends-on): Add unistr/u16-strmbtouc.
        Suggested by Paolo Bonzini.

--- lib/unistr/u-strstr.h.orig  Sat Jul 31 22:05:02 2010
+++ lib/unistr/u-strstr.h       Sat Jul 31 21:59:37 2010
@@ -1,5 +1,5 @@
 /* Substring test for UTF-8/UTF-16/UTF-32 strings.
-   Copyright (C) 1999, 2002, 2006, 2009-2010 Free Software Foundation, Inc.
+   Copyright (C) 1999, 2002, 2006, 2010 Free Software Foundation, Inc.
    Written by Bruno Haible <address@hidden>, 2002.
 
    This program is free software: you can redistribute it and/or modify it
@@ -24,10 +24,20 @@
   if (first == 0)
     return (UNIT *) haystack;
 
-  /* Is needle nearly empty?  */
+  /* Is needle nearly empty (only one unit)?  */
   if (needle[1] == 0)
     return U_STRCHR (haystack, first);
 
+#ifdef U_STRMBTOUC
+  /* Is needle nearly empty (only one character)?  */
+  {
+    ucs4_t first_uc;
+    int count = U_STRMBTOUC (&first_uc, needle);
+    if (count > 0 && needle[count] == 0)
+      return U_STRCHR (haystack, first_uc);
+  }
+#endif
+
   /* Search for needle's first unit.  */
   for (; *haystack != 0; haystack++)
     if (*haystack == first)
--- lib/unistr/u16-strstr.c.orig        Sat Jul 31 22:05:02 2010
+++ lib/unistr/u16-strstr.c     Sat Jul 31 21:56:25 2010
@@ -1,5 +1,5 @@
 /* Substring test for UTF-16 strings.
-   Copyright (C) 1999, 2002, 2006, 2009-2010 Free Software Foundation, Inc.
+   Copyright (C) 1999, 2002, 2006, 2010 Free Software Foundation, Inc.
    Written by Bruno Haible <address@hidden>, 2002.
 
    This program is free software: you can redistribute it and/or modify it
@@ -25,4 +25,5 @@
 #define FUNC u16_strstr
 #define UNIT uint16_t
 #define U_STRCHR u16_strchr
+#define U_STRMBTOUC u16_strmbtouc
 #include "u-strstr.h"
--- lib/unistr/u8-strstr.c.orig Sat Jul 31 22:05:02 2010
+++ lib/unistr/u8-strstr.c      Sat Jul 31 21:56:25 2010
@@ -1,5 +1,5 @@
 /* Substring test for UTF-8 strings.
-   Copyright (C) 1999, 2002, 2006, 2009-2010 Free Software Foundation, Inc.
+   Copyright (C) 1999, 2002, 2006, 2010 Free Software Foundation, Inc.
    Written by Bruno Haible <address@hidden>, 2002.
 
    This program is free software: you can redistribute it and/or modify it
@@ -25,4 +25,5 @@
 #define FUNC u8_strstr
 #define UNIT uint8_t
 #define U_STRCHR u8_strchr
+#define U_STRMBTOUC u8_strmbtouc
 #include "u-strstr.h"
--- modules/unistr/u16-strstr.orig      Sat Jul 31 22:05:02 2010
+++ modules/unistr/u16-strstr   Sat Jul 31 22:04:36 2010
@@ -8,6 +8,7 @@
 Depends-on:
 unistr/base
 unistr/u16-strchr
+unistr/u16-strmbtouc
 
 configure.ac:
 gl_LIBUNISTRING_MODULE([0.9], [unistr/u16-strstr])
--- modules/unistr/u8-strstr.orig       Sat Jul 31 22:05:02 2010
+++ modules/unistr/u8-strstr    Sat Jul 31 22:04:43 2010
@@ -8,6 +8,7 @@
 Depends-on:
 unistr/base
 unistr/u8-strchr
+unistr/u8-strmbtouc
 
 configure.ac:
 gl_LIBUNISTRING_MODULE([0.9], [unistr/u8-strstr])



reply via email to

[Prev in Thread] Current Thread [Next in Thread]