[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algo
From: |
Pádraig Brady |
Subject: |
Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm |
Date: |
Thu, 29 Jul 2010 11:40:28 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 |
On 29/07/10 10:26, Paolo Bonzini wrote:
> On 07/29/2010 11:05 AM, Pádraig Brady wrote:
>>> I would better recommend to use
>>> the u8_strstr function.
>>
>> I wonder could we speed that up for UTF-8
>> by just deferring to strstr() ?
>> I've not tested this so feel free to bin it.
>
> An alternative idea: check if there is more than 1 character using
> u8_mbtouc. If there is one, use the faster u8_strchr algorithm (you can
> just use u8_strchr, even though that does a useless conversion back to
> UTF-8). If there is more than one, use strstr directly.
I thought of that but thought it would be too much overhead for the
general case since it seemed like it would need strlen, u8_mbsnlen & u8_mbtouc,
with conversions back and forth.
> And this gives another opportunity: we can now define u8_strstr and
> u16_strstr to always return a pointer to a valid UTF-8 sequence:
>
> /* Find the first occurrence of NEEDLE in HAYSTACK. For UTF-8 and
> UTF-16, the returned value will never point in the middle of a
> multibyte sequence or surrogate pair. If this cannot be satisfied,
> for example due to an invalid NEEDLE, NULL is returned. */
>
> Why? Because the only cases when you can get a false positive are: 1)
> when you have illegal sequences at the beginning of the needle, which
> now we could easily check; 2) when both the needle and haystack have the
> same illegal UTF-8 in it, which I think could be ignored.
>
> For u16_strstr you need to look for an invalid surrogate character at
> the beginning of the needle.
>
> Thoughts? Would anybody give a shot to the above?
But with those benefits it might be worth it?
cheers,
Pádraig.
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, (continued)
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/23
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/23
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/27
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/27
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/27
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/27
- Re: ucs4_t type, Bruno Haible, 2010/07/28
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/28
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm,
Pádraig Brady <=
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Message not available
- Message not available
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/31
- guarantees of u8_mbtouc/u8_strmbtouc (was Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm), Paolo Bonzini, 2010/07/29
- Re: guarantees of u8_mbtouc/u8_strmbtouc, Bruno Haible, 2010/07/31
- Re: guarantees of u8_mbtouc/u8_strmbtouc, Paolo Bonzini, 2010/07/31
- Re: guarantees of u8_mbtouc/u8_strmbtouc, Bruno Haible, 2010/07/31
- Message not available
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29