Re: case-mapping part of a Unicode string

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-mapping part of a Unicode string

From:	Bruno Haible
Subject:	Re: case-mapping part of a Unicode string
Date:	Wed, 1 Jul 2009 02:04:22 +0200
User-agent:	KMail/1.9.9

Paolo Bonzini wrote:
> Regarding this part, how could I use the incremental context 
> computation?

The most probable use-cases of these incremental functions are
  - when your data structure is not a single string, but a
    concatenation of strings,
  - when you cache many casing_prefix_context_t and casing_suffix_context_t
    objects at various points of your string, and make insertions or
    deletions in the string here and there (like a word processor) and
    don't want to recompute all of the contexts.

> I can imagine how to use u8_casing_prefixes_context, but  
> not the same for suffixes:
> 
>    forgot_the_type_name pre = u8_casing_prefix_context (s, pointer3 - s);
>    forgot_the_type_name suf = u8_casing_suffix_context (pointer4,
>                                                      s_end - pointer4);
>    size_t uppercased_part_3_4_len;
>    uint8_t *uppercased_part_3_4 =
>      u8_ct_toupper (pointer3, pointer4 - pointer3, pre, suf,
>                     iso639_language, NULL, NULL,
>                  &uppercased_part_3_4_len);
> 
>    pre = u8_casing_prefixes_context (pointer4, pointer6 - pointer4, pre);
>    suf = u8_casing_suffix_context (pointer8, s_end - pointer8, suf);
>    size_t uppercased_part_6_8_len;
>    uint8_t *uppercased_part_6_8 =
>      u8_ct_toupper (pointer6, pointer8 - pointer6, pre, suf,
>                     iso639_language, NULL, NULL,
>                  &uppercased_part_6_8_len);

Yes, this is it, exactly. I was a bit lazy by assuming that uppercasing
the first substring will not change the uppercasing result of the second
substring. THis is likely, but I cannot prove it. If you want to do it
totally right, the code you gave does it.

> Is this intentional? Does u8_casing_suffixes_context make sense only  
> when scanning backwards?

Exactly.

> If so, you should add something like: "u8_casing_prefixes_context is 
> convenient if you are scanning a string forwards, while 
> u8_casing_suffixes_context does the same function when scanning 
> backwards.  For the suffix context on forward scans you should just pass 
> the entire remaining part of the string u8_casing_suffix_context, and 
> likewise for the prefix context on backward scans."

Your understanding is right. But there are many more ways of handling
strings than by "scanning" it forwards or backwards.

Btw, I've now reduced the computation time of these contexts by a constant
factor.

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

case-mapping part of a Unicode string, Bruno Haible, 2009/06/29
- Re: case-mapping part of a Unicode string, Paolo Bonzini, 2009/06/30
  - Re: case-mapping part of a Unicode string, Bruno Haible, 2009/06/30
    - Re: case-mapping part of a Unicode string, Paolo Bonzini, 2009/06/30
    - Re: [bug-libunistring] Re: case-mapping part of a Unicode string, Pádraig Brady, 2009/06/30
    - Re: case-mapping part of a Unicode string, Bruno Haible <=

Prev by Date: Re: [bug-libunistring] Re: case-mapping part of a Unicode string
Next by Date: Re: git-merge-changelog question
Previous by thread: Re: [bug-libunistring] Re: case-mapping part of a Unicode string
Next by thread: git-merge-changelog question
Index(es):
- Date
- Thread