[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: case-mapping part of a Unicode string
From: |
Bruno Haible |
Subject: |
Re: case-mapping part of a Unicode string |
Date: |
Tue, 30 Jun 2009 10:36:14 +0200 |
User-agent: |
KMail/1.9.9 |
Paolo Bonzini wrote:
> Can you show an example of case converting the 3rd, 6th and 7th
> character of a string?
First, you need the indices (in terms of units) of the 3rd, 4th, 6th, 8th
character:
const uint8_t *s = ...;
const uint8_t *s_end = s + u8_strlen (s) + 1;
const uint8_t *p = s;
int i;
ucs4_t dummy;
for (i = 0; i < 3; i++)
p = u8_next (&dummy, p);
const uint8_t *pointer3 = p;
for (i = 3; i < 4; i++)
p = u8_next (&dummy, p);
const uint8_t *pointer4 = p;
for (i = 4; i < 6; i++)
p = u8_next (&dummy, p);
const uint8_t *pointer6 = p;
for (i = 6; i < 8; i++)
p = u8_next (&dummy, p);
const uint8_t *pointer8 = p;
Then you convert the two substrings:
size_t uppercased_part_3_4_len;
uint8_t *uppercased_part_3_4 =
u8_ct_toupper (pointer3, pointer4 - pointer3,
u8_casing_prefix_context (s, pointer3 - s),
u8_casing_suffix_context (pointer4, s_end - pointer4),
iso639_language, NULL, NULL, &uppercased_part_3_4_len);
size_t uppercased_part_6_8_len;
uint8_t *uppercased_part_6_8 =
u8_ct_toupper (pointer6, pointer8 - pointer6,
u8_casing_prefix_context (s, pointer6 - s),
u8_casing_suffix_context (pointer8, s_end - pointer6),
iso639_language, NULL, NULL, &uppercased_part_6_8_len);
Then you can glue the pieces together:
size_t total = (pointer3 - s) + uppercased_part_3_4_len + (pointer6 -
pointer4)
+ uppercased_part_6_8_len + (s_end - pointer8);
uint8_t *result = (uint8_t *) xmalloc (total * sizeof (uint8_t));
uint8_t *q = result;
u8_cpy (q, s, pointer3 - s); q += pointer3 - s;
u8_cpy (q, uppercased_part_3_4, uppercased_part_3_4_len); q +=
uppercased_part_3_4_len;
u8_cpy (q, pointer4, pointer6 - pointer4); q += pointer6 - pointer4;
u8_cpy (q, uppercased_part_6_8, uppercased_part_6_8_len); q +=
uppercased_part_6_8_len;
u8_cpy (q, pointer8, s_end - pointer8);
free (uppercased_part_3_4);
free (uppercased_part_6_8);
And finally use u8_normalize to normalize the result to NFC (optional, but
recommended):
size_t normalized_len;
uint8_t *normalized =
u8_normalize (UNINORM_NFC, result, total, NULL, &normalized_len);
free (result);
return normalized;
Bruno