bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-mapping part of a Unicode string


From: Bruno Haible
Subject: Re: case-mapping part of a Unicode string
Date: Tue, 30 Jun 2009 10:36:14 +0200
User-agent: KMail/1.9.9

Paolo Bonzini wrote:
> Can you show an example of case converting the 3rd, 6th and 7th 
> character of a string?

First, you need the indices (in terms of units) of the 3rd, 4th, 6th, 8th
character:

  const uint8_t *s = ...;
  const uint8_t *s_end = s + u8_strlen (s) + 1;

  const uint8_t *p = s;
  int i;
  ucs4_t dummy;
  for (i = 0; i < 3; i++)
    p = u8_next (&dummy, p);
  const uint8_t *pointer3 = p;
  for (i = 3; i < 4; i++)
    p = u8_next (&dummy, p);
  const uint8_t *pointer4 = p;
  for (i = 4; i < 6; i++)
    p = u8_next (&dummy, p);
  const uint8_t *pointer6 = p;
  for (i = 6; i < 8; i++)
    p = u8_next (&dummy, p);
  const uint8_t *pointer8 = p;

Then you convert the two substrings:

  size_t uppercased_part_3_4_len;
  uint8_t *uppercased_part_3_4 =
    u8_ct_toupper (pointer3, pointer4 - pointer3,
                   u8_casing_prefix_context (s, pointer3 - s),
                   u8_casing_suffix_context (pointer4, s_end - pointer4),
                   iso639_language, NULL, NULL, &uppercased_part_3_4_len);
  size_t uppercased_part_6_8_len;
  uint8_t *uppercased_part_6_8 =
    u8_ct_toupper (pointer6, pointer8 - pointer6,
                   u8_casing_prefix_context (s, pointer6 - s),
                   u8_casing_suffix_context (pointer8, s_end - pointer6),
                   iso639_language, NULL, NULL, &uppercased_part_6_8_len);

Then you can glue the pieces together:

  size_t total = (pointer3 - s) + uppercased_part_3_4_len + (pointer6 - 
pointer4)
                 + uppercased_part_6_8_len + (s_end - pointer8);
  uint8_t *result = (uint8_t *) xmalloc (total * sizeof (uint8_t));
  uint8_t *q = result;
  u8_cpy (q, s, pointer3 - s); q += pointer3 - s;
  u8_cpy (q, uppercased_part_3_4, uppercased_part_3_4_len); q += 
uppercased_part_3_4_len;
  u8_cpy (q, pointer4, pointer6 - pointer4); q += pointer6 - pointer4;
  u8_cpy (q, uppercased_part_6_8, uppercased_part_6_8_len); q += 
uppercased_part_6_8_len;
  u8_cpy (q, pointer8, s_end - pointer8);

  free (uppercased_part_3_4);
  free (uppercased_part_6_8);

And finally use u8_normalize to normalize the result to NFC (optional, but
recommended):

  size_t normalized_len;
  uint8_t *normalized =
    u8_normalize (UNINORM_NFC, result, total, NULL, &normalized_len);
  free (result);
  return normalized;

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]