bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnulib] new module proposal: strip


From: Bruno Haible
Subject: Re: [bug-gnulib] new module proposal: strip
Date: Mon, 4 Sep 2006 14:21:18 +0200
User-agent: KMail/1.9.1

Hello Davide,

The first loop looks fine, safe for multibyte locales. But in the second loop:

> > In multibyte strings you cannot "go backwards". You have to write the
> > algorithm in a way that progresses from the first to the last multibyte
> > character. (*) In this case, you can do so by moving from first to last,
> > memorizing the position of the last non-whitespace character. More
> > precisely, a pointer pointing after this character. When you have
> > reached the end of the string, you put a '\0' where the memoized pointer
> > points to, and are done.
> Done. Thanks for the suggestions.

Well, that's not what I meant. By doing "x--" you are still stepping backwards
byte after byte. You can't safely do that in a multibyte string. Also the
total running times of the strlen calls now sums up to O(n^2) worst-case.
What I meant is something like

   char *last_non_space = d;
   for (mbi_init(i, d, strlen(d)); mbi_avail(i); mbi_advance(i)) {
     ...
   }
   *last_non_space = '\0';

Also, some fallback code should be provided for systems without multibyte
string functions. This fallback code is generally more efficient than the
multibyte code, but only applicable when MB_CUR_MAX == 1. Therefore in
other files (see strstr.c etc.) we generally use this template:

do_something(...)
{
  #if HAVE_MBRTOWC
  if (MB_CUR_MAX > 1)
    {
      ... here comes the multibyte (<wctype.h>) variant ...
    }
  else
  #endif
    {
      ... here comes the unibyte (<ctype.h>) variant ...
    }
}

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]