[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some broken UTF-8 sequence causes bash to infinite loop.

From: Morita Sho
Subject: Some broken UTF-8 sequence causes bash to infinite loop.
Date: Wed, 14 Nov 2007 20:11:27 +0900
User-agent: Mozilla-Thunderbird (X11/20071009)

Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -g -O2 uname output: Linux debian 2.6.22-3-k7 #1 SMP Mon Oct 22 22:51:54 UTC 2007 i686 GNU/Linux
Machine Type: i686-pc-linux-gnu

Bash Version: 3.2
Patch Level: 0
Release Status: release


When I `cd` to directory that contains broken UTF-8 sequence, bash stops to working and eat 100% cpu resource.
Ctrl-C doesn't work, so I need to SIGKILL to stop bash.
I have confirmed this problem will occur in official bash-3.2 and Debian's bash-3.1dfsg-8.

After some tests, source code review, and tracing using gdb, I found how to reproduce this problem. To reproduce this, run bash with ja_JP.UTF-8 locale (probably other locales where MB_CUR_MAX is larger than 2),
and put broken UTF-8 sequence to PS1 environment variable:
  $ LANG=ja_JP.UTF-8 bash
  $ PS1="\202\314\217\244\220l"
Unfortunately, this will NOT always reproduces the problem.
It will sometimes stops bash, sometimes not.

I have tried to find a bug, and I think the function
_rl_find_next_mbchar_internal in lib/readline/mbutil.c seems to have small bug.

Here is _rl_find_next_mbchar_internal function:
  size_t tmp;
      tmp = mbrtowc (&wc, string + point, strlen (string + point), &ps);
      while (tmp > 0 && wcwidth (wc) == 0)
          point += tmp;

* tmp is declared as size_t. (Note: size_t is unsigned type.)
* The return value of mbrtowc will put to tmp. (Note: mbrtowc returns -1 or -2 on error.)
* Since tmp is unsigned type, -1 will be treated as 0xFFFFFFFF.
  So that tmp > 0 will be true even if tmp is -1 or -2.
* When mbrtowc returns error, the value of wc will not be valid,
  so the return value of wcwidth(wc) will be undefined.
  It can causes bash to infinite loop.
If wcwidth(wc) returns non 0, while {} block will be skipped, and problem will not occur. But if wcwidth(wc) returns 0, since tmp is -1(0xFFFFFFFF), point += tmp will decreased point variable.

I think the condition "tmp > 0" is not correct to detect errors.
It should be replaced with like this:
      while (!(MB_NULLWCH (tmp) || MB_INVALIDCH (tmp)) && wcwidth (wc) == 0)

With this change, bash doesn't stop anymore when I set invalid UTF-8 sequence to PS1.

I suggest following patch to fix this problem.
--- bash-3.2/lib/readline/mbutil.c.orig 2007-11-14 06:09:23.000000000 +0900
+++ bash-3.2/lib/readline/mbutil.c      2007-11-14 06:08:29.000000000 +0900
@@ -128,12 +128,10 @@
   if (find_non_zero)
       tmp = mbrtowc (&wc, string + point, strlen (string + point), &ps);
-      while (tmp > 0 && wcwidth (wc) == 0)
+      while (!(MB_NULLWCH (tmp) || MB_INVALIDCH (tmp)) && wcwidth (wc) == 0)
          point += tmp;
          tmp = mbrtowc (&wc, string + point, strlen (string + point), &ps);
-         if (MB_NULLWCH (tmp) || MB_INVALIDCH (tmp))
-           break;

Morita Sho <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]