bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

regex_internal: uninitialized memory access (long)


From: Assaf Gordon
Subject: regex_internal: uninitialized memory access (long)
Date: Mon, 13 Aug 2018 15:51:45 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Hello all,

I suspect there is an uninitialized memory access deep inside
regex_internal.c under very particular circumstances.

This was first reported by "project-repo <address@hidden>"
as part of his fuzzing efforts, here:
https://lists.gnu.org/r/sed-devel/2018-08/msg00017.html

I've been able to pinpoint the cause, but I'm still learning
the code so can't suggest a fix yet.

The offending combination:
1. UTF-8 locale
2. case insensitive regex (REG_ICASE)
3. Using gnulib's regex even if on a glibc system
   ( --with-included-regex )
4. _REGEX_LARGE_OFFSETS=1 in config.h, causing "regoff_t" to be
   ssize_t instead of int.
5. regex containing a valid multibyte character whose
   uppercase is difference, resulting in "re_string_t->offsets_needed=1"
6. regex containing backslash-NUL.

Then the problem is:
1. build_wcs_upper_buffer() allocates the 'offsets' member
   of a "re_string_t" but does not initialize all elements.
2. re_string_peek_byte_case() accesses an uninitialized element.


Steps to reproduce it reliably are below.
"--with-included-regex" is needed to force using gnulib,
and to force _REGEX_LARGE_OFFSETS to be 1 (bug does not happen without
it).
The tiny patch just adds memory initialization to 0xBC to ensure
the mempory access triggers a segfault.

  git clone git://git.sv.gnu.org/sed.git
  cd sed
  ./bootstrap
  patch -p1 < regex-int-add-memset.patch
  ./configure --with-included-regex CFLAGS="-O0 -g"
  make
  printf "/\xe1\xbe\xbe\x5c\x00/I" > 1.sed
  sed/sed -f 1.sed < /dev/null


With valgrind:
====
==29631== Memcheck, a memory error detector
==29631== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==29631== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==29631== Command: sed/sed -f 1.sed
==29631==
==29631== Invalid read of size 1
==29631==    at 0x122403: re_string_peek_byte_case (regex_internal.c:860)
==29631==    by 0x127FD0: peek_token (regcomp.c:1830)
==29631==    by 0x127E93: fetch_token (regcomp.c:1790)
==29631==    by 0x129605: parse_expression (regcomp.c:2459)
==29631==    by 0x128CE8: parse_branch (regcomp.c:2221)
==29631==    by 0x128AFB: parse_reg_exp (regcomp.c:2173)
==29631==    by 0x1289DE: parse (regcomp.c:2141)
==29631==    by 0x12573F: re_compile_internal (regcomp.c:803)
==29631==    by 0x12473D: rpl_re_compile_pattern (regcomp.c:230)
==29631==    by 0x111A57: compile_regex_1 (regexp.c:113)
==29631==    by 0x111CD4: compile_regex (regexp.c:190)
==29631==    by 0x10C73C: compile_address (compile.c:953)
==29631== Address 0xbcbcbcbcc21b51e6 is not stack'd, malloc'd or (recently) free'd
==29631==
==29631==
==29631== Process terminating with default action of signal 11 (SIGSEGV)
==29631==  General Protection Fault
==29631==    at 0x122403: re_string_peek_byte_case (regex_internal.c:860)
==29631==    by 0x127FD0: peek_token (regcomp.c:1830)
==29631==    by 0x127E93: fetch_token (regcomp.c:1790)
==29631==    by 0x129605: parse_expression (regcomp.c:2459)
==29631==    by 0x128CE8: parse_branch (regcomp.c:2221)
==29631==    by 0x128AFB: parse_reg_exp (regcomp.c:2173)
==29631==    by 0x1289DE: parse (regcomp.c:2141)
==29631==    by 0x12573F: re_compile_internal (regcomp.c:803)
==29631==    by 0x12473D: rpl_re_compile_pattern (regcomp.c:230)
==29631==    by 0x111A57: compile_regex_1 (regexp.c:113)
==29631==    by 0x111CD4: compile_regex (regexp.c:190)
==29631==    by 0x10C73C: compile_address (compile.c:953)
==29631==
==29631== HEAP SUMMARY:
==29631==     in use at exit: 8,094 bytes in 16 blocks
==29631==   total heap usage: 54 allocs, 38 frees, 16,395 bytes allocated
==29631==
==29631== LEAK SUMMARY:
==29631==    definitely lost: 0 bytes in 0 blocks
==29631==    indirectly lost: 0 bytes in 0 blocks
==29631==      possibly lost: 0 bytes in 0 blocks
==29631==    still reachable: 8,094 bytes in 16 blocks
==29631==         suppressed: 0 bytes in 0 blocks
==29631== Rerun with --leak-check=full to see details of leaked memory
==29631==
==29631== For counts of detected and suppressed errors, rerun with: -v
==29631== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
====


With GDB, breaking at "re_string_peek_byte_case"
and checking the upper-to-lower offsets, the last 3 elements
are clearly not initialized (they contain "0xBC").
===
$ gdb sed/sed
(gdb) b re_string_peek_byte_case
Breakpoint 1 at 0x1a2d1: file lib/regex_internal.c, line 845.
(gdb) r -f 1.sed < /dev/null
Starting program: /tmp/sed/sed/sed -f 1.sed < /dev/null

Breakpoint 1, re_string_peek_byte_case (pstr=0x7fffffffdc30, idx=1) at lib/regex_internal.c:845
845       if (BE (!pstr->mbs_allocated, 1))
(gdb) x /6xg pstr->offsets
0x55555578fef0: 0x0000000000000000      0x0000000000000001
0x55555578ff00: 0x0000000000000003      0xbcbcbcbcbcbcbcbc
0x55555578ff10: 0xbcbcbcbcbcbcbcbc      0xbcbcbcbcbcbcbcbc
(gdb)

===


Interestingly, if sed is compiled with native glibc's regex code,
and with _REGEX_LARGE_OFFSETS not defined (meaning "regoff_t" is int),
the "offsets" elements are initialized correctly:

====
$ ./configure CFLAGS="-O0 -g"
$ make
$ gdb sed/sed
(gdb) b re_string_peek_byte_case
Function "re_string_peek_byte_case" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (re_string_peek_byte_case) pending.
(gdb) r -f 1.sed < /dev/null
Starting program: /tmp/sed/sed/sed -f 1.sed < /dev/null

Breakpoint 1, peek_token (address@hidden, address@hidden,
    address@hidden) at regcomp.c:1796
1796    regcomp.c: No such file or directory.
(gdb) s
re_string_peek_byte_case (idx=1, pstr=0x7fffffffdc70) at regex_internal.c:840
840     regex_internal.c: No such file or directory.
(gdb) x /10xw pstr->offsets
0x555555777e30: 0x00000000      0x00000001      0x00000003      0x00000004
0x555555777e40: 0x00000000      0x00000000      0x000003d1      0x00000000
0x555555777e50: 0x00000000      0x00000000
====


I don't have a fix yet, but if anyone has ideas - feedback is welcomed.

I found one very old glibc bug report from 2005 where Paul mentions REGEX_LARGE_OFFSET, not sure if relevant or not:
 https://sourceware.org/bugzilla/show_bug.cgi?format=multiple&id=1281


regards,
 - assaf

Attachment: regex-int-add-memset.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]