Re: One-line build-test from busybox statically built segfaults on GNU/H

From: Samuel Thibault
Subject: Re: One-line build-test from busybox statically built segfaults on GNU/Hurd
Date: Sun, 23 Nov 2014 19:48:15 +0100
User-agent: Mutt/1.5.21+34 (58baf7c9f32f) (2010-12-30)


Svante Signell, le Thu 20 Nov 2014 12:42:30 +0100, a écrit :
> (11:03:58) mjt: hello. it looks like hurd-i386 is the only arch where my
> one-line build-test program fails -- see
> https://buildd.debian.org/status/package.php?p=busybox

I've had a closer look.  My initial guess was wrong.  What actually
happens is much more horrible :)

The test uses getpwnam, which dlopens some nss module, I guess something
like /lib/i386-gnu/libnss_compat-2.19.so. We then get the following

(gdb) bt
#0  __mig_dealloc_reply_port (arg=86) at ../sysdeps/mach/hurd/mig-reply.c:41
#1  0x0124ba3c in __vm_allocate_rpc (target_task=0, address=0x1001a14, 
    size=1048576, anywhere=1)
#2  0x0107edd0 in __vm_allocate (target_task=0, address=0x1001a14, 
    size=1048576, anywhere=1)
#3  0x0117f58d in __mmap (addr=0x0, len=1048576, prot=6, flags=2, fd=-1, 
    offset=0) at ../sysdeps/mach/hurd/mmap.c:51
#4  0x011018a2 in sysmalloc (av=0x121a4e0 <main_arena>, nb=376)
    at malloc.c:2495
#5  _int_malloc (av=av@entry=0x121a4e0 <main_arena>, bytes=bytes@entry=372)
    at malloc.c:3800
#6  0x01102bb8 in __libc_malloc (bytes=bytes@entry=372) at malloc.c:2891
#7  0x01103f11 in malloc_hook_ini (sz=372, 
    caller=0x10f2a7c <__fopen_internal+28>) at hooks.c:32
#8  0x01102c8e in __libc_malloc (bytes=372) at malloc.c:2883
#9  0x010f2a7c in __fopen_internal (
    filename=filename@entry=0x11dd7da "/etc/nsswitch.conf", 
    mode=mode@entry=0x11da8ee "rce", is32=is32@entry=1) at iofopen.c:73
#10 0x010f2b5b in _IO_new_fopen (filename=0x11dd7da "/etc/nsswitch.conf", 
    mode=0x11da8ee "rce") at iofopen.c:103
#11 0x011ae143 in nss_parse_file (fname=0x11dd7da "/etc/nsswitch.conf", 
    fname=0x11dd7da "/etc/nsswitch.conf") at nsswitch.c:552
#12 __nss_database_lookup (database=0x1041917 "passwd_compat", 
    alternate_name=0x0, defconfig=0x10418b8 "nis", ni=0x10441e8 <ni>)
    at nsswitch.c:125
#13 0x0103e3e9 in init_nss_interface () at nss_compat/compat-pwd.c:105
#14 0x0103f43d in _nss_compat_getpwnam_r (name=0x80d4ad0 "root", 
    pwd=0x8103cd8 <resbuf>, buffer=0x8106970 "", buflen=1024, errnop=0x8105528)
    at nss_compat/compat-pwd.c:865
#15 0x0805c17d in getpwnam_r ()
#16 0x0805c039 in getpwnam ()
#17 0x08048b75 in main () at test.c:6

Notice the addresses of the libc functions: nss_parse_file is not
actually using the functions from the static executable, but from
*another* dlopened libc!!  That libc is not getting initialized enough,
and notably __mach_task_self_ there is still zero, and thus all RPCs
will return EMACH_SEND_INVALID_DEST, including the RPC which frees the
port used for the __vm_allocate RPC, thus freeing the port used for that
RPC, which fails the same, thus freeing the port used for that RPC, etc.

So in short: the issue is dlopening libraries from a static binary, I
guess we have never tested that. The same happens with dlopening other
libraries, for instance libz:

#include <stdio.h>
#include <dlfcn.h>
#include <zlib.h>
#include <string.h>
char outbuf[128];
typedef int (*init_t) (z_streamp strm, int windowBits, const char *version, int 
int main(void) {
        void *c = dlopen("libz.so", RTLD_LAZY);
        init_t sym = dlsym(c, "inflateInit2_");

        z_stream strm;
        memset(&strm, 0, sizeof(strm));
        strm.next_out = outbuf;
        strm.avail_out = sizeof(outbuf);
        int result = sym(&strm, 32 + 15, ZLIB_VERSION, (int) sizeof(z_stream));
        printf("%d\n", result);

to be compiled with 

gcc test.c -o test -ldl -static -g

It seems that on Linux the libc also gets dlopened like that, but I
guess initialization happens to get done enough for it to be working.


