bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: done digging around test-dprintf-posix2


From: Bruce Korb
Subject: Re: done digging around test-dprintf-posix2
Date: Sun, 09 Jan 2011 10:46:47 -0800
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20101125 SUSE/3.0.11 Thunderbird/3.0.11

Hi Bruno,

On 01/07/11 21:17, Bruno Haible wrote:
> Hi Bruce,
> 
>> I now believe it completely correct to add this:  free(malloc(0x88))
>> to the program and put it into the main line until the real cause
>> (glibc or kernel) is determined and fixed.
> 
> What is your explanation of why that free(malloc(0x88)) has the effect
> of avoiding the crash?

It causes an initial allocation arena to be allocated.
This allocation arena (if I read the code correctly) should
be about 1MB in size, not 10MB.

> If I understood things correctly from your Jakub Jelinek's reply to your
> report <http://sourceware.org/bugzilla/show_bug.cgi?id=12232>, then the
> effect of free(malloc(0x88)) is that is pre-allocates some memory pages,

The "allocation arena" I mentioned above.

> to such an extent that the mallocs inside rpl_fprintf or rpl_dprintf
> succeed. We don't want this, as it only masks a problem that is still
> present inside rpl_fprintf or rpl_dprintf.
> 
> Ulrich and Jakub pointed you to the fact that it's the kernel who decides.
> Have you tracked down in the kernel the source code that refuses memory
> allocations, depending on the RLIMIT_AS value? At that moment when malloc
> fails, what are the memory maps (/proc/<pid>/maps)

The kernel decides, but their code determines what sizes of data
to ask for.  If they ask for too much, the kernel is reasonable.
If they are not, then kernel behavior is at fault.
I replaced "return 1" with "abort()":

$ size core
   text    data     bss     dec     hex filename
  65536  225280       0  290816   47000 core \
(core file invoked as ./test-dprintf-posix2 1)

That is a bit smaller than 10000000 decimal.

> and what was the system
> call (strace!) that the malloc() call translated into?

"ltrace -S" does both.  I left off the "-S".

> And what is the size of the 'test-dprintf-posix2' program with all its
> dependencies (as shown by 'ldd')? Does it sum up to more than 10 MB?

Not hardly.  I'd have bumped the limit long ago if that were an issue.
The tests are attempting to check for a memory leak, the size limit
is arbitrary (i.e. not important, except that it must be large enough
so that the program can get started......)

$ for f in  test-fprintf-posix3 test-dprintf-posix2;do echo $f;ldd $f;done
test-fprintf-posix3
        linux-vdso.so.1 =>  (0x00007fffa6fff000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f41de2b7000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f41de060000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f41ddd00000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f41ddae3000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f41de4c0000)
test-dprintf-posix2
        linux-vdso.so.1 =>  (0x00007fff995f9000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f9b1dc99000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f9b1da42000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f9b1d6e2000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9b1d4c5000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9b1dea2000)
$ size test-fprintf-posix3 test-dprintf-posix2
   text    data     bss     dec     hex filename
  15338     648      24   16010    3e8a test-fprintf-posix3
  15540     648      16   16204    3f4c test-dprintf-posix2
(statically linked to libposix)
..............
__libc_start_main(0x4009c0, 2, 0x7fff27c151e8, 0x4036d0, 0x403760 <unfinished
...>
getrlimit(2, 0x7fff27c150e0, 0x7fff27c15200, 0x7f2965bfe4a8, 0x7f2965bff320
<unfinished ...>
SYS_getrlimit(2, 0x7fff27c150e0)                 = 0
<... getrlimit resumed> )                        = 0
setrlimit(2, 0x7fff27c150e0, 0x7fff27c15200, -1, 0x7f2965bff320 <unfinished
...>
SYS_setrlimit(2, 0x7fff27c150e0)                 = 0
<... setrlimit resumed> )                        = 0
getrlimit(9, 0x7fff27c150e0, 0x7fff27c15200, -1, 0x7f2965bff320 <unfinished
...>
SYS_getrlimit(9, 0x7fff27c150e0)                 = 0
<... getrlimit resumed> )                        = 0
setrlimit(9, 0x7fff27c150e0, 0x7fff27c15200, -1, 0x7f2965bff320 <unfinished
...>
SYS_setrlimit(9, 0x7fff27c150e0)                 = 0
<... setrlimit resumed> )                        = 0
strtol(0x7fff27c17263, 0, 10, -1, 0x7f2965bff320) = 1
malloc(88 <unfinished ...>
SYS_brk(NULL)                                    = 0x00606000
SYS_brk(0x00627000)                              = 0x00606000
SYS_mmap(0, 0x100000, 3, 34, 0xffffffff)         = -12
SYS_mmap(0, 0x8000000, 0, 16418, 0xffffffff)     = -12
SYS_mmap(0, 0x4000000, 0, 16418, 0xffffffff)     = -12
SYS_mmap(0, 0x8000000, 0, 16418, 0xffffffff)     = -12
SYS_mmap(0, 0x4000000, 0, 16418, 0xffffffff)     = -12
<... malloc resumed> )                           = NULL
__errno_location()                               = 0x7f29662506a8
__errno_location()                               = 0x7f29662506a8
SYS_exit_group(1 <no return ...>

RE: SYS_mmap()
I have no idea what "-12" means.   It doesn't mean "a-ok" and it
isn't "-1".  ltrace does not seem to recognize it as an error
result, so it isn't printing the location of the error code
(which wouldn't help anyway).

Anyhow, I *think* the first mmap ought to succeed because the
process size is about 300K and it is only asking for 1M more.
That fits my recollection of the malloc code.  The remaining
mmap calls are rather over the top and I'd expect them to be rejected.
I would not expect malloc to try to allocate so much space.
But maybe there is special magic there since the request is for "no access".
It is mapped PRIVATE/ANONYMOUS like the first map. I don't know what
the 0x4000 protection bit means.


SUMMARY: there is brokenness somewhere between glibc and the kernel.
>From the perspective of gnulib tests, it doesn't matter where the
fault lies, it matters that there is a problem.  What precise confluence
of circumstances triggers the problem doesn't seem crucial to me,
either, except to the glibc/kernel folks who do need to chase down
the exact cause.  Therefore, I think the test code should evade the
problem rather than continuing to fail, leaving the fix to others.
This, as a stand alone, not-linked-to-libposix, program does not fail
(well, I've not seen it fail):

#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define NUM_ROUNDS 1000
#define MAX_ALLOC_ROUND 10000
#define MAX_ALLOC_TOTAL (NUM_ROUNDS * MAX_ALLOC_ROUND)

int
main (int argc, char ** argv)
{
    struct rlimit limit;
    if (getrlimit (RLIMIT_DATA, &limit) < 0)
        return 77;
    if (limit.rlim_max == RLIM_INFINITY || limit.rlim_max > MAX_ALLOC_TOTAL)
        limit.rlim_max = MAX_ALLOC_TOTAL;
    limit.rlim_cur = limit.rlim_max;
    if (setrlimit (RLIMIT_DATA, &limit) < 0)
        return 77;
    if (getrlimit (RLIMIT_AS, &limit) < 0)
        return 77;
    if (limit.rlim_max == RLIM_INFINITY || limit.rlim_max > MAX_ALLOC_TOTAL)
        limit.rlim_max = MAX_ALLOC_TOTAL;
    limit.rlim_cur = limit.rlim_max;
    if (setrlimit (RLIMIT_AS, &limit) < 0)
        return 77;
    if (dprintf (STDOUT_FILENO, "%011000d\n", 17) == -1
        && errno == ENOMEM)
        return 1;

    return 0;
}


__libc_start_main(0x400640, 1, 0x7fff280db718, 0x400730, 0x4007c0 <unfinished 
...>
getrlimit(2, 0x7fff280db610, 0x7fff280db728, 0x7fc9aa0784a8, 0x7fc9aa079320 
<unfinished ...>
SYS_getrlimit(2, 0x7fff280db610)                 = 0
<... getrlimit resumed> )                        = 0
setrlimit(2, 0x7fff280db610, 0x7fff280db728, -1, 0x7fc9aa079320 <unfinished ...>
SYS_setrlimit(2, 0x7fff280db610)                 = 0
<... setrlimit resumed> )                        = 0
getrlimit(9, 0x7fff280db610, 0x7fff280db728, -1, 0x7fc9aa079320 <unfinished ...>
SYS_getrlimit(9, 0x7fff280db610)                 = 0
<... getrlimit resumed> )                        = 0
setrlimit(9, 0x7fff280db610, 0x7fff280db728, -1, 0x7fc9aa079320 <unfinished ...>
SYS_setrlimit(9, 0x7fff280db610)                 = 0
<... setrlimit resumed> )                        = 0
dprintf(1, 0x40081c, 17, -1, 0x7fc9aa079320 <unfinished ...>
SYS_fstat(1, 0x7fff280db110)                     = 0
SYS_mmap(0, 4096, 3, 34, 0xffffffff)             = 0x7fc9aa29a000
SYS_lseek(1, 0, 1)                               = -29
SYS_write(1, "00000000000000000000000000000000"..., 
10240000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
 = 1024
[..................]
SYS_munmap(0x7fc9aa29a000, 4096)                 = 0
<... dprintf resumed> )                          = 11001
SYS_exit_group(0 <no return ...>
+++ exited (status 0) +++



reply via email to

[Prev in Thread] Current Thread [Next in Thread]