bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hang in bgp_delete


From: Graham Northup
Subject: Hang in bgp_delete
Date: Sat, 11 Feb 2017 17:04:19 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

Configuration Information:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu'
-DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash'
-DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib
-D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pip
e -fstack-protector-strong -g -fvar-tracking-assignments -g
-fvar-tracking-assignments
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin'
-DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc'
-DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -Wno-parentheses
-Wno-format-security
uname output: Linux gmx 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34
CET 2016 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.4
Patch Level: 11
Release Status: release

Description:

I'm getting a mysterious hang on one of our Arch Linux machines for a
particular, rather simple script; getting a debugger attached to the
process after building some debugging symbols, I tracked the hang down
to this loop in bgp_delete (with some minor formatting):

for (
   psi = *(pshash_getbucket (pid));
   psi != NO_PIDSTAT;
   psi = bgpids.storage[psi].bucket_next
)
    if (bgpids.storage[psi].pid == pid)
        break;

...the problem is, according to my debugger:

(gdb) p psi
$1 = 11506
(gdb) p bgpids.storage[psi].bucket_next
$2 = 11506

...and so this just sits there wedging a core :)

I'm not entirely sure what circumstances cause this, but it feels pretty
racy; it takes, on average, a couple days to get this machine to
reliably repeat the issue. I'll leave this process alive for now if
you'd like me to gather more forensics. (I do have a core dump, but it's
~5.5MB :)

For posterity, and reference below, here's a backtrace--sorry that my UA
tries to word wrap it:

#0  0x000000000043ff0e in bgp_delete (pid=pid@entry=15980) at jobs.c:868
#1  0x0000000000443f89 in make_child (command=0xb6b930
"/manage/totaldisk.sh", async_p=async_p@entry=1) at jobs.c:2093
#2  0x000000000042ff9c in execute_simple_command
(simple_command=0x93bd10, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, async=async@entry=1,
fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:4088
#3  0x0000000000431e5c in execute_command_internal (command=0x93bce0,
asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:802
#4  0x0000000000433176 in execute_connection (fds_to_close=0xbbc4c0,
pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93bd60) at
execute_cmd.c:2576
#5  execute_command_internal (command=command@entry=0x93bd60,
asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:971
#6  0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0,
pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93be70) at
execute_cmd.c:2564
#7  execute_command_internal (command=command@entry=0x93be70,
asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:971
#8  0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0,
pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93bfe0) at
execute_cmd.c:2564
#9  execute_command_internal (command=command@entry=0x93bfe0,
asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:971
#10 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0,
pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93c0f0) at
execute_cmd.c:2564
#11 execute_command_internal (command=command@entry=0x93c0f0,
asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:971
#12 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0,
pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93c200) at
execute_cmd.c:2564
#13 execute_command_internal (command=command@entry=0x93c200,
asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:971
#14 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0,
pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x93c370) at
execute_cmd.c:2564
#15 execute_command_internal (command=command@entry=0x93c370,
asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at
execute_cmd.c:971
#16 0x0000000000433a7e in execute_command (command=0x93c370) at
execute_cmd.c:405
#17 0x0000000000433b2f in execute_while_or_until
(while_command=0x93c3a0, type=type@entry=0) at execute_cmd.c:3509
#18 0x0000000000431cad in execute_while_command
(while_command=<optimized out>) at execute_cmd.c:3450
#19 execute_command_internal (command=command@entry=0x93c3c0,
asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x93c3f0) at
execute_cmd.c:911
#20 0x0000000000433a7e in execute_command (command=0x93c3c0) at
execute_cmd.c:405
#21 0x000000000041c4a2 in reader_loop () at eval.c:180
#22 0x000000000041b1c2 in main (argc=2, argv=0x7fff79e96c78,
env=0x7fff79e96c90) at shell.c:792

...and here is xxd /proc/3127/cmdline--the hanging process. I was asked
by gdb when I dumped core to note that there are embedded NULs in here:

00000000: 2f62 696e 2f62 6173 6800 2f6d 616e 6167  /bin/bash./manag
00000010: 652f 7275 6e2e 7368 00                   e/run.sh.

Repeat-By:

Attached is a tarball with the current master of (1) the script 3127 was
running, (2) all of the scripts and program sources it calls, and (3) a
systemd service which invoked 3127. Assuming your build is susceptible
to the bug--exact conditions are quite unclear--extract this tarball's
contents, rename the directory in its root to "/manage", and execute
"/bin/bash /manage/run.sh"--or, more faithfully, install manage.service
into a systemd unit path of your choosing and start that service.

Let me know if there's anything else I can provide :)

Thanks,
Graham

Attachment: manage3client-master-5e9652e7eb2f53545244aa180427f019ca3e92d6.tar.bz2
Description: Binary data

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]