[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Terrific Dead Lock
From: |
Ludovic Courtès |
Subject: |
Terrific Dead Lock |
Date: |
Thu, 13 Mar 2008 23:29:56 +0100 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) |
Hello,
I'm experiencing a dead lock while running the test suite (in a NixOS
build), and I don't remember ever seeing it before. Sorry for the long
copy/paste, but it helped me understand the problem as I was writing
this message.
Here we go:
(gdb) info threads
* 3 Thread 0x40b70b90 (LWP 6675) 0xffffe410 in ?? ()
2 Thread 0x416d3b90 (LWP 6853) 0xffffe410 in ?? ()
1 Thread 0x402da8d0 (LWP 5049) 0xffffe410 in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 0x402da8d0 (LWP 5049))]#0 0xffffe410 in ?? ()
(gdb) bt
#0 0xffffe410 in ?? ()
#1 0xbfbc3e58 in ?? ()
#2 0x00000002 in ?? ()
#3 0x00000080 in ?? ()
#4 0x401912b9 in __lll_lock_wait () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#5 0x4018c9d6 in _L_lock_95 () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#6 0x4018c3ba in pthread_mutex_lock () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#7 0x400bb6fb in scm_i_thread_put_to_sleep () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#8 0x40069159 in scm_i_gc () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#9 0x4006afbe in increase_mtrigger () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#10 0x4009d8be in scm_make_srcprops () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#11 0x400977d9 in scm_read_sexp () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#12 0x4009672f in scm_read_expression () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#13 0x40097622 in scm_read_sexp () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#14 0x4009672f in scm_read_expression () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#15 0x4009769e in scm_read_sexp () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#16 0x4009672f in scm_read_expression () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#17 0x4009769e in scm_read_sexp () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#18 0x4009672f in scm_read_expression () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#19 0x4007d8da in scm_primitive_load () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#20 0x40062ed3 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#21 0x4004dc2b in scm_start_stack () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#22 0x4004e3a1 in scm_m_start_stack () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#23 0x4005cb71 in scm_apply () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#24 0x40061a15 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#25 0x400617bd in scm_call_0 () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#26 0x400664ad in apply_thunk () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#27 0x4006668e in scm_c_with_fluid () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#28 0x400666e5 in scm_with_fluid () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#29 0x40062093 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#30 0x400617bd in scm_call_0 () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#31 0x40051e98 in scm_dynamic_wind () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#32 0x40062093 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#33 0x400617bd in scm_call_0 () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#34 0x400664ad in apply_thunk () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#35 0x4006668e in scm_c_with_fluid () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#36 0x400666e5 in scm_with_fluid () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#37 0x40062093 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#38 0x40064bb6 in call_closure_1 () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#39 0x4005d48e in scm_for_each () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#40 0x40062eba in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#41 0x40063156 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#42 0x40063a79 in ceval () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#43 0x400648da in scm_primitive_eval_x () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#44 0x40064935 in scm_eval_x () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#45 0x4009a021 in scm_shell () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#46 0x4007a546 in invoke_main_func () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#47 0x4004c492 in c_body () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#48 0x400bdbd9 in scm_c_catch () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#49 0x4004ca02 in scm_i_with_continuation_barrier () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#50 0x4004cae3 in scm_c_with_continuation_barrier () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#51 0x400bcd79 in scm_i_with_guile_and_parent () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#52 0x400bce6e in scm_with_guile () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#53 0x4007a4df in scm_boot_guile () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#54 0x08048a06 in main ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x416d3b90 (LWP 6853))]#0 0xffffe410 in ?? ()
(gdb) bt
#0 0xffffe410 in ?? ()
#1 0x416d31a8 in ?? ()
#2 0x00000002 in ?? ()
#3 0x00000080 in ?? ()
#4 0x401912b9 in __lll_lock_wait () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#5 0x4018c9e4 in _L_lock_236 () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#6 0x4018c43b in pthread_mutex_lock () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#7 0x400bdbed in scm_c_catch () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#8 0x4004ca02 in scm_i_with_continuation_barrier () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#9 0x4004cae3 in scm_c_with_continuation_barrier () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#10 0x400bcd79 in scm_i_with_guile_and_parent () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#11 0x400bce6e in scm_with_guile () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#12 0x400bcec3 in on_thread_exit () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#13 0x40189dc0 in __nptl_deallocate_tsd () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#14 0x4018a189 in start_thread () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#15 0x40264dae in clone () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x40b70b90 (LWP 6675))]#0 0xffffe410 in ?? ()
(gdb) bt
#0 0xffffe410 in ?? ()
#1 0x40b6ff78 in ?? ()
#2 0x00000001 in ?? ()
#3 0x40b7005b in ?? ()
#4 0x401916cb in read () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#5 0x400988f3 in do_read_without_guile () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#6 0x400bb7cc in scm_without_guile () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#7 0x40098855 in signal_delivery_thread () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#8 0x400bdbd9 in scm_c_catch () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#9 0x400bdde9 in scm_internal_catch () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#10 0x400bca4d in really_spawn () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#11 0x4004c492 in c_body () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#12 0x400bdbd9 in scm_c_catch () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#13 0x4004ca02 in scm_i_with_continuation_barrier () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#14 0x4004cae3 in scm_c_with_continuation_barrier () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#15 0x400bcd79 in scm_i_with_guile_and_parent () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#16 0x400bcddf in spawn_thread () from
/tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#17 0x4018a17b in start_thread () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#18 0x40264dae in clone () from
/nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6
All this happens apparently while reading `unif.test' (which comes right
after `time.test'):
$ sudo tail -n 3 /tmp/nix-5221-14/guile-1.8.4/check-guile.log
PASS: time.test: strptime: in another thread after error
PASS: time.test: strptime: GNU %s format: gmtoff on GMT
PASS: time.test: strptime: GNU %s format: gmtoff on EST+5
To summarize:
* Thread 2 is exiting. It holds THREAD_ADMIN_MUTEX (it acquired it at
the beginning of `do_thread_exit ()') and is waiting on
SCM_I_CRITICAL_SECTION_MUTEX in `scm_c_catch ()'.
* Thread 1 is reading, actually GC'ing. It's trying to acquire
THREAD_ADMIN_MUTEX in `scm_i_thread_put_to_sleep ()'. It holds
SCM_I_CRITICAL_SECTION_MUTEX from `scm_make_srcprops ()'.
One might wonder: why the heck does `scm_make_srcprops ()' enter a
critical section? Could it just use a private mutex to protect accesses
to `srcprops_freelist'?
Han-Wen's reimplementation of it in HEAD (2007-01-19) doesn't use a
critical section, nor a mutex, but is thread-safe AFAIUI.
Two possibilities to fix it:
1. Copy `srcprop.[ch]' and `eval.c' bits from HEAD to 1.8. After all,
it's probably solid enough (I use almost only HEAD). See [0] for
an overview of the initial patch. It doesn't break the public API
nor the ABI, but it (re)moves stuff from the `srcprop.h'.
2. Remove the critical section from 1.8 and synchronize accesses to
`srcprops_freelist' with a private mutex, assuming that's a correct
fix.
I'd be in favor of the first approach.
Comments?
Thanks,
Ludovic.
[0] http://thread.gmane.org/gmane.lisp.guile.devel/6439
- Terrific Dead Lock,
Ludovic Courtès <=