guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Now crashing [was Re: guile-2.9.2 and threading


From: Linas Vepstas
Subject: Re: Now crashing [was Re: guile-2.9.2 and threading
Date: Sun, 14 Jul 2019 22:03:55 -0500

Exactly the same crash, same stack trace (slightly different line numbers),
with a fresh pull today:
    commit 89e28df1c9069dcb65188fe7b3973c333d87d7e2
    Author: Andy Wingo <address@hidden>
    Date:   Thu Jun 20 14:02:05 2019 +0200
which is the current HEAD on master.

FWIW, 60-odd guile threads waiting here:

#0  __lll_lock_wait () at
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f45ff4fbdbd in __GI___pthread_mutex_lock (mutex=mutex@entry
=0x16e1f68)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f45ff7e085f in scm_c_weak_set_lookup (set=<optimized out>,
    raw_hash=raw_hash@entry=551753256168943069,
    pred=pred@entry=0x7f45ff7d1660 <string_lookup_predicate_fn>,
    closure=closure@entry=0x7f3e44ff7ac0, dflt=dflt@entry=0x4)
    at ../../libguile/weak-set.c:760
#3  0x00007f45ff7d11e9 in lookup_interned_symbol
(raw_hash=551753256168943069,
    name=0x2965ca0) at ../../libguile/symbols.c:112
#4  scm_i_str2symbol (str=0x2965ca0) at ../../libguile/symbols.c:244

The parallelism is low because of this one lock.  This appears to be the
primary bottleneck for my workload.

-- Linas


On Sun, Jul 14, 2019 at 5:03 PM Linas Vepstas <address@hidden>
wrote:

> Below was for
> guile (GNU Guile) 2.9.2.14-1fb399
>
> --linas
>
> On Sun, Jul 14, 2019 at 4:59 PM Linas Vepstas <address@hidden>
> wrote:
>
>>
>> So, here's my next installment on using guile-2.9.2. The first
>> installment said that I'd piled up CPU-months of guile 2.9.2 experience
>> without any crashes. Well, now, a different workload crashes in minutes.
>> Below is a highly simplified, edited gdb session -- it crashes because it
>> unexpectedly aborts, during an abort(!) because `get_callee_vcode()`
>> failed. Harrumpf.
>>
>> Background: there are 140 threads, half in guile, the other half waiting
>> for guile to finish. Yes, that's too many, but anyways ... 70 threads in
>> guile and one crashed:
>>
>> #2  0x00007f85d3f6ecdb in capture_delimited_continuation (
>>     current_registers=<optimized out>, dynstack=<optimized out>,
>>     saved_registers=<optimized out>, saved_mra=<optimized out>,
>>     saved_fp=<optimized out>, vp=<optimized out>) at
>> ../../libguile/vm.c:1327
>> #3  abort_to_prompt (thread=0x15f692dc0, saved_mra=<optimized out>)
>>     at ../../libguile/vm.c:1454
>>
>> Both frames are interesting, because libguile/vm.c:1327 shows
>>   if (SCM_FRAME_DYNAMIC_LINK (base_fp) != saved_fp)
>>      abort();
>> hey!? who called this? line 1454 is in the middle of abort_to_prompt ()
>> Yow! an unexpected abort during an abort...
>>
>> How did we get here?
>> #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420,
>>     subr=<optimized out>, message=message@entry=0x1607c9380,
>>     args=args@entry=0x15af130e0, data=data@entry=0x15af130f0)
>>     at ../../libguile/error.c:90
>> #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0,
>>     message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S",
>> args=0x15af130e0,
>>     rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62
>> #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0)
>>     at ../../libguile/vm.c:1527
>>
>> and libguile/vm.c:1527 tells me that get_callee_vcode () is very unhappy.
>> But why? I cannot tell .. after that, things peter out in boring stack
>> frames that started with my call scm_c_catch() ... the same seemingly
>> harmless call that is pending in 70 other threads. (the same call that has
>> survived several CPU month of pounding with a different collection of
>> scheme code)
>>
>> My best guess is that the current workload, by unintentionally launching
>> gobs of threads is exposing a race condition that has been hithertho
>> hidden.  I don't know how to debug any further.  I will try a slightly
>> newer guile shortly, to see if I get lucky.
>>
>> -- Linas
>>
>> p.s. here's the whole stack trace. But really, its boring, except for the
>> above highlights.
>>
>>
>>
>> (gdb) bt
>> #0  0x00007f85d38ef428 in __GI_raise (sig=sig@entry=6)
>>     at ../sysdeps/unix/sysv/linux/raise.c:54
>> #1  0x00007f85d38f102a in __GI_abort () at abort.c:89
>> #2  0x00007f85d3f6ecdb in capture_delimited_continuation (
>>     current_registers=<optimized out>, dynstack=<optimized out>,
>>     saved_registers=<optimized out>, saved_mra=<optimized out>,
>>     saved_fp=<optimized out>, vp=<optimized out>) at
>> ../../libguile/vm.c:1327
>> #3  abort_to_prompt (thread=0x15f692dc0, saved_mra=<optimized out>)
>>     at ../../libguile/vm.c:1454
>> #4  0x00007f85ac539041 in ?? ()
>> #5  0x00007f85ac37f040 in ?? ()
>> #6  0x00007f85d41e10c0 in jump_table_ () from
>> /usr/local/lib/libguile-3.0.so.0
>> #7  0x000000015f692dc0 in ?? ()
>> #8  0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
>>     mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
>> #9  0x00007f85d3f70600 in vm_debug_engine (thread=0x7f85ac539000)
>>     at ../../libguile/vm-engine.c:370
>> #10 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0xe45a20,
>> argv=<optimized out>,
>>     nargs=5) at ../../libguile/vm.c:1605
>> #11 0x00007f85d3eefdcb in scm_apply_0 (proc=0xe45a20, args=0x304)
>>     at ../../libguile/eval.c:603
>> #12 0x00007f85d3ef0a0d in scm_apply_1 (proc=<optimized out>,
>>     arg1=arg1@entry=0xdc5420, args=args@entry=0x15af130a0)
>>     at ../../libguile/eval.c:609
>> #13 0x00007f85d3f6c546 in scm_throw (key=key@entry=0xdc5420,
>> args=0x15af130a0)
>>     at ../../libguile/throw.c:272
>> #14 0x00007f85d3f6caf9 in scm_ithrow (key=key@entry=0xdc5420,
>> args=<optimized out>,
>>     no_return=no_return@entry=1) at ../../libguile/throw.c:619
>> #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420,
>>     subr=<optimized out>, message=message@entry=0x1607c9380,
>>     args=args@entry=0x15af130e0, data=data@entry=0x15af130f0)
>>     at ../../libguile/error.c:90
>> ---Type <return> to continue, or q <return> to quit---
>> #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0,
>>     message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S",
>> args=0x15af130e0,
>>     rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62
>> #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0)
>>     at ../../libguile/vm.c:1527
>> #18 0x00007f85b4314805 in ?? ()
>> #19 0x00007f85b428a000 in ?? ()
>> #20 0x00007f85d41e10c0 in jump_table_ () from
>> /usr/local/lib/libguile-3.0.so.0
>> #21 0x000000015f692dc0 in ?? ()
>> #22 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
>>     mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
>> #23 0x00007f85d3f70600 in vm_debug_engine (thread=0x2)
>>     at ../../libguile/vm-engine.c:370
>> #24 0x00007f85d3f76db2 in scm_call_n (proc=<optimized out>,
>>     argv=argv@entry=0x7f7e85fe2600, nargs=nargs@entry=3) at
>> ../../libguile/vm.c:1605
>> #25 0x00007f85d3eef97f in scm_call_3 (proc=<optimized out>,
>> arg1=<optimized out>,
>>     arg2=<optimized out>, arg3=<optimized out>) at
>> ../../libguile/eval.c:510
>> #26 0x00007f85d4262b6f in ?? ()
>> #27 0x00007f85d4262a80 in ?? ()
>> #28 0x00007f85d41e10c0 in jump_table_ () from
>> /usr/local/lib/libguile-3.0.so.0
>> #29 0x000000015f692dc0 in ?? ()
>> #30 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
>>     mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
>> #31 0x00007f85d3f70600 in vm_debug_engine (thread=0x304)
>>     at ../../libguile/vm-engine.c:370
>> #32 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341ee0,
>>     argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605
>> #33 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341ee0)
>>     at ../../libguile/eval.c:490
>> #34 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341ee0,
>>     handler=0x15b341ec0, pre_unwind_handler=0x15b341ea0) at
>> ../../libguile/throw.c:146
>> #35 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler 
>> (key=key@entry=0x404,
>>
>> ---Type <return> to continue, or q <return> to quit---
>>     thunk=<optimized out>, handler=<optimized out>,
>>     pre_unwind_handler=<optimized out>) at ../../libguile/throw.c:260
>> #36 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404,
>> body=<optimized out>,
>>     body_data=<optimized out>,
>>     handler=handler@entry=0x7f85c95d0f00
>> <opencog::SchemeEval::catch_handler_wrapper(void*, scm_unused_struct*,
>> scm_unused_struct*)>,
>>     handler_data=handler_data@entry=0x7f7e60000980,
>>     pre_unwind_handler=pre_unwind_handler@entry=0x7f85c95d0c40
>> <opencog::SchemeEval::preunwind_handler_wrapper(void*, scm_unused_struct*,
>> scm_unused_struct*)>,
>>     pre_unwind_handler_data=0x7f7e60000980) at ../../libguile/throw.c:385
>> #37 0x00007f85c95d122a in opencog::SchemeEval::do_eval
>> (this=0x7f7e60000980,
>>     expr="(observe-mpg \"The countess, with her loving heart, felt that
>> her children were being ruined, that it was not the count's fault for he
>> could not help being what he was -- that (though he tried to hide "...)
>>     at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:590
>> #38 0x00007f85c95d12aa in opencog::SchemeEval::c_wrap_eval
>> (p=0x7f7e60000980)
>>     at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:507
>> #39 0x00007f85d3eeb47a in c_body (d=0x7f7e85fe2d40)
>>     at ../../libguile/continuations.c:430
>> #40 0x00007f85d4262b6f in ?? ()
>> #41 0x00007f85d4262a80 in ?? ()
>> #42 0x00007f85d41e10c0 in jump_table_ () from
>> /usr/local/lib/libguile-3.0.so.0
>> #43 0x000000015f692dc0 in ?? ()
>> #44 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
>>     mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
>> #45 0x00007f85d3f70600 in vm_debug_engine (thread=0x304)
>>     at ../../libguile/vm-engine.c:370
>> #46 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341fe0,
>>     argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605
>> #47 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341fe0)
>>     at ../../libguile/eval.c:490
>> #48 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341fe0,
>> ---Type <return> to continue, or q <return> to quit---
>>     handler=0x15b341fc0, pre_unwind_handler=0x15b341fa0) at
>> ../../libguile/throw.c:146
>> #49 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler 
>> (key=key@entry=0x404,
>>
>>     thunk=<optimized out>, handler=<optimized out>,
>>     pre_unwind_handler=<optimized out>) at ../../libguile/throw.c:260
>> #50 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404,
>>     body=body@entry=0x7f85d3eeb470 <c_body>,
>>     body_data=body_data@entry=0x7f7e85fe2d40,
>>     handler=handler@entry=0x7f85d3eeb720 <c_handler>,
>>     handler_data=handler_data@entry=0x7f7e85fe2d40,
>>     pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580
>> <pre_unwind_handler>,
>>     pre_unwind_handler_data=0xe174a0) at ../../libguile/throw.c:385
>> #51 0x00007f85d3eeb9e3 in scm_i_with_continuation_barrier (
>>     body=body@entry=0x7f85d3eeb470 <c_body>,
>>     body_data=body_data@entry=0x7f7e85fe2d40,
>>     handler=handler@entry=0x7f85d3eeb720 <c_handler>,
>>     handler_data=handler_data@entry=0x7f7e85fe2d40,
>>     pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580
>> <pre_unwind_handler>,
>>     pre_unwind_handler_data=0xe174a0) at
>> ../../libguile/continuations.c:368
>> #52 0x00007f85d3eebac5 in scm_c_with_continuation_barrier
>> (func=<optimized out>,
>>     data=<optimized out>) at ../../libguile/continuations.c:464
>> #53 0x00007f85d3575127 in GC_call_with_gc_active (
>>     fn=fn@entry=0x7f85d3f6a070 <with_guile_trampoline>,
>>     client_data=client_data@entry=0x7f7e85fe2e20) at
>> ../pthread_support.c:1343
>> #54 0x00007f85d3f6ac4f in with_guile (base=base@entry=0x7f7e85fe2df0,
>>     data=data@entry=0x7f7e85fe2e20) at ../../libguile/threads.c:683
>> #55 0x00007f85d356f132 in GC_call_with_stack_base (
>>     fn=fn@entry=0x7f85d3f6abb0 <with_guile>, arg=arg@entry
>> =0x7f7e85fe2e20)
>>     at ../misc.c:1941
>> #56 0x00007f85d3f6aff8 in scm_i_with_guile (dynamic_state=<optimized
>> out>,
>>     data=0x7f7e60000980,
>>     func=0x7f85c95d1290 <opencog::SchemeEval::c_wrap_eval(void*)>)
>>     at ../../libguile/threads.c:698
>> ---Type <return> to continue, or q <return> to quit---
>> #57 scm_with_guile (
>>     func=func@entry=0x7f85c95d1290
>> <opencog::SchemeEval::c_wrap_eval(void*)>,
>>     data=data@entry=0x7f7e60000980) at ../../libguile/threads.c:704
>> #58 0x00007f85c95d126e in opencog::SchemeEval::eval_expr
>> (this=0x7f7e60000980,
>>     expr=...) at
>> /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:479
>> #59 0x00007f85bc783439 in opencog::GenericShell::eval_loop
>> (this=0x7f7ef0001e90)
>>     at
>> /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:588
>> #60 0x00007f85c6e5ac80 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #61 0x00007f85d3c916ba in start_thread (arg=0x7f7e85fe3700) at
>> pthread_create.c:333
>> #62 0x00007f85d39c141d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>> (gdb)
>>
>>
>>
>> --
>> cassette tapes - analog TV - film cameras - you
>>
>
>
> --
> cassette tapes - analog TV - film cameras - you
>


-- 
cassette tapes - analog TV - film cameras - you


reply via email to

[Prev in Thread] Current Thread [Next in Thread]