guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: srfi-18 requirements


From: Neil Jerram
Subject: Re: srfi-18 requirements
Date: Sun, 30 Dec 2007 11:04:40 +0000
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)

"Julian Graham" <address@hidden> writes:

> Hi everyone,
>
> Thanks for your comments and patience.  I've attached a new version of
> my SRFI-18 patch which I hope addresses the stuff that Ludovic raised.

I have some general comments.  I'm sorry for not coming up with these
earlier.

1. Some of your changes are bug fixes for the existing thread code,
   and not dependent on SRFI-18, so we should apply these to 1.8.x as
   well as to HEAD.  Can you pull these out into a separate patch?

2. I don't think it's clear what the overall effect of the SRFI-18
   enhancement will be.  Is it your intention that Guile will then
   implement SRFI-18 behaviour by default?  Or that it will be an
   option, alongside the existing thread behaviour?

   Based on the current patch, I think you intend the latter, and that
   the choice of existing/SRFI-18 behaviour is made by what procedures
   the calling code uses, which in turn depends on whether that code
   has done (use-modules (srfi srfi-18)).  Right?

   What then happens if a SRFI-18 procedure is called on a
   non-SRFI-18-created thread, or vice versa?

   (Is there already a list somewhere of differences between existing
   and SRFI-18 behaviour?  That would help my understanding, at
   least.)

3. I think it's important that existing thread behaviour continues to
   work exactly as it does before your enhancement (modulo bug fixes,
   of course), so I'd prefer if the code changes could be structured
   in such as way as to make it obvious that this will be the case.
   Perhaps this is impossible, in which case we have to rely on
   review, but if there is anything that could be done here, that
   would be good.

I haven't commented on the patch in detail, because it may change
depending on discussion of the above points.  I have a few points on
the text below though.

>> I don't quite get this one.  Could you illustrate the problem
>> step-by-step with a simple scenario?
>>
>> The value of HANDLER in `scm_spawn_thread ()' doesn't seem to affect
>> critical-section locking.
>
> Maybe I should have said the problem lies in really_spawn() -- if
> data->handler is non-NULL, then really_spawn() enters the thread's
> body via scm_internal_catch() instead of directly.
> scm_internal_catch() calls scm_c_catch(), which calls make_jmpbuf(),
> which enters a critical section, leading to the deadlock. Here's the
> sequence of events that I was experiencing (surprisingly often):
>
> Thread 1, in guile-mode (heap mutex locked) launches Thread 2
> Thread 2 enters a critical section, locking the critical section mutex
> Thread 1, as part of expression evaluation in eval.i.c, attempts to
> lock the critical section and blocks
> Thread 2, as part of make_jmpbuf, calls SCM_NEWSMOB2, leading to a
> call to scm_double_cell, which causes a GC. Thread 2 attempts to lock
> the heap mutex of all executing threads, but Thread 1's heap mutex
> will never be unlocked

Why is Thread 2 still in the critical section when it calls
make_jmpbuf?  That strikes me as the problem here.

> I've subsequently discovered and fixed a separate (quasi-) deadlock,
> also-preexisting, this time related to a lack of thread safety
> surrounding the usage of scm_thread_go_to_sleep -- in short, there can
> be a race on the value of scm_thread_go_to_sleep such that a thread
> can continue to enter and leave guile-mode even while
> scm_i_thread_put_to_sleep is trying to put it to sleep for GC.  The
> fix is to require that threads hold the thread_admin_mutex while
> locking their heap_mutexes in scm_enter_guile, so that they don't
> inadvertantly re-enter guile-mode during a GC attempt.  I can give you
> an example if you'd like.

Nice fix.

>> +SCM_DEFINE (scm_join_thread_timed, "join-thread", 1, 2, 0,
>>
>> Scheme/C name mismatch.  I believe it effectively hides the first
>> `join-thread', right?
>
> Yes, I put it there to override the binding of join-thread.  Is that a
> problem? I do it several other places, too.  I wasn't sure how else to
> maintain binary compatibility for users of scm_join_thread while
> extending the functionality of the Scheme 'join-thread' function.
> Would it be better if I removed the first SCM_DEFINE (the one that
> refers to scm_join_thread()) and replaced it with a normal function
> declaration for scm_join_thread()?

See e.g. handling of scm_srfi1_map in srfi/srfi-1.c.  Would that work
for the SRFI-18 extensions?

Note that this would imply a separate SRFI-18 library, which gets
loaded when the srfi-18 module is first used.  So clearly this is
related to the points above code structure and separating existing and
SRFI-18 behaviour.

>>> +"Suspend execution of the calling thread until the target @var{thread} "
>>
>> Indenting would be nicer.
>
> You mean indenting the doc strings so that they line up with the
> parentheses?  None of the other functions do this.  (And is this
> documentation actually used to generate anything?)

In theory, yes: see doc/maint/docstring.el.  I haven't actually done
this for years, but I believe it should still work.

>> Type-checking overhead may become a concern if we are to convert values
>> often, e.g., once after every timed-join trial.  OTOH, it's good to have
>> good error reporting.
>
> So... do you have a ruling, one way or the other?  For what it's
> worth, my opinion is that API exposed to the user must always feature
> input-checking, but I defer to your maintainer wisdom.

I think the check-arg-type calls are messy, but I'm not that much
bothered.

     Neil





reply via email to

[Prev in Thread] Current Thread [Next in Thread]