guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: srfi-18 requirements


From: Julian Graham
Subject: Re: srfi-18 requirements
Date: Mon, 17 Dec 2007 23:30:46 -0500

Hi everyone,

Thanks for your comments and patience.  I've attached a new version of
my SRFI-18 patch which I hope addresses the stuff that Ludovic raised.
 Sorry it's taken so long!  In real life, I've been hung up on freeing
myself from the entanglements of one full-time job and getting wrapped
up in those of another.  At any rate:


> My comments below are mostly cosmetic.  Once you're done with them,
> could you please provide a ChangeLog entry?

Of course.  Please find relevant updates included in the revised patch.


> Then we'll need to update the doc of the core API, and also add
> documentation of the SRFI module itself (the custom has been to somewhat
> duplicate SRFIs in the manual so that we have all documentation in one
> place).

Sure, I'll do that -- I just wanted to make sure the scope of the
patch was well-established before I wrote anything.


> `threads.test' doesn't need to be updated because threads don't throw
> exceptions, right?

Right, sort of: doing 'join-thread' on a thread that has been canceled
via the SRFI-18 'terminate-thread!' function will throw a
thread-terminated-exception (terminate-thread! installs a special
cleanup handler right before exiting that does this). But everything
related to cancellation in threads.test uses cancel-thread, which does
not cause an exception to be thrown, and none of the other tests
involve exception handling.


> I don't quite get this one.  Could you illustrate the problem
> step-by-step with a simple scenario?
>
> The value of HANDLER in `scm_spawn_thread ()' doesn't seem to affect
> critical-section locking.

Maybe I should have said the problem lies in really_spawn() -- if
data->handler is non-NULL, then really_spawn() enters the thread's
body via scm_internal_catch() instead of directly.
scm_internal_catch() calls scm_c_catch(), which calls make_jmpbuf(),
which enters a critical section, leading to the deadlock. Here's the
sequence of events that I was experiencing (surprisingly often):

Thread 1, in guile-mode (heap mutex locked) launches Thread 2
Thread 2 enters a critical section, locking the critical section mutex
Thread 1, as part of expression evaluation in eval.i.c, attempts to
lock the critical section and blocks
Thread 2, as part of make_jmpbuf, calls SCM_NEWSMOB2, leading to a
call to scm_double_cell, which causes a GC. Thread 2 attempts to lock
the heap mutex of all executing threads, but Thread 1's heap mutex
will never be unlocked


I've subsequently discovered and fixed a separate (quasi-) deadlock,
also-preexisting, this time related to a lack of thread safety
surrounding the usage of scm_thread_go_to_sleep -- in short, there can
be a race on the value of scm_thread_go_to_sleep such that a thread
can continue to enter and leave guile-mode even while
scm_i_thread_put_to_sleep is trying to put it to sleep for GC.  The
fix is to require that threads hold the thread_admin_mutex while
locking their heap_mutexes in scm_enter_guile, so that they don't
inadvertantly re-enter guile-mode during a GC attempt.  I can give you
an example if you'd like.


> Don't do that, we must keep all years.

Fixed.


> In `block_self ()':
>
>> @@ -239,6 +265,7 @@
>>      err = EINTR;
>>        t->block_asyncs--;
>>        scm_i_reset_sleep (t);
>> +      scm_i_pthread_cleanup_pop (0);
>>      }
>
> Why is it needed?

I'm not sure what you mean. In a literal sense, I pop the cleanup
handler because I installed it a few lines earlier. More generally,
the reason I added the handler was because scm_i_pthread_cond_wait is
a cancellation point, and scm_i_setup_sleep has just been called -- if
the thread is canceled while in cond_wait, the sleep state will never
be reset. This may not be a critical issue (or it may be -- I added
this code while I was chasing down an unrelated deadlock), but it
makes for better thread bookkeeping.


>> +extern scm_i_thread *scm_i_signal_delivery_thread;
>
> Could it go in one of `{threads,scmsigs}.h'?

Yes -- I'd meant to include that in the original patch.  Sorry!  It's
in scmsigs.h now.


>> +typedef struct {
>
> Make sure to follow the GCS, i.e., put the opening brace on a line of
> its own.

Fixed for all occurences.


>> -  /* Ensure the signal handling thread has been launched, because
we might be
>> -     shutting it down.  */
>> -  scm_i_ensure_signal_delivery_thread ();
>> -
>>    /* Unblocking the joining threads needs to happen in guile mode
>>       since the queue is a SCM data structure.  */
>>    scm_with_guile (do_thread_exit, v);
>>
>> +  /* Ensure the signal handling thread has been launched, because
we might be
>> +     shutting it down.  */
>> +  scm_i_ensure_signal_delivery_thread ();
>> +
>
> Why does that need to be moved?

It doesn't -- thanks for catching that.  I was trying to resolve a
deadlock related to the one described above that I ultimately fixed by
removing the catch handler for the signal delivery thread.


> +SCM_DEFINE (scm_join_thread_timed, "join-thread", 1, 2, 0,
>
> Scheme/C name mismatch.  I believe it effectively hides the first
> `join-thread', right?

Yes, I put it there to override the binding of join-thread.  Is that a
problem? I do it several other places, too.  I wasn't sure how else to
maintain binary compatibility for users of scm_join_thread while
extending the functionality of the Scheme 'join-thread' function.
Would it be better if I removed the first SCM_DEFINE (the one that
refers to scm_join_thread()) and replaced it with a normal function
declaration for scm_join_thread()?


>> +"Suspend execution of the calling thread until the target @var{thread} "
>
> Indenting would be nicer.

You mean indenting the doc strings so that they line up with the
parentheses?  None of the other functions do this.  (And is this
documentation actually used to generate anything?)


> Remember spaces after periods.  :-)

Fixed for all occurences.


> Open bracket right after `block_self'.

Fixed for all occurences.


> Too bad we have yet another time API...
> ... and another exception API, too.

Yeah... such is the nature of SRFIs, though.  Anything you want me to
change here -- other than the exception handling described below?


> We have `*unspecified*' in core Guile, which is better because it
> doesn't have any side-effect, so better use it.

Ah, excellent -- didn't know that was there!  Fixed.  (Can we put that
in the manual index?)


> I'd use pairs or records for exception objects rather than just symbols
> since symbols can always be forged.  So we'd have, e.g.:

I hadn't thought of that!  Done, though it took a while.  SRFI-18
exception throwing and handling are now pass-thrus to SRFI-34.  There
was a bit of difficulty in that Guile's implementation of SRFI-34
causes exceptions to get rethrown at the end of the handler, whereas
SRFI-18 requires that the handler re-enter the continuation of the
called primitive afterwards.  I've resolved that by wrapping any
installed user handlers such that they store and then apply their
caller's continuation.


> Type-checking overhead may become a concern if we are to convert values
> often, e.g., once after every timed-join trial.  OTOH, it's good to have
> good error reporting.

So... do you have a ruling, one way or the other?  For what it's
worth, my opinion is that API exposed to the user must always feature
input-checking, but I defer to your maintainer wisdom.


Regards,
Julian

Attachment: srfi-18.HEAD.patch-20071217
Description: Binary data

Attachment: srfi-18.test
Description: Binary data

Attachment: srfi-18.scm
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]