[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GC thread performance
Re: GC thread performance
Sat, 2 Dec 2017 14:30:50 +0000
On Sat, 02 Dec 2017 10:50:29 +0200
Marko Rauhamaa <address@hidden> wrote:
> Linas Vepstas <address@hidden>:
> > I cannot speak to GC, but I freuqently encounter situations in guile
> > where using the parallel constructs e.g. par-for-each, end up
> > running slower than the single-threaded version. For example, using
> > 2 or 3 threads, I get a 2x and 3x speedup, great, but using 4x
> > gives a slowdown, often 10x slower than single-threaded. I try to
> > make sure that the insides of the loop are large and long-running,
> > so that the cost of creating and scheduling threads is
> > inconsequential.
> > I have not attempted to determine the cause of this, but basically,
> > that entire subsystem needs a careful review and measurement.
> I'll have to speculate, too.
> Guile guarantees the consistency of the data model under all
> circumstances. Bad synchronization between threads is allowed to cause
> unspecified behavior, but it should never trigger a SIGSEGV. In
> practice, that means excessive locking: all data access needs to take
> place in a critical section.
If you mean that you believe guile carries out significant and excessive
locking to maintain the invariants of its containers and other SCM
objects, I do not think that is right. I don't think guile generally
does use locking for that: instead it uses the fact that SCM objects
are stored in a type of size uintptr_t and aligned on 8-byte boundaries
to enable pointer tagging (see the file tags.h in the source
Guile can therefore leverage the fact that on any platform supported by
guile threads, native pointer/integer types aligned on their natural
size boundary are atomic, with in C11 terms relaxed (ie unsynchronized)
memory ordering. guile is not going to carry out CAS-style or
acquire/release synchronisation for you: a SCM object will be in a
valid state but that state may be completely different from the one you
expect if you don't synchronize in your own code.
Guile may of course lock its own internal global non-fluid data, but
that is a different point from the one I think you are making.
As I recall, Andy Wingo was reporting in relation to his fibers library
that it started to slow with more than about 8 native threads running.
As I recall, after 8 native threads, significant improvements in speed
failed to occur even with more than 8 processors. You are also likely
to get a slow down if you run more native threads than you have