vm ops and threading discussion for guile-log

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

vm ops and threading discussion for guile-log

From:	Stefan Israelsson Tampe
Subject:	vm ops and threading discussion for guile-log
Date:	Tue, 24 Apr 2012 22:09:09 +0200

Hi,

i hope that I do not bore you with continuing talking about guile-log.

But one of the things brought up by Mark, and rightly so, are that having the fundaments
of logic programming based on a stack compared to assq lists or similar more
scalable functional data structures makes it complicated to take advantage of multiple
cores and threads.

To answer this I have now enabled the possibility to use multiple stacks by blending the two
techniques together. The reason to take such a complicated approach are threefold.

1. Stack based techniques are faster and quite much so if we can take advantage of special
vm operation or are able to compile the code. I find that assq based one will saturate and we
would for the uniprocessor case get a difference around 10 in speed.

2. Stack based approaches can take advantage of the fact that it is a stack and implement
variables that behaves like global variables but also like stack variables and can be reconstruted
from a stored state. Now these data-structures look nonfunctional but they have similarities
to variables living on the stack e.g. in order for reentrance to work they need to reference functional
datastructures. Anyway I find that with them one can formulate many problem in a natural way
which a functional approach cannot mach hence make sure that we have an expressive framework
that preserves a functional behavior e.g. running stored code that does not reference global variables always return
the same and also we are able to put all code in tail position and enable user space code to yield a return
value simply by returning a value in the user space function.

3. We are able to trace rebuilding unwinding, this is nice with respect to debugging but also is a algorithmic
enabler because you can design functionality taking advantage of this fact.

Well there are some drawbacks.
1. Things are a little more complicated.
2. There might be some overhead in some situation, althogh selecting the right mix of
a stack based approach and assq list based one will lead to good performance in most cases.
-----------------------------------------------------------------------------------------------------------------------------------------

How the simple multithreading works:
Each thread has a stack of their own and each created variable has an id indicating to which stack they belong.
Now if a thread tries to set a variable from another thread then it will do this by consing on an assq list in stead
of modifying the other stack and simply mimic kanrens strategy.
------------------------------------------------------------------------------------------------

VM OPS.
--------------
Here one can work on different levels. But I choose to look into just reducing the overhead for function call's
out into C land. You might write a direct op code that calls that function, but this means a lot of VM op's.
And I therefore settled in a smaller set of op-codes that used function pointers stored in a array reachable
form the VM. And then used op-codes like

(fast-call-0 index)
(fast-call-1 index a1)
(fast-call-2 index a2)

Then when we load the c functions like gp-car form the shared library I add a stub like,

(fast-call-set! gp-car 2 5)
(define-inlinable (gp-car x s)
(fast-call-2 5 x s))

The first is a vm-op that transferes gp-car function pointer to the function pointer array at arguments 2, index 5.
and then I overwrite the gp-car! with a suitable inline funciton.

With this techniqe I can increase the speed of the stack based logic code by a factor 2-3. The assq based one is only marginally
faster.

I think I could win a factor of 2 in the number of vm ops if I could design into this framework the possibility to return several
variables. e.g.

Consider destructuring a cons cell in a car and a cdr. Currently this is done like,

(let ((s (gp-pair!? x s)))
(if s
    (let ((x-car (gp-car x s))
          (x-cdr (gp-cdr x s)))
       ...

I would like to be able to write

(let-values (((s x-car x-cdr) (gp-pair-fast!? x s)))
   (if s ...))

And the op-codes would align quite nicely and reduce the op count.

There is a further improvements that could be made but then everything is
special and special vm op-codes would be needed and that bloat that should be handled
by implementing A JIT engine or naitive compilation in stead.

My conclusion is that it's possible to implement a tool in guile that would make it quite
easy to improve the speed of vm based guile by a factor of 2-5 by using C based elements
without the need to be a VM wizard and it would be nice to have something along the lines
discussed above but with better integration into guile.

I did also redesign the code so currently I'm at 80ms (for stack based guile-log) for the einstein case that basically tests
the speed fo destructuring a pair in it's elements.

Cheers
Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

vm ops and threading discussion for guile-log, Stefan Israelsson Tampe <=

Prev by Date: Re: wip-cse
Next by Date: Re: wip-cse
Previous by thread: Re: Halt while building guile
Next by thread: Let's Talk About Backtraces and Stacks
Index(es):
- Date
- Thread