lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Blockers for Guile 2.2


From: Luca Fascione
Subject: Re: Blockers for Guile 2.2
Date: Sun, 27 Feb 2022 16:26:54 +0100

On Sun, Feb 27, 2022 at 12:13 PM Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Sun, Feb 27, 2022 at 10:39 AM Luca Fascione <l.fascione@gmail.com>
> wrote:
> > is it true that if you double the source size you double the compilation
> time?
>
> it should be, but we have rather complicated page breaking code that
> is so hairy that nobody understands it fully. I'm not sure there is
> NP-complete snake in the proverbial grass there.
>

Understood. As a use of both lilypond and LaTeX I have been idly wondering
for years
whether modern computers can afford to use global line/page breaking
algorithms that
would avoid some of the shortcomings of TeX's approach.
A discussion for a different thread, of course.

accessing data (eg. offsets):
> * on first access, the callback gets executed. This is just evaluating
> (<function> <grob-object>).
> * on second access, the value is cached in an alist, and looking it up
> is extremely cheap.
>

This is cool. :-)
I don't know enough about this program to even begin to have a gut feeling,
however I guess I'm thinking it seems there would be tons of these reads,
and I'm hearing you say
that in an eventual sense, all data access is an alist access.
I don't know how alists are actually implemented under the hood, but they
feel like they would be a linear scan
with symbol-typed keys on the surface. So to pull out one float you're
doing what 5-10 64bit compares
(checking the keys) and just as many pointer jumps, right? (I'm thinking
the alist is a list of symbol/value
pairs in the implementation also).

This cost strongly dominates the float dereference itself, and there is
this question of how much extra stuff
is happening around it (I guess in my mind I'm comparing it to element
access in C++ which is one pointer (this),
one offset for the member (compiled into an immediate), one load of
this+offset (which the hardware helps you with)).

I feel for the moment I can't provide any concrete insight into any of
this, because I don't know the specifics enough.

> Code following a simple pattern like this, once compiled, will largely be
> dominated by the
> > scripting language runtime overhead
>
> From the outside this may seem plausible, but I doubt your intuition here:
>
> * the callback tables are also alists. They are cheap (they aren't
> sped up if you swap them for hash tables)
>

Not presuming to know your program better than you, but I'd just bring up
that
this is saying that your lists are short (likely length 20ish on average):
the observation
you report is that hashing so you can do a direct access into an array is
not faster than
several pointer-pointer comparisons and pointer chases. The hash you'd use
here be
something like FNV or so, it'll break even somewhere in the 10-20
comparisons, I'd expect.



> * Scheme has no marshaling: objects are either direct (scheme -> C++
> is bit twiddling), or they are indirect (a tagged pair with a C++
> pointer in the cdr)
>

Half of that I expected (more specifically, for various reasons, a number
of which not accurate,
I expected the Scheme APIs to be similar to the TCL APIs, and there as well
you just get handed
straight what TCL has in hand, not marshalling involved). One thing I
didn't know is that the
client calls to extract the machine representation of the value would be
super cheap.
But still, if the guile compiler translates scheme values into native ones
and is able to leave them there
for "long" stretches of code in some cases, and our use case instead
prevents that, it seems it could
eventually add up. Again I do need to learn the source better before you
give these thoughts any real weight.



> IMO The real problem is that we don't have good tooling to see what is
> going on in the Scheme side of things: C++ has decent perf analysis,
> but the Guile side of things just looks like a lot of time spent in
> scm_eval(). Some of it is overhead, some if it might be a poorly
> implemented Scheme function (which is often easier to fix than
> reducing overhead.)
>

Very agreed that poorly conceived code is the first thing to address, no
doubt.
I'd think that the way to gain insight as to what's going on is to inspect
the
bytecode actually, and gain familiarity with the APIs that execute it.
Is it that the bytecode is then translated to executable, or is it running
on a VM?
I would assume they don't provide a decompiler of any sort, do they?

Thanks for a most interesting discussion

L


reply via email to

[Prev in Thread] Current Thread [Next in Thread]