bug-binutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug ld/27695] New: ld has poor performance characteristics when loading


From: steve.gargolinski at gmail dot com
Subject: [Bug ld/27695] New: ld has poor performance characteristics when loading large quantities of .so files
Date: Mon, 05 Apr 2021 17:58:32 +0000

https://sourceware.org/bugzilla/show_bug.cgi?id=27695

            Bug ID: 27695
           Summary: ld has poor performance characteristics when loading
                    large quantities of .so files
           Product: binutils
           Version: 2.28
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: ld
          Assignee: unassigned at sourceware dot org
          Reporter: steve.gargolinski at gmail dot com
  Target Milestone: ---

Our application is growing and our startup time is increasing significantly on
Linux while remaining fairly consistent on Windows. A typical startup workflow
that we've been measuring takes about 10 seconds on Windows and over 60 seconds
on Linux with comparable hardware.

Profiling the platform startup time difference attributes the time completely
to ld.so. We did a bunch of experimentation and investigation and realized that
our growing quantity of dynamic libraries is a major contributor to this
change.

In order to replicate this outside of our product we generated a small sample
application that measure time to load 100,000 small generated classes
(constructor, virtual destructor) spread across a varying quantity of dynamic
libraries. Loading these 100,000 classes in one dynamic library takes about 0.3
seconds. Loading the same 100,000 classes spread across 1,000 libraries takes
over 9 seconds!

Back to our real world use case. In our product we generally load libraries
with RTLD_GLOBAL. One of the main performance bottlenecks we were able to
identify is in _dl_lookup_symbol_x(). When searching the global scope
(symbol_scope[0]), the search found nothing > 50% of the time and did so with
linear performance.

return _dl_lookup_symbol_x(undef_name, undef_map, ref, symbol_scope, version,
type_class, flags, skip_map);

A major portion of our 60 second startup time is spent here. We experimented
with adding a hashset of symbols previously loaded into the global scope
(updated in add_to_global()) so that we could get constant time lookup on this
check instead of linear. This was a major improvement to both our test
application and our real product.

The test application mentioned above, which previously took 9 seconds to load
1,000 libraries, now performs the same operation in 1 second.

We've prototyped a strategy to dynamically patch ld.so at startup of our
application and our workflow time measurements improved from 60 seconds to 30
seconds. Still not nearly as fast as Windows, but a major improvement. We've
tested this on a bunch of versions of multiple distributions and have been able
to improve all of them.

With this change we're adding some memory overhead. Also timing improvements
will not be seen by applications loading a small number of dynamic libraries
(and can even cause a performance regression due to time spent populating the
hashset) - but it's a huge improvement to our use case.

I'm happy to share any of the fixes or investigations in more detail. Improving
ld.so performance as dynamic library quantity scales is really important to our
use case and we're looking for input on whether this can be a useful addition
to the glibc codebase.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]