[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: OpenBLAS and performance
From: |
Ludovic Courtès |
Subject: |
Re: OpenBLAS and performance |
Date: |
Fri, 22 Dec 2017 16:10:39 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) |
Hi,
Dave Love <address@hidden> skribis:
> Ludovic Courtès <address@hidden> writes:
>
>> Hello,
>>
>> Dave Love <address@hidden> skribis:
>>
>>> Fedora sensibly builds separately-named libraries for different flavours
>>> <https://apps.fedoraproject.org/packages/openblas/sources/>, but I'd
>>> argue also for threaded versions being available with the generic soname
>>> in librray sub-directories. There's some discussion and measurements
>>> (apologies if I've referenced it before) at
>>> <https://loveshack.fedorapeople.org/blas-subversion.html>
>>
>> I like the idea of an ‘update-alternative’ kind of approach for
>> interchangeable implementations.
>
> /etc/ld.so.conf.d normally provides a clean way to flip the default,
> but that isn't available in Guix as far as I remember.
Right.
>> Unfortunately my understanding is that implementations aren’t entirely
>> interchangeable, especially for LAPACK (not sure about BLAS), because
>> BLIS, OpenBLAS, etc. implement slightly different subsets of netlib
>> LAPACK, AIUI.
>
> LAPACK may add new routines, but you can always link with the vanilla
> Netlib version, and openblas is currently only one release behind. The
> LAPACK release notes I've seen aren't very helpful for following that.
> The important requirement is fast GEMM from the optimized BLAS. I
> thought BLIS just provided the BLAS layer, which is quite stable, isn't
> it?
I tried a while back to link PaSTiX (a sparse matrix direct solver
developed by colleagues of mine), IIRC, against BLIS, and it would miss
a couple of functions that Netlib LAPACK provides.
>> Packages also often check for specific implementations in
>> their configure/CMakeLists.txt rather than just for “BLAS” or “LAPACK”.
>
> It doesn't matter what they're built against when you dynamically load a
> compatible version.
Right but they do that precisely because all these implementations
provide different subsets of the Netlib APIs, AIUI.
>> FlexiBLAS, which Eric mentioned, looks interesting because it’s designed
>> specifically for that purpose. Perhaps worth giving it a try.
>
> I see it works by wrapping everything, which I wanted to avoid. Also
> it's GPL, which restricts its use. What's the advantage over just
> having implementations which are directly interchangeable at load time?
Dunno, I haven’t dig into it.
>> Besides, it would be good to have a BLAS/LAPACK policy in Guix. We
>> should at least agree (1) on default BLAS/LAPACK implementations, (2)
>> possibly on a naming scheme for variants based on a different
>> implementation.
>
> Yes, but the issue is wider than just linear algebra. It seems to
> reflect tension between Guix' approach (as I understand it) and the late
> binding I expect to use. There are potentially other libraries with
> similar micro-architecture-specific issues, and the related one of
> profiling/debugging versions. I don't know how much of a real problem
> there really is, and it would be good to know if someone has addressed
> this.
Guix’ approach is to use static binding a lot, and late binding
sometimes. For all things plugin-like we use late binding. For shared
libraries (not dlopened) we use static binding.
Static binding has a cost, as you write, but it gives us control over
the environment, and the ability to capture and replicate the software
environment. As a user, that’s something I value a lot.
I’d also argue that this is something computational scientists should
value: first because results they publish should not depend on the phase
of the moon, second because they should be able to provide peers with a
self-contained recipe to reproduce them.
> Yes, but even with dynamic dispatch you need to account for situations
> like we currently have on x86_64 with OB not supporting the latest
> micro-architecture, and it only works on x86 with OB. You may also want
> to avoid overhead -- see FFTW's advice for packaging. Oh for SIMD
> hwcaps...
I’m not sure what you mean. That OB does not support the latest
micro-architecture is not something the package manager can solve.
As for overhead, it should be limited to load time, as illustrated by
IFUNC and similar designs.
Thanks,
Ludo’.
- Re: OpenBLAS and performance, (continued)
- Re: OpenBLAS and performance, Eric Bavier, 2017/12/20
- Re: OpenBLAS and performance, Dave Love, 2017/12/21
- Re: OpenBLAS and performance, Dave Love, 2017/12/21
- Re: OpenBLAS and performance, Ricardo Wurmus, 2017/12/21
- Re: OpenBLAS and performance, Ricardo Wurmus, 2017/12/21
- Re: OpenBLAS and performance, Dave Love, 2017/12/22
Re: OpenBLAS and performance, Ludovic Courtès, 2017/12/21