[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: af_alg benchmarks and performance

From: Bruno Haible
Subject: Re: af_alg benchmarks and performance
Date: Tue, 08 May 2018 02:20:07 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-119-generic; KDE/5.18.0; x86_64; ; )

What can we do to get the speedups but avoid the slowdowns, in the two
hairy cases
  - the afalg_buffer case,
  - the afalg_stream case with non-regular files?

The following approaches come to mind:

  * A "tuning" framework like the one from GMP. This is a set of benchmark
    programs that the developers use to determine the break-even points
    on a platform and write them into a platform-specific gmp-param.h file.

    Drawback: Who will have the time (and resources) to do this for the
    hundreds of x86_64 CPUs and the dozens of ARMs CPUs on the market?

  * A configure test that compares the speed of the two implementations
    and sets a flag in config.h accordingly.

    - This does not solve the issue for programs distributed through a Linux
    - The outcome of this configure test may depend on the load of the machine
      at the moment 'configure' runs.
    - Changes in the kernel (which are likely to arrive due to Meltdown,
      Spectre, and Spectre-NG fixes) will affect these comparisons.
    - It goes against the goals of "reproducible builds".

  * Use the kernel-provided meta-info about the algorithms to decide whether
    to use the kernel API.

    In detail: Read /proc/crypto at run time. It consists of records
    with fields (name, driver, module, priority, internal, type).
    - Consider only the records for the names we are interested in.
    - Eliminate records with module != "kernel" (since we are not ready to
      handle the situation that the module gets unloaded while af_alg.c
      iterates on the data).
    - No need to eliminate records with internal = "yes", since these have a
      name that starts with '__', thus are already eliminated.
    - Eliminate records with priority <= 100, because these are the generic
      implementations, that provide no significant speedup compared to the
      generic C implementation in gnulib (assuming the gnulib code was compiled
      with -O2).

    Drawback: Does not work if /proc is not mounted.

I would favour the third approach.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]