bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: af_alg: Add ability to use Linux kernel crypto API on data in memory


From: Matteo Croce
Subject: Re: af_alg: Add ability to use Linux kernel crypto API on data in memory
Date: Mon, 7 May 2018 11:55:29 +0200

On Mon, May 7, 2018 at 4:07 AM, Paul Eggert <address@hidden> wrote:
> Bruno Haible wrote:
>>
>> Oops, I goofed with "git diff". Here's the correct patch to test.
>
>
> I tried those bench-md5 benchmarks on two platforms, with somewhat more
> disappointing results.
>
> I observed a real-time slowdown ranging from 11% (large buffers) to 22x
> (small buffers) on Intel Xeon E3-1225 V2 (circa 2012 CPU), Ubuntu 16.04,
> Linux 4.4.0, glibc 2.23. See attached file ubuntu1604.txt.
>
> I observed a real-time slowdown ranging from 8% (large buffers) to 43x
> (small buffers) on AMD Phenom II X4 910e (circa 2010 CPU), Fedora 28, Linux
> 4.16.5, glibc 2.27. See attached file fedora28.txt.
>
> These numbers compare somewhat unfavorably with your report, where the
> real-time slowdown ranged from 1.5% (large buffers) to 25x (small buffers),
> as reported in <https://lists.gnu.org/r/bug-gnulib/2018-05/msg00035.html>.

Hi all,

I tried all the above, I can confirm the disappointing results with
md5 or small buffers.
This is what happens on my machine, a Lenovo Laptop with Intel(R)
Core(TM) i7-6820HQ CPU @ 2.70GHz running Fedora 27

with large buffers all the algos are faster but md5:

$ without/gltests/bench-md5 1000000000 1
real   1.520719
user   1.520
sys    0.000
$ with/gltests/bench-md5 1000000000 1
real   1.684162
user   0.000
sys    1.684

$ without/gltests/bench-sha1 1000000000 1
real   1.696258
user   1.696
sys    0.000
$ with/gltests/bench-sha1 1000000000 1
real   1.072500
user   0.000
sys    1.072

$ without/gltests/bench-sha256 1000000000 1
real   4.467676
user   4.468
sys    0.000
$ with/gltests/bench-sha256 1000000000 1
real   2.527936
user   0.009
sys    2.519

$ without/gltests/bench-sha512 1000000000 1
real   2.684985
user   2.685
sys    0.000
$ with/gltests/bench-sha256 1000000000 1
real   2.546133
user   0.004
sys    2.542


While for sha1, af_alg become faster with buffers > 100k:

$ without/gltests/bench-sha1 100 1000000
real   0.292869
user   0.293
sys    0.000
$ with/gltests/bench-sha1 100 1000000
real   9.153545
user   0.698
sys    8.421

$ without/gltests/bench-sha1 1000 100000
real   0.190652
user   0.191
sys    0.000
$ with/gltests/bench-sha1 1000 100000
real   1.033346
user   0.071
sys    0.963

$ without/gltests/bench-sha1 10000 10000
real   0.183897
user   0.184
sys    0.000
$ with/gltests/bench-sha1 10000 10000
real   0.214090
user   0.003
sys    0.212

$ without/gltests/bench-sha1 100000 1000
real   0.181184
user   0.181
sys    0.000
$ with/gltests/bench-sha1 100000 1000
real   0.131482
user   0.002
sys    0.130

$ without/gltests/bench-sha1 1000000 100
real   0.178751
user   0.179
sys    0.000
$ with/gltests/bench-sha1 1000000 100
real   0.122498
user   0.000


sha256 instead, become faster with af_alg with buffers > 10k:

$ without/gltests/bench-sha256 100 1000000
real   0.617181
user   0.617
sys    0.000
$ with/gltests/bench-sha256 100 1000000
real   9.655386
user   0.703
sys    8.950

$ without/gltests/bench-sha256 1000 100000
real   0.470694
user   0.471
sys    0.000
$ with/gltests/bench-sha256 1000 100000
real   1.203199
user   0.091
sys    1.112

$ without/gltests/bench-sha256 10000 10000
real   0.459542
user   0.460
sys    0.000
$ with/gltests/bench-sha256 10000 10000
real   0.360933
user   0.003
sys    0.358

$ without/gltests/bench-sha256 100000 1000
real   0.454326
user   0.454
sys    0.000
$ with/gltests/bench-sha256 100000 1000
real   0.279998
user   0.000
sys    0.280

$ without/gltests/bench-sha256 1000000 100
real   0.451635
user   0.452
sys    0.000
$ with/gltests/bench-sha256 1000000 100
real   0.266343
user   0.001
sys    0.265

$ without/gltests/bench-sha256 10000000 10
real   0.443723
user   0.444
sys    0.000
$ with/gltests/bench-sha256 10000000 10
real   0.260270
user   0.000
sys    0.260

Keep in mind that I have the infamous patch to mitigate the Intel CPU
bug, which adds a big overhead to syscalls, but it will hopefully
disappear on future CPUs:

$ dmesg |grep isolation
[    0.000000] Kernel/User page tables isolation: enabled

-- 
Matteo Croce
per aspera ad upstream



reply via email to

[Prev in Thread] Current Thread [Next in Thread]