[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/4] sha1sum: use AF_ALG when available
From: |
Matteo Croce |
Subject: |
Re: [PATCH v2 1/4] sha1sum: use AF_ALG when available |
Date: |
Thu, 26 Apr 2018 03:24:25 +0200 |
On Wed, Apr 25, 2018 at 7:34 PM, Assaf Gordon <address@hidden> wrote:
> Hello Matteo,
>
> On Wed, Apr 25, 2018 at 01:26:08PM +0200, Matteo Croce wrote:
>> Linux supports accessing kernel crypto API via AF_ALG since
>> version 2.6.38. Coreutils uses libcrypto when available and fallbacks to
>> generic C implementation of various hashing functions.
>
> Not exactly: coreutils (and every project which uses gnulib's md5/sha*
> modules)
> defaults to using the generic C implementation.
> gnulib automatically adds the option "--with-openssl" to "./configure",
> which can be set explicitly to 'yes' or 'auto'.
>
> To be conservative, I would suggest following the same method:
> Add a "configure" flag instead of enabling a completely new interface
> by default for every downstream project,
> especially that it is tied to external components (the kernel / modules /
> etc).
>
> If a distribution wants to enable by default, they can do it
> for their package (and ensure it works well with their kernel).
>
Hi Assaf,
I will have a look at the autotools to add such option to configure
and check for something like HAVE_LINUX_CRYPTO
>> Use afalg_stream() only in sha1sum for now, but other hashes are possible.
>> The speed gain really depends on the CPU type, on systems which doesn't use
>> libcrypto ranges from ~10% to 320%.
>
> It would be beneficial to know the improvements against coreutils+libcrypto
> on the same CPUs, because if users build with "--openssl=yes" (which is
> currently
> considered the "fast" implementation), they would not benefit from your patch.
> So they'll need to decide which one is preferable (and OpenSSL/libreSSL is
> faster
> on more CPUs and OSs, not just linux on a subset of CPUs).
>
I did comparations against libcrypto too, the results depends on the
CPU and file size but generally libcrypto and af_alg are paragonable.
Of course they are way faster than the generic C implementation.
on x86_64 and file smaller than 2 GB af_alg is slightly faster than
libcrypto, probably because of the sendfile usage:
$ grep -m1 '^model name' /proc/cpuinfo
model name : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
$ uname -a
Linux turbo 4.16.2 #71 SMP Sat Apr 21 20:14:08 CEST 2018 x86_64 x86_64
x86_64 GNU/Linux
$ time ./sha1sum-plain 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
real 0m2,718s
user 0m2,548s
sys 0m0,170s
$ time ./sha1sum-libcrypto 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
real 0m1,707s
user 0m1,538s
sys 0m0,170s
$ time ./sha1sum-afalg 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
real 0m1,680s
user 0m0,000s
sys 0m1,668s
on x86_64 and huge files, af_alg is slightly slower than libcrypto,
probably because two syscalls are needed per block:
$ time ./sha1sum-plain 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin
real 0m22,410s
user 0m20,159s
sys 0m2,250s
$ time ./sha1sum-libcrypto 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin
real 0m13,873s
user 0m12,263s
sys 0m1,610s
$ time ./sha1sum-afalg 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin
real 0m14,124s
user 0m0,080s
sys 0m14,044s
I have similar results on an arm64 board running a quad core Cortex
A72, small file:
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 50.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 1
$ uname -a
Linux macchiatobin 4.16.0 #8 SMP Sat Apr 21 17:55:12 UTC 2018 aarch64
aarch64 aarch64 GNU/Linux
$ time ./sha1sum-plain 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
real 0m9.348s
user 0m8.176s
sys 0m1.171s
$ time ./sha1sum-libcrypto 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
real 0m2.232s
user 0m1.721s
sys 0m0.510s
$ time ./sha1sum-afalg 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
real 0m2.219s
user 0m0.000s
sys 0m2.219s
and a bigger one:
$ time ./sha1sum-plain 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin
real 1m14.968s
user 1m5.860s
sys 0m9.106s
$ time ./sha1sum-libcrypto 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin
real 0m20.706s
user 0m14.674s
sys 0m6.001s
$ time ./sha1sum-afalg 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin
real 0m21.608s
user 0m0.142s
sys 0m21.462s
Unfortunately I don't have a board with a crypto HW engine available,
but given the speed of such chips, the results will vary a lot in
favour of af_alg.
Also, keep in mind that some distribution don't want or can't use
libssl for some reasons (licensing, firmware footprint, etc.)
>> This is a test on a Intel(R) Xeon(R) CPU E3-1265L V2 and Debian stretch:
>>
>> $ truncate -s 2GB 2g.bin
>> $ time sha1sum 2g.bin
>> 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
>>
>> real 0m4.829s
>> user 0m4.437s
>> sys 0m0.391s
>> $ time ./sha1sum-afalg 2g.bin
>> 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin
>>
>> real 0m3.164s
>> user 0m0.000s
>> sys 0m3.162s
>
> Regarding the above measurement:
> On most modern filesystems "truncate -s" would create a sparse file,
> requiring very little I/O.
> And if the measurements above are indeed mostly CPU-bound
> and not IO-bound, it indicates that hasing in the kernel
> is only somewhat faster then in user-mode, but not significantly so
> if you exclude I/O (not as dramatic difference as measured in
> your previous email).
>
That was intended behaviour. Reading from a slow disk would have
reported similar times between versions.
> This also brings the question: in the previous emails you report
> the improved *-afalg versions after the default versions - did
> you clear the cache to exclude I/O effects?
> (e.g. "sync; echo 3 > /proc/sys/vm/drop_caches")
>
as before, that would clear the block cache and the I/O could be a
bottleneck, probably not what we want.
>> + {
>> + /* sendfile() not possible, do a classic read-write loop */
>> + while ((size = fread (buf, 1, sizeof (buf), stream)))
>> + {
>> + if (send (ofd, buf, size, size == sizeof (buf) ? MSG_MORE : 0) ==
>> -1)
>> + {
>
> Few comments:
> 1. In the above loop, wouldn't a file that's an exact multiple of BLOCKSIZE
> never be sent 0 as the 4th paramater?
> And if so, assuming hashing still works because "MSG_MORE" is just a hint,
> why not always send MSG_MORE ?
>
I tried it before posting the patch, it seems that doing the last
write with MSG_MORE still generates a valid hash. As you guessed, it's
just an hint:
$ dd status=none if=/dev/zero bs=32k count=4 |sha1sum
67dfd19f3eb3649d6f3f6631e44d0bd36b8d8d19 -
$ dd status=none if=/dev/zero bs=32k count=4 |strace -e trace=%network
./sha1sum-afalg
socket(AF_ALG, SOCK_SEQPACKET, 0) = 3
bind(3, {sa_family=AF_ALG,
sa_data="hash\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0sha1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"},
88) = 0
accept(3, NULL, NULL) = 4
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
67dfd19f3eb3649d6f3f6631e44d0bd36b8d8d19 -
+++ exited with 0 +++
Yes, we can always pass MSG_MORE to the kernel, the hash will
correctly generated when calling read()
> 2. The code for sha*_stream() functions is a bit more elaborate,
> and checks ferror/feof to accomodate EAGAIN - it might be useful
> to replicate it here.
>
>> + if (size != hashlen)
>> + {
>> + fprintf (stderr, "Error from read (%zd vs %zd bytes): %s\n",
>> + size, hashlen, strerror (errno));
>> + ret = -EIO;
>
> The fprintf might be problematic:
> First, this is a cryptic error message, in the sense that
> it indicate an I/O error (hinted by the word "read"),
> but is actually a kernel crypto/driver issue.
> If a user sees this message, there's no chance they'll know
> what the problem is. It's better to be more explicit that
> this relates to kernel crypto API problem.
>
Indeed, it should be reworded better.
> Second, all other sha*_stream functions never output anything
> to STDERR. Their interface is returning 0 for success, 1 for failure,
> and setting errno for failure. Not sure if this is an official API
> agreement or just de-facto, but it means that users of sha*_stream
> functions should now be aware there might be output.
>
> Lastly, if the decision is to keep the error message,
> it's likely better to use gnulib's error() function, like so:
>
> error (0, errno, _("some error message %zd"), size);
>
> Which will take care of strerror, and also enable translation.
>
I wasn't sure if put the error message or not. I think that a library
should not clobber the output with text, probably I did a mistake,
returning 1 and set errno seems a better idea.
> ----
>
> On more general note, while the Crypto API is available since version 2.6.38,
> There are still some issues popping here and there.
> for example:
> https://security-tracker.debian.org/tracker/CVE-2016-8646
> https://bugzilla.redhat.com/show_bug.cgi?id=1395896
>
That's bad. I'm not a security expert at all, but I did a quick search
and I've found that also libcrypt had some CVEs.
Nobody's perfect :)
> Another example, the libkcapi [1] library (referenced from the kernel crypto
> api documentation)
> contains a work-around for a bug in kernels pre-4.9 [2].
> I don't know if this is relevant for your implementation or not,
> but if it is, it's worth checking for this and other issues.
> [1] http://www.chronox.de/libkcapi.html
> [2]
> https://github.com/smuellerDD/libkcapi/commit/b8d5941addb15fe5a716eef24060fbd306c06ec9
>
That's bad too. In this case probably the distribution kernel
maintainers should backport such fix anyway, regardless of af_alg
support in coreutils.
> It also seems OpenSSL version 1.1.0 added support for AF_ALG.
> Perhaps it's better to leave it to them (since gnulib can already
> use openssl easily)?
>
I've just checked the openssl version on my two reference systems
which are an arm64 Ubuntu 18.04:
$ dpkg -l libssl1.1 |grep ^ii
ii libssl1.1:arm64 1.1.0g-2ubuntu4 arm64
and an x86_64 Fedora 27:
$ rpm -q openssl-libs
openssl-libs-1.1.0h-3.fc27.x86_64
strace revealed that none of them seems to use af_alg, at least in my
tests. Probably the package maintainer had to enable it?
Regards,
--
Matteo Croce
per aspera ad upstream