bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: memchr2 speed, gcc


From: Eric Blake
Subject: Re: memchr2 speed, gcc
Date: Fri, 25 Apr 2008 21:44:17 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080213 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Bruno Haible on 4/25/2008 9:14 PM:
| Hi Eric,
|
|> I've got an even more efficient implementation, inspired by
|> http://www.cl.cam.ac.uk/~am21/progtricks.html
|
| Very nice! And it has no "false positive" case, like the current memchr.c
| implementation.

I discovered the web page from this glibc bug report (I only looked at the
link in the bug report, not at the proposed patch):
http://sourceware.org/bugzilla/show_bug.cgi?id=5807.  Then after
implementing it, I also noticed that newlib uses the same algorithm for
some of its string functions.

|
|> Also, is anyone interested in making gnulib's memchr and strchrnul more
|> efficient by copying the optimizations learned in memchr2?
|
| That will definitely be useful, yes. But first, some polishing of the code.
| Here is a proposed patch to fix a few things on the surface:

Looks good to me, except for:

|
| !   /* Compute auxiliary longword values:
| !      repeated_one is a value which has a 1 in every byte.
| !      repeated_c1 has a c1 in every byte.
| !      repeated_c2 has a c2 in every byte.  */
| !   repeated_one = 0x01010101;
| !   repeated_c1 = c1 | (c1 << 8);
| !   repeated_c2 = c2 | (c2 << 8);
| !   repeated_c1 |= repeated_c1 << 16;
| !   repeated_c2 |= repeated_c2 << 16;
|     if (0xffffffffU < TYPE_MAXIMUM (longword))
|       {
| !       repeated_one |= repeated_one << 31 << 1;
| !       repeated_c1 |= repeated_c1 << 31 << 1;
| !       repeated_c2 |= repeated_c2 << 31 << 1;
| !       if (8 < sizeof (longword))
| !     {
| !       int i;
| !
| !       for (i = 64; i < sizeof (longword) * 8; i *= 2)

Since we are already computing repeated_one, I'm wondering if it would be
more efficient to compute repeated_c1 = repeated_one * c1 once at the end,
rather than computing repeated_c1 in parallel with the repeated
modifications to repeated_one.

| !      machines.  Choose a code that works in both cases.  */

s/a //

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgSpREACgkQ84KuGfSFAYB5agCdH7JH93LAIdK13zCy7qBr6z5O
Bh8An3JRkg/4PUw3hnUzM0/HkLMNu1ol
=rvE2
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]