bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: test-fdutimensat racy?


From: Bernhard Voelker
Subject: Re: test-fdutimensat racy?
Date: Wed, 05 Jun 2013 09:37:45 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

On 06/05/2013 01:33 AM, Pádraig Brady wrote:
> On 06/04/2013 09:20 PM, Bernhard Voelker wrote:
>> Subject: [PATCH] tests/nap.h: avoid race
> 
> That's a bit terse. I'd instead say:
> tests/nap.h: use an adaptive delay to avoid ctime update issues

Fixed - see complete patch below.

> It might be worth referencing some of the threads describing
> the possible kernel issues, like:
> https://lists.gnu.org/archive/html/bug-gnulib/2011-11/msg00226.html

Done.
Interestingly, these problems showed up quite regularly.

> It might be better to:
> 
> if (0 <= nap_fd)
>   {
>     ASSERT (close (nap_fd) != -1);
>     ASSERT (unlink (TEMPFILE) != -1);
>   }
> 
> which would allow you to setup the atexit handler
> before the file is created [...]

Great idea, thanks, done.

Thanks for the review!

Have a nice day,
Berny


>From af2ecd258ee76da92ab8ac832a6e1d153424ba16 Mon Sep 17 00:00:00 2001
From: Bernhard Voelker <address@hidden>
Date: Wed, 5 Jun 2013 09:20:15 +0200
Subject: [PATCH] tests/nap.h: use an adaptive delay to avoid ctime update
 issues

The recent change in nap.h (5191133e) decreased the probability of lost
races to about a third, however such problems could still be observed
in virtual machines and openSUSE's OBS.

Instead of calulating the nap() time once and using it (together with
a small correction multiplier), avoid the race alltogether by verifying
on a reference file whether a timestamp difference has happened.

Before, nap() detected the needed time once empirically and then used
that delay (together with a small correction multiplier) in further
calls.  This problem has been reported and discussed several times,
including guesses about possible kernel issues:
  https://lists.gnu.org/archive/html/bug-gnulib/2013-04/msg00071.html
  http://lists.gnu.org/archive/html/coreutils/2012-03/msg00088.html
  https://lists.gnu.org/archive/html/bug-gnulib/2011-11/msg00226.html
  http://bugs.gnu.org/12820
  https://lists.gnu.org/archive/html/bug-gnulib/2010-11/msg00113.html
  https://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00007.html

Now, nap() avoids the race alltogether by verifying on a reference
file whether a timestamp difference has happened.

* tests/nap.h (nap_fd): Define file descriptor variable for the
witness file.
(nap_works): Change return value to bool.  Change passing
the old file's status by value instead of by reference as this function
does no longer update that timestamp; rename the function argument from
st to old_st.  Remove the local variables cdiff and mdiff because that
function now returns true/false instead of the precise delay.
(guess_delay): Remove function.
(clear_tmp_file): Add new function to close and unlink the witness file.
(nap): Instead of re-using the delay which has been calculated during
the first call, avoid the race by actually verifying that a timestamp
difference can be observed on the current file system.  Use an adaptive
approach for the delay to minimize execution time.  Assert that the
maximum delay is <= ~2 seconds, more precisely sum(2^n) from 0 to 30
= 2^31 - 1 = 2.1s.
Use atexit to call clear_tmp_file when the process terminates.
---
 ChangeLog   |  35 +++++++++++++++++++
 tests/nap.h | 110 ++++++++++++++++++++++++++++--------------------------------
 2 files changed, 86 insertions(+), 59 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e8b9dfe..2c2ac44 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,38 @@
+2013-06-04  Bernhard Voelker  <address@hidden>
+
+       tests/nap.h: use an adaptive delay to avoid ctime update issues
+       The recent change in nap.h (5191133e) decreased the probability of lost
+       races to about a third, however such problems could still be observed
+       in virtual machines and openSUSE's OBS.
+       Before, nap() detected the needed time once empirically and then used
+       that delay (together with a small correction multiplier) in further
+       calls.  This problem has been reported and discussed several times,
+       including guesses about possible kernel issues:
+       https://lists.gnu.org/archive/html/bug-gnulib/2013-04/msg00071.html
+       http://lists.gnu.org/archive/html/coreutils/2012-03/msg00088.html
+       https://lists.gnu.org/archive/html/bug-gnulib/2011-11/msg00226.html
+       http://bugs.gnu.org/12820
+       https://lists.gnu.org/archive/html/bug-gnulib/2010-11/msg00113.html
+       https://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00007.html
+       Now, nap() avoids the race alltogether by verifying on a reference
+       file whether a timestamp difference has happened.
+       * tests/nap.h (nap_fd): Define file descriptor variable for the
+       witness file.
+       (nap_works): Change return value to bool.  Change passing
+       the old file's status by value instead of by reference as this function
+       does no longer update that timestamp; rename the function argument from
+       st to old_st.  Remove the local variables cdiff and mdiff because that
+       function now returns true/false instead of the precise delay.
+       (guess_delay): Remove function.
+       (clear_tmp_file): Add new function to close and unlink the witness file.
+       (nap): Instead of re-using the delay which has been calculated during
+       the first call, avoid the race by actually verifying that a timestamp
+       difference can be observed on the current file system.  Use an adaptive
+       approach for the delay to minimize execution time.  Assert that the
+       maximum delay is <= ~2 seconds, more precisely sum(2^n) from 0 to 30
+       = 2^31 - 1 = 2.1s.
+       Use atexit to call clear_tmp_file when the process terminates.
+
 2013-06-02  Paul Eggert  <address@hidden>

        sig2str: port to C++
diff --git a/tests/nap.h b/tests/nap.h
index d8dbad2..b7144a4 100644
--- a/tests/nap.h
+++ b/tests/nap.h
@@ -20,6 +20,10 @@
 # define GLTEST_NAP_H

 # include <limits.h>
+# include <stdbool.h>
+
+/* File descriptor used for the witness file.  */
+static int nap_fd = -1;

 /* Return A - B, in ns.
    Return 0 if the true result would be negative.
@@ -53,86 +57,74 @@ get_stat (int fd, struct stat *st, int do_write)
 }

 /* Given a file whose descriptor is FD, see whether delaying by DELAY
-   nanoseconds causes a change in a file's time stamp.  *ST is the
-   file's status, recently gotten.  Update *ST to reflect the latest
-   status gotten.  If successful, return the needed delay, in
-   nanoseconds as determined by the observed time stamps; this may be
-   greater than DELAY if we crossed a quantization boundary.  If
-   unsuccessful, return 0.  */
-static int
-nap_works (int fd, int delay, struct stat *st)
+   nanoseconds causes a change in a file's time stamp.  OLD_ST is the
+   file's status, recently gotten.  */
+static bool
+nap_works (int fd, int delay, struct stat old_st)
 {
-  struct stat old_st = *st;
+  struct stat st;
   struct timespec delay_spec;
-  int cdiff, mdiff;
   delay_spec.tv_sec = delay / 1000000000;
   delay_spec.tv_nsec = delay % 1000000000;
   ASSERT (nanosleep (&delay_spec, 0) == 0);
-  get_stat (fd, st, 1);
+  get_stat (fd, &st, 1);

   /* Return the greater of the ctime and the mtime differences, or
      zero if it cannot be determined, or INT_MAX if either overflows.  */
-  cdiff = diff_timespec (get_stat_ctime (st), get_stat_ctime (&old_st));
-  if (cdiff != 0)
-    {
-      mdiff = diff_timespec (get_stat_mtime (st), get_stat_mtime (&old_st));
-      if (mdiff != 0)
-        return cdiff < mdiff ? mdiff : cdiff;
-    }
-  return 0;
+  if (   diff_timespec (get_stat_ctime (&st), get_stat_ctime (&old_st))
+      && diff_timespec (get_stat_mtime (&st), get_stat_mtime (&old_st)))
+    return true;
+
+  return false;
 }

-static int
-guess_delay (void)
+#define TEMPFILE BASE "nap.tmp"
+
+static void
+clear_temp_file (void)
 {
-  /* Try a 1-ns sleep first, for speed.  If that doesn't work, try 100
-     ns, 1 microsecond, 1 ms, etc.  xfs has a quantization of about 10
-     milliseconds, even though it has a granularity of 1 nanosecond,
-     and NTFS has a default quantization of 15.25 milliseconds, even
-     though it has a granularity of 100 nanoseconds, so 15.25 ms is a
-     good quantization to try.  If that doesn't work, try 1 second.
-     The worst case is 2 seconds, needed for FAT.  */
-  static int const delaytab[] = {1, 1000, 1000000, 15250000, 1000000000 };
-  int fd = creat (BASE "tmp", 0600);
-  int i;
-  int delay = 2000000000;
-  struct stat st;
-  ASSERT (0 <= fd);
-  get_stat (fd, &st, 0);
-  for (i = 0; i < sizeof delaytab / sizeof delaytab[0]; i++)
+  if (0 <= nap_fd)
     {
-      int d = nap_works (fd, delaytab[i], &st);
-      if (d != 0)
-        {
-          delay = d;
-          break;
-        }
+      ASSERT (close (nap_fd) != -1);
+      ASSERT (unlink (TEMPFILE) != -1);
     }
-  ASSERT (close (fd) == 0);
-  ASSERT (unlink (BASE "tmp") == 0);
-  return delay;
 }

 /* Sleep long enough to notice a timestamp difference on the file
-   system in the current directory.  Assumes that BASE is defined,
-   and requires that the test module depends on nanosleep.  */
+   system in the current directory.  Use an adaptive approach, trying
+   to find the smallest delay which works on the current file system
+   to make the timestamp difference appear.  Assert a maximum delay of
+   ~2 seconds, more precisely sum(2^n) from 0 to 30 = 2^31 - 1 = 2.1s.
+   Assumes that BASE is defined, and requires that the test module
+   depends on nanosleep.  */
 static void
 nap (void)
 {
-  static struct timespec delay;
-  if (!delay.tv_sec && !delay.tv_nsec)
+  struct stat old_st;
+  static int delay = 1;
+
+  if (-1 == nap_fd)
+    {
+      atexit (clear_temp_file);
+      ASSERT ((nap_fd = creat (TEMPFILE, 0600)) != -1);
+      get_stat (nap_fd, &old_st, 0);
+    }
+  else
     {
-      int d = guess_delay ();
-
-      /* Multiply by 1.125 (rounding up), to avoid problems if the
-         file system's clock is a bit slower than nanosleep's.
-         Ceiling it at INT_MAX, though.  */
-      int delta = (d >> 3) + ((d & 7) != 0);
-      d = delta < INT_MAX - d ? d + delta : INT_MAX;
-      delay.tv_sec = d / 1000000000;
-      delay.tv_nsec = d % 1000000000;
+      ASSERT (0 <= nap_fd);
+      get_stat (nap_fd, &old_st, 1);
     }
-  ASSERT (nanosleep (&delay, 0) == 0);
+
+  if (1 < delay)
+    delay = delay / 2;  /* Try half of the previous delay.  */
+  ASSERT (0 < delay);
+
+  for ( ; delay <= 2147483647; delay = delay * 2)
+    if (nap_works (nap_fd, delay, old_st))
+      return;
+
+  /* Bummer: even the highest nap delay didn't work. */
+  ASSERT (0);
 }

 #endif /* GLTEST_NAP_H */
-- 
1.8.2.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]