bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#239288: coreutils: cp --preserve=timestamps -u copies repeatedly


From: Paul Eggert
Subject: Re: Bug#239288: coreutils: cp --preserve=timestamps -u copies repeatedly
Date: Sat, 27 Mar 2004 23:37:45 -0800
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Jim Meyering <address@hidden> writes:

>   http://article.gmane.org/gmane.comp.gnu.coreutils.bugs/1464
>
> but no one has begun work on that, as far as I know.

Here's a proposed implementation for that idea.  It was a bit trickier
than I thought it would be, mostly because I thought of more
optimizations.

2004-03-27  Paul Eggert  <address@hidden>

        * NEWS: cp -pu and mv -u (when copying) now take the destination
        file system time stamp resolution into account.
        * doc/coreutils.texi (mv invocation): Document this.
        (cp invocation): Document -u (it was missing!) with new behavior.

        * lib/utimecmp.c, lib/utimecmp.h, m4/utimecmp.m4: New files.
        * lib/Makefile.am (libfetish_a_SOURCES): Add utimecmp.c, utimecmp.h.
        * m4/prereq.m4 (jm_PREREQ): Require gl_UTIMECMP.
        * src/copy.c: Include "utimecmp.h".
        (copy_internal): Compare time stamps using utimecmp rather than
        MTIME_CMP.

Index: NEWS
===================================================================
RCS file: /home/meyering/coreutils/cu/NEWS,v
retrieving revision 1.194
diff -p -u -r1.194 NEWS
--- NEWS        24 Mar 2004 17:38:58 -0000      1.194
+++ NEWS        24 Mar 2004 23:15:42 -0000
@@ -4,6 +4,12 @@ GNU coreutils NEWS                      
 
 ** New features
 
+  cp -pu and mv -u (when copying) now don't bother to update the
+  destination if the resulting time stamp would be no newer than the
+  preexisting time stamp.  This saves work in the common case when
+  copying or moving multiple times to the same destination in a file
+  system with a coarse time stamp resolution.
+
    'df', 'du', and 'ls' now take the default block size from the
    BLOCKSIZE environment variable if the BLOCK_SIZE, DF_BLOCK_SIZE,
    DU_BLOCK_SIZE, and LS_BLOCK_SIZE environment variables are not set.
Index: doc/coreutils.texi
===================================================================
RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.173
diff -p -u -r1.173 coreutils.texi
--- doc/coreutils.texi  24 Mar 2004 17:38:17 -0000      1.173
+++ doc/coreutils.texi  24 Mar 2004 23:11:13 -0000
@@ -6402,6 +6402,19 @@ results in an error message on systems t
 
 @optTargetDirectory
 
address@hidden -u
address@hidden --update
address@hidden -u
address@hidden --update
address@hidden newer files, copying only
+Do not copy a non-directory that has an existing destination with the
+same or newer modification time.  If time stamps are being preserved,
+the comparison is to the source time stamp truncated to the
+resolutions of the destination file system and of the system calls
+used to update time stamps; this avoids duplicate work if several
address@hidden -pu} commands are executed with the same source and
+destination.
+
 @item -v
 @itemx --verbose
 @opindex -v
@@ -6798,6 +6811,11 @@ about each existing destination file.
 @cindex newer files, moving only
 Do not move a non-directory that has an existing destination with the
 same or newer modification time.
+If the move is across file system boundaries, the comparison is to the
+source time stamp truncated to the resolutions of the destination file
+system and of the system calls used to update time stamps; this avoids
+duplicate work if several @samp{mv -u} commands are executed with the
+same source and destination.
 
 @item -v
 @itemx --verbose
Index: lib/Makefile.am
===================================================================
RCS file: /home/meyering/coreutils/cu/lib/Makefile.am,v
retrieving revision 1.182
diff -p -u -r1.182 Makefile.am
--- lib/Makefile.am     23 Mar 2004 17:34:05 -0000      1.182
+++ lib/Makefile.am     24 Mar 2004 20:20:57 -0000
@@ -115,6 +115,7 @@ libfetish_a_SOURCES = \
   unistd-safer.h \
   unlocked-io.h \
   userspec.c userspec.h \
+  utimecmp.c utimecmp.h \
   utimens.c utimens.h \
   version-etc.c version-etc.h \
   xalloc.h \
Index: m4/prereq.m4
===================================================================
RCS file: /home/meyering/coreutils/cu/m4/prereq.m4,v
retrieving revision 1.83
diff -p -u -r1.83 prereq.m4
--- m4/prereq.m4        18 Dec 2003 10:33:39 -0000      1.83
+++ m4/prereq.m4        24 Mar 2004 22:21:02 -0000
@@ -103,6 +103,7 @@ AC_DEFUN([jm_PREREQ],
   AC_REQUIRE([gl_UNICODEIO])
   AC_REQUIRE([gl_UNISTD_SAFER])
   AC_REQUIRE([gl_USERSPEC])
+  AC_REQUIRE([gl_UTIMECMP])
   AC_REQUIRE([gl_UTIMENS])
   AC_REQUIRE([gl_XALLOC])
   AC_REQUIRE([gl_XGETCWD])
Index: src/copy.c
===================================================================
RCS file: /home/meyering/coreutils/cu/src/copy.c,v
retrieving revision 1.159
diff -p -u -r1.159 copy.c
--- src/copy.c  12 Mar 2004 11:53:18 -0000      1.159
+++ src/copy.c  28 Mar 2004 00:09:07 -0000
@@ -39,6 +39,7 @@
 #include "quote.h"
 #include "same.h"
 #include "savedir.h"
+#include "utimecmp.h"
 #include "utimens.h"
 #include "xreadlink.h"
 
@@ -945,16 +946,28 @@ copy_internal (const char *src_path, con
                  return 1;
                }
 
-             if (x->update && MTIME_CMP (src_sb, dst_sb) <= 0)
+             if (x->update)
                {
-                 /* We're using --update and the source file is older
-                    than the destination file, so there is no need to
-                    copy or move.  */
-                 /* Pretend the rename succeeded, so the caller (mv)
-                    doesn't end up removing the source file.  */
-                 if (rename_succeeded)
-                   *rename_succeeded = 1;
-                 return 0;
+                 /* When preserving time stamps (but not moving within a file
+                    system), don't worry if the destination time stamp is
+                    less than the source merely because of time stamp
+                    truncation.  */
+                 int options = ((x->preserve_timestamps
+                                 && ! (x->move_mode
+                                       && dst_sb.st_dev == src_sb.st_dev))
+                                ? UTIMECMP_TRUNCATE_SOURCE
+                                : 0);
+
+                 if (0 <= utimecmp (dst_path, &dst_sb, &src_sb, options))
+                   {
+                     /* We're using --update and the destination is not older
+                        than the source, so do not copy or move.  Pretend the
+                        rename succeeded, so the caller (if it's mv) doesn't
+                        end up removing the source file.  */
+                     if (rename_succeeded)
+                       *rename_succeeded = 1;
+                     return 0;
+                   }
                }
            }
 
--- /dev/null   Tue Mar 18 13:55:57 2003
+++ lib/utimecmp.c      Sat Mar 27 16:22:09 2004
@@ -0,0 +1,340 @@
+/* utimecmp.c -- compare file time stamps
+
+   Copyright (C) 2004 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+/* Written by Paul Eggert.  */
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#include "utimecmp.h"
+
+#if HAVE_INTTYPES_H
+# include <inttypes.h>
+#endif
+#if HAVE_STDINT_H
+# include <stdint.h>
+#endif
+
+#include <limits.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include "hash.h"
+#include "timespec.h"
+#include "utimens.h"
+#include "xalloc.h"
+
+/* Verify a requirement at compile-time (unlike assert, which is runtime).  */
+#define verify(name, assertion) struct name { char a[(assertion) ? 1 : -1]; }
+
+#ifndef MAX
+# define MAX(a, b) ((a) > (b) ? (a) : (b))
+#endif
+
+#ifndef SIZE_MAX
+# define SIZE_MAX ((size_t) -1)
+#endif
+
+/* The extra casts work around common compiler bugs.  */
+#define TYPE_SIGNED(t) (! ((t) 0 < (t) -1))
+/* The outer cast is needed to work around a bug in Cray C 5.0.3.0.
+   It is necessary at least when t == time_t.  */
+#define TYPE_MINIMUM(t) ((t) (TYPE_SIGNED (t) \
+                             ? ~ (t) 0 << (sizeof (t) * CHAR_BIT - 1) : (t) 0))
+#define TYPE_MAXIMUM(t) ((t) (~ (t) 0 - TYPE_MINIMUM (t)))
+
+enum { BILLION = 1000 * 1000 * 1000 };
+
+/* Best possible resolution that utimens can set and stat can return,
+   due to system-call limitations.  It must be a power of 10 that is
+   no greater than 1 billion.  */
+#if HAVE_WORKING_UTIMES && defined ST_MTIM_NSEC
+enum { SYSCALL_RESOLUTION = 1000 };
+#else
+enum { SYSCALL_RESOLUTION = BILLION };
+#endif
+
+/* Describe a file system and its time stamp resolution in nanoseconds.  */
+struct fs_res
+{
+  /* Device number of file system.  */
+  dev_t dev;
+
+  /* An upper bound on the time stamp resolution of this file system,
+     ignoring any resolution that cannot be set via utimens.  It is
+     represented by an integer count of nanoseconds.  It must be
+     either 2 billion, or a power of 10 that is no greater than a
+     billion and is no less than SYSCALL_RESOLUTION.  */
+  int resolution;
+
+  /* True if RESOLUTION is known to be exact, and is not merely an
+     upper bound on the true resolution.  */
+  bool exact;
+};
+
+/* Hash some device info.  */
+static size_t
+dev_info_hash (void const *x, size_t table_size)
+{
+  struct fs_res const *p = x;
+
+  /* Beware signed arithmetic gotchas.  */
+  if (TYPE_SIGNED (dev_t) && SIZE_MAX < MAX (INT_MAX, TYPE_MAXIMUM (dev_t)))
+    {
+      uintmax_t dev = p->dev;
+      return dev % table_size;
+    }
+
+  return p->dev % table_size;
+}
+
+/* Compare two dev_info structs.  */
+static bool
+dev_info_compare (void const *x, void const *y)
+{
+  struct fs_res const *a = x;
+  struct fs_res const *b = y;
+  return a->dev == b->dev;
+}
+
+/* Return -1, 0, 1 based on whether the destination file (with name
+   DST_NAME and status DST_STAT) is older than SRC_STAT, the same age
+   as SRC_STAT, or newer than SRC_STAT, respectively.
+
+   If OPTIONS & UTIMECMP_TRUNCATE_SOURCE, do the comparison after SRC is
+   converted to the destination's timestamp resolution as filtered through
+   utimens.  In this case, return -2 if the exact answer cannot be
+   determined; this can happen only if the time stamps are very close and
+   there is some trouble accessing the file system (e.g., the user does not
+   have permission to futz with the destination's time stamps).  */
+
+int
+utimecmp (char const *dst_name,
+         struct stat const *dst_stat,
+         struct stat const *src_stat,
+         int options)
+{
+  /* Things to watch out for:
+
+     The code uses a static hash table internally and is not safe in the
+     presence of signals, multiple threads, etc.
+
+     int and long int might be 32 bits.  Many of the calculations store
+     numbers up to 2 billion, and multiply by 10; they have to avoid
+     multiplying 2 billion by 10, as this exceeds 32-bit capabilities.
+
+     time_t might be unsigned.  */
+
+  verify (time_t_is_integer, (time_t) 0.5 == 0);
+  verify (twos_complement_arithmetic, -1 == ~1 + 1);
+
+  /* Destination and source time stamps.  */
+  time_t dst_s = dst_stat->st_mtime;
+  time_t src_s = src_stat->st_mtime;
+  int dst_ns = TIMESPEC_NS (dst_stat->st_mtim);
+  int src_ns = TIMESPEC_NS (src_stat->st_mtim);
+
+  if (options & UTIMECMP_TRUNCATE_SOURCE)
+    {
+      /* Look up the time stamp resolution for the destination device.  */
+
+      /* Hash table for devices.  */
+      static Hash_table *ht;
+
+      /* Information about the destination file system.  */
+      static struct fs_res *new_dst_res;
+      struct fs_res *dst_res;
+
+      /* Time stamp resolution in nanoseconds.  */
+      int res;
+
+      if (! ht)
+       ht = hash_initialize (16, NULL, dev_info_hash, dev_info_compare, free);
+      if (! new_dst_res)
+       {
+         new_dst_res = xmalloc (sizeof *new_dst_res);
+         new_dst_res->resolution = 2 * BILLION;
+         new_dst_res->exact = false;
+       }
+      new_dst_res->dev = dst_stat->st_dev;
+      dst_res = hash_insert (ht, new_dst_res);
+      if (! dst_res)
+       xalloc_die ();
+
+      if (dst_res == new_dst_res)
+       {
+         /* NEW_DST_RES is now in use in the hash table, so allocate a
+            new entry next time.  */
+         new_dst_res = NULL;
+       }
+
+      res = dst_res->resolution;
+
+      if (! dst_res->exact)
+       {
+         /* This file system's resolution is not known exactly.
+            Deduce it, and store the result in the hash table.  */
+
+         time_t dst_a_s = dst_stat->st_atime;
+         time_t dst_c_s = dst_stat->st_ctime;
+         time_t dst_m_s = dst_s;
+         int dst_a_ns = TIMESPEC_NS (dst_stat->st_atim);
+         int dst_c_ns = TIMESPEC_NS (dst_stat->st_ctim);
+         int dst_m_ns = dst_ns;
+
+         /* Set RES to an upper bound on the file system resolution
+            (after truncation due to SYSCALL_RESOLUTION) by inspecting
+            the atime, ctime and mtime of the existing destination.
+            We don't know of any file system that stores atime or
+            ctime with a higher precision than mtime, so it's valid to
+            look at them too.  */
+         {
+           bool odd_second = (dst_a_s | dst_c_s | dst_m_s) & 1;
+
+           if (SYSCALL_RESOLUTION == BILLION)
+             {
+               if (odd_second | dst_a_ns | dst_c_ns | dst_m_ns)
+                 res = BILLION;
+             }
+           else
+             {
+               int a = dst_a_ns;
+               int c = dst_c_ns;
+               int m = dst_m_ns;
+
+               /* Write it this way to avoid mistaken GCC warning
+                  about integer overflow in constant expression.  */
+               int SR10 = SYSCALL_RESOLUTION;  SR10 *= 10;
+
+               if ((a % SR10 | c % SR10 | m % SR10) != 0)
+                 res = SYSCALL_RESOLUTION;
+               else
+                 for (res = SR10, a /= SR10, c /= SR10, m /= SR10;
+                      (res < dst_res->resolution
+                       && (a % 10 | c % 10 | m % 10) == 0);
+                      res *= 10, a /= 10, c /= 10, m /= 10)
+                   if (res == BILLION)
+                     {
+                       if (! odd_second)
+                         res *= 2;
+                       break;
+                     }
+             }
+
+           dst_res->resolution = res;
+         }
+
+         if (SYSCALL_RESOLUTION < res)
+           {
+             struct timespec timespec[2];
+             struct stat dst_status;
+
+             /* Ignore source time stamp information that must necessarily
+                be lost when filtered through utimens.  */
+             src_ns -= src_ns % SYSCALL_RESOLUTION;
+
+             /* If the time stamps disagree widely enough, there's no need
+                to interrogate the file system to deduce the exact time
+                stamp resolution; return the answer directly.  */
+             {
+               time_t s = src_s & ~ (res == 2 * BILLION);
+               if (src_s < dst_s || (src_s == dst_s && src_ns <= dst_ns))
+                 return 1;
+               if (dst_s < s
+                   || (dst_s == s && dst_ns < src_ns - src_ns % res))
+                 return -1;
+             }
+
+             /* Determine the actual time stamp resolution for the
+                destination file system (after truncation due to
+                SYSCALL_RESOLUTION) by setting the access time stamp of the
+                destination to the existing access time, except with
+                trailing nonzero digits.  */
+
+             timespec[0].tv_sec = dst_a_s;
+             timespec[0].tv_nsec = dst_a_ns;
+             timespec[1].tv_sec = dst_m_s | (res == 2 * BILLION);
+             timespec[1].tv_nsec = dst_m_ns + res / 9;
+
+             /* Set the modification time.  But don't try to set the
+                modification time of symbolic links; on many hosts this sets
+                the time of the pointed-to file.  */
+             if (S_ISLNK (dst_stat->st_mode)
+                 || utimens (dst_name, timespec) != 0)
+               return -2;
+
+             /* Read the modification time that was set.  It's safe to call
+                'stat' here instead of worrying about 'lstat'; either the
+                caller used 'stat', or the caller used 'lstat' and found
+                something other than a symbolic link.  */
+             {
+               int stat_result = stat (dst_name, &dst_status);
+
+               if (stat_result
+                   | (dst_status.st_mtime ^ dst_m_s)
+                   | (TIMESPEC_NS (dst_status.st_mtim) ^ dst_m_ns))
+                 {
+                   /* The modification time changed, or we can't tell whether
+                      it changed.  Change it back as best we can.  */
+                   timespec[1].tv_sec = dst_m_s;
+                   timespec[1].tv_nsec = dst_m_ns;
+                   utimens (dst_name, timespec);
+                 }
+
+               if (stat_result != 0)
+                 return -2;
+             }
+
+             /* Determine the exact resolution from the modification time
+                that was read back.  */
+             {
+               int old_res = res;
+               int a = (BILLION * (dst_status.st_mtime & 1)
+                        + TIMESPEC_NS (dst_status.st_mtim));
+
+               res = SYSCALL_RESOLUTION;
+
+               for (a /= res; a % 10 != 0; a /= 10)
+                 {
+                   if (res == BILLION)
+                     {
+                       res *= 2;
+                       break;
+                     }
+                   res *= 10;
+                   if (res == old_res)
+                     break;
+                 }
+             }
+           }
+
+         dst_res->resolution = res;
+         dst_res->exact = true;
+       }
+
+      /* Truncate the source's time stamp according to the resolution.  */
+      src_s &= ~ (res == 2 * BILLION);
+      src_ns -= src_ns % res;
+    }
+
+  /* Compare the time stamps and return -1, 0, 1 accordingly.  */
+  return (dst_s < src_s ? -1
+         : dst_s > src_s ? 1
+         : dst_ns < src_ns ? -1
+         : dst_ns > src_ns);
+}
--- /dev/null   Tue Mar 18 13:55:57 2003
+++ lib/utimecmp.h      Sat Mar 27 16:08:36 2004
@@ -0,0 +1,38 @@
+/* utimecmp.h -- compare file time stamps
+
+   Copyright (C) 2004 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+/* Written by Paul Eggert.  */
+
+#ifndef UTIMECMP_H
+#define UTIMECMP_H 1
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+/* Options for utimecmp.  */
+enum
+{
+  /* Before comparing, truncate the source time stamp to the
+     resolution of the destination file system and to the resolution
+     of utimens.  */
+  UTIMECMP_TRUNCATE_SOURCE = 1
+};
+
+int utimecmp (char const *, struct stat const *, struct stat const *, int);
+
+#endif
--- /dev/null   Tue Mar 18 13:55:57 2003
+++ m4/utimecmp.m4      Wed Mar 24 14:22:15 2004
@@ -0,0 +1,14 @@
+dnl Copyright (C) 2004 Free Software Foundation, Inc.
+dnl This file is free software, distributed under the terms of the GNU
+dnl General Public License.  As a special exception to the GNU General
+dnl Public License, this file may be distributed as part of a program
+dnl that contains a configuration script generated by Autoconf, under
+dnl the same distribution terms as the rest of that program.
+
+AC_DEFUN([gl_UTIMECMP],
+[
+  dnl Prerequisites of lib/utimecmp.c.
+  AC_REQUIRE([gl_TIMESPEC])
+  AC_REQUIRE([gl_FUNC_UTIMES])
+  :
+])




reply via email to

[Prev in Thread] Current Thread [Next in Thread]