bug-fileutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fileutils `snapshot' extension


From: Bob Proulx
Subject: Re: fileutils `snapshot' extension
Date: Tue, 15 May 2001 22:38:44 -0600

> I have run into some issues about what to do in certain cases:
> 
>   a) what should be the test for "sameness"?  It would be nice to rely
>      on stat() information, like: mode, owner, group, size, mtime.
>      What is a reasonably robust set of things to check?  I am
>      planning to provide a --pedantic option that will also compare
>      data before assuming "sameness", but for performance reasons I'd
>      rather that not be the default mode.  Advice?

The struct stat contains the following two fields, among others.

          dev_t    st_dev;       /* ID of device containing a */
                                 /* directory entry for this file */
          ino_t    st_ino;       /* Inode number */

I believe comparing only st_dev and st_ino is sufficient to tell if
two files are the same.  I would start there since that is a common
method.  However, it is still likely possible to fool this by
accessing the same files by multiple different and diverse paths.
Such as by multiple NFS mounts to the same location.  By both NFS and
local file access.  By multiple mounts of a SAN array to the local
machine.  Etc.  I doubt these scenarios are useful but they are
possible.  I doubt it is possible to avoid completely these obscure
false positives.  It is hard to make things fool proof because fools
can be so clever.

Expand on the st_dev and st_ino.  On any given device all inodes will
be unique.  But on different devices you will duplicate inode numbers.
[The root directory of a unix filesystem is always inode number 2, for
example.  This is one way the root of a filesystem is found.]
Therefore if those are different the file is different.

>   b) what to do about symlinks?  I was thinking it would be useful to
>      munge symlinks that originally point into the srcdir to point
>      into the equivalent spot in targetdir instead, and leave others
>      alone.  Advice?

My own personal opinion thinking about this for less than a minute
would be to ignore the fact that it is a link, use stat() instead of
lstat(), and make the links as if it were a file.  For symlinks in the
srcdir you will end up with symlinks pointing to symlinks.  But that
is functionally correct and still space small, perhaps minimal.  It
also keeps logical correctness in more cases than if you unwound the
symlinks.  You still want to detect symlinks in your target so you
don't recreate them if they don't need to be recreated, however.

You need to ask yourself how to create the path to the original srcdir
target.  If you are accessing the dstdir files by local machine only
or if over NFS as well.  If either then all symlinks should be
relative so that accesses both remote and local will resolve.  (For
example ln -s /etc/rc.d/init.d/apache /etc/rc.d/rc3.d/S90apache.
Accessing that file remotely will access the file on your local
machine and not the one that you generally expect.  For correct
operation both local and remote IMNHO symlinks should be relative if
possible.  Such as ln -s ../init.d/crond /etc/rc.d/rc3.d/S90crond.
For concrete examples.

Hope that helps.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]