bug-fileutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fileutils `snapshot' extension


From: Russell Senior
Subject: Re: fileutils `snapshot' extension
Date: 27 May 2001 12:13:59 -0700

>>>>> "Jim" == Jim Meyering <address@hidden> writes:

Russell> I have deployed a partially complete version of the
Russell> `snapshot' program I described earlier in this thread.  I am
Russell> interested to know if the fileutils maintainers are
Russell> interested in this program as a contribution to the package.

Jim> Have you looked at rsync?

I have used it in the past and I just looked again, but it seems it
does something different that what I am proposing.  Certainly its
primary focus is different.

Jim> Although I'm not sure precisely what you're proposing, I suspect
Jim> it can already do some of what you want.

Maybe that's true, but it's not my impression.  If so, could you
perhaps provide some hints to me how, so that I might enhance my
enlightenment?

Jim> If you're still convinced your new functionality belongs in the
Jim> fileutils, please give some us some more information:

I'll try again (sorry if I haven't been more clear up to now).  Here
is the problem I am trying to solve: There is a working tree of files,
call it /working.  Periodically, I want to take a snapshot of the
state of this tree.  Because I (or some user) might accidentally
delete or modify some file and discover this only days or weeks later,
I want to maintain multiple parallel snapshots of the state of
/working somewhere that is normally accessed in a read-only way, 
say: /snapshots/2001-05-19, 
     /snapshots/2001-05-20, 
     /snapshots/2001-05-21, etc.

Since usually only a small part of /working changes from day to day,
I'd like to take advantage of any unchanged versions of files from a
previous snapshot, using hardlinks.  Now I can't make hard links
between /working and /snapshots versions because the /working versions
are still mutating, but since by convention the /snapshots versions
are _not_ mutating in place, I can use hardlinks between parallel
snapshots if the /working file hasn't changed since the last snapshot.
Any individual snapshot can be blown away without affecting any other.

Depending (of course) on how rapidly /working is mutating, _many_
parallel snapshots can be maintained for only a little more disk space
than the original /working.  In the situation I have deployed this, I
currently have 15 parallel snapshots over about two weeks that occupy
only about 10% more disk space total than the current /working
version.  For example:

  $ du --summ /working
  1844157 /working

  $ du --max-depth=1 /snapshots
  1793674 /snapshots/20010513
  13790   /snapshots/20010514
  4790    /snapshots/20010516
  8342    /snapshots/20010517
  48993   /snapshots/20010518
  1305    /snapshots/20010521-1200
  16875   /snapshots/20010521-2300
  10361   /snapshots/20010522-1200
  3634    /snapshots/20010522-2300
  33773   /snapshots/20010523-1200
  9157    /snapshots/20010523-2300
  80334   /snapshots/20010524-1200
  2903    /snapshots/20010524-2300
  7349    /snapshots/20010525-1200
  8161    /snapshots/20010525-2300
  2043439 /snapshots

There are other ways of capturing these changes, by incremental tar,
maybe even xdeltas (I am only vaguely familiar with them).  One nice
feature of my solution is that the snapshots are directly readable by
users by way of the regular filesystem interface.

I am aware of a thing called SnapFS which inserts a layer into the
operating system (linux in this case) to provide copy-on-write
semantics, which is close to and arguably nicer than what I am doing,
however my code works without modifications to the operating system,
essentially on any file system that allows hardlinking.

If you think I got my idea across effectively this time, I'll try
forwarding it to address@hidden

Jim> -------------- Here are some guidelines for contributing code to
Jim> the fileutils, textutils, and sh-utils packages.

Jim> Send patches. [...] 

I can send a patch for just the new program (about 90% complete) and
minor (possibly unacceptable) changes to other code, namely src/copy.h
and src/copy.c (to expose the internal copy_reg() function).  I'd be
happy to discuss how to do it more acceptably.  I haven't done much on
the ancillary front (texi, etc.).

Jim> On the other hand, if you're adding new features, please follow
Jim> the guidelines below: 

[...]

Jim>   - follow the guidelines in the GNU Coding Standards
Jim> (standards.info) which is distributed as part of the autoconf
Jim> package.

I patterned the code as closely as I could on what I found in src/cp.c
and friends.

Jim>   - include changes to the texinfo documentation, [...]

Not done yet.

Jim> and be sure to update the --help output.

Done.

Jim>   - finally, if the change is `significant' you'll have to send
Jim> signed copyright assignment papers to the FSF

I am happy to do this.  I've already discussed getting a disclaimer of
interest from my employer (who didn't pay me to do this, but who I
understand might have some claim on it under the law), and I expect no
problem there.

Jim> And you'll have to be patient and expect delays on my part.  It
Jim> is unusual that I spend more than a few hours per week on the
Jim> packages I maintain.

Certainly.


-- 
Russell Senior         ``The two chiefs turned to each other.        
address@hidden      Bellison uncorked a flood of horrible       
                         profanity, which, translated meant, `This is
                         extremely unusual.' ''                      



reply via email to

[Prev in Thread] Current Thread [Next in Thread]