rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: change timestamps of backups?


From: Patrik Dufresne
Subject: Re: change timestamps of backups?
Date: Thu, 22 Apr 2021 08:05:16 -0400

Hello griffin,

I think rdiff-backup could be a good fit for you.

1. If you want rdiff-backup to store increments efficiently, make sure your
data is not compressed. Compression is messing a lot with the files and
doesn't make increment very efficient.

2. If you are using ZFS you may configure the compression type you want for
a particular data set. gzip, LZ4. You may probably do something similar
with BTRFS

3. I'm wondering what the "dump" file format is. If it's a single file.
It's not optimal for rdiff-backup since increment will get computed on this
big file every day. Ideally, rdiff-backup works well will more smaller
files cause it to detect if the file changes or not very quickly and simply
skip the increments.

4. Finally, if you want to force a particular timestamp to match your dump
file numbering, you may enforce a date when running the backup. Take a look
at `--current-time` This way you could mimic the fact the backup is running
in the past or future according to your need.





On Thu, Apr 22, 2021 at 3:45 AM griffin tucker <
rdiffabkcuapbup9384@griffintucker.id.au> wrote:

> On Thu, 22 Apr 2021 at 17:38, Dominic Raferd <dominic@timedicer.co.uk>
> wrote:
> >
> >
> > On 22/04/2021 08:31, griffin tucker wrote:
> > > On Thu, 22 Apr 2021 at 17:17, Dominic Raferd <dominic@timedicer.co.uk>
> wrote:
> > >>
> > >> On 22/04/2021 08:07, Dominic Raferd wrote:
> > >>> On 22/04/2021 08:01, griffin tucker wrote:
> > >>>> I've tried using deduplication, but only get about 6gb savings per
> 30gb.
> > >>>>
> > >>>> I intend on using squashfs on top of rdiff-backup, btrfs is just
> being
> > >>>> used temporarily.
> > >>>>
> > >>>> On Thu, 22 Apr 2021 at 16:41, Dominic Raferd
> > >>>> <dominic@timedicer.co.uk> wrote:
> > >>>>> On 22/04/2021 07:03, griffin tucker wrote:
> > >>>>>> i have a collection of the last 5 monthly dumps of various wikis
> from
> > >>>>>> dumps.wikimedia.org
> > >>>>>>
> > >>>>>> each dump has numbered directories in the format 20210501,
> 20210401,
> > >>>>>> 20210301, etc.
> > >>>>>>
> > >>>>>> all the filenames in these directories remain the same with each
> > >>>>>> wiki's dump, with the exception of enwiki
> > >>>>>>
> > >>>>>> other than enwiki, these range from about 30gb to about 370gb
> > >>>>>> uncompressed with each successive dump
> > >>>>>>
> > >>>>>> enwiki, the main english wikipedia, has mostly the same named
> files,
> > >>>>>> but has the pages-meta-history.xml file split up into various
> 1-55gb
> > >>>>>> compressed files (mostly 1-2gb) making a total of about 700gb
> > >>>>>> compressed (disregarding redundant files)
> > >>>>>>
> > >>>>>> i'm not sure how big enwiki is uncompressed, but could be close to
> > >>>>>> 25tb. i haven't figured out how i could make rdiff-backup more
> > >>>>>> efficient with these files, aside from a script to merge each
> > >>>>>> metahistory file into a single huge >100gb file and then running
> > >>>>>> rdiff-backup, and then splitting the file back into their separate
> > >>>>>> files with an index after restoring
> > >>>>>>
> > >>>>>> i'm using btrfs zstd:15 to store the files uncompressed, however i
> > >>>>>> don't have enough storage to store enwiki uncompressed, zstd
> > >>>>>> compression just isn't that good, even at maximum - i've used xz
> > >>>>>> compression which attains much better rates of compression for
> other
> > >>>>>> wikis but that isn't exactly seamless (experiments with fuse
> failed)
> > >>>>>>
> > >>>>>> so, to save space, i thought i would use rdiff-backup so that it
> would
> > >>>>>> only store the differences between dumps, and it works very well
> in
> > >>>>>> initial tests, however, if i run the reverse incremental backups
> one
> > >>>>>> after the other today, they would be dated today, rather than
> > >>>>>> 20210501, 20210401, etc. which isn't informative
> > >>>>>>
> > >>>>>> if i could add a comment next to each datetime stamp, this would
> be
> > >>>>>> useful, otherwise i'll have to keep a separate index, which isn't
> a
> > >>>>>> huge problem, i just thought i'd ask if i could change the
> datetime
> > >>>>>> stamps before i write such a script
> > >>>>>>
> > >>>>>> On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <Eric@lavar.de> wrote:
> > >>>>>>> Hi Griffin,
> > >>>>>>>
> > >>>>>>> On 22/04/2021 06:39, griffin tucker wrote:
> > >>>>>>>> is there a way to change the timestamps of the backups?
> > >>>>>>> no
> > >>>>>>>
> > >>>>>>>> or perhaps replace the timestamps with a unique name?
> > >>>>>>> no
> > >>>>>>>
> > >>>>>>>> would this cause a faulty restore or a damaged backup?
> > >>>>>>> yes, rdiff-backup makes a lot of date/time comparaisons so the
> > >>>>>>> timestamp
> > >>>>>>> is meaningful.
> > >>>>>>>
> > >>>>>>> What are you trying to do?
> > >>>>>>>
> > >>>>>>> KR, Eric
> > >>>>> Since you are already using btrfs, have you considered using
> > >>>>> deduplication? Likely to work better if you store uncompressed.
> > >>>>>
> > >>> In your scenario I would expect deduplication to give big savings if
> > >>> you store uncompressed. If not, YMMV. (I tried with rdiff-backup on
> > >>> btrfs + deduplication a few years ago but found it all a bit scary
> and
> > >>> retreated to ext4.)
> > >> To clarify, I mean turning off compression within rdiff-backup, and
> > >> instead using compression (+deduplication) at fs level.
> > > well, i suppose i was using windows server's dedupe in that 6gb per
> > > 30gb savings, maybe i should try again with btrfs' dedupe
> > >
> > > come to think of it, dedupe seems to be already enabled which would
> > > explain <5 second copies for hundreds of gigabytes, but i can't get
> > > the dedupe status when i run:
> > >
> > > btrfs dedupe status <mountpoint>
> > >
> > > with an error
> > >
> > > btrfs: unknown token 'dedupe'
> > >
> > > i'll investiage this further
> > Another option is to use ZFS, Patrik wrote about it here:
> >
> https://www.ikus-soft.com/en/blog/2020-07-22-configure-zfs-for-rdiff-backup/
> i'm reluctant to use zfs because linus torvalds said not to
>
>

-- 
IKUS Software inc.
https://www.ikus-soft.com/
514-971-6442
130 rue Doris
St-Colomban, QC J5K 1T9


reply via email to

[Prev in Thread] Current Thread [Next in Thread]