info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Severe speed problems with binary files


From: Larry Jones
Subject: Re: Severe speed problems with binary files
Date: Wed, 13 Apr 2005 14:40:48 -0400 (EDT)

Russ Sherk writes:
> 
> How many revisions of the example file are there?  cvs speed may be
> affected adversly by a large number of revisions of a binary file.

There doesn't necessarily need to be a large number of revisions, there
just has to be a large number of differences, and that's almost
certainly the problem.  Although CVS can store binary files, it wasn't
designed to do that and it doesn't work very well since its line-
oriented diff algorithm usually doesn't work very well on binary files. 
In your case, you've got five distinct revisions of the file (1.1 and
1.1.1.1 should be identical) and the repository file is nearly five
times the size of the working file, which indicates that the diff
algorithm is not working well and lots of work will be required to
regenerate "old" revisions.

For those who don't know, the way the RCS file format (which is what CVS
uses) works is the "most recent" revision is stored intact, all other
revisions are stored as sets of differences from their base revisions. 
To recreate a particular revision, you retrieve the most recent revision
and then apply the sets of differences to create each successive
intermediary revision until you finally get the revision you want.  The
theory is that the most recent revision is the one you want most often,
so retreiving it should be as fast as possible but retrieving other
revisions can take longer.  This generally works well as long as you're
working on the trunk (since the "most recent" revision is defined as the
head of the trunk), but breaks down as soon as you start working on
branches since the head of the branch can be far removed from the head
of the trunk.

That is the situation you are in.  You're working on a branch (albeit a
vendor branch), so your head revision (1.1.1.5) is four (large) sets of
changes away from the head of the trunk.  If you try checking out
revisions 1.1.1.1, 1.1.1.2, 1.1.1.3, 1.1.1.4, and 1.1.1.5, you'll
undoubtedly find that each successive revision takes proportionally
longer to check out.  There are a couple of things you can do to improve
the situation:

1) Don't store large binary files in CVS.

2) If you insist on storing large binary files in CVS, keep them on the
   trunk rather than in branches.  For files on a vendor branch, you can
   force a commit to the trunk at the cost of making the repository file
   even larger and making old vendor releases more expensive to retrieve.

3) Rewrite CVS to better handle binary files.

-Larry Jones

Can I take an ax to school tomorrow for ... um ... show and tell? -- Calvin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]