[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cvs server/client protocol inefficiency and privacy violations

From: Bruno Haible
Subject: Re: cvs server/client protocol inefficiency and privacy violations
Date: Tue, 30 Jan 2007 15:14:24 +0100
User-agent: KMail/1.9.1

Hello Mark,

Thanks for the answers.

> >   2. what are the possible values for the variable, and their respective
> >      effects,
> >   3. what is the default value.)
>     --verify[=(off | warn | fatal)] | --no-verify

Thanks. It was this info that I expected to find in the doc.

> > Look at this too: ChangeLog.modified is not in the CVS, but the cvs program
> > needs 29 seconds to detect this:
> >
> > $ time cvs --no-verify update ChangeLog.modified
> > cvs update: use `cvs add' to create an entry for `ChangeLog.modified'
> >
> > real    0m29.219s
> > ...
> > $ time cvs --no-verify status ChangeLog.modified
> > cvs status: use `cvs add' to create an entry for `ChangeLog.modified'
> > ===================================================================
> > File: ChangeLog.modified        Status: Unknown
> >
> >    Working revision:    No entry for ChangeLog.modified
> >    Repository revision: No revision control file
> >
> >
> > real    0m29.202s
> > ...
> >
> > This means that before the cvs client asks the cvs server whether the file
> > exists at all on the server side, it sends the complete locally created
> > file to the server!!
> Yes, that does appear to be the case. To be honest, I was not aware of
> that side-effect of the 'cvs status' command.

Probably the effect on network bandwidth is not so big when people update
an entire directory, because of the '.#*' cached files. But the effect
on privacy is there in any case.

> If you wish to see what is actually being transmitted to the server, use
> the CVS_CLIENT_LOG environment variable set to a name like
> /tmp/my-cvs-test and after the cvs command is executed the files
> /tmp/my-cvs-test.in (data sent to the server) and /tmp/my-cvs-test.out
> (data returned from the server) will have been created.

Nice! It confirms my guesses: Modified files are sent entirely to the server
during "cvs update", if they are either "cvs add"ed or explicitly mentioned
on the command line. client.c:send_modified().

> > And what about privacy? Did I give "cvs" permissions to send files from
> > my hard disk over the network?
> This seems to be what the 'cvs status' command assumes. The 'cvs update'
> command does not send hello.txt to the server

Sure it does: In the .in file I find

Entry /ChangeLog/1.1093///
Modified ChangeLog
<and then comes the modified file's contents>

> ... unless you have done a 'cvs add ChangeLog.modified' first

It doesn't do it when I do a "cvs update" without specifying filenames, but
it does when I specify the filename on the command line:

$ echo my-creditcard-number > hello.txt
$ CVS_CLIENT_LOG=/tmp/cvsdata5 cvs update hello.txt 
cvs update: use `cvs add' to create an entry for `hello.txt'
$ grep credit /tmp/cvsdata5.in 

> > Sadly, my conclusion for today is that the cvs client/server protocol
> > needs a complete redesign if it wants to
> >   - match today's expectations regarding privacy,
> Yes, changes to the 'cvs status' command are desirable.

It's also "cvs update", see above.

> >   - minimize network bandwidth, esp. uploads,
> There may be those who might like to specify which direction needs to be
> minimized for those folks operating cvs servers on the end of an ADSL
> pipe from their home. However, CVS is still a lot less bandwidth than
> some of the 'newer' protocols out there. It was designed at the time
> when a 56K link was a big deal and a T1 was ruinously expensive.

However, CVS was designed at a time when disk spaces was even more expensive
than network bandwidth. Nowadays it's customary to cache not only the
original unmodified versions of the files locally on disk, but the entire
repository with the full history log.

> In fact, more than one opensource SCM system/project exists to replace
> CVS (and/or CVSNT).
>    - GNU arch
>    - OpenCVS (the OpenBSD project to avoid the security concerns
>      and GPL issues they do not like when using CVS).
>    - git
>    - monotone
>    - svn

It seems git makes a better choice regarding privacy, network upload
bandwidth, and server load.

> For now, there are a lot of legacy users using CVS and it is desirable
> to do what we can to improve it. Sadly, the nature of maintaining a
> large legacy installed base in a compatible and portable manner means
> that a complete redesign is not as likely as trying to introduce gradual
> incremental improvements.

Yes, that's unfortunately true. However, you could implement "cvs update"
in a way that pulls the unmodified contents from the server, rather than
pushing the modified files to the server. (I am even told that some
proprietary CVS client implementation does this.) This does not require
changes on the server, only in the client.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]