|Subject:||more cvs performance questions (I think they are at least interesting though!)|
|Date:||Tue, 28 Oct 2003 18:36:09 -0800 (PST)|
I have a few more questions related to performance. Some MAY be a bit 'out-of-the-box, but please bare with me!
We are running cvs-1.11. I did migrate us to 1.11.9, but it turned out it does not mesh with Eclipse, which is what our developers use. The latest upgrade Eclipse can use is 1.11.6. From what I read, that has its own problems, so 1.11.5 would be the latest we could use.
Our server machine is a Solaris 8, 2 processor box, 2GB RAM, 28GB disk,
900 MHz. This machine is dedicated to cvs. The only other things on it or hitting it are an LDAP server, bugzilla and viewcvs.
Our repository sits on a NetApp slice (just a big, beefy disk) that is NFS mounted to
our server. This is a production level NFS mount and there are NO other mounts. We originally did this in the interest of speed we had 4 minute checkouts on a local repository, 36 seconds on the NFS mount.
I know there are NFS/ CVS issues, but I have spoke to this list regarding this and the
conclusion was that with a production level NFS server, we will almost certainly not have any problems.
And we havent. Weve been running like this for over a year now. Our problem, since so many project and users have been added, is with performance.
We now can have as many as 77 concurrent cvs processes going. That is excessive and very rare, but did happen when an 8Mb xml file was checked in as ascii, which causes a diff
to be made for any and every update command on it. It was then re-checked in as binary and that took care of that.
But normally, we can have 3 branching processes going at once on one project, along with numerous updates, co, etc against the same project while various other projects are doing the same against there own. Id say 36 cvs processes going at once isnt a stretch. So, given this scenario:
Should cvs even be able to handle this kind of load? To some of us, its amazing and a credit to cvs that this thing hasnt crashed already. But, to avoid a crash, when we did the metrics and saw what our percentages on cpu, switching, kernel, etc., and especially load (46) were, we shut down inetd.conf, waited for some cvs processes to complete and the load drop to 10 before starting inetd.conf back up.
a) should we be splitting up our repository and giving each project their own?
b) is there a way to limit the number of pserver calls made at any one time?
c) Should we be going to a 4x4 machine rather than our current 2x2?
Context switching seems to be excessive, especially when we have more than 2 or 3 cvs ops running together. In the mornings, it's hitting as much as 12K per second, which is definitely a killer on a 2-processor system.
a) Is this normal?
b) Is cvs setup with a ping parameter or some kind of am I alive setting that hits every 1, 2 or 5 seconds? If so, can it be reset?
Is there any kind of performance bug where just a few processes take up a lot of CPU especially branch commands? We were getting CPU time readings of 41 on one sub-branch process.
In the doc, I read about setting the LockDir=directory in CVSROOT, where I assume I create my own dir in the repository (LockDir=TempLockFiles).
We DO NOT have this set as yet, but I think I might like to try it for speed sake. All our developers need write access to the repository, but the doc states:
It can also be used to put the locks on a very fast in-memory file system to speed up locking and unlocking the repository.
a) Just what is an in-memory file system?
b) Is speed garnered because all the lock files are in one directory and cvs does not need to traverse the project repository?
c) Is the speed increase significant?
d) Will there be any problems with having lock files from multiple different projects in the repository flooding this same directory?
If I need to search for errant locks, the way we are currently set up, I can go to the project where I know they exist and do a find for them. In this LockDir case, we are going to have lock files from multiple different projects all in one dir. It appears by the statement: You need to create directory, but CVS will create subdirectories of directory as it needs them that the full path is still used, correct? (So, it would still be an easy search?)
I then read the link: 10.5-Several developers simultaneously attempting to run CVS, that goes along with LockDir.
The beginning states that cvs will try every 30 seconds to see if it still needs to wait for lock.
e) Any chance this is a parameter that can be decreased or would its checking more often just create more overhead and slow things down?
In the end, it states
if someone runs
cvs ci a/two.c b/three.c
and someone else runs
cvs update at the same time, the person running
update might get only the change to `b/three.c' and not the change to `a/two.c'.
f) I assume this does not relate only to when LockDir is set. This is the case period, correct?
The developers do have to communicate a bit. But, I guess thats also why we have 77 developers running updates all the time.
Is it possible/feasible to have multiple pserver sessions, each then having its own port and each going to the same repository, but going one level past that and each going to its own project? (It wouldnt be two repositories, though it might look like it, because only one init was ever done.) Would having each project on its own port help in the interest of performance?
2401 stream tcp nowait root /usr/local/bin/cvs
cvs -f --allow-root=/usr/cvsroot/PROJ1 pserver
2402 stream tcp nowait root /usr/local/bin/cvs
cvs -f --allow-root=/usr/cvsroot/PROJ2 pserver
Or, switching that around, would there be any benefit to having two repositories and connecting both of them to one pserver?
Finally, but not related to performance:
If a cvs command is killed uncleanly by a crash or by a kill 9, this could leave errant locks. I know how to search and remove errant locks to get going again.
a) But, does this also corrupt the project repository you were working on?
b) If so, how can one find what was corrupted and are there steps one can take to refresh or update to the last uncorrupted file?
c) Or, do you just have to revert to the last backup taken?
If you revert to the last backup, and that occurred at noon, you might have a developer who was in the process of checkin when backup occurred. You would again have errant locks. Again, easy enough to remove. But what, if anything, needs to be done to get to the last valid check-in of that file? Does deleting the lock keep any half changes from registering in the ,v files or elsewhere and automatically give us back the link to the uncorrupted revision before ci halted? Or, are there step one can or must take to find this file(s) and get the last uncorrupted revision?
Thank you very much for your time. Hope these questions are at least somewhat interesting. I tried to do some extensive research first. Also, wasnt sure whether to ask them in separate emails or not. This just seemed more efficient.
|[Prev in Thread]||Current Thread||[Next in Thread]|