[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: findin sloc changes between two tags

From: yeti
Subject: Re: findin sloc changes between two tags
Date: Tue, 19 Feb 2008 00:35:16 -0800 (PST)
User-agent: G2/1.0

On Feb 19, 1:06 pm, Paul Sander <address@hidden> wrote:
> On Feb 18, 2008, at 8:40 PM, yeti wrote:
> > On Feb 19, 4:38 am, Paul Sander <address@hidden> wrote:
> >> For this particular metric, I usually run the two versions through a
> >> beautifier with standard settings, then diff the output of that.
> >> On Feb 18, 2008, at 10:17 AM, Rick Genter wrote:
> >>>> From: address@hidden
> >>>> [mailto:address@hidden
> >>>> On Behalf Of Ted Stern
> >>>> But that regexp handles only C++ comments.  I don't know of a  
> >>>> way to
> >>>> recognize /* ... [text containing newlines] ... */.  Possibly  
> >>>> another
> >>>> diff utility has that options (xxdiff, tkdiff?).
> >>> You could write an awk or perl script to filter the multiline  
> >>> comments
> >>> out, save the output to a file, then diff those files. I, however,
> >>> consider comments to be equally (or even more) important to non-
> >>> comments
> >>> in source code, and don't understand the use case.- Hide quoted  
> >>> text -
> >> - Show quoted text -
> > Hi guys,
> > Thanks for all those answers. I however thought that this would be a
> > fairly common problem and there might be a standard solution. Keeping
> > your suggestions in mind I did
> > cvs diff -wlcbBC20   -r rev1 -r  rev2 my_file.c  | perl -0777 -pe 's{/
> > \*.*?\*/}{}gs' | diffstat >> FileToHoldInfo.txt
> > idea is to get enough context lines and then eliminate the comments
> > from the diff output and finally use diffstat to gather stats. Do you
> > think this is the correct way ??
> I think that this method will work only if the comments are  
> completely enclosed within the context displayed by the diff  
> program.  It will fail (i.e., produce incorrect output), for example,  
> if a short sentence is added to the end of a 50-line comment.  Or to  
> the beginning of one.  Or to the middle of a 100-line comment.  It  
> also fails if someone arbitrarily inserts or removes newlines in the  
> code itself.
> This is where beautifiers such as the "indent" program come in.  It  
> normalizes the format of the source code based on the syntax of the  
> programming language and policies specified on its command line.  It  
> leaves comments in place, so additional filtering (like your Perl one-
> liner above) might be necessary.
> After the two versions have been reduced to standard formats, you can  
> apply the diff program with minimal arguments.  Its output can be  
> used to count the number of lines inserted, deleted, and changed.- Hide 
> quoted text -
> - Show quoted text -

Yes you are right I'm assuming that most comments would be 20 line
wide though one can as well use -C50 to make it work for 50 line wide
comments and so on. To remove blank lines regexp can be modified. But
now I have detected another problem :-(

If I check out two different versions of the file and apply unix diff
over them the results are very different from those obtained using cvs
diff on two revisions. cvs diff is showing 256 modifications (!) in
the code when there are no modifications at all. There are about 700
additions (+) but cvs diff is showing only 424 (+). I think cvs diff
is confusing some additions with modifications. However unix diff on
files gives correct results.
I wonder why is cvs diff showing incorrect results ? Is this a known
problem ? If so are there any workarounds for it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]