[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dependency caching
From: |
Steven Knight |
Subject: |
Re: dependency caching |
Date: |
Tue, 31 Jul 2001 21:51:19 -0500 (CDT) |
> I mean that when dependencies are computed for a file, the results are
> stored in either .consign or a new .depend file. When another instance
> of cons needs them, it first checks whether the file has changed since
> dependencies were computed,
How would you want to determine this? Timestamp + MD5 signature of
contents, or some other way?
> and uses the saved versions if possible.
Some one did actually profile Cons to decide if recalculating
dependencies was a bottleneck, but that was long ago. Then, the
results reportedly showed that the amount of time spent scanning files
for dependencies was not significant relative to the total amount of
processing time, so there was no compelling reason to complicate Cons'
code by putting in dependency caching.
The Cons code has changed significantly since then, though, so it would
be useful if someone profiled the current code to find out, with some
hard data, where the bottlenecks are today. If it looks like dependency
scanning *is* a bottleneck, it would also be good to break down whether
this bottleneck is due to the scan itself (the loop-and-regex to find
all of the #include lines), or due to opening-reading-closing the file a
second time.
If it's the latter, a less complicated performance improvement might be
to cache the contents of the file in memory, so that we could calculate
the MD5 checksum of the file *and* scan the contents for dependencies
without going out to disk twice.
If, however, it turns out that caching dependencies really is a win,
keep in mind that dependencies can change for a significant reason
besides file contents: CPPPATH. You want to recompile if CPPPATH
changes out from under you such that you #include a different copy of a
.h file...
Consequently, I might start with something like a separate .depend file
that lists timestamp, content signature, and the *raw* dependencies it
found:
foo.c: 1234567890 fedcba0987654321 "foo.h" <stdio.h>
I think you'll need these rather than the resolved path names because
the CPPPATH semantics of quote-includes vs. bracket-includes are
different. But that's after thinking about it for a whole five minutes,
so take it with a grain of salt.
Don't let these caveats dissuade you from finding out if this is a win
and trying it out. I'd just suggest doing enough investigation first to
make sure that the problem you solve really is the bottleneck that will
give you the significant performance improvement you wnat in exchange
for the time you'll spend implementing it.
--SK