[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #40639] GNU Make with profiling information
Re: [bug #40639] GNU Make with profiling information
Tue, 14 Jan 2014 15:49:45 +0000
To some, using a spreadsheet might not seem like the most worthwhile
way to visualise timing information.
If it was me, I'd be far more concerned about whether I could write a
script that could easily cope with all this information. Builds with
hundreds of thousands of targets were common for me at one point and
nowadays I do android stuff - how much is that? I think it's
somewhere around 36,000.
This scale makes spreadsheets relatively unimportant as an analysis
tool and makes it necessary to pass information through a script to
first extract or summarise the information to a level where humans can
deal with it.
a) an absolute start time and
b) a duration
...are easy to process in scripts to reconstruct whatever form one
needs - a spreadsheet for you and a different kind of special graph
for me. Both examples of a profiling feature for make that I'm aware
of already use this format to good effect.
It's also worth trying to produce these figures as each job finishes
and then throw them away because then the build doesn't have to finish
before one is able to process the data. You might use it, for example
to provide progress information. e.g. if you keep information from a
previous build and combine it with profiling information coming out of
the new build you can guess how long is left.
On 14 January 2014 13:48, Eddy Petrișor <address@hidden> wrote:
> 2014/1/12 Paul Smith <address@hidden>:
>> On Wed, 2013-12-18 at 13:28 +0200, Eddy Petrișor wrote:
>>> Could you please confirm if the general direction of the the is OK in
>>> the latest patch I sent?
>> Conceptually it seems OK. I'm still not jazzed about having any more
>> than one output format, and I'd prefer that format to be in a
>> more-or-less readable form, more like the "long" form than the others.
> If that will be the only format, then it would mean always imposing a lot
> more more work on the information processing stage due to necessary
> filtering and transformations to fit a format accepted or easily parsed in a
> tool such as Oocalc, Gnumeric or Excel.
> Although human readable seems nice and understandable, it is the least
> machine parseable, hence my choice for the 'simple' format to be default.
>> I think the output should go in the standard make output format, so
>> something like:
>> make[<LVL>]: <target>: <details...>
>> Or, alternatively:
>> <target>[<LVL>]: <details>
>> Also I think it's enough to show the start offset and the elapsed time.
>> End offset is not necessary IMO.
> Unfortunately, depending on the used tools, when processing the information
> the end time is necessary. For instance, in Microsoft Excel the only way to
> display graphs for intervals is via a graph designed for visualising stocks'
> variation (and it even forces the insertion of an extra field).
> Oocalc is fine with start and stop, but for the graphs look awful and are
> unusable with absolute values (it scales so the entire 0-timestamp interval
> is visible, so a difference of a few seconds is invisible on the graph), so
> relative values are better here.
> OTOH, relative time stamps or just durations are useful for a human eye
> examination, since is easy to spot offenders that way.
> These are three different scenarios which I myself encountered and had in
> mind when designing the code the way I did.
>> I'm unsure about the PID. This is the pid of the make process so I'm
>> not sure what the goal is. Is it to be able to collect all the times
>> together maybe?
> The goal is to be able to:
> - spot targets evaluated redundantly in a recursive makefile in different
> processes but on the same recursion level (these targets could be candidates
> to be moved in a parent make invocation)
> - spot wasteful/needless recursive calls (e.g. several targets called in
> different make processes when they could be grouped in a single call)
> - be able to analyse a single make invocation or a call sub-tree
>> Is it necessary to dump all the output times at the end? Doing so
>> requires that we increase the size of the file structure to hold the
>> information, and this is already large AND the most common structure in
>> memory; there's one for every single target which for non-recursive
>> builds can get really big. I'm trying to keep memory usage under
>> If instead of that we print the information after each target is
>> complete we can shift the storage of this information out of the file
>> structure and into the commands structure or similar. To me it seems
>> more useful to keep the elapsed time info right next to the command
>> output rather than dumping it all at the end.
> I'm afraid I am missing some details of the implementation so I can't answer
> that question in any meaningful way.
> I will have to look into the code, but if a single target does NOT have
> multiple commands structures, it should work.
> Any pointers to the appropriate code area or suggestions would be welcome.
>> Some other comments:
>> 1. In general remember that GNU make code must conform to ANSI
>> C89 / ISO C90. We shouldn't be using newer features of the
>> language or runtime library unless we need to, and most of those
>> require some kind of autoconf test.
> I'm sure you had some specific code in mind when you wrote this. I assumed
> the build system would have the appropriate compiler options for the desired
> compliance level. Should I compile with '-std=c99 -pedantic-errors' to
> check, or do you have other options in mind?
>> 2. Let's avoid float and double (and struct timeval). There's no
>> reason why we can't fit enough precision in a uint32 to count
>> elapsed time in milliseconds for a build: that gives 50 days or
>> so. GNU make still supports running on systems where there is
>> no floating point support (see the NO_FLOAT #define). Although
>> I haven't tested it in a while.
> I was aware of this since your first email. I wanted to know of the general
> idea is OK.
> I will change this, too.
>> 3. The use of "$" tokens in printf() statements is likely
>> problematic from a portability standpoint. It seems like this
>> should be relatively easy to avoid.
> I'll change them into simple format specifiers, in spite of the repetitions.
>> 4. If the printed string contains text then it needs to be marked
>> for translation (with the "_(...)" macro).
> Since the profiling info should be machine parsable, I think the only
> translatable string would be the long format. I will change this, although
> the inconsistency rubs me the wrong way.
>> 5. We don't want to be using fprintf() here. All output needs to
>> go through the output.c module, so that it's properly managed
>> via output sync.
> I was aware of the output sync issue, I will look into it.
>> 6. gettimeofday is not portable. Also, it's not really the best
>> option for timing things because (due to NTP etc.) it can change
>> (by that I mean that if it reports 10s elapsed it might not be
> Hmm, didn't thought of that.
>> that 10s really elapsed). Using a monotonic clock is better,
>> although that's also not portable. But if we have to be
>> non-portable maybe we should try to get an accurate accounting.
>> On the gripping hand maybe it's not that important to be
>> absolutely accurate.
> Is there a portable timing API that has us resolution? Not sure if ms
> resolution is enough given the speed of current system and Moore's Law
> predicting even faster systems.
>> You mentioned something about trying to send the start time through the
>> environment but I don't see any code to that effect here; how were you
>> doing that?
> As said in my previous mail, I wanted to avoid confusion regarding what code
> to review.
> In reply to your newest mail:
>> Sorry, I was not clear: I wasn't suggesting that it would be better to
>> display the absolute time for the start time. I was wondering why we
>> display the start time at all. Why not just show the elapsed time, and
>> nothing else? That would avoid all of these issues.
>> However Tim makes a reasonable point in his response so if it can be
>> done without too much difficulty it would be good to show a relative
>> start time.
> Also, the start time is necessary to be able to see which target was waiting
> for which in a graph. This way one can see a target starting later than it
> should logically start, making it a good avenue for performance improvement
> Eddy Petrișor
> Bug-make mailing list
You could help some brave and decent people to have access to
uncensored news by making a donation at: