cons-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for comments: CONS specification


From: H. S. Teoh
Subject: Re: Request for comments: CONS specification
Date: Thu, 27 May 2004 20:59:44 -0700
User-agent: Mutt/1.5.6i

On Fri, May 28, 2004 at 02:19:01AM +0200, Pierre THIERRY wrote:
[...]
> > AFAIK, SCons is essentially the same as Cons, with a few additional
> > features.
> 
> OK. Does someone using SCons could list those features?

I'll let somebody else answer that, but AFAIK, one of the features SCons
has that Cons doesn't is parallel building. The main barrier to adding
this to the current version of Cons is that its current monolithic design
makes it very difficult to do this without major overhaul of the code. (I
believe this is one of the big factors is why we wanted to rewrite Cons,
isn't it? :-))


> > Does that mean that the build tool should automatically iterate over
> > such a process until a fixed point is reached? This may or may not be
> > desirable [...] Alternatively this could be made configurable, so that
> > it would do this iteration only for "complex steps" like LaTeX.
> 
> Yes, it would be a nonsense to trigger every build, assembly and link
> twice. I think iteration should have four modes: none, fixed, limited
> and infinite. The latter being dangerous, as you told some TeX files can
> endlessly change, and it's already the case when there is tiemstamping.
> Fixed mode should iterate a given number of times, and limited would
> iterate until stability or a maximum numer of times (maybe issuing a
> warning if limit is reached).

Perhaps a better way would be to only run everything once, unless the user
says to make sure the target is at a fixed point, in which case Cons would
re-run troublesome operations like TeX multiple times until a fixed point
is reached (or a given limit is reached, like you suggested). The reason
for this is that when you're doing development, you don't really care that
all the cross-references in the Postscript is consistent; so you just want
Cons to run TeX another time to update the file relative to the previous
run. This is the reason TeX behaves the way it does in the first place.

Only when you want to do a final shipping build, you want to make sure
everything is consistent. So it should be OK to just run TeX once during
development, and only run it in "full" mode when you want a final build
where everything must be consistent.


[...]
> > How deep should the source file guessing algorithm go?
> 
> In my understanding, each file type should have attached ancestors
> definitions (this could be simple strings, regex substitutions,
> functions or even external software calls). CONS would build recursively
> a graph with the possible ancestors of the files it has be given.

This could be possibly be cached for speed in the .consign file (or some
other startup file), so that Cons doesn't have to calculate these graphs
every single time. 


> The first present ancestor found in this graph would be used as the
> source file.

I guess my point is, there could be multiple sources of a particular file
type. For example, Flex generates a .c and .h from a .l file, and Bison
generates a .c and .h from a .y file. They can also be configured to
produce C++ source files, too. Additionally, a .o file might be produced
from a .cc file instead of a .c file. So given a target of, say, main.o,
Cons would have the following graph: 

              main.o
             /      \
        main.c       main.cc
         / \          /    \
    main.l  main.y  main.l  main.y

The problem now is that Cons wouldn't know whether main.c would be
produced or main.cc should be produced, unless the user specified it
explicitly.

Another problem is that *many* possible source file types are compiled
into .o files, and there may be a large graph of possible sources for each
.o file specified by the user. This might slow down Cons, if it has to
check through a long list of possibilities each time, plus resolve
ambiguities in the graph each time.

Or perhaps I'm just expecting too much... maybe we can force each file
type to only ever produce a fixed set of outputs, so Flex .l files will
be assumed to only produce .c unless the user explicitly overrides this.

[...]
> CONS go through the graph until it finds a way to build each given file.
> This would give the following dependency graph:
[...]
> Then, CONS would also scan the existing files for dependencies. The .l
> scanner could tell that a hello.h is needed to build main.s:

Probably we don't need to worry about the .s file? It's probably safe to
assume that for the common case, we'll always compile a .c file into .o
"directly".

I think maybe we shouldn't worry too much about the complicated cases; the
guessing model after all is just a syntactic shortcut for the most common
build requirements. If the user needs to do something special, he should
just explicitly specify what he wants.

[...]
> In this guessing model, there remain a question that appeared to me with
> this ".l" format: it could, with some prepocessors, be necessary to scan
> result files for dependencies, thus having a dependency graph that
> changes during the build process.

Good point. I think it might make sense to refine the dependency graph
*during* the build process. I.e., from your example, Cons would know that
it needs to build scanner.o from scanner.c, and scanner.c from scanner.l.
After it runs Flex on scanner.l, it scans scanner.c and finds out that it
depends on helper.h and scanner.h, so it refines the dependency graph to
build helper.h (if needed) and scanner.h (but scanner.h has already been
built by this time). Then it can build scanner.o and the rest of the
targets.

The other possible solution is for the .l file type definition to state
explicitly that the resulting .c file will always have a dependency on
some corresponding .h file, so that if Cons finds out that it needs to
generate scanner.c from scanner.l, it will know to add scanner.h to the
dependency list for scanner.c. This way it doesn't have to wait until
build time to figure it out.


> It has two drawbacks: the analysis of the dependencies can be inaccurate
> without building, and then displaying the actions without actually
> building can be also inacurrate. But the situations where this problem
> can arise seem to me to be fairly rare, and are easy to bypass, by
> giving the hidden dependencies explicitly (you just loose the benefit of
> CONS guessing everything for you).
>
> CONS should also warn the user if he finds new dependencies while
> building. (in fact, it should not search them for languages where the
> problem is impossible, like C/C++ without predecessors)

Would it be worthwhile for Cons to do .c scanning during the build phase,
though? Maybe it's good enough to have the file type definition state any
special dependencies needed, and if something unexpected happens, the user
is responsible. I don't think there is a general solution to this unless
you do build-time dependency guessing. The user could define a special
file called .rnd which is processed by a script that randomly generates
different dependencies each time, then there would be no way to know what
the "real" dependencies are until you actually start the build. 


[...]
> > I'm not so sure about automatically deducing class/function
> > information from source files. That could make the initial scanning
> > phase really slow.
> 
> This will be a user's choice. If he's lazy and he considers the scanning
> time is bearable (and it's not sure it would be really slow...), he lets
> CONS do its job, else he can just provide dependencies, like he used to
> do...

The problem is that for many programming languages, deducing
class/function dependencies amounts to parsing the source file(s) and
building a symbol table --- things which should be done by the compiler,
not by Cons. Scanning for dependencies is acceptable because you don't
have to actually parse everything, just parse enough to know what files
the current file depends on.

[...]
> > How about the interface for specifying new file types?
> 
> Their should me multiple ways, from the simpliest to the more flexible.
> A filetype could be defined with a filename list (makefiles), a suffix
> (most of the cases) or even regex, function or program call (like the
> 'file' utility in UNIX).
> 
> Separately, steps would be defined with the types of their result and
> source files. e.g.:
> 
> - ( xml + xslt => xml )      = xslt processor
> - ( source     => assembly ) = assembler
> - ( object     => program )  = linker

I like this approach much better. You could even specify which file
extensions to look for, e.g.:

        FileType "XML"
                sources  => '%<.xslt %<.xml',
                products => '%>.xml',
                command  => '$XSLT_PROC %< %>';
        FileType "Flex scanner"
                sources  => '%<.l',
                products => '%>.c %>.h',
                command  => 'flex %<.l -o%>.c';

etc.

Which brings up an interesting point about filetype guessing: both the
input and output format of an XSLT process is .xml; it's probably
impossible to automatically guess what inputs are needed to produce, say,
output.xml, unless there's a general rule for Cons to deduce that
"output.xml" is created by running Xalan on "input.xml" and "input.xslt".
For that matter, you might have a single source.xml that produces
different outputs through different stylesheets, e.g. source.xml +
export.xslt => dbdump.xml, and source.xml + report.xslt => summary.html,
etc.. 

Probably one of those cases where we need the user to specify what he
wants explicitly.


[...]
> The goal is always the same: the lesser information the user has to give
> CONS, the better it is. Because when the computer can do it, the user
> should not have to...

Unless it's not possible for the computer to guess the user's intention,
in which case the user *should* be able to specify the missing information
without needing to specify *everything* explicitly from scratch. E.g.,
just because Cons can't figure out if scanner.c depends on scanner.h,
doesn't mean that the user now needs to specify how to create scanner
(executable) from scanner.o and scanner.o from scanner.c and scanner.h.
The user should be able to just supply the missing info (scanner.c has
extra dependency on scanner.h), and Cons should be able to figure out the
rest.


> In fact, with this spec, we could, IMHO, say CONS will be something like
> an expert agent more than a simple build tool.
[...]

Sure, we can try. :-)  There are some things that just can't be solved
without the user telling us what to do, though. E.g. the XML example
above.


T

-- 
It's amazing how careful choice of punctuation can leave you hanging:

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]