gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gm2] On the gm2 build process and dependencies


From: Gaius Mulley
Subject: Re: [Gm2] On the gm2 build process and dependencies
Date: Wed, 20 Jun 2012 18:28:25 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Martin Hofmann <address@hidden> writes:

> I've been trying to get my head around what gm2 actually does to build
> a program, what gets linked in and why ...
>
> Apart from understanding what goes on, my interest is to tailor the
> building of executables - from "all statically linked in" to "as much
> dynamically linked in as possible".
>
> I've written together what I've collected so far, as a remainder to
> myself and maybe a reference for discussion. Some questions are
> included
> in the last section below.
>
> There are for sure errors and omissions in the following description, so
> I'd be glad if someone "in the know" would point them out ...
>
> Thank you for any comments!
>
> Regards
> Martin

Hi Martin,

a good description - would be great to place this into the gm2 internals
section of the documentation.

>
> *************
>
>
> Build process of gm2 Modula-2
> =============================
>
> What follows is a step-by-step description of the insides of building
> a simple "hello world" type program. This is mostly derived from the
> output of the `-v` option during a run of gm2.
>
> 1  Compile modules to assembly files
> ------------------------------------
>
>     cc1gm2: hello.mod --> hello.s
>
> This is the "real" compilation step:
>
> Apart from the symbols for defined Modula-2 variables and procedures of
> the module, the `hello.s` will also contain function entry points
>
>     _M2_Hello_init
>     _M2_Hello_finish
>
> which make the module's initialization and termination code callable
> from outside.
>
> Furthermore, a reference to
>
>     __gxx_personality_v0
>
> is included because a pointer to this function is placed into a table
> which is used for exception handling.

all true.

>
> 2  Assemble modules into object files
> -------------------------------------
>
>     as: hello.s --> helloprog.o
>
> Nothing special here, this is done by all GCC compilers.
>
> But note that the object file for the main program module is named
> `helloprog.o` instead of `hello.o`. (This is of course only relevant if
> non-temporary files are created via the `-save-temps-option` to gm2.)

yes this is because hello.o is the C++ scaffold object (generated by
compiling hello.cpp) see below and helloprog.o is the main program
module.

> 3  Collect a list of all modules to be initialized
> --------------------------------------------------
>
>     gm2l: hello.mod --> hello.l
>
> This starts from the main program module and collects all modules which
> are directly or indirectly imported - including the standard library
> modules.
>
> It does scan `.def` and `.mod` files, as long as it finds them.
>
> QUESTION: What if a module `A` imports a module `B` in its
> implementation module, but the source `A.mod` is not available?

if it cannot find A.mod then its imports are unknown.  Each import will
refer to a module.o - it is likely that the remaining .def and .mod
files would pull in A's imports.  But not necessarily so.  In which case
the programmer wishing to ship partial source solutions and object files
could either provide a modulelist (see linking options) or provide a
pseudo A.mod which just has the imports.

> The resulting text file `hello.l` basically contains one name per line
> (without the `.mod` or `.def` extension).

indeed this is the machine generated modulelist.

> Source is `gm2build/gcc/gm2/gm2l.mod`.

gm2l will topologically order the module imports - but it cannot
solve cyclic imports obviously.

>
> 3  Determine a possible initialization order
> --------------------------------------------
>
>     gm2lorder: hello.l --> hello.lst
>
> This reorders the list of modules so that the most "basic" module, which
> does not depend on any other moduel, comes first.

indeed.  This file hello.lst can be replaced by a hand written version
if necessary.

> QUESTION: Or does it only care about run-time system modules to be
> first, and the other modules are ordered by the search algorithm of
> `gm2l`?

it ensures that the core system modules are the initialised first.
See -fruntime-modules=

which specify, using a comma separated list, the runtime modules and
their order.  These modules will initialised first before any other
modules in the application dependency.  By default the runtime modules
list is set to Storage,SYSTEM,M2RTS,RTExceptions,IOLink.  Note that
these modules will only be linked into your executable if they are
required.  So adding a long list of dependant modules will not effect
the size of the executable it merely states the initialisation order
should they be required.

> Source is `gm2-compiler/gm2lorder.mod`.
>
>
> 4  Generate a scaffolding main program
> --------------------------------------
>
>     gm2lgen: hello.lst --> hello.cpp
>
> This generates a short C++ program which:
>
> a) provides the entry point function `main()` for C++-like program
>    startup (all following actions take place during execution of this
>    main function),
>
> b) calls all module initialization functions in the given order,
>
> c) calls the program module initialization function last - this starts
>    the Modula-2 program itself,
>
> d) calls all module finalization functions (in reverse order) after the
>    Modula-2 program module returns,
>
> e) catches any exceptions thrown (or "raised") during all this
>    initialization and finalization business, and gives an appropriate
>    final messages if so (via `RTExceptions_DefaultErrorCatch()`).
>
> f) or else returns 0 in case of normal termination (via `exit(0)`).
>
> The sequence of functions calls to the outside of `hello.cpp` is thus
> like this:
>
>     _M2_Storage_init (argc, argv);
>     _M2_SYSTEM_init (argc, argv);
>     _M2_M2RTS_init (argc, argv);
>     _M2_RTExceptions_init (argc, argv);
>     _M2_IOLink_init (argc, argv);
>     // ... other init functions ...
>     M2RTS_ExecuteInitialProcedures (); /* sic, no '_' prefix? */

yes this is because M2RTS_ExecuteInitialProcedures is a real procedure
exported from M2RTS rather than a runtime support procedure such as
_M2_foobar_init.  A user could call ExecuteInitialProcedures if desired.


>     _M2_hello_init (argc, argv);
>
>     _M2RTS_ExecuteTerminationProcedures ();

(without the _)
  M2RTS_ExecuteTerminationProcedures ();

>     _M2_hello_finish ();
>     // ... other finish functions, reverse order ...
>     _M2_Storage_finish ();
>
> Source is `gm2-compiler/gm2lgen.mod`.

yes indeed.

> 5  Compile the scaffolding main program into an object file
> -----------------------------------------------------------
>
>     gm2cc: hello.cpp --> hellostart.o
>
> This uses the C++ compiler `cc1plus` (the `gm2cc` generates the command
> line). The resulting object is again not named `hello.o` but
> `hellostart.o`.
>
> Of course, this is done again in two steps, compilation and assembly.
>
> QUESTION: Where does `gm2cc` come from?

it is just a copy of 'gcc' from the 4.1.2 tree.  Called gm2cc to avoid
any name clash.

>
> 6  Packing the object files into a library
> ------------------------------------------
>
>     gm2lcc: hello.lst helloprog.o hellostart.o --> hello.a
>
> I'm not quite sure why this is done, but this seems to collect all the
> imported modules and the two modules generated from the program module
> into one static library.

yes this is done primarily to work with lang-specs.h so we can tell the
final linker to link one hello.o and hello.a.  It also allows gm2 to
easily produce a .so from the same method that produces the hello.a.

> Source is `gm2-compiler/gm2lcc.mod`.
>
>
> 7  Linking it all together
> --------------------------
>
> This uses `collect2` (as a disguised `ld` command?) to link the stuff in
> the static library with the run-time support objects and libraries, and
> also the required Modula-2 libraries.
>
> Here is the command line for `collect2`, with comments interspersed:
>
>
> /usr/home/mh/opt/bin/../libexec/gcc/i386-unknown-freebsd9.0/4.1.2/collect2
>     -V -dynamic-linker /libexec/ld-elf.so.1
>     -o hello
>
> - Now the objects and libs to include:
>
>       /usr/lib/crt1.o
>
> - `crt1.o` provides the real entry point into the executable, sets up
>   `argc`/`argv`, calls `main()` function (and `_init`, `_fini`?)
>
>       /usr/lib/crti.o
>
> - `crti.o` defines sections `.init` and `.fini`, which each contain the
>   prologue (initial part) of a `_init` rsp. `_fini` function. QUESTION:
>   Where are `_init` and `_fini` called? I think in `crt1.o`?
>
>   NOTE: These two objects correspond to the system's `libc`, that's why
>   they come from `/usr/lib`.
>
>         /usr/home/mh/opt/bin/../lib/gcc/i386-unknown-
>         freebsd9.0/4.1.2/crtbegin.o
>
> - `crtbegin.o` starts lists of constructors/destructors for global C++
>   objects (`__CTOR_LIST__` and `__DTOR_LIST__`), starts sections
>   `.ctors` and `.dtors`. (`collect2` arranges for a list of ctors and
>   dtors to be placed in these sections.)
>
>   NOTE: This is concerned with C++, and thus taken from gm2's
>   installation.
>
>       -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2/gm2/iso
>       -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2/gm2/pim
>       -L/usr/home/mh/opt/bin/../lib/gcc/i386-unknown-freebsd9.0/4.1.2
>       -L/usr/home/mh/opt/bin/../lib/gcc
>       -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2
>
> -L/usr/home/mh/opt/bin/../lib/gcc/i386-unknown-freebsd9.0/4.1.2/../../..
>       -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2/../../..
>
> - Lib paths for gm2 and C++ libs.
>
>       hello.a
>
> - Objects for the program's modules, except libraries.
>
>       -lgm2iso
>       -lgm2
>
> - Modula-2 libraries - the ISO library needs the basic library.
>
>       -lm
>
> - Math C lib - Modula-2 numerics are implemented on them.
>
>       -lstdc++
>
> - For the sake of `hello.cpp`, the scaffolding program, the C++ library
>   is needed ...
>
>       -lgcc_eh
>
> - Provides _Unwind_RaiseException and other ABI EH functions, also
>   `__gcc_personality_v0`, the C++ exception "personality" function.
>
>       -lgcc_s
>
> - Also provides `_Unwind_RaiseException` and other EH stuff, plus low-
>   level arithmetic functions like in `libgcc`, plus threading ...
>
>       -lgcc
>
> - Low-level arithmetic functions to emulate architecture's missing
>   capabilities. (Also provides a `__main` function which is called at
>   the start of a C++ `main()` function, this function calls all the
>   constructors listed in `__CTOR_LIST__`.)
>
>       -lc
>
> - The C lib (from the system, eg `/usr/lib`).
>
>       -lgcc_s
>       -lgcc
>
> - Don't know why they are mentioned twice.
>
>       /usr/home/mh/opt/bin/../lib/gcc/i386-unknown-
>       freebsd9.0/4.1.2/crtend.o
>
> - Counterpart to `crtbegin.o`, finishes the `.ctors` and `.dtors`
>   sections.
>
>       /usr/lib/crtn.o
>
> - Counterpart to `crti.o`, finishes the `_init` and `_fini` functions
>   and their sections.
>
>
> 8  Discussion (and questions ...)
> ---------------------------------
>
> Apart from compiling source code into object files, the while build
> process is concerned with three (or four?) issues. I wonder if some of
> this could be simplified.
>
> 1. Program startup (and termination).
>
>    Would'n it be nicer and easier to generate a Modula-2 scaffolding
>    program containing a `main()` function?

ok yes - this could be done.  The main program module could contain the
main function.  It is not clear that there would be huge benefits
though.

>    Could this also avoid the need to link `libstd++` in?

ah well possibly this route could be taken - but I'd rather stay with
libstdc++ as it provides gm2 with mixed language support.  The ability
to link with C++ and Ada (probably Java) as well and throw and catch
other language exceptions.  Currently gm2 can work with swig and
languages such as Python can catch gm2 exceptions.  I raised (no pun
intended) this subject at a gcc conference in 2009 and the strong advice
was to use libstdc++.  I know it adds compiler build time, but the end
user should get many benefits.

>    Since this scaffoling program would only vary in the list of
>    functions to be called (init, main, finish), we could even re-use a
>    fixed object module which references just a list of function pointers
>    outside in a build-time generated (assembly?) object ...?

an interesting approach I grant.  I'm keen to keep an eye on the
embedded system platform - where data size is sometimes needs to be kept
to a bare minimum.  For a native *nix machine your approach would work
well.  

> 2. Initialization and finalization of modules (this is not a problem in
>    C, but it is related to C++ global object construction/destruction
>    and very similar to Ada's elaboration order issues).
>
>    Could the chasing of imports for an initilization order be deferred
>    to runtime, thus making the build process simpler (and the use of
>    modules in shared libraries effortless)? I think of a scheme like
>    this:
>
>    Into every module are two procedures (and variables) generated along
>    the follwing lines (in a Module A which imports B and C), the
>    procedures need to be exported. They keep track of a reference count
>    and init/finish the module the first time it is needed rsp the time
>    it is no longer needed.
>
>        VAR _M2_importCount : CARDINAL;   (* Assume BSS init to 0 *)
>            _M2_isInitializing : BOOLEAN; (* Assume init to FALSE *)
>
>        PROCEDURE _M2_A_import;
>        BEGIN
>          INC(_M2_importCount);
>          IF _M2_importCount = 1 THEN
>            (* First time import - initialize now *)
>            _M2_isInitializing := TRUE; (* Protect from cyclic init. *)
>            _M2_B_import;               (* Need B initialized *)
>            _M2_C_import;               (* Need C initialized *)
>            _M2_A_init;                 (* Initialize A itself *)
>            _M2_isInitializing := FALSE;
>          ELSIF _M2_isInitializing THEN
>            (* Cyclic dependency - bad thing! *)
>            HALT
>          END
>        END _M2_A_import;
>
>        PROCEDURE _M2_A_release;
>        BEGIN
>          (* Assert _M2_importCount > 0 *)
>          DEC(_M2_importCount);
>          IF _M2_importCount = 0 THEN
>            (* Last release - finalize now, in reverse order *)
>            _M2_A_finish;  (* Finalize A itself *)
>            _M2_C_release; (* Don't need C any more *)
>            _M2_B_release; (* Don't need B any more *)
>          END
>        END _M2_A_release;
>
>    This way every module would itself arrange for the initialization of
>    its imported modules. (A more elaborate variant would distingish
>    between import of procedures and variables - which need initialized
>    provider modules - and import of types and constands only - wich
>    doesn't.)

ok, this could be added and implemented with a command line link option
if required?

> 3. Exception handling setup.
>
>    As far as I can tell now (this is rather new stuff to me), gm2 uses
>    gcc's "zero exception cost" model througout. There are roughly three
>    components to it:
>
>    a) Tables of frame info in the compiled objects;
>
>    b) language-independent runtime functions like
>       `_Unwind_RaiseException`, these reside in `libgcc_eh` (and/or
>       other places?);
>
>    c) a language-dependent "personality" function, this gets called
>       during stack unwinding and is responsible to find an appropriate
>       exception handler in a given frame.
>
>    I'm not sure how much of `libgcc`, `libgcc_eh`, `libgcc_s`,
>    `libstdc++` is actually neede to implement this kind of exception
>    handling - assuming a "pure" Modula-2 program, not a mixed-language
>    beast.
>
>    I'd very much like to get rid of the dependency of both the C++
>    compiler and the C++ library ...

I hear this - but the exception handling code is quite complex and has
been solved by the gcc developers.  Having to maintain a Modula-2
version seems to be creating unnecessary work.  It also occasionally
changes and I'd rather just use the default one provided by gcc.  The
Ada group had a setjmp/longjump mechanism - but seem to be deprecating
this in favour of libstdc++

>
> 4. Threading setup.
>
>    Here gm2 uses the GNU `libpth` library. Can't say much about it now.
>
>
> *********************
>
> So far I have twiddled with linking a sample program against shared
>
>     libstdc++
>     libgm2
>     libgm2iso
>
> (the gm2 shared libs were cobbled together from the object files in the
> `SO` dirs ...).
>
> This kind of worked, except that
>
> - lots of complex number functions (`ccos` and friends) are not in my
>   `libm` and generate undefined references - it would be a labor of love
>   to reimplement them based on a C90 (not C99) library ...
>
> - the path and name for the `libpth` had to be given explicitly to
>   satisfy references to it (from the Modula-2 libraries).
>
> It also seems that exception handling is somewhat brittle in these
> circumstances, but this can well be my fault :-)

thanks for taking the time to write the document - I'm sure it will be
of use.

I was considering whether to modify gm2 to compile whole project at
once.  Ie read in all source from all modules and compile producing
a single .o, this would allow for intermodule optimisations which could
hopefully reduce the overall footprint of an executable.

regards,
Gaius



reply via email to

[Prev in Thread] Current Thread [Next in Thread]