libtool-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fortran libraries on the Blue Gene with mpi


From: Ralf Wildenhues
Subject: Re: Fortran libraries on the Blue Gene with mpi
Date: Mon, 27 Apr 2009 23:40:19 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

* Christian Rössel wrote on Mon, Apr 27, 2009 at 05:33:33PM CEST:
> Ralf Wildenhues wrote:
> > However, as a minor note, the logs all show:
> > 
> > | checking dependency style of xlc... none
> > [...]
> > | checking dependency style of xlC... none
> > 
> > which is kind of weird.  IIRC the XL compilers have working dependency
> > extraction mechanisms, which are detected by the Automake code.  There
> > has been one bug fix in Automake's depcomp script, but it was limited to
> > the --disable-static case.  It would be worthwhile to investigate this.

> to create dependency output use the option -M or -qmakedep. Here the
> relevant relevant part of the man page:
> 
>          -M     Creates an output file that contains information to be
>                 included in a "make" description file. This is
>                 equivalent to specifying -qmakedep without a suboption.
> 
>          -qmakedep[=gcc]
>                 Creates an output file that contains targets suitable
>                 for inclusion in a description file for the make command
>                 that describes the dependencies of the main source file
>                 in the compilation.
>                 Specifying 'gcc' changes the format of the generated
>                 dependency file.
>                 Specifying -qmakedep without 'gcc' is equivalent to
>                 specifying -M.

Thanks.  I will address this issue in a separate message, later.
Let's finish the Libtool bits first.


> >>>   # XL, BG
> >>>   cd build-bgxl
> >>>   ../configure CC=bgxlc CXX=bgxlC F77=bgfort FC=bgxlf95 GCJ=no \
> >>>                LDFLAGS=-qnostaticlink
> >>>   make
> >>>   make -k check VERBOSE=yes 2>&1 | tee checklog-bgxl-1
> >>>   cd ..
> > 
> > This is where things start to get interesting.
> 
> With the bg* compilers we build programs that are supposed to be run on
> the compute nodes. They may also run on the login-nodes, but you can't
> take that for granted (AFAIR the error "Illegal instruction" appears if
> you try to run a compute node program on a login node).

Ah!  I completely misunderstood that.  That means that all those builds
should run with cross-compiling enabled.  Cross compilation mode is
enabled when --host is passed (and differs from either the passed
--build flag, or whatever configure computes as the build name);
you can also force cross compilation mode through the hack of passing
  cross_compiling=yes

to configure.  The --host argument will also cause configure to look for
all tool chain programs with a $host- prefix, in this case, with
--host=powerpc-bgp-linux that would be powerpc-bgp-linux-gcc etc.

> As all tests run
> on the login-nodes, we should expect failures. Also, a test that
> succeeds on the login node may not succeed on the compute node. IMHO all
> test programs build with bg* and mpi* compilers should be run on the
> compute nodes, not on the login nodes.

Well, in this case they should not be run at all, at least not those
that are run as part of the configure script.

> To run a program on the compute nodes you write a batch script and
> submit it to a queue. This process unfortunately differs from machine to
> machine. It is also not sensible to submit many small jobs to the queue
> as one job allocates at least 128 nodes.

:-)

> Maybe there is a way of calling
> all tests from a single batch script so that one has to submit only one job.

Not really.  For those that failed on the login nodes, you can try to
submit one or two to the queue; if they then pass, I'd be pretty
confident that the others will work, too.

> > Test failures:
> > 
> > - f77demo-* in the old testsuite
> >   This is because the bgfort command does not exist.
> >   It was a typo, should have been F77=bgfort77 or F77=bgf77 or F77=bgxlf
> >   I guess.  If you have energy left, here's how you can rerun those
> >   tests:
> > 
> >    cd build-bgxl
> >    ../configure CC=bgxlc CXX=bgxlC F77=bgfort77 FC=bgxlf95 GCJ=no \
> >                 LDFLAGS=-qnostaticlink
> >    gmake
> >    gmake -k check VERBOSE=yes TESTSUITEFLAGS='-k F77' TESTS="\
> >         tests/f77demo-static.test \
> >         tests/f77demo-make.test \
> >         tests/f77demo-exec.test \
> >         tests/f77demo-conf.test \
> >         tests/f77demo-make.test \
> >         tests/f77demo-exec.test \
> >         tests/f77demo-shared.test \
> >         tests/f77demo-make.test \
> >         tests/f77demo-exec.test"
> 
> Please find the results attached (checklog-bgxl-2).

Thanks.  f77demo-exec.test fails after f77demo-static.test, and
f77demo-make.test fails after f77demo-{conf,shared}.test.  The first
failure is an "Illegal instruction" again, for which we have an
explanation now; the other two are again:

  /bgsys/drivers/ppcfloor/gnu-linux/powerpc-bgp-linux/bin/ld: attempted static 
link of dynamic object `./.libs/libfoo.so'

I still don't know the cause for this; but at least the F77 cases look
just like the FC cases.

Can you post the output of the following?

  cd build-bgxl/tests/fcdemo
  /bin/sh ./libtool   --mode=link bgxlf95 -Wc,-v -g  -qnostaticlink -o fprogram 
fprogram.o libfoo.la libfoo3.la -ldl

Thanks.

> > Right after that:
> > 
> > | configure:11502: $? = 0
> > | /lib/: cannot read file data: Is a directory
> > | configure:11520: result: no
> > | configure:11559: checking whether stripping libraries is possible
> > | configure:11564: result: yes
> > 
> > The /lib/  looks pretty weird.  I don't yet understand where it comes from
> > but could be a bug in _LT_TRY_DLOPEN_SELF or LT_SYS_DLOPEN_SELF.
> > We should try to analyse and fix it.
> > 
> > Can you do something like this and post the configure standard output
> > and standard error?
> > 
> >    cd build-bgxl
> >    sed '/checking whether a statically linked program can/a\
> >         set -x
> >         /result.*lt_cv_dlopen_self_static/a\
> >         set +x' < ../configure > ../configure-debug
> >    ../configure-debug CC=bgxlc CXX=bgxlC F77=bgfort77 FC=bgxlf95 GCJ=no \
> >                 LDFLAGS=-qnostaticlink
> 
> Please find the results attached (configure-debug.log).

Hmm.  No trace of the error in this log file.


> >>>   # GCC, MPI
> >>>   cd build-mpigcc
> >>>   ../configure CC=mpicc CXX=mpicxx F77=mpif77 FC=mpif90 GCJ=no \
> >>>                LDFLAGS=-dynamic
> >>>   make
> >>>   make -k check VERBOSE=yes 2>&1 | tee checklog-mpigcc-1
> >>>   cd ..
> > 
> > More failures here:

> > - f77demo-*.test: Fortran compiler mpif77 doesn't work, due to:
> > 
> > | 
> > /bgsys/drivers/V1R3M0_460_2008-081112P/ppc/gnu-linux/libexec/gcc/powerpc-bgp-linux/4.1.2/f951:
> >  error while loading shared libraries: libmpfr.so.1: cannot open shared 
> > object file: No such file or directory
> > 
> > - fcdemo-*.test: Fortran compiler name mpif90 doesn't work, due to:
> > 
> > | 
> > /bgsys/drivers/V1R3M0_460_2008-081112P/ppc/gnu-linux/libexec/gcc/powerpc-bgp-linux/4.1.2/f951:
> >  error while loading shared libraries: libmpfr.so.1: cannot open shared 
> > object file: No such file or directory
> > 
> > - In the new testsuite, all C++, Fortran 77/90 tests failed too,
> >   consequently.
> > 
> > Can you do the following to rerun those tests?
> > Find the directory where that libmpfr.so.1 is installed.  Say, it is
> > in $foodir.  Then
> 
> Hm, in contrast to John, I did not find a library with this name,
> neither in the compute node directories under /bgsys nor in the usual
> lib directories.

Weird.  I suppose that would be something to relate to your software
administration then, too.  Only pretty recent GCC versions require
libmpfr, maybe there is an older one that doesn't.

> > For a nicer user experience, it would be helpful if those compilers were
> > rebuilt with -Wl,-rpath,$foodir in their LDFLAGS (maybe you can ask your
> > software providers).


> >>>   # XL, MPI
> >>>   cd build-mpixl
> >>>   ../configure CC=mpixlc CXX=mpixlC F77=mpixlf FC=mpixlf95 GCJ=no \
> >>>                LDFLAGS=-qnostaticlink
> >>>   make
> >>>   make -k check VERBOSE=yes 2>&1 | tee checklog-mpixl-1
> >>>   cd ..
> > 
> > The CXX=mpixlC was wrong, causing all the tagdemo tests to fail.
> > Dunno what the right name would have been.
> > 
> > The F77=mpixlf was wrong, too, causing all the f77demo tests to fail.

> I reran the tests with correct CXX, F77. See checklog-mpixl-2

Thanks.  The new logs show that again, the failures are the same now
between F77 and FC compilers, and the C++ failures are similar to those
with above settings.

Cheers,
Ralf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]