bug-binutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Need help on reducing linker issue


From: Roger Jeurninck
Subject: Need help on reducing linker issue
Date: Mon, 22 Nov 2010 13:54:58 +0100

Hi,

I'm running into build behaviour which might be caused by a non-deterministic GNU linker?! As this is quiet unlikely I hope you can support me in reducing this issue so I can find the real root cause.
Thanks,

Roger


---------------------------------------------- 

What is the issue?

at the company I'm working for we have an automatic (Solaris) build environment based on IBM's Clearmake. One of the translations done within the build process is a cross compilation and linking for powerpc. 
.. as our products are critical in the customers' production process, we do not want to bother our customers with unneeded patches. To arrange this we have a tool made who will check the build's libraries/binaries on differences against the latest baseline created.

... lately we have a lot of 'unexpected' patches popping up which I'm analysing in more detail. What we see here is that a rebuild of such an unexpected target does solve this issue?!

----------------------------------------------

So what is happening here?
I analysed such an unexpected patch (ppc binary) and see that there is a small delta in the objdump (-x) output. 

In the binary which we qualify as 'ok' we see that a symbol is put as UND
UND.objdump:0000000000000000       O *UND*      000000000000000c              OOXA_interface_info

While the wrong binary has the same symbol in the .sbss segment:
sbss.objdump:0000000010019560 g     O .sbss     000000000000000c              OOXA_interface_info

And the library that is implementing the symbol :
lib.objdump:0000000000012c60 g     O .bss       000000000000000c              OOXA_interface_info

So this library  looks fine as he provided the symbol OOXA_interface_info. The good 'UND' library looks fine to as he should not have this symbol allocated (extern const struct)
Now why does the wrong 'sbss' file put this symbol in the .sbss data segment? It looks as the struct is really allocated here?!

---------------

Now I made a simple script which links the binary for many times in a row (>100)
This is being done on -exact- the same buildhost with -exact- the same input files (objects,libs) and tooling etc. (note that I have the clearmake config records to confirm this). To make sure I have a stable reproduction scenario, I only do the linking step over and over again
... depending on the load (other builds) running on this buildhost I see that the output file is different (UND or .sbss?) If there is no other load on the buildhost I do not get the unwanted (sbss) binary but on a certain moment there something happening on the buildhost which will influence the crosslinker...

---------------

So now I did try to reproduce the wrong/sbss situation..
- is it the load on the buildhost causing this behavior? 
   a. I created an application which simulates load on this machine. Running multiple instances with approx. 100% load does not influence the linker and the result is still ok
   b. I did some random parallel builds but still no results
 - is it the memory swapping
   .. here I created a small app which really fills the memory but still is the result ok

... but again, at moments there are more other builds running on this buildhost (by other developers) I can again reproduce it. Note that in this situation not each build will reproduce the '.sbss' situation. Depending on the moment this varies between 0% to 25% and sometimes even about 50%.
But what is causing this?!  

---------------

Getting a little lost here I'm dumping as much info as possible. So I created for both the 'Ok' and 'not ok' situations the following files:
- truss output
- ps info
- objdump -x
- cleartool config record

The first differnce I notice in the truss output is:
------------------------------------------
-------------- sbss (wrong) --------------
------------------------------------------
open("/vobs/litho/.caddata/lxfs_glp/lxfs_glp/devel/sysroots/wrs_sbc8548-glibc_cgl/sysroot/te500v2/usr/lib//libc.so", O_RDONLY) = 10
ioctl(10, TCGETA, 0xFFBE6C84) Err#25 ENOTTY
fstat64(10, 0xFFBE6150) = 0
   d=0x04550167 i=6151272 m=0100644 l=1  u=14302 g=2010  sz=235
at = Nov 18 08:44:09 MET 2010  [ 1290066249 ]
mt = Dec 17 02:08:28 MET 2008  [ 1229476108 ]
ct = Feb 19 08:57:07 MET 2010  [ 1266566227 ]
   bsz=8192  blks=8     fs=nfs
ioctl(10, TCGETA, 0xFFBE60DC) Err#25 ENOTTY
read(10, " / *   G N U   l d   s c".., 8192) = 235
read(10, 0x0024D02C, 8192) = 0
------------------------------------------
-------------- UND (Ok) ------------------
------------------------------------------
open("/vobs/litho/.caddata/lxfs_glp/lxfs_glp/devel/sysroots/wrs_sbc8548-glibc_cgl/sysroot/te500v2/usr/lib//libc.so", O_RDONLY) = 10
ioctl(10, TCGETA, 0xFFBE6C84) Err#25 ENOTTY
fstat64(10, 0xFFBE6150) = 0
   d=0x04550167 i=6151272 m=0100644 l=1  u=14302 g=2010  sz=235
at = Nov 18 14:00:36 MET 2010  [ 1290085236 ]
mt = Dec 17 02:08:28 MET 2008  [ 1229476108 ]
ct = Feb 19 08:57:07 MET 2010  [ 1266566227 ]
   bsz=8192  blks=8     fs=nfs
ioctl(10, TCGETA, 0xFFBE60DC) Err#25 ENOTTY
read(10, " / *   G N U   l d   s c".., 8192) = 235
read(10, 0x0024D114, 8192) = 0
------------------------------------------

...here you see a shift in the address of the last load instruction. Of course at the end of the truss output there are more differences as the file created is different.

-------------------

Next I see that there is a difference in the alignment of both the objdump -x files (note the diff mentioned above):

Wrong/sbss file:
 24 .sbss         00000014  0000000010019560  0000000010019560  00009560  2**3
 
Good/UND file:
 24 .sbss         00000008  0000000010019560  0000000010019560  00009560  2**2

-------------------

Any help welcome and please let me know if I should provide more info.
thanks!

Roger

 










 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]