bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51536: openblas builds not reproducible on different x86_64 machines


From: Ludovic Courtès
Subject: bug#51536: openblas builds not reproducible on different x86_64 machines
Date: Thu, 03 Feb 2022 00:13:33 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Our OpenBLAS package uses DYNAMIC_ARCH=1 to provide optimizations for
> all supported targets, at least of x86 and x86_64.  In theory that seems
> OK, but in practice the builds differ depending on the host CPU.

What follows is the log of an investigation that didn’t find the root
cause, but perhaps it’ll give us ideas…

Right now the build results of ci.guix and bordeaux.guix differ:

--8<---------------cut here---------------start------------->8---
$ guix describe
Generacio 202   Jan 30 2022 23:57:03    (nuna)
  guix 43dd34c
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 43dd34c7777a212c99a97da7a2c237158faa9a1b
ludo@ribbon ~/src/guix$ guix challenge openblas
/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18 contents differ:
  no local build for 
'/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18'
  
https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18:
 0m1jlc26yrwxn8gxwpj8452kw4g84ywclh0hnab93873ifz87s5c
  
https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18:
 1d0m9v3kpsqzplpl1law2lfhm6rrbhkkqsvh19dlg9wx45vbbvjb
  differing file:
    /lib/libopenblasp-r0.3.18.so

1 store items were analyzed:
  - 0 (0.0%) were identical
  - 1 (100.0%) differed
  - 0 (0.0%) were inconclusive
--8<---------------cut here---------------end--------------->8---

To get an idea, I thought we could compare the two build logs:

  https://ci.guix.gnu.org/log/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18
  https://bordeaux.guix.gnu.org/build/3fab433c-e7d3-498d-86f8-4bcd5da9c4db

(Protip: I found the second one via
<http://data.guix.gnu.org/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18>.)

The “ar  -ru ../libopenblasp-r0.3.18.a …” are apparently the same in
both cases, which rules out the simple case of unsorted .o files.

The .so on ci.guix is slightly bigger:

--8<---------------cut here---------------start------------->8---
$ wget -qO - 
https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18|
 lzip -d | guix archive -x /tmp/o1
$ wget -qO - 
https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18|
 lzip -d | guix archive -x /tmp/o2
$ ls -l /tmp/{o1,o2}/lib/libopenblasp-r0.3.18.so
-r-xr-xr-x 1 ludo users 40538768 Jan  1  1970 
/tmp/o1/lib/libopenblasp-r0.3.18.so
-r-xr-xr-x 1 ludo users 40436368 Jan  1  1970 
/tmp/o2/lib/libopenblasp-r0.3.18.so
--8<---------------cut here---------------end--------------->8---

Both have the same symbols though, and in the same order:

--8<---------------cut here---------------start------------->8---
$ diff -u <(objdump -T  /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60-  ) 
<(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so |cut -c60- )
$ echo $?
0
--8<---------------cut here---------------end--------------->8---

… which suggests they include code optimized for the same
micro-architectures because symbols include the name of the
micro-architecture:

--8<---------------cut here---------------start------------->8---
$ objdump -T  /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60-|tail -10
  csymm3m_RU
  cgemv_c_BARCELONA
  csymv_U_HASWELL
  dtrmm_iltncopy_CORE2
  LAPACKE_dsytrs2
  openblas_num_threads_env
  csycon_rook_
  csytri_rook_


--8<---------------cut here---------------end--------------->8---

Some of the offsets differ though:

$ diff -u <(objdump -T  /tmp/o1/lib/libopenblasp-r0.3.18.so  ) <(objdump -T 
/tmp/o2/lib/libopenblasp-r0.3.18.so )
--- /dev/fd/63  2022-02-03 00:10:17.308357982 +0100
+++ /dev/fd/62  2022-02-03 00:10:17.276357923 +0100
@@ -1,5 +1,5 @@
 
-/tmp/o1/lib/libopenblasp-r0.3.18.so:     format de fixer elf64-x86-64
+/tmp/o2/lib/libopenblasp-r0.3.18.so:     format de fixer elf64-x86-64
 
 DYNAMIC SYMBOL TABLE:
 0000000000000000      DF *UND* 0000000000000000  GLIBC_2.3.2 
pthread_cond_signal
@@ -91,57 +91,57 @@
 00000000013edb70 g    DF .text 00000000000001be  Base        
zgemm3m_incopyb_BULLDOZER
 0000000000e6d200 g    DF .text 0000000000002b06  Base        
strsm_kernel_RT_BOBCAT
 0000000000512c00 g    DF .text 0000000000000a0a  Base        zsymv_U_PRESCOTT
-00000000023c7530 g    DF .text 0000000000000201  Base        
LAPACKE_dpttrs_work
+00000000023ae930 g    DF .text 0000000000000201  Base        
LAPACKE_dpttrs_work
 0000000000692000 g    DF .text 0000000000000b89  Base        srot_k_PENRYN
 000000000179caa0 g    DF .text 0000000000000200  Base        dgemm_beta_HASWELL
 0000000000a44690 g    DF .text 00000000000004b4  Base        
dtrsm_iutucopy_OPTERON
-000000000231cfc0 g    DF .text 000000000000021d  Base        
LAPACKE_sstein_work
-0000000002327800 g    DF .text 000000000000014b  Base        LAPACKE_ssytrd
-0000000001ad9100 g    DF .text 00000000000002aa  Base        
chemm_outcopy_SKYLAKEX
+00000000023043c0 g    DF .text 000000000000021d  Base        
LAPACKE_sstein_work
+000000000230ec00 g    DF .text 000000000000014b  Base        LAPACKE_ssytrd
+0000000001acc900 g    DF .text 00000000000002aa  Base        
chemm_outcopy_SKYLAKEX
 00000000017d6c10 g    DF .text 0000000000000c38  Base        cgemv_n_HASWELL
-0000000002327b70 g    DF .text 0000000000000143  Base        LAPACKE_ssytrf
+000000000230ef70 g    DF .text 0000000000000143  Base        LAPACKE_ssytrf
 000000000018f010 g    DF .text 000000000000025c  Base        cblas_stbmv
 0000000000195a20 g    DF .text 000000000000003b  Base        cblas_idamin
-0000000002328d40 g    DF .text 0000000000000101  Base        LAPACKE_ssytri
+0000000002310140 g    DF .text 0000000000000101  Base        LAPACKE_ssytri
 000000000077be00 g    DF .text 0000000000000e65  Base        
ztrsm_kernel_RN_PENRYN
 0000000001583f20 g    DF .text 0000000000001c22  Base        
dtrmm_iltucopy_STEAMROLLER
-00000000021bf830 g    DF .text 0000000000000527  Base        ztbcon_
-0000000001a70630 g    DF .text 00000000000001c7  Base        
dsymm_oltcopy_SKYLAKEX
-000000000245a910 g    DF .text 000000000000001b  Base        
LAPACKE_zpp_nancheck
+00000000021a6c30 g    DF .text 0000000000000527  Base        ztbcon_
+0000000001a640c0 g    DF .text 000000000000066d  Base        
dsymm_oltcopy_SKYLAKEX
+0000000002441d10 g    DF .text 000000000000001b  Base        
LAPACKE_zpp_nancheck
 000000000108ee20 g    DF .text 000000000000014d  Base        
zgemm3m_oncopyb_ATOM
-0000000002409df0 g    DF .text 000000000000035c  Base        
LAPACKE_zgtsvx_work
-0000000001e7d120 g    DF .text 0000000000001743  Base        dlatrs_
-0000000001e948a0 g    DF .text 00000000000001d1  Base        drscl_
+00000000023f11f0 g    DF .text 000000000000035c  Base        
LAPACKE_zgtsvx_work
+0000000001e64520 g    DF .text 0000000000001743  Base        dlatrs_
+0000000001e7bca0 g    DF .text 00000000000001d1  Base        drscl_
 00000000019ac700 g    DF .text 00000000000004bd  Base        
zhemm3m_iucopyb_ZEN
 00000000003c0f30 g    DF .text 000000000000001e  Base        
support_avx512_bf16
-0000000002329ac0 g    DF .text 0000000000000107  Base        LAPACKE_ssytrs
+0000000002310ec0 g    DF .text 0000000000000107  Base        LAPACKE_ssytrs
 0000000000f94890 g    DF .text 00000000000002d3  Base        
ztrmm_oltncopy_BOBCAT
On #guix-hpc Ricardo mentioned encountering this reproducibility issue
earlier.

Ludo’.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]