guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bootstrappable] Re: wip-full-source-bootstrap: from a 357-byte `hex


From: Danny Milosavljevic
Subject: Re: [bootstrappable] Re: wip-full-source-bootstrap: from a 357-byte `hex0' to 'hello'
Date: Thu, 7 Jan 2021 21:10:58 +0100

Hi Paul,
Hi Janneke,

On Thu, 07 Jan 2021 11:43:39 +0100
Jan Nieuwenhuizen <janneke@gnu.org> wrote:

> > This is really exciting, great job Jan! Do you need any help, e.g. on
> > the ARM side, or with optimising the integration?

> Thanks!  We really could use help with this (Danny?).  

@Paul:

Yeah, help would be nice!

We are now pretty far along in bootstrapping--which means you'd ideally set up
an armhf machine, install guix on it and then debug gawk-mesboot0 using that in
order to help.  Can you do that?

A writeup of how to debug a current problem we are facing follows.  If you can
help with fixing that problem, please do :)

We currently are at mescc -> tinycc -> tinycc -> tinycc -> gawk-mesboot0 or so.

The next steps for us are:

* Debug why gawk-mesboot0 doesn't work correctly (see below--and Janneke's
E-Mail)

  * That will entail enabling gdb to work on tcc (TinyCC) executables and/or
    reading ARM assembly.  Since the error does not cause a failure
    immediately, it will be difficult to pinpoint exactly where the error was
    introduced.

To reproduce the bug:

  ./gawk 'BEGIN { i = 5; ++i; print(i) }'
  5

(yes, it prints 5.  Same happens with "--".  But i += 1 works fine.
So does i *= 2).

You will need Guix on armhf.  I do NOT recommend using an aarch64 machine
with an aarch64 kernel for the time being (I'll fix it eventually--but right
now that's a really bad idea with Guix).

> To paint the
> picture for people listening in: Next to ARM it may need some Guix
> skills, or even more work to reproduce our experimental ARM bootstrap
> outside of Guix.

> A very short summary of where we are, on wip-arm-bootstrap, building
> gawk-mesboot0
> 
> --8<---------------cut here---------------start------------->8---
> ./pre-inst-env guix build -e '(@@ (gnu packages commencement) gawk-mesboot0)'
> --8<---------------cut here---------------end--------------->8---
> 
> produces a gawk binary that fails to increment a variable
> 
> --8<---------------cut here---------------start------------->8---
> 11:35:59 janneke@novena:~/src/wip-arm-bootstrap [env]
> $ /gnu/store/f756xxxqj3mabaax5r4ldrxh01a9q54r-gawk-mesboot0-3.0.0/bin/gawk -f 
> add.awk add.awk 
> i=1
> i=2
> 11:36:14 janneke@novena:~/src/wip-arm-bootstrap [env]
> $ /gnu/store/f756xxxqj3mabaax5r4ldrxh01a9q54r-gawk-mesboot0-3.0.0/bin/gawk -f 
> inc.awk inc.awk 
> i=     0
> i=     0
> 11:36:27 janneke@novena:~/src/wip-arm-bootstrap [env]
> --8<---------------cut here---------------end--------------->8---
> 
> So could be anything, could bin in tcc-boot or in gawk-mesboot0...

Next steps:

* find function in gawk 3.0.0 that interprets "++" (yacc grammar),
  * find out why it doesn't work
    * find out why tcc miscompiles it

In order to read the source of the gawk used, invoke

  guix build -S -e '(@@ (gnu packages commencement) gawk-mesboot0)'

.

In there, the yacc grammar is in awk.y (and the generated parser is in
awktab.c).

The AST node types generated for "++" and "--" are Node_preincrement and
Node_predecrement, respectively.

The evaluator is in eval.c (@janneke: and it uses setjmp.  At least
longjmp seems not to be entered for our gawk problem here... phiew).

The place where Node_preincrement is finally handled is in op_assign in the
latter file, which does:

        case Node_preincrement:
        case Node_predecrement:
                unref(*lhs);
                *lhs = make_number(lval +
                               (tree->type == Node_preincrement ? 1.0 : -1.0));
                tree->lnode->flags |= SCALAR;
                if (after_assign)
                        (*after_assign)();
                return *lhs;

The case that does work (+= 1) is handled in the same function:

        lval = force_number(*lhs);
        rval = force_number(tmp);
[...]
        case Node_assign_plus:
                *lhs = make_number(lval + rval);
                break;

Debugging gawk on armhf via

   gdb --args 'BEGIN { ++i; print(i) }'

I get:

eval.c 

│  >857                         *lhs = make_number(lval +                       
                                                                              │
│   858                                        (tree->type == Node_preincrement 
? 1.0 : -1.0));                                                               │

And in asm, after it ascertained that tree->type == Node_preincrement, it does:

│   0x15598 <op_assign+320> ldr     lr, [pc]        ; 0x155a0 <op_assign+328>   
                                                                              │
│   0x1559c <op_assign+324> b       0x155a4 <op_assign+332>                     
                                                                              │
│   0x155a0 <op_assign+328>                CONSTANT 0
│   0x155a4 <op_assign+332> vldr    d0, [lr]                                    
                                                                              │
│   0x155a8 <op_assign+336> vldr    d1, [r11, #-16]                             
                                                                              │
│   >0x155ac <op_assign+340> vadd.f64        d0, d1, d0                         
                                                                              │
│   0x155b0 <op_assign+344> vpush   {d0}                                        
                                                                              │
│   0x155b4 <op_assign+348> mov     r2, #97 ; 0x61                              
                                                                              │
│   0x155b8 <op_assign+352> pop     {r0, r1}                                    
                                                                              │
│   0x155bc <op_assign+356> bl      0x23208 <mk_number>                         
                                                                              │

Registers before the vadd are:

d0             {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0x0, 
0x0, 0x0}, u32 = {0x0, 0x0}, u64 = 0x0, f32 = {0x0, 0x0}, f64 = 0x0}
d1             {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0x0, 
0x0, 0x0}, u32 = {0x0, 0x0}, u64 = 0x0, f32 = {0x0, 0x0}, f64 = 0x0}

And [r11, #-16] seems to be used more often in the assembly, so is probably 
lval.  Unsurprisingly, it's 0.

But so is the other constant added to it.  WTF!

And maybe more useful:

 gdb --args ../gawk 'BEGIN { i = 5; ++i ; print(i) }':

right before it does the ++:

│   0x1555c <op_assign+260> ldr     r1, [r11, #12]                              
                                                                              │
│   0x15560 <op_assign+264> add     r1, r1, #24                                 
                                                                              │
│   0x15564 <op_assign+268> ldr     r2, [r1]                                    
                                                                              │
│   0x15568 <op_assign+272> cmp     r2, #10                                     
                                                                              │
│   0x1556c <op_assign+276> moveq   r1, #1                                      
                                                                              │
│   0x15570 <op_assign+280> movne   r1, #0                                      
                                                                              │
│   0x15574 <op_assign+284> str     r0, [r11, #-60] ; 0xffffffc4                
                                                                              │
│   0x15578 <op_assign+288> cmp     r1, #0                                      
                                                                              │
│   0x1557c <op_assign+292> beq     0x15584 <op_assign+300>                     
                                                                              │
│   0x15580 <op_assign+296> b       0x15598 <op_assign+320>                     
                                                                              │
│   0x15584 <op_assign+300> ldr     lr, [pc]        ; 0x1558c <op_assign+308>   
                                                                              │
│   0x15588 <op_assign+304> b       0x15590 <op_assign+312>                     
                                                                              │
│   0x1558c <op_assign+308> CONSTANT 0x000508ec (little endian)
│   0x15590 <op_assign+312> vldr    d0, [lr]                                    
                                                                              │
│   0x15594 <op_assign+316> b       0x155a8 <op_assign+336>                     
                                                                              │
│   0x15598 <op_assign+320> ldr     lr, [pc]        ; 0x155a0 <op_assign+328>   
                                                                              │
│   0x1559c <op_assign+324> b       0x155a4 <op_assign+332>                     
                                                                              │
│   0x155a0 <op_assign+328> CONSTANT 0x000508f4 (little endian)
│   0x155a4 <op_assign+332> vldr    d0, [lr]          ; instruction: 
0xed9e0b00; so sign=1; note: double
│   0x155a8 <op_assign+336> vldr    d1, [r11, #-16]   ; instruction: 
0xed1b1b04; so sign=0; note: double
│   >0x155ac <op_assign+340> vadd.f64        d0, d1, d0                         
                                                                              │
│   0x155b0 <op_assign+344> vpush   {d0}                                        
                                                                              │
│   0x155b4 <op_assign+348> mov     r2, #97 ; 0x61                              
                                                                              │
│   0x155b8 <op_assign+352> pop     {r0, r1}                                    
                                                                              │
│   0x155bc <op_assign+356> bl      0x23208 <mk_number>                         
                                                                              │

At address 0x000508ec there is: 0 (64 bit).  That is incorrect.
At address 0x000508f4 there is: 0 (64 bit).  That is incorrect.

Registers before the vadd are:

d0             {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0x0, 
0x0, 0x0}, u32 = {0x0, 0x0}, u64 = 0x0, f32 = {0x0, 0x0}, f64 = 0x0}
d1             {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x14, 0x40}, u16 = {0x0, 
0x0, 0x0, 0x4014}, u32 = {0x0, 0x40140000}, u64 = 0x4014000000000000, f32 = 
{0x0,
0x2}, f64 = 0x5}

So now we have to find out why tcc put those 0s in those constant literal slots.

My wild guess is because it's trying to use 64 bit ints and we only gave it a
32 bit int at some point.  But that should have been fixed by compiling tcc
using tcc.  Weird...

tcc does the following (in arm-gen.c):

  int v, ft, fc, fr, sign;
  uint32_t op;
  SValue v1;

  fr = sv->r;
  ft = sv->type.t;
  fc = sv->c.i;

      if(is_float(ft)) {
        calcaddr(&base,&fc,&sign,1020,2);
#ifdef TCC_ARM_VFP
        op=0xED100A00; /* flds */
        if(!sign)
          op|=0x800000;
        if ((ft & VT_BTYPE) != VT_FLOAT)
          op|=0x100;   /* flds -> fldd */
        o(op|(vfpr(r)<<12)|(fc>>2)|(base<<16));

Uhhh.... how does a 64 bit double fit into an ARM (32 bit) int (variable "fc")?

calcaddr seems to take the second argument, stuff it into a const
pool (if too big) and return the address to the item in the second argument
again.  If it's not too big, then it's just used directly without pool entry.

So it might make sense to add printfs to these locations in tcc and
see what it does with the constants.

@Paul: Could you decode what calcaddr is supposed to do, exactly (see
git@gitlab.com:janneke/tinycc.git arm-gen.c)?  That would help a lot.

Something maybe unrelated: trying to run gawk using

  qemu-arm -singlestep -d nochain,in_asm,cpu ./gawk

I get

  0x0004e760:  e3511000  cmp      r1, #0

  R00=00000000 R01=00000020 R02=407ffc58 R03=00000000
  R04=00000000 R05=00000000 R06=00000000 R07=00000000
  R08=00000000 R09=00000000 R10=0004fb78 R11=407ffc40
  R12=407ffc58 R13=407ffc30 R14=0004e9b8 R15=0004e760
  PSR=20000010 --C- A usr32
  qemu: uncaught target signal 4 (Illegal instruction) - core dumped
  Illegal instruction

The correct encoding of CMP is e3510000.
The "illegal" encoding of CMP has "set condition codes" (bit 20) set.

@Paul:

Is setting the "set condition codes" flag on CMP only deprecated or
actually forbidden?  (I.e. am I chasing a qemu misemulation (again...) or
is this actually a problem?)

Attached the buggy gawk executable.

I've also searched for other double literals (/= 0) in gawk's source code,
and there's one in io.c (in do_getline), and one in builtin.c (rlength = -1.0
if a regexp does not match in do_match).
That's it O_o.
So there are very few double literals in there apparently.

And indeed:

$ echo | ./gawk 'BEGIN { print(getline); }'
0    <--- wrong

No idea how to test the rlength thing, though.

@Janneke: 

I'm not sure we have automated it far enough for Paul to reproduce this
bug from scratch yet, right?

Maybe we should already do a Mes for ARM release after all ?
(provided the x86 bootstrap still works with that version, of course)

I mean release early, release often and all.  Otherwise it's really
difficult for new people to get started.

Attachment: gawk
Description: Binary data

Attachment: pgpE2Z9mjCliC.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]