Re: [Axiom-developer] 2.7 build

axiom-developer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Axiom-developer] 2.7 build

From:	Waldek Hebisch
Subject:	Re: [Axiom-developer] 2.7 build
Date:	Thu, 12 Jul 2007 19:54:10 +0200 (CEST)

> Greetings!
> 
> Waldek Hebisch <address@hidden> writes:
> 
> > I am now testing yesterday gcl-cvs.  I have set up safety to 3.  I hit
> > two problems.  One is segfault -- it went away when I tried to debug
> > it:
> > 
> > 
> > )compile "RADCAT.spad"
> > 
> > Segmentation violation: c stack ok:signalling error
> >    >> System error:
> >    Condition in MAKE-INPUT-FILENAME [or a callee]: INTERNAL-SIMPLE-ERROR: 
> > Caught fatal error [memory may be damaged]: Segmentation violation.
> > 
> > 
> > I am somewhat worried: this is a first time I see a segfault in safety 3
> > build. 
> > 
> 
> As I mentioned a while ago, I'm trying to implement a more reasonable
> (e.g. faster) safety 3, partially in response to your ealier request.
> The older mode is still available at (the non-standard) safety 4.  It
> would not surprise me if there are not a few issues to work out, but
> the ansi-test suite, which is compiled at safety 3, is working fine.
> 
> For quick debugging of segfaults, run with )set break break and )lisp
> (si::use-fast-links nil), then :bt at the break.  For robust
> debugging, build gcl with --enable-debug and run under gdb.
>

Yes, I did

)set break break
)lisp (si::use-fast-links nil)
)read "boo1.input"

and the segfault went away -- this stage finished without problems.

> I think the work-around you posted in your patch earlier regarding
> universal-error-handler effectively disables error reporting, but am
> not completely sure.  In your tree, I've tried simply removing the GCL
> from #-(or gcl ccl) above and commenting out the
> universal-error-handler embedding with (apparently) successful
> results.  
>

Axiom tries to catch all errors and provide its own error reporting
-- Axiom messages may lack some information compared to native Lisp 
error reporting.  To do this Axiom either uses Ansi condition handler
or GCL specific universal-error-handler embedding.   It is OK to
disable Axiom error catcher during build.  "Production" Axiom have
to catch numerical errors, but otherwise Axiom error hander just
pretends that things are under control...

> This said, I have seen a segfault in your tree at the interpsys stage
> which I would love to chase down.  Stephen's success has led me to
> believe this is something specific to your tree, but GCL should
> ideally never segfault.  Would it be possible for you and Stephen to
> isolate what differences in your trees allow him to proceed well into
> the algebra stage, but stops your tree shortly after interpsys? 
> 

There are two possible reasons:

1) differences in Spad compiler.  wh-sandox fixed several bugs and
   removed a lot of code.  It is possible that removal introduced
   a new bug.  It is possible that Stephen fixed a bug that remains
   unfixed in wh-sandox.

2) wh-sandox compiles algebra in quite different way.  In particular,
   in first stage wh-sandox compiles about 200 files in a single
   Lisp image.  This puts much more stress on garbage collector than
   the old way.  Anather thing is that some files which silver
   compiles at the very end are compiled just at the beggining.
   Yet another thing: wh-sandox uses extra flags, so code takes
   different path.

> To be most efficient, it would be absolutely fantastic if you could
> boil the error down to a simple example in lisp akin to Stephen's very
> helpful reports of late.  My problem is that I do not know the axiom
> sources nearly as well as the GCL sources -- it will therefore take
> quite a while simply to find the relevant function leading to the
> error.  

In case of segfault I doubt that a small example is possible. 

> 
> For example, in the report below ...
> 
> > The second problem is that the function |ICformat| apparently is
> > miscompiled, Lisp code differs only trivially from code in 2.6.8
> > build.  In particular at the beggining we have:
> > 
> > (DEFUN |ICformat| (|u|)
> >   (PROG (|v| |l'| |l1| |l|)
> >     (RETURN
> >       (SEQ (COND
> >              ((ATOM |u|) |u|)
> >              ((AND (PAIRP |u|) (EQ (QCAR |u|) '|has|))
> >               (|compHasFormat| |u|))
> >              ((OR (AND (PAIRP |u|) (EQ (QCAR |u|) 'AND)
> >                        (PROGN (SPADLET |l| (QCDR |u|)) 'T))
> >                   (AND (PAIRP |u|) (EQ (QCAR |u|) '|and|)
> >                        (PROGN (SPADLET |l| (QCDR |u|)) 'T)))
> >               (SPADLET |l|
> >                        (REMDUP (PROG (#1=#:G7955)
> 
> I take it this source is generated by some other function in axiom
> which is malfunctioning, (akin to the writing of #<compiled function
> ...> in the generated lisp sources I reported earlier).  What is this
> function?  How do I run it to produce the above output?  I take it the
> difference lies in the gensym above -- this appears clearly wrong, and
> indicates an error either in GCL's compilation of the function that
> generates this source, or a mistake in one of the patches applied in
> your tree.  Stephen, can you reproduce this?
> 

Let me repeat:  I belive that the Lisp code above is correct.  This
code is an output of depsys, which translates Boot to Common Lisp.
The translator extenivly uses gensym-ed symbols, and AFAICS gensym-ed
work fine.  In fact, what I am saying is: modulo names of gensym-ed
symbols (which are irrelevant to the meaning) the Lisp code for this
function produced during 2.7.0 build is identical to Lisp code from
build using 2.6.8.

This function is contained in the file 'functor.boot.pamphlet' and
is a part of the Spad compiler.  'functor.boot.pamphlet' is translated
to Lisp giving 'functor.clip'.  'functor.clip' is compiled to
'functor.o'.  'functor.o' is used during Spad compilation.
'functor.o' produced by gcl-2.7.0 works incorrectly.

> > 
> > 
> > however, the call:
> > 
> > (|ICformat| '(|and| |foo| |bar| |foo|))
> > 
> > in 2.7.0 falls to the end and signals error, while in 2.6.8 compiled
> > Axiom it works OK.  It looks that Steve had similar problem (but later
> > wrote that it works OK).
> > 
> 
> I don't see how this is related to Stephen's earlier report regarding
> logical expression evaluation -- am I missing something?  It appears
> we have a printing problem in your tree somewhere.
>

Well, I belive that the logical expression in the code above is
evaluated incorrectly.  Basically, the |ICformat| performs a
series of pattern matches and depending on the outcame chooses
code to execute.  If no of the patterns matches |ICformat| signals
error.  The input above should be matched by the expression:

> >              ((OR (AND (PAIRP |u|) (EQ (QCAR |u|) 'AND)
> >                        (PROGN (SPADLET |l| (QCDR |u|)) 'T))
> >                   (AND (PAIRP |u|) (EQ (QCAR |u|) '|and|)
> >                        (PROGN (SPADLET |l| (QCDR |u|)) 'T)))

but apparently, the expression above compiled by 2.7.0 does not
match (the code takes error branch).  I have modified by hand the
Lisp file changing the expression above to:

             ((OR (AND (PAIRP |u|) (OR (EQ (QCAR |u|) 'AND) (EQ (QCAR |u|) 
'|and|) )
                       (PROGN (SPADLET |l| (QCDR |u|)) 'T))
                  (AND (PAIRP |u|) (EQ (QCAR |u|) 'AND)
                       (PROGN (SPADLET |l| (QCDR |u|)) 'T)))

Ater such change I was able to succesfully continue the build
(currently I am running the test suite).

So, I belive that the expression above is miscompiled by gcl.  I will
try to reduce problem to smaller example.

> > One extra remark: the extra PROG which I saw previously seem to be
> > gone -- I am not sure if it is safety 3 or something in gcl changed.
> > But now I see:
> > 
> > (DEFUN |ALIST;dictionary;$;1| ($) (SPADCALL NIL (QREFELT $ 11)))
> > 
> > (the same as in other dialects), while earlier 2.7 gave me:
> > 
> > (DEFUN |ALIST;dictionary;$;1| ($)
> >   (PROG () (RETURN (SPADCALL NIL (QREFELT $ 11)))))
> > 
> 
> These are equivalent in lisp, but it still indicates some instability
> in the axiom code generator.  There is nothing in GCL  that I can see
> at the moment that would automatically make this conversion.
>

Axiom tries to "optimize" generated Lisp code -- removal of PROG is
almost surely one of such optimizations.  However, Axiom makes various
unportable assumptions, and probably one of this assumptions is
(was) broken by the new gcl.

> Please let me know if my suggestions above are unworkable.
> 
> BTW, am assuming we are still working with the source produced by svn
> update in wh-sandbox as modified by the patches your posted.  If the
> source tree is different, a complete set of instructions describing
> how to reproduce this would be most helpful.
> 

I am working with the same tree as my first trial at gcl-2.7 build, namely
revision 636 of wh-sandbox + the patch that I posted.  If you did update
you probably got revision 646 which is slightly different.

-- 
                              Waldek Hebisch
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

[Axiom-developer] 2.7 build, (continued)
- [Axiom-developer] Literate Programming, daly, 2007/07/12
  - Re: [Axiom-developer] Literate Programming, Gabriel Dos Reis, 2007/07/12
  - Re: [Axiom-developer] Literate Programming, Camm Maguire, 2007/07/12
- [Axiom-developer] Literate Programming, daly, 2007/07/12
  - Re: [Axiom-developer] Literate Programming, Gabriel Dos Reis, 2007/07/13

Prev by Date: Re: [Axiom-developer] 2.7 build
Next by Date: [Axiom-developer] Literate Programming
Previous by thread: Re: [Axiom-developer] 2.7 build
Next by thread: Re: [Axiom-developer] 2.7 build
Index(es):
- Date
- Thread