axiom-developer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Axiom-developer] Lazyness in lexing Boot and Spad


From: Gabriel Dos Reis
Subject: Re: [Axiom-developer] Lazyness in lexing Boot and Spad
Date: 25 Feb 2007 18:01:50 -0600

Waldek Hebisch <address@hidden> writes:

| > 
| > Hello,
| > 
| >   Axiom's implementation of Boot's and Spad source includer
| > (src/boot/btincl2.boot, src/interp/cstream.boot) uses lazyness (faked
| > through function "Delay").  This is also variously known as zipper.
| > My quesion is how much of the complexity introduced by faking
| > lazyness actually outweight the benefits?  Is the lazyness actually
| > beneficial? 
| > 
| 
| AFAICS the main potential of benefit of lazyness is that it allows
| writing algorithm as if it operated on the whole file, but still
| have "interactive" parser -- that is parser which stops reading
| its input when it sees end of a syntactic construct.

I'm familiar with Haskell -- and have written slightly more than "toy"
programs in Haskell, so I appreciate the benefits of lazyness in
general.  I apologize I did not make make it clearer that I'm talking
of the specific case of Axiom's way of consuming the token stream.

I posit that if we had strictness analysis in Axiom, then the analysis
would find out that actually the parts that effectively consumes the
token stream are strict in the token stream argument.  So, lazyness
buys nothing.  I base that conjecture over "debugging" under GCL.
That is why I raised the question of the lazyness benefits in the
specific case of Axiom's current infrastructure.

| This potential benefit is completly lost in Axiom: the command
| line parser requires that the whole construct is on single logical
| line (that is single line after joining all continuated lines).

Yes.  Which a strictness analysis will find out.

| I can guess that command line parser is non-interactive due to
| backtracking and also because whitespace syntax lacks explicit
| end-of-construct markes (for example parser sees that a procedure
| ended only when seeing next procedure or end of file).

Yes.  But, I don't think that makes a huge difference.

| I must say that this whole pipeline:
| 
| scan.boot.pamphlet pile.boot.pamphlet pf2sex.boot.pamphlet
| pf2atree.boot.pamphlet parini.boot.pamphlet packtran.boot.pamphlet
| macex.boot.pamphlet intfile.boot.pamphlet incl.boot.pamphlet
| dq.boot.pamphlet cstream.boot.pamphlet cparse.boot.pamphlet
| astr.boot.pamphlet
| 
| is on my list of candidates for removal.

There are many files on that list that I would disagree on removing.
This statement is based on spending a non-negligible part of the last
10 months studying from different angles.  No doubt they need
improvement.  But, I'm deeply skeptical about their removal.

|  I would like to reuse
| parts of this code, but I belive that what we need can be done
| simpler.  However, rewriting this part for me is of low priority,
| and I probably will not get to this before summer.

I've done some work on it in several local tress, but their
combination is not a coherent structure at this time I'm afraid.

While working on those parts of the compiler (and interpreter), I've
considered a suggestion made by Bill Page some time ago: maybe rewrite
Axiom in Haskell?  On several occasions, I've resisted that idea.
But, if we get the improvements in both Boot and the compiler
structures, it will look a lot like something written in "simple
Haskell".  Funny.

| My vision of future compiler is:  
| - classical lexer which generates basic syntax tree nodes (including
|   line number info)
| - first stage parser (build surface syntax tree) accepting very
|   loose grammar
| - macroexpander
| - second stage parser (flattening trees and checking syntactic
|   constraints which cna be checked only after macro expansion)
| - typechecker, probably multipass.  In first few passes I would only
|   consider interface parts, the final pass would also check bodys
|   (and resolve overloading)
| - final code generation
| 
| Some remarks why such structure:  I want macro expansion rather
| early, because before macro expansion is it impossible to check
| for syntax correctness.  Also, full syntactic information helps
| with typechecking (current parser has it backwards, it uses
| type information for parsing, leading sometimes to weird errors).

I'm unclear about your issue with macro expansion.  From my
understanding, they happen very early in the compilation process --
albeit not like C macro expansion (a horror I would like to see
disappear the surface of the planet).

   --% WHERE
   compWhere([.,form,:exprList],m,eInit) ==
     $insideExpressionIfTrue: local:= false
     $insideWhereIfTrue: local:= true
     e:= eInit
     u:=
       for item in exprList repeat
         [.,.,e]:= comp(item,$EmptyMode,e) or return "failed"
     u="failed" => return nil
     $insideWhereIfTrue:= false
     [x,m,eAfter]:= comp(macroExpand(form,eBefore:= e),m,e) or return nil
     eFinal:=
       del:= deltaContour(eAfter,eBefore) => addContour(del,eInit)
       eInit
     [x,m,eFinal]


I would like Axiom to have the ability to store the following:
  (1) the "parse form"
  (2) the "fully typed ast" before code generation

I have an intermediate representation of Spad in Spad.  It lacks
a couple of feature, but I can handle almost all algebra files.

| Also, current compiler couples code generation with type checking,
| which is potentialy wastefull.  Namely, in some cases the compiler
| has to do backtracking search on possible type assignments, so
| it generates code and then discard it during backtracking (I am
| not sure how serious this problem is -- most of the time backtracking
| is very limited).  

Yes, my guestimate was that the waste generated by backtracking is
small enough.

| During compilation in some cases (IIRC in chaseInferences) the compiler
| uses recursive traversal on a dag (ignoring identity of nodes)
| which leads to about 3m spent on a single file (samewhat surprisingly
| the particular procedure I found shows pathological behaviour only one
| file, on other taking only little time).
| 
| I belive that having typechecking as a separate phase operationg
| essentially on the whole program would allow beter search technique
| and would avoid many time sinks (currently many global data structure
| are invalidated after compilation of any constructors, which is
| probably responsible for nonlinear compile times for multifile
| compilations).

That idea is already present in some form:

intloopSpadProcess(stepNo,lines,ptree,interactive?)==
    $stepNo:local := stepNo
    $currentCarrier := cc := ['carrier]
    ncPutQ(cc, 'stepNumber, stepNo)
    ncPutQ(cc, 'messages, $ncMsgList)
    ncPutQ(cc, 'lines, lines)
    $ncMsgList := nil
    result := CatchAsCan(flung, Catch("SpadCompileItem",
     CATCH($intCoerceFailure, CATCH($intSpadReader,
       interp(cc, ptree, interactive?))))) where
 
        interp(cc, ptree, interactive?) ==
            ncConversationPhase(function phParse,            [cc, ptree])
            ncConversationPhase(function phMacro,            [cc])
            ncConversationPhase(function phIntReportMsgs,[cc, interactive?])
            ncConversationPhase(function phInterpret,        [cc])
 
            #ncEltQ(cc, 'messages) ^= 0 => ncError()
 
    intSetNeedToSignalSessionManager()
    $prevCarrier := $currentCarrier
    result = 'ncEnd     => stepNo
    result = 'ncError   => stepNo
    result = 'ncEndItem => stepNo
    stepNo+1
 
-- Gaby




reply via email to

[Prev in Thread] Current Thread [Next in Thread]