Re: [Axiom-developer] build-improvements and more-rules.mk

axiom-developer
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Axiom-developer] build-improvements and more-rules.mk

From:	root
Subject:	Re: [Axiom-developer] build-improvements and more-rules.mk
Date:	Mon, 16 Oct 2006 02:45:34 -0400
This is the documentation of daase.lisp.pamphet
Some of it is new and I'll add it to the file.






1) KAF FILE FORMAT ==============================================

This documentation refers to KAF files which are random access files.
NRLIB files are KAF files (look for NRLIB/index.KAF)
The format of a random access file is

byte-offset-of-key-table
first-entry
second-entry
...
last-entry
((key1 . first-entry-byte-address)
 (key2 . second-entry-byte-address)
 ...
 (keyN . last-entry-byte-address))

The key table is a standard lisp alist.

To open a database you fetch the first number, seek to that location,
and (read) which returns the key-data alist. To look up data you
index into the key-data alist, find the ith-entry-byte-address,
seek to that address, and (read).

For instance, see src/share/algebra/USERS.DAASE/index.KAF



One existing optimization is that if the data is a simple thing like a
symbol then the nth-entry-byte-address is replaced by immediate data.

Another existing one is a compression algorithm applied to the
data so that the very long names don't take up so much space.
We could probably remove the compression algorithm as 64k is no
longer considered 'huge'. The database-abbreviation routine
handles this on read and write-compress handles this on write.
The squeeze routine is used to compress the keys, the unsqueeze
routine uncompresses them. Making these two routines disappear
should remove all of the compression.


Indeed, a faster optimization is to simply read the whole database
into the image before it is saved. The system would be easier to
understand and the interpreter would be faster.


The fastest optimization is to fix the time stamp mechanism
which is currently broken. Making this work requires a small
bit of coordination at 'make' time which I forgot to implement.





2) *miss*  ===============================================


Note that if you do

)lisp (setq *miss* t)

you can see when an interpreter lookup has "missed" the in-core
tables and needs to fetch information from the databases.






3) DATABASE FILES ===============================================

Database files are very similar to KAF files except that there
is an optimization (currently broken) which makes the first
item a pair of two numbers. The first number in the pair is
the offset of the key-value table, the second is a time stamp.
If the time stamp in the database matches the time stamp in
the image the database is not needed (since the internal hash
tables already contain all of the information). When the database
is built the time stamp is saved in both the gcl image and the
database.
 



Here I'll try to outline the interp database write procedure

(defun write-interpdb ()
 "build interp.daase from hash tables"
 (declare (special $spadroot) (special *ancestors-hash*))
 (let (opalistpos modemapspos cmodemappos master masterpos obj *print-pretty*
        concategory categorypos kind niladic cosig abbrev defaultdomain
        ancestors ancestorspos out)
  (declare (special *print-pretty*))
  (print "building interp.daase")

; 1. We open the file we're going to create

  (setq out (open "interp.build" :direction :output))

; 2. We reserve some space at the top of the file for the key-time pair
;    We will overwrite these spaces just before we close the file.

  (princ "                              " out)

; 3. Make sure we write it out
  (finish-output out)

; 4. For every constructor in the system we write the parts:

  (dolist (constructor (|allConstructors|))
   (let (struct)

; 4a. Each constructor has a property list. A property list is a list
;     of (key . value) pairs. The property we want is called 'database
;     so there is a ('database . something) in the property list

    (setq struct (get constructor 'database))

; 5 We write the "operationsalist"
; 5a. We remember the current file position before we write
;     We need this information so we can seek to this position on read

    (setq opalistpos (file-position out))

; 5b. We get the "operationalist", compress it, and write it out

    (print (squeeze (database-operationalist struct)) out)

; 5c. We make sure it was written

    (finish-output out)

; 6 We write the "constructormodemap"
; 6a. We remember the current file position before we write

    (setq cmodemappos (file-position out))

; 6b. We get the "constructormodemap", compress it, and write it out

    (print (squeeze (database-constructormodemap struct)) out)

; 6c. We make sure it was written

    (finish-output out)

; 7. We write the "modemaps"
; 7a. We remember the current file position before we write

    (setq modemapspos (file-position out))

; 7b. We get the "modemaps", compress it, and write it out

    (print (squeeze (database-modemaps struct)) out)

; 7c. We make sure it was written

    (finish-output out)

; 8. We remember source file pathnames in the obj variable

    (if (consp (database-object struct)) ; if asharp code ...
     (setq obj
      (cons (pathname-name (car (database-object struct)))
            (cdr (database-object struct))))
     (setq obj
      (pathname-name
        (first (last (pathname-directory (database-object struct)))))))

; 9. We write the "constructorcategory", if it is a category, else nil
; 9a. Get the constructorcategory and compress it

    (setq concategory (squeeze (database-constructorcategory struct)))

; 9b. If we have any data we write it out, else we don't write it
;     Note that if there is no data then the byte index for the
;     constructorcatagory will not be a number but will be nil.

    (if concategory  ; if category then write data else write nil
     (progn
      (setq categorypos (file-position out))
      (print concategory out)
      (finish-output out))
     (setq categorypos nil))

; 10. We get a set of properties which are kept as "immediate" data
;     This means that the key table will hold this data directly
;     rather than as a byte index into the file.
; 10a. niladic data

    (setq niladic (database-niladic struct))

; 10b. abbreviation data (e.g. POLY for polynomial)

    (setq abbrev (database-abbreviation struct))

; 10c. cosig data

    (setq cosig (database-cosig struct))

; 10d. kind data

    (setq kind (database-constructorkind struct))

; 10e. defaultdomain data

    (setq defaultdomain (database-defaultdomain struct))

; 11. The ancestor data might exist. If it does we fetch it, 
;     compress it, and write it out. If it does not we place
;     and immediate value of nil in the key-value table

    (setq ancestors (squeeze (gethash constructor *ancestors-hash*))) 
;cattable.boot
    (if ancestors
     (progn
      (setq ancestorspos (file-position out))
      (print ancestors out)
      (finish-output out))
     (setq ancestorspos nil))

; 12. "master" is an alist. Each element of the alist has the name of
;     the constructor and all of the above attributes. When the loop
;     finishes we will have constructed all of the data for the key-value
;     table

    (push (list constructor opalistpos cmodemappos modemapspos
      obj categorypos niladic abbrev cosig kind defaultdomain
      ancestorspos) master)))

; 13. The loop is done, we make sure all of the data is written

  (finish-output out)

; 14. We remember where the key-value table will be written in the file

  (setq masterpos (file-position out))

; 15. We compress and print the key-value table

  (print (mapcar #'squeeze master) out)

; 16. We make sure we write the table

  (finish-output out)

; 17. We go to the top of the file

  (file-position out 0)

; 18. We write out the (master-byte-position . universal-time) pair
;     Note that if the universal-time value matches the value of
;     *interp-stream-stamp* then there is no reason to read the
;     interp database because all of the data is already cached in
;     the image. This happens if you build a database and immediatly
;     save the image. The saved image already has the data since we
;     just wrote it out. If the *interp-stream-stamp* and the database
;     time stamp differ we "reread" the database on startup. Actually
;     we just open the database and fetch as needed. You can see fetches
;     by setting the *miss* variable non-nil.

  (print (cons masterpos (get-universal-time)) out)

; 19. We make sure we write it.

  (finish-output out)

; 20 And we are done

  (close out)))






4) DAASE.LISP DOCUMENTATION =========================================

;;TTT 7/2/97
; Regarding the 'ancestors field for a category: At database build
; time there exists a *ancestors-hash* hash table that gets filled
; with CATEGORY (not domain) ancestor information. This later provides
; the information that goes into interp.daase This *ancestors-hash*
; does not exist at normal runtime (it can be made by a call to
; genCategoryTable). Note that the ancestor information in
; *ancestors-hash* (and hence interp.daase) involves #1, #2, etc
; instead of R, Coef, etc. The latter thingies appear in all
; .NRLIB/index.KAF files. So we need to be careful when we )lib
; categories and update the ancestor info.


; This file contains the code to build, open and access the .DAASE
; files this file contains the code to )library NRLIBS and asy files

; There is a major issue about the data that resides in these
; databases.  the fundamental problem is that the system requires more
; information to build the databases than it needs to run the
; interpreter.  in particular, MODEMAP.DAASE is constructed using
; properties like "modemaps" but the interpreter will never ask for
; this information.

; So, the design is as follows:
;  first, the MODEMAP.DAASE needs to be built. this is done by doing
; a )library on ALL of the NRLIB files that are going into the system.
; this will bring in "modemap" information and add it to the
; *modemaps-hash* hashtable.
;  next, database build proceeds, accessing the "modemap" property
; from the hashtables. once this completes this information is never
; used again.
;  next, the interp.daase database is built. this contains only the
; information necessary to run the interpreter. note that during the
; running of the interpreter users can extend the system by do a
; )library on a new NRLIB file. this will cause fields such as "modemap"
; to be read and hashed.

; In the old system each constructor (e.g. LIST) had one library directory
; (e.g. LIST.NRLIB). this directory contained a random access file called
; the index.KAF file. the interpreter needed this KAF file at runtime for
; two entries, the operationAlist and the ConstructorModemap.
; during the redesign for the new compiler we decided to merge all of
; these .NRLIB/index.KAF files into one database, INTERP.DAASE.
; requests to get information from this database are intended to be
; cached so that multiple references do not cause additional disk i/o.
; this database is left open at all times as it is used frequently by
; the interpreter. one minor complication is that newly compiled files
; need to override information that exists in this database.
;   The design calls for constructing a random read (KAF format) file
; that is accessed by functions that cache their results. when the
; database is opened the list of constructor-index pairs is hashed
; by constructor name. a request for information about a constructor
; causes the information to replace the index in the hash table. since
; the index is a number and the data is a non-numeric sexpr there is
; no source of confusion about when the data needs to be read.
;
; The format of this new database is as follows:
;
;first entry:
; an integer giving the byte offset to the constructor alist
; at the bottom of the file
;second and subsequent entries (one per constructor)
; (operationAlist)
; (constructorModemap)
; ....
;last entry: (pointed at by the first entry)
; an alist of (constructor . index) e.g.
;  ( (PI offset-of-operationAlist offset-of-constructorModemap)
;   (NNI offset-of-operationAlist offset-of-constructorModemap)
;    ....)
; This list is read at open time and hashed by the car of each item.

; the system has been changed to use the property list of the
; symbols rather than hash tables. since we already hashed once
; to get the symbol we need only an offset to get the property
; list. this also has the advantage that eq hash tables no longer
; need to be moved during garbage collection.
;  there are 3 potential speedups that could be done. the best
; would be to use the value cell of the symbol rather than the
; property list but i'm unable to determine all uses of the
; value cell at the present time.
;  a second speedup is to guarantee that the property list is
; a single item, namely the database structure. this removes
; an assoc but leaves one open to breaking the system if someone
; adds something to the property list. this was not done because
; of the danger mentioned.
;  a third speedup is to make the getdatabase call go away, either
; by making it a macro or eliding it entirely. this was not done
; because we want to keep the flexibility of changing the database
; forms.

; the new design does not use hash tables. the database structure
; contains an entry for each item that used to be in a hash table.
; initially the structure contains file-position pointers and
; these are replaced by real data when they are first looked up.
; the database structure is kept on the property list of the
; constructor, thus, (get '|DenavitHartenbergMatrix| 'database)
; will return the database structure object.

; each operation has a property on its symbol name called 'operation
; which is a list of all of the signatures of operations with that name.

; -- tim daly






5) CREATING DATABASES =============================================

; making new databases consists of:
;  1) reset all of the system hash tables
;  *) set up Union, Record and Mapping
;  2) map )library across all of the system files (fills the databases)
;  3) loading some normally autoloaded files
;  4) making some database entries that are computed (like ancestors)
;  5) writing out the databases
;  6) write out 'warm' data to be loaded into the image at build time
; note that this process should be done in a clean image
; followed by a rebuild of the system image to include
; the new index pointers (e.g. *interp-stream-stamp*)
; the system will work without a rebuild but it needs to
; re-read the databases on startup. rebuilding the system
; will cache the information into the image and the databases
; are opened but not read, saving considerable startup time.
; also note that the order the databases are written out is
; critical. interp.daase depends on prior computations and has
; to be written out last.


t
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Axiom-developer] build-improvements and more-rules.mk, (continued)
Prev by Date: Re: [Axiom-developer] build-improvements and more-rules.mk
Next by Date: Re: [Axiom-developer] build-improvements and more-rules.mk
Previous by thread: Re: [Axiom-developer] build-improvements and more-rules.mk
Next by thread: [Axiom-developer] parallel algorithms
Index(es):
- Date
- Thread