Re: Internal storage costs...

Le 25 juil. 2013 à 02:35, Sean Charles <address@hidden> a écrit :

Hello venerable list…

Is it more efficient to store a list of lexemes as character codes or single character atoms?

Without knowing the C code other than what I know of the FFI, is it more compact to store a list of integers which presumably represent themselves or is it more efficient to use single character atoms ?

I am guessing that the FlyWeight pattern is used or similar which means that the single character atoms are actually pointers to the atom so a list of one hundred 'a'-s is in fact a list of one hundred pointers into the atom store but is the pointer size bigger than the character code size ?

I ask because my lexer is working and producing output like this:

| ?- feltlex('small.felt',X).

X = [comment(block,pos(1,1),[' ','S',t,r,i,n,g,' ',t,e,s,t,i,n,g,'.','\n','\n',' ',' ',' ','A',l,l,o,w,' ',b,a,c,k,s,l,a,s,h,e,d,' ',d,e,l,i,m,t,e,r,' ',i,n,' ',t,h,e,' ',s,e,q,u,e,n,c,e,'.','.','.','.','\n']),chr(/),comment(single,pos(6,1),[' ','D',o,u,b,l,e,' ',q,u,o,t,e,d,' ',s,t,r,i,n,g,s,'.','.','.']),string(double,pos(7,1),[c,h,e,e,s,\,'"',e,b,u,r,g,e,r]),string(double,pos(8,1),[c,h,e,e,s,\,'''',e,b,u,r,g,e,r]),comment(single,pos(10,1),[' ','S',i,n,g,l,e,' ',q,u,o,t,e,d,' ',s,t,r,i,n,g,s,'.','.','.']),string(single,pos(11,1),[c,h,e,e,s,e,\,'"',b,u,r,g,e,r]),string(single,pos(12,1),[c,h,e,e,s,e,\,'''',b,u,r,g,e,r])]

That's from a source file:

/* String testing.

Allow backslashed delimter in the sequence....
*/

; Double quoted strings...
"chees\"eburger"
"chees\'eburger"

; Single quoted strings...
'cheese\"burger'
'cheese\'burger'

Not a brilliant example but it was for testing the comment handling and string consumption allowing for a backslashed single or double quote to be part of the string. It's parsing using get_char/peek_char with LA(1) and that allows me to cope well enough for now. It is s-_expression_ based.

For a really large source file, I want to make sure that I am being as efficient with internal storage as possible because once I have completed the lexer I have to be able to create an AST from it and then translate it into something else and I have already found out recently that GNU Prolog seg-faults under OSX when dealing with large amounts of in-memory data.

So, anybody know what is the more space compact representation, atoms or character codes ?

It is the same: an atom (even a character atom) or an integers needs a cell, ie. machine word (e.g. 32 ou 64 bits). The integer is encoded in the cell while for an atom it is the index of the corresponding entry in the atom table (an hash table). A list needs 2 cells (head and tail).

NB: a structure with N arguments needs 1+N cells: 1 to encode the functor/arity et N cells for the sub-arguments (these N cells can contain an atom or an integer or a reference to a list or another compound term).

BTW, note that there is a Prolog lexer you can use via the built-in predicates read_token/2-3.

Daniel

Thanks,
Sean.

--
Ce message a été vérifié par MailScanner pour des virus ou des polluriels et rien de suspect n'a été trouvé.
_______________________________________________
Users-prolog mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/users-prolog

From:	Daniel Diaz
Subject:	Re: Internal storage costs...
Date:	Sat, 27 Jul 2013 14:52:37 +0400