texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Questions regarding conversion between strings and tre


From: Henri Lesourd
Subject: Re: [Texmacs-dev] Questions regarding conversion between strings and trees
Date: Sat, 04 Mar 2006 15:43:34 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02

David MENTRE wrote:

Hello,

I'm pursuing the idea of a literate programming mode for texmacs. My
current running code is able to parse (and produce in the reverse
direction) a source file into a Scheme structure like the following one
(" and \ are backquoted in the Scheme way). TM and CODE a two Scheme
symbols representing respectively texmacs document and literate code.

 '((tm . "<TeXmacs|1.0.6>")
   (tm . "")
   (tm . "<style|generic>")
   (tm . "")
   (tm . "<\\body>")
   (tm . "  Sample texmacs document.")
   (tm . "")
   (code . "(define (hello-world) (display \"Hello world!\"))")
   (code . "")
   (tm . "  \;")
   (tm . "</body>")
   (tm . "")
   (tm . "<\\initial>")
   (tm . "  <\\collection>")
   (tm . "    <associate|language|french>")
   (tm . "  </collection>")
   (tm . "</initial>")


I need to transform this data structure into a texmacs document (and in
the reverse way for saving). I would like to keep the CODE blocks into a
specific node type of texmacs document tree.

So my questions:

1. Is it possible to add a new node type (something like
   <\lp-code></lp-code>) to texmacs tree? How? If it is possible, I
   suppose I need to define a style file for rendering?

Just define a new macro, for example :
[[
  <assign|lp-code|<macro|x|<with|font-shape|italic|<arg|x>>>
]]

, and then :
[[
  <lp-code|hello>
]]

is displayed as an 'hello' in italics. Of course it's
up to you to decide what kind of display you really
want for <lp-code|...>.

To summarize, as far as I know, the only way to define
new markup / new node types in TeXmacs is by defining
a new macro using <assign|name|<macro|...>>.

Is this the answer to your question, or are you in
fact asking for a more broader question ?


2. How can I convert serialized part of texmacs document into tree data
   structure suitable for inclusion in the current document buffer?
   Apparently, string->tree could be used for this but, from my
   attempts, tags are not interpreted correctly. E.g.:

 (display* (string->tree "<TeXmacs|1.0.6>"))
gives
 <tree \<TeXmacs\|1.0.6\>>
and not an expected
 <tree <TeXmacs|1.0.6>>

Two points :

1. The header and suffix parts of a TeXmacs document
  are currently *not* part of what you can access
  with the TeXmacs tree API (i.e. (path->tree),
  (path-assign), etc.) : the part you can access
  with the API is only the part located inside
  the <body|...> part of a TeXmacs document ;

2. You can *not* build composite TeXmacs trees
  with (string->tree). What (string->tree) does
  is only building *atomic* TeXmacs trees (leaves).
  This is why in any case, a command like :
  [[
     (string->tree "everything <you> want")
  ]]

  will always build an atomic tree like :
  [[
     <tree "everything \<you\> want">
  ]]

  If you want to build a composite TeXmacs tree,
  you must use the function (stree->tree). For
  example, if you do :
  [[
     (tree->stree '(with "font-shape" "italics" (underline "Hello")))
  ]]

  *then* you get the following composite TeXmacs tree :
  [[
     <with|font-shape|italics|<underline|Hello>>
  ]]


   Or maye I don't interpret display* output correctly? In a previous
   email Henri said that (string->tree "<gtr>") produces the expected
   ">" character in TeXmacs but a display* still prints "<tree
   \<gtr\>>" on console.

This is because "<gtr>" is a valid string representation of ">",
if you want the symbol ">" itself appearing in a TeXmacs document.
You can observe that for example (display (string->tree ">"))
also works, you get <tree "\>">.

Thus there is an ambiguity, here, TeXmacs should consider
either ">" or "<gtr>" for being the appropriate representation
of the symbol ">" in <tree "..."> leaves, but not the two...

Anyway, what is important is to be able to generate
a symbol ">" if you need to, namely, if you write
to a file you know that you must use "\<gtr\>",
and that inside TeXmacs (string->tree "<gtr>")
amounts in fact to ">".


3. Moreover, I'm wondering how to handle the issue that opening and
   closing tags (for example <\body> and </body>) are not in the same
   string. One solution would be to:

a. first convert (code . "toto") lines into (tm . "<\lp-code>toto</lp-code>");

   b. and then concat all the strings to do a big string->tree on the
      final string.

   Is there a better way to do this?

The solution to your problem is either :

a. You generate the content of a TeXmacs file. In this case,
  no problem with <\body> that starts on one line, etc., because
  you generate everything. Moreover, givent that you know exactly
  what you want, what you must generate is clearly defined by
  the syntax of TeXmacs file's markup ;

b. You want to change a TeXmacs document 'on the fly', then
  the right thing to do is (I get your very example) :
  [[
     (stree->tree
       '(document
           " Sample texmacs document."
           ""
           (lp-code "(define (hello-world) (display \"Hello world!\"))")
           (lp-code "")
           ""
        ))
  ]]

  Some explanation is needed here for the use of '(document)
  and the non-need to use ";\" : inside TeXmacs, the
  markup :
  [[
     (document "A" "B" "C")
  ]]

  is the one that is in fact used to implement the splitting
  of the text into several paragraphs ; namely :
  [[
     (document "A" "B" "C")
  ]]

  is displayed as :
  [[
     A
     B
     C
  ]]

  inside TeXmacs.

  The other important point about (document ...) is that
  usually, you never see it in the Edit source tree mode,
  nor in the markup. The reason why it is so is because
  it is always *combined* with other markups : for example
  with <body|...>. If you input a <body|...> markup in
  TeXmacs and go in Source mode just afterwards, you will
  see :
  [[
     <body|
        >
  ]]

  , instead of the more expected :
  [[
     <body|>
  ]]

  This is because **IN FACT**, the **REAL** markup that
  has been inserted is :
  [[
     <body|<document|>>
  ]]

  This thing is extremely important, because in doing your
  path calculations, you must **OF COURSE** take into account
  the intermediary <document|> tag ! For example, the (local)
  path to get the "A" inside the markup below :
  [[
     <body|
        A>
  ]]

  is '(0 0) and not simply '(0), because the markup you
  have in hand is in fact :
  [[
     <body|<document|A>>
  ]]

  The presence (or the absence) of an intermediary <document>
  markup is also the reason why the "\" and "/" symbols appear
  in the .tm files. The markup :
  [[
     <body
        A>
  ]]

  is written as :
  [[
     <\body>
         A
     </body>
  ]]

  in a .tm file, while more simple markups, for
  example <underline|A>, would be written the
  same when serialized in a .tm file.


I've read with great interest recent discussion between Lionel and Henri
but I must admit I'm a bit lost in the string successive escapes. ;)

The conclusion is simple : if the symbols "<" and ">" appear in
the text of your document (namely, like in "a<b", for example),
then you must translate them either to "<less>" / "<gtr>" if
you use the TeXmacs tree API, or directly to "\<less\>" / "\<gtr\>"
if you are generating a TeXmacs file.

The problem is that the practice is made tricky because it is
not very clear where TeXmacs himself does additional translations.

Currently, this is perhaps not a big problem for you, you
probably don't need to consider immediately these particular
cases if you want to implement the 1s shot of your literate
programming tool.


Best, Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]