gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: doc formats (Miles' Awiki overview request)


From: Thomas Lord
Subject: [Gnu-arch-users] Re: doc formats (Miles' Awiki overview request)
Date: Sun, 22 Jan 2006 12:01:34 -0800

Miles:
> Do you have a short overview of "awiki" somewhere?

No.  So here's a crude one for an audience of hackers (i.e., 
not the way you'd teach it to a non-hacker).

* Docs are Trees of Typed, Attributed Nodes

  Like DOM.

  XML has a fully general but heavy syntax for such nodes:

       <TYPE ATTRIBUTE=VALUE ...> ...subtrees... </TYPE>

  A given Awiki grammar is a more baroque, equally general
  syntax for such nodes.

* Recursive Decomposition

  The Awiki parser engine makes multiple passes.   The first
  pass over a document fragment determines the type of the
  root node for that fragment, the attributes for that root
  node, the *sources* for the subtrees of the fragment, and
  the parsing rule to apply to that source to generate 
  subtrees.

  For example, this section would be parsed in the first pass
  to produce:

     .example

        type: section
        attributes: none specified
        title source: "Recursive Decomposition"
        body source: "The Awiki parser [...] `section' node."

  Appropriate parsing rules are then applied to the sources
  to generate the subtrees of the resulting `section' node.

* Grammar Abstraction

  Awiki recognizes that one thing is a parsing rule like:

      Tree nodes are divided by [a certain number of] asterisks
      in column 0 [....]

  and another thing is what kind of tree we mean:

      The "divide by column 0 asterisks rule" is used to divide
      (sub)sections.

  With one exception (see below) there are fixed number of built-in
  parsing rules and those rules are reusable for many different 
  mappings to trees.   In one case, the asterisks rule might divide
  stuff up into `<section>' nodes.   In another case, the asterisks
  might divide stuff up in a completely different way (e.g., `<element>'
  nodes in a grammar specialized for documenting the periodic table.)

  The association between a parsing rule and which rules to apply to
  subtree sources is variable.   Normally, the asterisks rule would
  next parse the source of the body text with, say, the "paragraphs and
  subsections" rule but a grammar could say otherwise.

* Specific Parsing Rules

  What you might want in an overview is the more conventional kind
  of:

        *bold*  => *bold*
        _italics_ => _italics_

  kind of table.

  I don't have such a table for you and, in truth, where I left off I
  was still playing around to find a nice mix of defaults.

  More interesting are the handful of general principles that apply
  across all rules and, alas, I don't have those written up either.
  The rules I was working on doing things like avoiding a need for
  quoting, allowing arbitrary nesting, avoiding gratuitous whitespace
  dependencies, etc.   The general thrust is to make simple things 
  look completely natural and to make complex things easy to get 
  right.   One example:

  Let's suppose that `/foo/' means `<emphasize>foo<emphasize>'.
  What if I want to emphasize the phrase `and/or'?  Well, ok,
  that's handled by whitespace rules so '/and/or/' parses just
  fine if surrounded by whitespace or punctuation.  What if you
  want to emphasize part of an already emphasized text (nesting)?
  I think repetition is a good disambiguator, at least to a 
  certain depth:  `//the key thing is to be /really/ careful//'.
  Of course that gets ugly if taken too far but, mostly one
  never needs to take it that far.

  My little (nascent) set of rules like that add up to something
  that (a) casual users don't really have to know in depth and,
  anyway, (b) is simple enough you could teach it in a class on
  touch typing as an extension to general rules for technical/business
  typing.

* Error Propogation

  If some source (sub)text just doesn't parse then the source itself
  becomes the contents of a simple `<error>' node.

  Grammars can say, of a given node type, whether the type does or
  does not tolerate `<error>' nodes as subtrees.   So, for example,
  an error-tolerant `<section>' node might contain an unparsable
  `<error>' subtree in lieu of a paragraph -- rendering could display
  the section mostly normally but show the errant non-paragraph in
  raw-source form.   On the other hand, if a node can't tolerate error
  subtrees, then the outer parse reverts and the whole thing becomes
  an error -- instead of a `<bibliography-entry>' node in which some
  element of the entry is unparsable, the whole thing would be replaced
  with an `<error>' node.


That's about it.  The art is in two places:  the grammar abstraction
and the handful of principles for writing new parsing rules.  Oh and:

* The Escape

  At the "leafs" of the grammar you can escape into entirely 
  different parsing techniques.   For example, if an Awiki grammar
  were used as the input language for a mathematica-like system
  then certain sub-sources might be parsed by a conventional LALR
  expression parser for mathematical expressions.

-t






reply via email to

[Prev in Thread] Current Thread [Next in Thread]