gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: doc formats


From: Thomas Lord
Subject: [Gnu-arch-users] Re: doc formats
Date: Sat, 21 Jan 2006 08:53:40 -0800

    Thomas> Before replying to Miles and Matthew: If you want a good
    Thomas> example of wiki-style markup gone bad, [see Wikipedia]

  Stephen> The claim of the XML proponents on python-dev is that this is
  Stephen> inevitable.  I'm sympathetic to that claim, having seen the 
  Stephen> devices proposed for reST to address attaching semantic 
  Stephen> information to plain text.

(Incidentally, I have turned into a (reluctant) XML proponent however
I am a proponent of the abstract data structure, the typing systems,
the query languages, transform languages, and at least some parts of
the namespace management stuff.  I am not a proponent
of the syntax other than as an exchange format in cases where humans 
aren't dealing with the streamed form other than for debugging 
purposes.  I say reluctant because the abstract data structure has
too many arbitrary quirks for my tastes.)

The craziness exemplified by the Thomas Jefferson entry on Wikipedia
is not inevitable although when your DTD's grow very large you
will wind up with things that look more like conventional markup (albeit
still nicer than XML).   We can see this non-inevitability in a few
different ways:

* First: is there a nicer (closer to plain-text, more directly legible)
  simple syntax for XML in general?   

  Of course there are many.  It is odd to hear of people in the Python 
  community, Python being a language that eschews braces in favor of 
  indentation, argue otherwise.

  My Awiki syntax degenerates to a couple of fully general XML syntaxes
  for constructs that the grammar table don't treat specially.  One of
  those syntaxes uses indentation similarly to Python as in this excerpt
  from an XSLT script:

    .xsl:template ~match: /page

      .xsl:param ~name: depth
                  select: 0

      .xsl:apply-templates
        .xsl:with-param ~name: depth
                         select: $depth

  One wouldn't want to regularly type bibliographies that way but it's
  actually rather refreshing to write XSLT that way instead of standard
  XML syntax.   Specialized fragments like that look fine when mixed
  with more _familiar_ kinds of markup.

* Second: is there a smooth path from common-case mark-ups to the 
  fully general ones?  Are there intermediate stages that feel
  more like technical typing, based on just a few rules, instead of
  an ad-hoc mishmash or a nightmare of angle brackets?

  There is a display box on the Thomas Jefferson page for which
  the Wiki source _mostly_ fails to provide useful semantic markup.
  Their Wiki source begins:
 
   {{start box}}
      {{succession box| title=[[List of Governors of Virginia|Governor
of Virginia]] | before=[[Patrick Henry]] | after=[[William Fleming]] |
years=[[1779]] – [[1781]]}}
      {{succession box| title=United States Minister Plenipotentiary to
France | before=[[Benjamin Franklin]] | after=[[William Short]] | years=
[[1785]] – [[1789]]}}

  In Awiki, this can come out:

      .successions

        Governor of Virgina

          before: Patric Henry
          after: William Fleming
          dates: 1779 - 1781

        United States Minister of Plenipotentiary to France

          before: Benjamin Franklin
          after: William Short
          dates: 1785 - 1789

  with the grammars transforming that into something isomorphic
  to:

       <successions>
         <position>
           <title>Governor of Virginia</title>
           <before>Patric Henry</before>
           <after>William Fleming</after>
           <dates><from>1779</from>
                  <to>1781</to></dates>
        [...]


One key thing about these examples is that they can be handled by the
same declarative (e.g. table-driven) grammar definition that handles
more *conventional styles of _Wiki-style_ markup*.

Another key thing is that although for some tasks, such as writing
a succession box, the writer has to learn some new rules -- they
are simple rules.   Hollywood screenwriters and paralegal assistants
learn special formatting rules for plain text for screenplays and
court filings -- the Awiki approach to wikis can formalize that.

If the barrier to writing a Wikipedia article were "You must learn
XML syntax and study the DTDs for the kind of article you are writing"
there would likely be fewer articles.   Evidently, though, it is ok to
ask people writing particular kinds of articles to learn the specialized
sub-syntaxes that apply to that class of articles.   Wouldn't it be nice
if (a) the source for the Jefferson article were that legible and (b) it
were straightforward to XQuery for all succession boxes mentioning 
"founding fathers" of the US and cross check them for internal
consistency?

About the ReST:

  Stephen> Don't ASSume.  The "re" in reStructured Text means "again", 
  Stephen> as in Structured Text, version 2. [it doesn't mean "regular
  Stephen> expression"]

You are mistaken.  It is a pun and means both.  See the "History"
section at:

  http://mail.python.org/pipermail/doc-sig/2000-November/001239.html

  Stephen> [hints about ReST's parsing techniques]

Thanks.  That's helpful.  I'll try to get in touch with the ReST author.

  Thomas> [Awiki emulates a hierarchical decomposition hypothesized
  Thomas> to reflect how humans parse plain-text "by eye"]

  Stephen> [No way.  The human visual system processes too much
  Stephen> information to simulate in such a simple-minded way.]

My goals is not to parse an *arbitrary* plain-text as a human would.
Also the amount of visual information in plain-text is very low compared
to what the human visual system can process.   So, your objection is
misplaced (but understandable).

My goals do *not* include error-tolerance in the same way the by-eye
parsing does.  For example, if a script contains some indenting errors
the average reader might not notice and will certainly know what is
meant.   Nevertheless, I draw the line this way:  Awiki (when being
used to render as XHTML, say) is just fine if it can say: "These 50
lines in the middle of the script confuse me.   I've parsed the stuff
before and after just fine but to render these -- here is a warning
box wrapped around the raw source."

  Stephen> Furthermore, at the block level the "eye" we're talking
  Stephen> about is a two-dimensional pattern recognizer.  AFAIK most
  Stephen> parsing theory, and all tools easily available to GNU 
  Stephen> developers, are based on parsing unidimensional streams. 

Right.  That's one reason why I hack Awiki rather than use, say,
lex and yacc.   The "recursive decomposition" I've mentioned, along
with keeping track both column and line positions, gives my parsing 
technique the ability to do two-dimensional pattern recognition well
enough for plain-text texts.   One interesting question is how to 
adapt the technique to other scripts besides ASCII.

-t







reply via email to

[Prev in Thread] Current Thread [Next in Thread]