[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Re: doc formats
From: |
Thomas Lord |
Subject: |
[Gnu-arch-users] Re: doc formats |
Date: |
Sat, 21 Jan 2006 08:53:40 -0800 |
Thomas> Before replying to Miles and Matthew: If you want a good
Thomas> example of wiki-style markup gone bad, [see Wikipedia]
Stephen> The claim of the XML proponents on python-dev is that this is
Stephen> inevitable. I'm sympathetic to that claim, having seen the
Stephen> devices proposed for reST to address attaching semantic
Stephen> information to plain text.
(Incidentally, I have turned into a (reluctant) XML proponent however
I am a proponent of the abstract data structure, the typing systems,
the query languages, transform languages, and at least some parts of
the namespace management stuff. I am not a proponent
of the syntax other than as an exchange format in cases where humans
aren't dealing with the streamed form other than for debugging
purposes. I say reluctant because the abstract data structure has
too many arbitrary quirks for my tastes.)
The craziness exemplified by the Thomas Jefferson entry on Wikipedia
is not inevitable although when your DTD's grow very large you
will wind up with things that look more like conventional markup (albeit
still nicer than XML). We can see this non-inevitability in a few
different ways:
* First: is there a nicer (closer to plain-text, more directly legible)
simple syntax for XML in general?
Of course there are many. It is odd to hear of people in the Python
community, Python being a language that eschews braces in favor of
indentation, argue otherwise.
My Awiki syntax degenerates to a couple of fully general XML syntaxes
for constructs that the grammar table don't treat specially. One of
those syntaxes uses indentation similarly to Python as in this excerpt
from an XSLT script:
.xsl:template ~match: /page
.xsl:param ~name: depth
select: 0
.xsl:apply-templates
.xsl:with-param ~name: depth
select: $depth
One wouldn't want to regularly type bibliographies that way but it's
actually rather refreshing to write XSLT that way instead of standard
XML syntax. Specialized fragments like that look fine when mixed
with more _familiar_ kinds of markup.
* Second: is there a smooth path from common-case mark-ups to the
fully general ones? Are there intermediate stages that feel
more like technical typing, based on just a few rules, instead of
an ad-hoc mishmash or a nightmare of angle brackets?
There is a display box on the Thomas Jefferson page for which
the Wiki source _mostly_ fails to provide useful semantic markup.
Their Wiki source begins:
{{start box}}
{{succession box| title=[[List of Governors of Virginia|Governor
of Virginia]] | before=[[Patrick Henry]] | after=[[William Fleming]] |
years=[[1779]] – [[1781]]}}
{{succession box| title=United States Minister Plenipotentiary to
France | before=[[Benjamin Franklin]] | after=[[William Short]] | years=
[[1785]] – [[1789]]}}
In Awiki, this can come out:
.successions
Governor of Virgina
before: Patric Henry
after: William Fleming
dates: 1779 - 1781
United States Minister of Plenipotentiary to France
before: Benjamin Franklin
after: William Short
dates: 1785 - 1789
with the grammars transforming that into something isomorphic
to:
<successions>
<position>
<title>Governor of Virginia</title>
<before>Patric Henry</before>
<after>William Fleming</after>
<dates><from>1779</from>
<to>1781</to></dates>
[...]
One key thing about these examples is that they can be handled by the
same declarative (e.g. table-driven) grammar definition that handles
more *conventional styles of _Wiki-style_ markup*.
Another key thing is that although for some tasks, such as writing
a succession box, the writer has to learn some new rules -- they
are simple rules. Hollywood screenwriters and paralegal assistants
learn special formatting rules for plain text for screenplays and
court filings -- the Awiki approach to wikis can formalize that.
If the barrier to writing a Wikipedia article were "You must learn
XML syntax and study the DTDs for the kind of article you are writing"
there would likely be fewer articles. Evidently, though, it is ok to
ask people writing particular kinds of articles to learn the specialized
sub-syntaxes that apply to that class of articles. Wouldn't it be nice
if (a) the source for the Jefferson article were that legible and (b) it
were straightforward to XQuery for all succession boxes mentioning
"founding fathers" of the US and cross check them for internal
consistency?
About the ReST:
Stephen> Don't ASSume. The "re" in reStructured Text means "again",
Stephen> as in Structured Text, version 2. [it doesn't mean "regular
Stephen> expression"]
You are mistaken. It is a pun and means both. See the "History"
section at:
http://mail.python.org/pipermail/doc-sig/2000-November/001239.html
Stephen> [hints about ReST's parsing techniques]
Thanks. That's helpful. I'll try to get in touch with the ReST author.
Thomas> [Awiki emulates a hierarchical decomposition hypothesized
Thomas> to reflect how humans parse plain-text "by eye"]
Stephen> [No way. The human visual system processes too much
Stephen> information to simulate in such a simple-minded way.]
My goals is not to parse an *arbitrary* plain-text as a human would.
Also the amount of visual information in plain-text is very low compared
to what the human visual system can process. So, your objection is
misplaced (but understandable).
My goals do *not* include error-tolerance in the same way the by-eye
parsing does. For example, if a script contains some indenting errors
the average reader might not notice and will certainly know what is
meant. Nevertheless, I draw the line this way: Awiki (when being
used to render as XHTML, say) is just fine if it can say: "These 50
lines in the middle of the script confuse me. I've parsed the stuff
before and after just fine but to render these -- here is a warning
box wrapped around the raw source."
Stephen> Furthermore, at the block level the "eye" we're talking
Stephen> about is a two-dimensional pattern recognizer. AFAIK most
Stephen> parsing theory, and all tools easily available to GNU
Stephen> developers, are based on parsing unidimensional streams.
Right. That's one reason why I hack Awiki rather than use, say,
lex and yacc. The "recursive decomposition" I've mentioned, along
with keeping track both column and line positions, gives my parsing
technique the ability to do two-dimensional pattern recognition well
enough for plain-text texts. One interesting question is how to
adapt the technique to other scripts besides ASCII.
-t