[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Re: doc formats
From: |
Stephen J. Turnbull |
Subject: |
Re: [Gnu-arch-users] Re: doc formats |
Date: |
Sat, 21 Jan 2006 15:11:47 +0900 |
User-agent: |
Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b24 (dandelion, linux) |
>>>>> "Thomas" == Thomas Lord <address@hidden> writes:
Thomas> Before replying to Miles and Matthew: If you want a good
Thomas> example of wiki-style markup gone bad,
The claim of the XML proponents on python-dev is that this is
inevitable. I'm sympathetic to that claim, having seen the devices
proposed for reST to address attaching semantic information to
plain text.
Thomas> Can anyone either point to a document that concisely
Thomas> explains the parsing theory (or in other terms, the state
Thomas> machine) the ReST uses? The name ("Regular expression
Thomas> ...")
Don't ASSume. The "re" in reStructured Text means "again", as in
Structured Text, version 2.
There is no parsing theory, as far as I know. There is a set of
commonly heuristics, such as the use of underscore to mark links,
analogous to the use by gettext, and asterisks for emphasis, as often
used in email, underlining with hyphens or equality symbols to mark
headers, ASCII art tables, and indentation for block quotes which are
endorsed and formalized. Email-style quoting is also recognized, as
well as prompt> input / (no-prompt) output. Finally there are some
new heuristics such as the use of a doubled colon :: which turns a
following block quote into the equivalent of texinfo @example, and the
use of |id| to create abbreviations. A pretty mixed bag, I think
you'd really have to strain to find a "theory" in there. Certainly
nothing to compare to SGML or LISP syntactic theory. The developers
do recognize the problems that come from using the same character to
mark the beginning and the end of a marked text, but that's about as
far as it goes.
Then there is a comment / explicit markup / extension syntax
introduced by a leading "..". This also admits a "field syntax" which
is essentially RFC 822 headers, with tags marked by trailing and
leading colons. Eg,
.. image:: dont-panic.png
:height: 480
:width: 640
:background: transparent
Thomas> Speaking of XML: I understand that ReST is table driven in
Thomas> some way and therefore presumably highly reconfigurable.
Thomas> I'd like to see that more aggressively developed.
reST is divided into a single front end allowing plugins for extension
syntax based on the field syntax, and back-ends driving markup engines
such as LaTeX, HTML, and I believe Docbook. ISTR that the product of
the front end is indeed a syntax tree, but it's not as powerful as
XML, though it can produce XML. As a consequence, the HTML and LaTeX
produced is far more human-readable than the horrors that, say,
latex2html produces.
Thomas> I'd like to see a single parser generic parser engine
Thomas> driven by a declarative language spec so that, for
Thomas> example, in one case it's interpreting markup as
Thomas> short-hand for XHTML but, in other case, it's interpreting
Thomas> a similar markup syntax as short-hand for some other,
Thomas> quite different DTD.
Attempts to get reST to do this are precisely what the XML advocates
on python-dev are terrified by.
Thomas> (The issue here is that there is an extreme scarcity
Thomas> natural-looking plain-text mark-ups but many DTDs we wish
Thomas> to generate. "By eye" people have no trouble
Thomas> disambiguating some forms of overloading -- the doc parser
Thomas> should accomodate that.)
That's right out, man. The human eye, as in the phrase "by eye",
includes the 1.5 kilos or so of neural matter located at the other end
of the optic nerve. The brain is a shitty parser; it can't even lex
very well. "By eye" is 1% parsing, 99% semantic filtering. In your
case, I'd be willing to go as high as 33%-66% (with 1% left over for
Edisonian inspiration), but no way would I concede that your ability
to recognize inline constructs "by eye" is as much as 50% based on
parsing.
Furthermore, at the block level the "eye" we're talking about is a
two-dimensional pattern recognizer. AFAIK most parsing theory, and
all tools easily available to GNU developers, are based on parsing
unidimensional streams. (Consider the progress of proprietary OCR vs
free OCR programs as symptomatic.)
Thomas> Awiki also has some tricks
Urk! Once you let in a trick or two, there goes the neighborhood!
Thomas> to beat out ambiguities and to rely less on gratuitous
Thomas> whitespace restrictions, afaict from the ReST docs.
The reST whitespace restrictions are artificial; this is not
surprising given it comes from Python developers, and one of its
initial target use cases was Python docstrings.
It's also amusing (interesting but probably not relevant to this
discussion) to note that with an Xft-enabled Emacs I find a reST
source buffer nearly as attractive as a formatted HTML buffer for
slide presentations to general audiences (such as undergraduate
classes). HTML does a better job of placing images and tables are
distinctly more attractive, but it's actually possible to do lectures
in an Emacs buffer the way I would do them on the blackboard, line by
line (of course I use filladapt to handle the indentation, which it
does correctly for reST out of the box).
OTOH, there is no *ML that is satisfactory for technical
presentations yet; it's gotta be TeX.
--
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
- [Gnu-arch-users] Re: doc formats, Thomas Lord, 2006/01/20
- Re: [Gnu-arch-users] Re: doc formats,
Stephen J. Turnbull <=
- [Gnu-arch-users] Re: doc formats, Thomas Lord, 2006/01/20
- [Gnu-arch-users] re: doc formats, Thomas Lord, 2006/01/20
- [Gnu-arch-users] Re: doc formats, Thomas Lord, 2006/01/21
- [Gnu-arch-users] Re: doc formats, Thomas Lord, 2006/01/22
- [Gnu-arch-users] Re: doc formats, Thomas Lord, 2006/01/23