gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: doc formats


From: Thomas Lord
Subject: [Gnu-arch-users] Re: doc formats
Date: Fri, 20 Jan 2006 08:30:13 -0800

Before replying to Miles and Matthew:

If you want a good example of wiki-style markup gone bad, take
a look at the source for the Wikipedia (en) page on Thomas Jefferson.
Much of it is is basically illegible without poking around quite
far in Wikipedia documentation and comparing it to the source page.
Tragically, various tables and other structured information is not
marked up in a way that allows semantic queries (otherwise, for example,
it would be *trivial* (of course anything is approximately possible)
to format an XQuery over Wikipedia entries for US presidents that 
would produce all kinds of useful tables....

Join me in feeling a little sad that Wikipedia has accumulated so
many articles before taking the issue more seriously.   Note how
it plays out as, for example, "Let's spend a lot of work *now* to 
try to make nice (enough, mostly) printed forms since we didn't
spend a little more work *back then* that would have made the problem
trivial."   Andrew said something about lots of programmers basically
working from muscle memory too often and that leading to results
commonly seen -- here is yet another example.

On Fri, Jan 20, 2006 at 01:56:55PM +0900, Miles Bader wrote:
> matthew hannigan <address@hidden> writes:
>>> Realistically, ReST is probably as close as we'll come.

>> Er, I don't know about that.  There are about 1,783 different
>> "structured text" formats out there, and ReST doesn't seem to be a
>> particularly great one.

> With the requirement is that the plain text / source text is usable/
> presentable on it's own. ie. that it requires almost nothing
> in the way of tags; and structure is inferred from common
> conventions in the text.

Can anyone either point to a document that concisely explains
the parsing theory (or in other terms, the state machine) the
ReST uses?  The name ("Regular expression ...") makes me skeptical
because, having tried this kind of thing both ways, I'm fairly well
convinced that regular expressions have only a very minor role
in such a syntax (so small that, for the few you need, you may as
well hand-code them unless you are working in a language that would
make that too slow.

And, speaking of implementation language, what about that?  I think
it's fine to prototype a doc processor like this in Python (indeed,
the version of Awiki I currently use is written in Python) but
I'd like to see something that is easy to translate into a small
amount of a lower level language, ideally C.   The markup language
should be useful in a wide range of circumstances where people
currently use XML (either directly or via a GUI editor that emits
XML) -- minimizing the cost of the extra layer of translation seems
to me to be important.

Speaking of XML: I understand that ReST is table driven in some way
and therefore presumably highly reconfigurable.   I'd like to see
that more aggressively developed.   I'd like to see
a single parser generic parser engine driven by a declarative 
language spec so that, for example, in one case it's interpreting
markup as short-hand for XHTML but, in other case, it's interpreting
a similar markup syntax as short-hand for some other, quite different
DTD.   Those shifts should even be available in a context-sensitive 
way within a single document.  For example, 

                `Power'

might indicate a variable named "Power" in the body of function
documentation or the title of a book in the bibliography of the
same document.   Similarly the syntax for an item in a list means
one thing in the body of the introduction but something different
in the end-notes section of the same introduction.   (The issue here
is that there is an extreme scarcity natural-looking plain-text
mark-ups but many DTDs we wish to generate.   "By eye" people have
no trouble disambiguating some forms of overloading -- the doc
parser should accomodate that.)

A good rule-of-thumb test might be:  you'll want (one hopes) XSLT
programs for downstream processing of the parsed text.   Is your
parser engine flexible enough that you can write those XSLT scripts
in a wiki-ish style and have the result be easier to read and maintain
than writing them raw XML syntax?

(Awiki is based on a recursive decomposition that is hypothesized
to mirror how a human parses a plain-text document: by recognizing
coarse structure to establish context for smaller fragments then
recursively doing the same thing for smaller fragments.   There are,
for the most part, only a few clean ways to recognize coarse structure
and so the Awiki engine is concerned more with how those are stacked
and made context sensitive and less with how to add new ones.  Awiki
also has some tricks to beat out ambiguities and to rely less on
gratuitous whitespace restrictions, afaict from the ReST docs.)


>> For instance, it seems to randomly, and confusingly, use the same 
>> syntax for different purposes in ways that don't make much apparent 
>> sense,

Can't speak to how well ReST uses overloading but, in general, it 
seems necessary to use overloading somehow.

>> and the table syntax is completely awful (tables are actually ascii-
>> art tables, which are a big pain in the butt to maintain).

> The requirement above pretty much says it has to be ascii art tables.
> Emacs can help with this sort of thing anyway, can't it?

Yes.  I like ascii art tables because they make the source directly
legible.   Ordinary design common sense applies:  there's no reason to
draw elaborate boxes around everything -- very minimal and easy to
type/align mark-up is just fine (and often visually preferable).  The
ReST author has some design notes on-line that indicate he's thought
a lot about this too.

And, yes, Emacs can help.   It shouldn't be that hard to do directly
with VI, either.   If a table is really complicated, use an "outline"
format (as per the ReST author's notes) or keep the table data in
another file in whatever format is most natural and go from there.

Another goal:  for some DTDs that use the wiki-engine it should be
possible to pretty-print the XML and get a very nice looking wiki
version.

-t






reply via email to

[Prev in Thread] Current Thread [Next in Thread]