emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Citations, continued


From: Richard Lawrence
Subject: Re: [O] Citations, continued
Date: Mon, 02 Feb 2015 10:02:41 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Hi all,

Here is the citation syntax proposal I have mentioned in a couple of
posts now.  I have attached it as an Org document for better
readability, and also reproduced the text below.  Let me know what you
think!

Best,
Richard


#+TITLE: A Proposal for Org citation syntax
#+AUTHOR: Richard Lawrence

* Introduction
In brief, the proposal is:

1. Use the Pandoc syntax for basic, inline citations.
2. Extend the Pandoc syntax modestly to accommodate backend-agnostic
   formatting of inline citations.
3. Also allow non-inline citation definitions, with a syntax
   comparable to non-inline footnotes, to accommodate
   backend-specific formatting.

Basing this proposal on the Pandoc syntax is a `merely practical'
choice.  It might not be the most Org-like, and it might produce too
much conceptual divergence between citations and links.  But it is a
syntax that is already well-tested and known to work elsewhere, and
which has easily-scriptable tools for processing it (namely, Pandoc's
own), which Org users could rely on in the meantime, while Org's own
implementation of this syntax catches up.

Beyond the features provided by the basic Pandoc syntax, I have tried
to ensure that the other features are as Org-like as possible, are
already in use in Org documents, and (I hope) could be implemented
with minimal work.

* Citation syntax
(I repeat the list of requirements I posted earlier here, for easy
reference; so far, I don't think anyone has suggested we need any
others.)

A citation is a textual reference to one or more individual works,
together with other information about those works, grouped together in
a single place.

Within a citation, each reference to an individual work needs to be
capable of containing:
1. a database key that references the cited work
2. prefix / pre-text
3. suffix / post-text
4. references to page/chapter/section/whatever numbers and ranges.
   This is likely part of the prefix or suffix, but might be worth
   parsing separately for localization or link-following behavior.
5. a way of indicating backend-agnostic formatting properties.
   Examples of some properties users might want to specify are:
   - displaying only some fields (or suppressing some fields) from a
     reference record (e.g., journal, date, author)
   - indicating that the referenced works should *only* appear in
     the bibliography of the exported document (equivalent of LaTeX
     \nocite)

Citations as a whole also need:
6. address@hidden a way of indicating formatting properties for specific export
   backends.  Examples of some properties that users might want to
   specify are:
   - a citation command to use for each individual reference (LaTeX;
     others?)
   - a multi-cite command to apply to all references together
     (LaTeX)
   - CSS or other styling class (HTML and derived backends; also
     ODT?)
   - properties describing how to treat emphasis and other
     formatting that cannot appear in plain text (ASCII and other
     plain text backends)

** Starting point
I assume, to start, the basic Pandoc [ ... @key1 ...; ... @key2 ...]
syntax for inline citations, documented 
[[http://pandoc.org/README.html#citations][here]].  This defines a syntax
for inline citations that allows grouping multiple individual
references together between brackets, with semicolons as separators.

Previous discussions have suggested beginning citation definitions
with a tag, like [cite: ...] or [citation: ...], by analogy with
footnotes and links.  As far as I can see, the tag doesn't really
provide any advantages for inline citations, and is just unnecessary
markup.  This is because the syntax of citations is (or should be)
more constrained than footnotes or links; a citation is already
recognizable, and parseable as such, by the required presence of a
reference key.  The tag would also immediately break compatibility
with the basic Pandoc syntax if it were required for inline citation
definitions, a result which I am trying to avoid in this proposal.

A syntax for /non-inline/ citation definitions, however, comparable to
the syntax for footnotes, would make good use of such a tag.  This is
what I propose below.

** Backend-agnostic formatting properties
*** Selecting specific fields
Selecting specific fields to display could be done by appending field
names to cite keys after colons, much like Org tags:
#+BEGIN_QUOTE
[See @Doe99, pp. 34--45; also @Doe00:year, section 6] 

[See their article in @Doe99:journal:year.] 
#+END_QUOTE
Note that this would make for an extension of Pandoc syntax.  This
extension is not a strict superset, since Pandoc allows internal `:'
characters in cite keys, and thus would treat address@hidden:journal:year' as
a single key, rather than treating the key as ending at the first
colon, with other data afterward.  (More compatible but uglier
alternatives for the field selector include `!', `{', `}', and `^'.
If an alternative is desired, I suggest address@hidden,year}'.)

When specific fields are requested, ONLY data from those fields should
appear in the exported document.  Backends would choose how to export
these citations based on the selected fields.

I would think the default behavior during export should be something
like: get the reference record from the database, then pass it and a
list of the requested fields to a user-customizable function which is
expected to return a string to insert in the output.  (The default
function could, say, intersperse the requested field data with
whitespace and add parentheses.  More sophisticated functions could
rely on external tools to format the citation using the Citation Style
Language.)  Of course, this assumes that the exporter has a way of
querying the reference database, which would be fine for bibtex and
org-bibtex databases, but may not be a good assumption in general.

Specific backends could also do something different with field
selectors when it makes sense to do so.  For example, the LaTeX
backend could choose \citeyear as the command to place in the exported
document when just `:year' is requested in the citation.

*** Non-cited works that should appear in the bibliography
A special field selector `:nocite' would be one way to achieve
citations that, for whatever reason, should appear in the Org source
and in the exported bibliography, but should not appear in the
exported text where they are placed.  This would allow referencing
them at relevant places in the document, like:
#+BEGIN_QUOTE
Smith said a lot of things, but no one can remember what they
were. address@hidden:nocite]
#+END_QUOTE

One drawback of this syntax is that it does not provide an easy way to
list all the nocite references, since the user would have to add
`:nocite' to each one individually.  This is not a huge problem for
small numbers of refernces, but it would also be nice to have some
equivalent of LaTeX's \nocite{*}.  On this point, see the proposal for
non-inline citation definitions below.

** Non-inline citation definitions and backend-specific formatting
The syntax proposed above assumes citations are defined inline.  A
complementary alternative would be to treat citations like
(non-inline) footnotes, with an inline reference and a definition
elsewhere in the document.  This could be convenient for citations
that have lots of pre- or post-text.

In that case, a citation could look like:
#+BEGIN_QUOTE
    Doe provides an interesting analysis. [cite:1]

    ...

    * Citations

    [cite:1] See @Doe99, pp. 34--45; also @Doe2000:year, ch. 1.
#+END_QUOTE
That is, a citation /pointer/ would occur inline in the document text,
which refers (via a number or a label) to a citation /definition/ in a
specially-named subtree.  The definition begins by repeating the
pointer, and has the same syntax as proposed above, minus the
enclosing square brackets.

This approach could peacefully coexist with the above proposal for
inline citations, in the same way that inline and non-inline footnote
definitions now peacefully coexist.  

*** Backend-specific formatting
In general, it would be nice to avoid formatting properties which are
specific to a particular export backend when a backend-agnostic
solution is available, but some backend-specific formatting needs are
probably inevitable, so we need a syntax for specifying them.

Another advantage of the non-inline citation syntax is that it would
allow using the existing #+ATTR_BACKEND syntax to specify
backend-specific formatting properties, since the citation definitions
would be block-level elements:
#+BEGIN_QUOTE
    * Citations

    #+ATTR_LATEX: :command citet
    #+ATTR_HTML: :class my-citation
    [cite:1] See @Doe99, pp. 34--45; @Foobar2000, ch.1.
#+END_QUOTE
This automatically makes the syntax readily extensible as new needs
come up and target formats evolve.

(Originally, I had thought about how to extend the inline citation
definition syntax above to include backend-specific formatting
information.  But everything I came up with seemed pretty ugly, and
not worth the extra syntax it would require.  When I realized that
non-inline definitions could leverage the existing syntax for
backend-specific properties, I tossed that part of the proposal,
though I'm happy to share it if anyone wants to see it.)

Thus, I propose that, for authors who /need/ backend-specific
formatting, this should be the way to do it.  The above inline
citation syntax should remain limited to uses where no
backend-specific behavior is required.

Note however that there is a tension here with the proposal above for
backend-agnostic field selectors.  I am not sure what should happen
if, say, the user selects individual fields in the citation but also
requests an incompatible citation command for a particular backend.

*** Bibliography-only entries
Non-inline definitions would also provide a convenient place to list
non-cited references that should appear in the bibliography.  For
example:
#+BEGIN_QUOTE
    * Citations
    ...
    [nocite:] @Doe99; @Foobar2000; @Baz98.
#+END_QUOTE
As a special case,
#+BEGIN_QUOTE
    * Citations

    [nocite:*] 
#+END_QUOTE
could introduce bibliography entries for everything in the reference
database.

* Document metadata
In addition to the syntax of citations themselves, the Org document
would also need to represent the following metadata to support
citations:
7. address@hidden a pointer to one or more backend reference databases,
   including in-document databases in org-bibtex format
8. a reference to a citation style or style file
9. a reference to a locale file
10. an indication of where the bibliography should be found in the
    exported document (equivalent to \printbibliography, etc. in
    LaTeX)

** #+BIBLIOGRAPHY: reference database, style, locale
The #+BIBLIOGRAPHY keyword already exists, in ox-bibtex.el (in
contrib), though its current syntax does not quite meet all the above
needs.  I suggest changing the syntax to support in-file databases and
a locale file.

The point of specifying the style and locale as part of
the #+BIBLIOGRAPHY definition is for compatibility with both LaTeX and
Citation Style Language bibliography and citation formatting.

In keeping with other metadata keyword lines (like #+OPTIONS), I
suggest a key:value syntax for the arguments to #+BIBLIOGRAPHY, like
so:
#+BEGIN_QUOTE
#+BIBLIOGRAPHY: db:/path/to/some/file.bib style:chicago

#+BIBLIOGRAPHY: db:/path/to/some/file.bib style:plain locale:en_GB

#+BIBLIOGRAPHY: db:"*Reference DB"
#+END_QUOTE
In the last example, the leading "*" is meant to indicate that the
reference database is a subtree with headline "Reference DB", whose
branches are in org-bibtex format.

By specifying where the reference data is (and implicitly what format
it is in, e.g., via the file extension), link-following and export
behavior for citations can differ depending on the format of this
database.  For example, if the database is a .bib file, `following' a
citation key could mean finding the corresponding entry in this file.
If the database is an in-document tree in org-bibtex format, following
a key could mean jumping to the headline whose :CUSTOM_ID: property
agrees with that key.

Likewise, if the database is in a format that the exporter knows how
to read, then export backends could potentially look up information
from it to create bibliography entries and citations in the exported
document, possibly relying on an external tool (like citeproc-*) to
transform them into the requested style.  This would be particularly
useful for non-LaTeX backends (which is what ox-bibtex.el focuses on
at the moment).

** Bibliography placement
The other issue is that Org documents must say where the bibliography
should appear in exported documents.

A reasonable default would be placing the bibliography at the end of
the document.  But some documents, in particular long ones, may need
more flexibility in specifying where to place the bibliography.

The simplest solution seems to be just allowing the #+BIBLIOGRAPHY
keyword to appear anywhere in the document, to be replaced on export
with the formatted bibliography.  I think this is what ox-bibtex now
does.  

Attachment: proposal.org
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]