emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[O] Citation syntax: a revised proposal


From: Richard Lawrence
Subject: [O] Citation syntax: a revised proposal
Date: Sat, 14 Feb 2015 18:29:05 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Hi everyone,

Since discussion seems to have petered out on the previous thread (see:
http://thread.gmane.org/gmane.emacs.orgmode/94524), I took some time to
go back over the discussion and write up a concrete proposal for
citation syntax.

This proposal represents my attempt to formulate a syntax that is easy
to read, easy to parse, and covers all the use-cases that people
mentioned as being important.  It is surely not perfect, but I learned a
lot from the previous thread, and I hope something like this will serve
the community's needs.

The proposal is below, both inline (for easy quoting) and attached (for
easy reading).  To keep it relatively short, I have mostly not explained
my reasoning for the choices I made, but I am happy to do so here if
anyone has questions.

I welcome feedback, comments, criticisms, and objections on any point.
However, since we've already had a long discussion about this, I
respectfully request that we try to keep this thread focused.  To that
end, I suggest:

  1) If you have criticisms or objections, please try to indicate
     whether you think they are `substantive' (e.g., you see a problem
     that would prevent you from using this syntax, or prevent Org from
     implementing it) or not (e.g., you would prefer a slightly
     different but equivalent way of expressing something).

  2) If you wish to express an opinion about the proposal without
     offering further comments, let us know by just replying with +1
     (meaning you'd like to see this syntax, or something reasonably
     similar to it, be adopted), 0, or -1 (meaning you'd prefer not to
     see this syntax or anything similar to it adopted).

I guess this is my Valentine to the Org community. :) Thanks for reading!

Best,
Richard

#+TITLE: Citation syntax, a revised proposal
#+DATE: <2015-02-14 Sat>
#+AUTHOR: Richard Lawrence
#+EMAIL: address@hidden
#+LANGUAGE: en
#+SELECT_TAGS: export
#+EXCLUDE_TAGS: noexport

* Citation syntax
** Requirements
A citation is a textual reference to one or more individual works,
together with other information about those works, grouped together in
a single place.  

Within a citation, each reference to an individual work needs to be
capable of containing:
  1) a database key that references the cited work
  2) prefix / pre-note
  3) suffix / post-note
     
Whole citations also need:
  4) address@hidden a way of specifying whether the citation is in-text or
     parenthetical
  5) a way of representing a common prefix and suffix, if the citation
     is a multi-cite
  6) a way of specifying whether the citation should produce a
     complete bibliography entry in-place
  7) an extensible way of specifying formatting properties to export
     filters and/or specific export backends 
     
** Citation definitions
*** Citation keys; bibliography references vs. complete entries 
A citation key consists of a unique label preceded by a flag, which is
optionally preceded by a hyphen.

The flag is either `@' or `&'.  `@' indicates that the citation should
produce a normal reference to the bibliography entry for the cited
work (in whatever style the document uses), located elsewhere.

The `&' flag indicates that the citation should produce a complete
bibliography entry for the cited work in the place where the citation
appears.

The optional hyphen (`-') indicates that the author's name should be
suppressed from the rendered citation.  (Note that this is only useful
in author-X citation styles; it should have no effect in numeric
styles.)

*** Basic citations: Parenthetical vs. in-text 
There are two basic types of citation: /parenthetical/ and /in-text/.
Each of these may contain references to one or more individual works.

The difference between parenthetical and in-text citations is
expressed using parentheses around the /first/ citation key.  A
parenthetical citation has such parentheses around the first citation
key; an in-text citation lacks them.  (Parentheses around non-initial
keys are permitted for visual consistency and to keep the grammar
simple, but have no meaning.)

A citation thus consists in general of a bracketed list, beginning
with `cite:', of one or more individual references, each of which:
  - may contain a prefix,
  - must contain a citation key, which may or may not be surrounded by `(...)'
  - and may contain a suffix
Individual references are separated by semi-colons.

There are also two special cases to make simple-but-common uses very
easy to type and read:
  1) a parenthetical citation for a single work with no prefix and
     suffix may be written by just surrounding the key with brackets,
     like: address@hidden
  2) an in-text citation for a single work with no prefix and suffix
     may be written as a /bare/ key, without brackets, like: @Doe99.
(Thus, in both of the `simple' cases, one less level of bracketing is
required.)

Prefix and suffix text are regular Org text, which are allowed to
contain various kinds of Org markup (see the grammar below for a
complete list).

*** Multi-cite citations 
Multi-cite citations are distinguished from basic parenthetical and
in-text citations by the presence of an optional common prefix or
common suffix (which may not contain keys).  If present, the common
prefix must occur before the first individual reference, and the
common suffix must occur after the last individual reference.  The
common prefix and suffix are separated from the individual references
by semi-colons.

*** Examples of main citation syntax
Basic parenthetical citation:
#+BEGIN_QUOTE
The nineteenth century was very interesting. [cite: (@Doe99)]
#+END_QUOTE

Basic parenthetical citation using special-case syntax:
#+BEGIN_QUOTE
The nineteenth century was very interesting. address@hidden
#+END_QUOTE

Parenthetical citation with multiple works and prefix and suffix:
#+BEGIN_QUOTE
The nineteenth century was in fact lovely [cite: see (@Doe99) p. 44;
@Smith2000 has a review].
#+END_QUOTE

Basic in-text citation with a suffix:
#+BEGIN_QUOTE
As [cite: @Doe99 p. 44] says, the nineteenth century was very interesting.
#+END_QUOTE

In-text citation using special-case syntax:
#+BEGIN_QUOTE
@Doe2000 explains that the twentieth century was even more interesting. 
#+END_QUOTE

In-text citation with author suppressed:
#+BEGIN_QUOTE
As Doe explained in his address@hidden, the twentieth century was somewhat
less interesting than previously thought.
#+END_QUOTE

Parenthetical citation with full-entry key:
#+BEGIN_QUOTE
A complete bibliography entry follows in parentheses. [cite: (&Doe99)]
A complete bibliography entry follows in parentheses. [&Doe99]
#+END_QUOTE

In-text citation with full-entry key:
#+BEGIN_QUOTE
A complete bibliography entry follows: [cite: &Doe99].
A complete bibliography entry follows: &Doe99.
#+END_QUOTE

Full-entry in-text citation, in a footnote:
#+BEGIN_QUOTE
Doe exhibits unusual scholarship.[fn:: &Doe99.]
#+END_QUOTE

In-text citation, with a complete bibliography entry minus the author
in a footnote, plus a suffix:
#+BEGIN_QUOTE
@Doe99 exhibits unusual scholarship.[fn:1]

[fn:1] [cite: -&Doe99 Cf. especially section 4.]
#+END_QUOTE

In-text multi-cite:
#+BEGIN_QUOTE
Speculation abounds about what the twenty-first century will
bring. [cite: For an overview of this topic, see; @Smith1998;
@Jones1999; @Miller2001; and references therein.]
#+END_QUOTE

Parenthetical multi-cite:
#+BEGIN_QUOTE
Speculation abounds about what the twenty-first century will
bring. [cite: For an overview of this topic, see; (@Smith1998);
@Jones1999; @Miller2001; and references therein.]
#+END_QUOTE

*** Syntax for extensions 
Additional information can be supplied in a citation that may affect
how export filters or particular backends format it.

This additional information may be supplied following the brackets of
a citation between the following delimiters: `%%( ... )'.

(Note: I am proposing that this expression go /after/ the main
citation brackets both because it visually separates this extra
information from the main citation, and in order to avoid imposing any
further syntactic restriction on suffixes.)

At least for now, any information supplied this way is /strictly the
user's responsibility/ to interpret (e.g., using an export filter).
This means that citations that have information like this are not
portable and might not be exported correctly:
  - in other users' setups
  - by particular backends
  - by future versions of Org

I will not deal with the details of how this additional information
should be syntactically represented, since this has not really been
discussed.  But I suggest that, to deal with the complexities of
additional information in full generality, something like a complete
Lisp list is required.  Thus, I suggest that this additional
information simply be represented as a Lisp list.  (Besides
generality, this has the benefit of making the syntax easy to parse:
the parser can just call Elisp's read function with a marker after the
`%%'.)

I provide these examples merely to illustrate the possibilities here:
#+BEGIN_QUOTE
@vonNeumann1930 %%(:type genitive :capitalize t) model can only handle
a limited range of observed cases.

@McCarthy1950 %%('s) clever use of Lisp syntax was also used to
express the Saxon genitive.

For more, see Ref. @Doe99 %%(:type refnum :follow-to "some.pdf").

Even more complicated examples occur after Doe's famous article from
[cite: @Doe99] %%(:type date-only).

And in [cite: @Doe2000] %%(:attr_latex (:format-string
"\citeyear{%KEY}") :attr_html (:only-fields (month year))), Doe
finally realized that arbitrary complexity was a powerful but
double-edged sword.

@_aParticularlyUGLYkey:is-this-one %%(:overlay "Nice Display")
#+END_QUOTE 

** Grammar
This section formally documents the syntax of citations discussed
above.  

To represent the syntax of citations, we need a category of /citation/
objects, which require the following properties (the names here are not
important and could be changed):
  - is-parenthetical (boolean; nil means is in-text)
  - common-prefix (text)
  - common-suffix (text)
  - references (list)
  - extra-info (list)

Each reference in the list of references should be a plist with the
following properties:
  - prefix (text)
  - suffix (text)
  - key (string)
  - is-parenthesized (boolean; t means key was parenthesized; only
    significant for the first reference in a citation)
  - suppress-author (boolean; t means author name should not be output)
  - is-full (boolean; t means a full bibliography entry should be
    output in-place) 

The category of citations has the following grammar:
  - A CITATION is a PARENTHETICAL-CITATION or an IN-TEXT citation.
  - A PARENTHETICAL-CITATION is either a SIMPLE-PARENTHETICAL or a
    CITATION-LIST whose first individual INDIVIDUAL-REFERENCE is a
    PARENTHESIZED-KEY
  - An IN-TEXT-CITATION is either a SIMPLE-IN-TEXT, or a
    CITATION-LIST whose first INDIVIDUAL-REFERENCE is a BARE-KEY.
  - A SIMPLE-PARENTHETICAL is a KEY immediately surrounded by square
    brackets, optionally followed by an EXTRA-INFO clause.
  - A SIMPLE-IN-TEXT is a BARE-KEY, optionally followed by an
    EXTRA-INFO clause
  - A CITATION-LIST has the format
       [cite: PREFIX; INDIVIDUAL-REFERENCE; ... INDIVIDUAL-REFERENCE; SUFFIX] 
EXTRA-INFO
    where the initial PREFIX, final SUFFIX, and EXTRA-INFO clause are
    optional.  At least one INDIVIDUAL-REFERENCE must be present. 
  - An INDIVIDUAL-REFERENCE has the format: 
       PREFIX KEY-MAYBE-PARENS SUFFIX
    The KEY-MAYBE-PARENS is obligatory, and the prefix and suffix
    are optional.
  - A KEY-MAYBE-PARENS is either a BARE-KEY or PARENTHESIZED-KEY
  - A BARE-KEY is a KEY with immediately-preceding whitespace
  - A PARENTHESIZED-KEY is a KEY immediately surrounded by `(' and `)'.
  - A KEY optionally begins with `-', and obligatorily contains `@' or
    `&' followed by a string of characters which begins with a letter
    or `_', and may contain alphanumeric characters and the following
    internal punctuation characters:
       :.#$%&-+?<>~/
  - A PREFIX or SUFFIX is arbitrary text (except `;', `]', and
    KEY-MAYBE-PARENs) which may contain only the following Org
    objects:
    - bold
    - code
    - entity
    - italic
    - latex-fragment
    - line-break
    - strike-through
    - subscript
    - superscript
    - underline
    - superscript
    (Note that this list could be extended somewhat if necessary.)
  - An EXTRA-INFO clause consists of data not specified by this
    grammar, in between `%%(' and `)'

** Outstanding issues
It seems to me that there are potential problems with the above
proposal in a number of areas, but I cannot tell how serious they are,
or what changes (if any) should be made to solve them.  I don't
pretend that this is an exhaustive list:
  1) *Nesting.*  I have favored LaTeX compatibility for in-text
     citations with multiple references; but this means there is no
     way to `nest' citations.  Thus, there is no way to express (in
     the main syntax) what Pandoc expresses as:
        @Doe99 [p. 34; see also @DoeRoe2000]
     which renders like:
        Doe (1999, p. 34; see also Doe and Roe 2000)
     Instead, since a citation is in-text or parenthetical as a whole,
     the equivalent in the above syntax
        [cite: @Doe99 p. 34; see also @DoeRoe2000]
     should render like:
        Doe (1999, p. 34), see also Doe and Roe (2000).
     I am not certain if Pandoc-like output is important in this case.
     The few people who commented on this said that it was not. 
  2) *Limitations on prefixes and suffixes.*  There may be legitimate
     uses of `@', `;', `]', etc. inside prefix or suffix text that the
     above syntax does not allow.  Examples might include:
     - use of semi-colons as part of the prefix/suffix text
     - footnotes, links, or timestamps inside a prefix/suffix
     I am not certain how important these cases are.  If they are
     important, some of them might be able to be worked around with
     entities.
  3) *Edge cases.* The above syntax may make it possible to express
     things that don't make sense, or would be too difficult to
     export.  The only one I can think of is that it is possible to
     mix `@'-style and `&'-style keys in the same citation.  I am not
     sure if this should be forbidden; it may sometimes make sense.
     It may also be possible to express things that external tools,
     such as citeproc-js, don't know how to process.  I do not have a
     good sense of what, if anything, falls into that category, and
     what should be done about it.
  4) *Citation commands.*  Rather than introduce an explicit
     representation for different citation commands/types, I have used
     different parts of the syntax to express the common distinctions
     that people mentioned.  I suggest that, for now, anything beyond
     these basic distinctions be left to the user-extension syntax.
     However, if it becomes clear in the future that there is a need
     to add a representation for a command to the main syntax, there
     is a natural place to do so: immediately after the `cite:' tag
     (as Nicolas suggested).

Also, I have not said anything in this proposal to address how other
document metadata should be represented, which has not been discussed
much on the list.  I think this should be discussed separately.
      

Attachment: citation-syntax.org
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]