[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [O] Bug: text export and multi-word link descriptions with line brea
From: |
Mathias Bauer |
Subject: |
Re: [O] Bug: text export and multi-word link descriptions with line breaks |
Date: |
Thu, 3 Apr 2014 18:30:24 +0200 |
Hello Nicolas,
* Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):
> Mathias Bauer <address@hidden> writes:
>
> > I just stumbled over Org's plain text export and how it works on
> > links with descriptions consisting of multiple words and line
> > breaks between them. I'm running Org stable version 8.2.5h.
> >
> > Org source (spaces at the end of line 1 and 2 don't matter):
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> > ...
> > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> > --------------------snip--------------------
> >
> > Text export result:
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
> >
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> >
> > [RFC 2440] https://tools.ietf.org/html/rfc2440
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> > --------------------snip--------------------
> >
> > These multiple references look quite bad. Is it possible to
> > "normalize" the descriptions in some way *before* checking
> > them for uniqueness and output them thereafter?
>
> Could you be more explicit? What does look quite bad? What did
> you expect instead? How is related to line breaks in the
> descriptions?
Ok, let's go into more details. See the Org source text:
1. There are three links and each of them appears twice. The
link targets of every two of them are identical.
2. Each of the two "[...][RFC 2440]" links appear in one line; the
links "[...][RFC 4880]" and "[...][RFC 1991]" each have a
newline in their description. They are in fact
"[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively,
"[...][RFC\n1991]" and "[...][RFC 1991]".
So, now let's examine the Org text export:
The final reference part - the five links below the paragraph -
shows two links, [RFC 4880] and [RFC 1991], which appear twice
but the link [RFC 2440] appears only once there.
This is, at least, inconsistent.
The point is, that Org obviously considers "[...][RFC 4880]" and
"[...][RFC\n4880]" as being two different links internally and
list both of them in the reference part. For this listing, the
\n is removed. This is, what I called "normalization" in my
first post.
Human eyes, however, won't see any difference between this two
forms and start being surprised.
I expect, Org to do the following steps while parsing the source
text:
1. "Normalize" or clean the link description, i.e. remove any
newlines, starting and trailing spaces, and replace any
occurrences of "[ \t]+" in the interior by a single space
only. (To be done.)
2. Check the tuple (description,target) for duplicates and drop
them. (Seems ok to me.)
3. Below the paragraph list the tuples as "[description] target"
in the order of occurrence in the original text. (Also seems
ok to me.)
I hope this makes this issue a little bit more clear now.
Kind regards,
Mathias