emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[O] Smart Quotes Exporting (Was: Re: (no subject))


From: Mark E. Shoulson
Subject: [O] Smart Quotes Exporting (Was: Re: (no subject))
Date: Thu, 31 May 2012 19:26:36 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

Sorry for messing up the thread subject header; I think I misused gmane's posting.

On 05/31/2012 09:38 AM, Nicolas Goaziou wrote:
Hello,

Mark Shoulson<address@hidden>  writes:

+(defvar org-e-html-quote-replacements
+  '(("fr" "« " " »" "‘" "’" "’")
+    ("en" "“" "”" "‘" "’" "’")
+    ("de" "„" "“" "‚" "‘" "’"))
A docstring will be required for this variable. It should be
a defcustom.

Oh, certainly; they're all a disaster. I think I said that in the writeup at the top. This is just proof of concept, nothing is in the right place, nothing is properly documented. They have to be defcustoms, there needs to be a good :type in the defcustom as well as a proper docstring. You'll get no argument from me about the lack (or inaccuracy) of docstrings and such. I hadn't gotten that far yet. I said the patch was only if you wanted to tinker with the development as this progresses.

+(defun org-e-latex--quotation-marks (text info)
+  (org-export-quotation-marks text info org-e-latex-quote-replacements))
+  ;; (mapc (lambda(l)
+  ;;     (let ((start 0))
+  ;;       (while (setq start (string-match (car l) text start))
+  ;;         (let ((new-quote (concat (match-string 1 text) (cdr l))))
+  ;;           (setq text (replace-match new-quote  t t text))))))
+  ;;   (cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
+  ;;            ;; Falls back on English.
+  ;;            (assoc "en" org-e-latex-quotes))))
+  ;; text)
Use directly `org-e-latex-quote-replacements' in code then.

Not sure I understand this comment.

+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Probably a defcustom eventually.
+
+;; Each element of this consists of: car=language code, cdr=list of
+;; double-quote-open-regexp, double-quote-close-regexp,
+;; single-quote-open-regexp, single-quote-close-regexp,&optional
+;; single-apostrophe regexp?
+;; Just about all will be the same anyway, so mostly language DEFAULT.
+
+;; For testing purposes, poorly-designed at first.
+(defvar org-export-quotes-regexps
+  '((DEFAULT
+      "\\(?:\\s-\\|[[(]\\|^\\)\\(\"\\)\\w"
+      "\\(?:\\S-\\)\\(\"\\)\\s-"
+      "\\(?:\\s-\\|(\\|^\\)\\('\\)\\w"
+      "\\w\\('\\)\\(?:\\s-\\|\\s.\\|$\\)"
+      "\\w\\('\\)\\w")))
I'm not sure this variable can be used for both the buffer and the
export engine. Export back-ends will only see chunks of the paragraph.

For example, in the following text,

   He crossed the Rubicon and said: "/Alea jacta est./"

Plain text translators will see three strings:

   1. "He crossed the Rubicon and said: \""
   2. "Alea jacta est."
   3. "\""

In case 1, you have an opening quote with nothing after it. In case 3,
you have a closing quote with nothing before or after it. Plain regexps
can't help here.

The only solution in can think of is to do quote substitutions in
paragraphs within the parse tree before they reach the translators (i.e.
with `org-export-filter-parse-tree-functions').

That's the only way to know if "\"" is an opening or a closing quote,
for example. The current approach won't work.

Hm. OK, this may indeed be (a) a problem and (b) an indication that I really don't understand the process as I thought I did... ... ... Ah. So when the "plain" text is being exported, the exporter passes along the text in chunks as divided up by the formatting. So string #2 is broken out from the others due to its being in italics. That is indeed an issue. Moreover, I never even properly considered the effects of formatting characters (as opposed to punctuation) right next to the quote-marks, even if this weren't a problem.

So... there's the filter-parse-tree-functions hook gets applied within the parse tree... so a back-end can add a function to that list which looks over the parse-tree and watches for these border cases (and also the ones within ordinary strings). Looks like it's going to be tough to work in any flexibility to define further per-language or per-backend cleverness to handle anything beyond the "canonical set" of open-double, close-double, open-single, close-single, and mid-word.

To be sure, anything we do will most assuredly fail even on some fairly reasonable input, in which case the users are pretty much on their own and will have to do things the hard way. And I could use that as the answer here, that, "well, it'll work only within plain-text strings" (and I might possibly still have to use that answer), but I would rather include the situations you bring up in the supported set and not throw up my hands at it. So, yes, will look at that.
+  (let* ((start 0)
+        (regexps
+         (cdr
+          (or
+           (assoc (plist-get info :language)
+                  org-export-quotes-regexps)
+           (assoc 'DEFAULT org-export-quotes-regexps))))
Use `assq' instead of `assoc' in the second case.

Good call.

+        (subs (cdr (or (assoc (plist-get info :language)
+                              replacements)
+                       (assoc "en" replacements))))
+        (quotes (pairlis regexps subs)))
+    (mapc (lambda (p)
+           (let ((re (car p))
+                 (su (cdr p)))
+             (while (setq start (string-match re text start))
+               (setq text (replace-match su t t text 1)))))
Use `replace-regexp-in-string' instead.

   (replace-regexp-in-string (car p) (cdr p) text t t 1)

I'd been looking at other functions that didn't have that available; thanks for pointing me at it.

~mark




reply via email to

[Prev in Thread] Current Thread [Next in Thread]