groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 01/01: Sanitize text for use in PDF document outlines.


From: Keith Marshall
Subject: [groff] 01/01: Sanitize text for use in PDF document outlines.
Date: Sat, 4 Sep 2021 07:37:03 -0400 (EDT)

keithmarshall pushed a commit to branch master
in repository groff.

commit 058b63ce3d614479a64d65d9272cbaa3e2f4b4d1
Author: Keith Marshall <keith.d.marshall@ntlworld.com>
AuthorDate: Sat Sep 4 12:35:26 2021 +0100

    Sanitize text for use in PDF document outlines.
---
 contrib/pdfmark/ChangeLog     |  30 ++++++++
 contrib/pdfmark/pdfmark.am    |   3 +-
 contrib/pdfmark/pdfmark.ms    |  12 +--
 contrib/pdfmark/sanitize.tmac | 170 ++++++++++++++++++++++++++++++++++++++++++
 contrib/pdfmark/spdf.tmac     |  33 +++++---
 5 files changed, 230 insertions(+), 18 deletions(-)

diff --git a/contrib/pdfmark/ChangeLog b/contrib/pdfmark/ChangeLog
index 65ab4a0..ab034fe 100644
--- a/contrib/pdfmark/ChangeLog
+++ b/contrib/pdfmark/ChangeLog
@@ -1,3 +1,33 @@
+2021-09-03  Keith Marshall  <keith.d.marshall@ntlworld.com>
+
+       Sanitize text for use in PDF document outlines.
+
+       * sanitize.tmac: New file; it implements...
+       (sanitize): ...this new macro; interprets its first argument as a
+       string name, and copies its remaining arguments to the named string,
+       discarding specific embedded troff escape sequences; currently...
+       (\F): ...only this is identified as "specifically discardable".
+
+       * pdfmark.am (TMACFILES): Add sanitize.tmac
+
+       * spdf.tmac (mso): Include sanitize.tmac
+       (xn*ref, xn*argc): Rename all occurrences...
+       (spdf:refname, spdf:argc): ...to these, respectively.
+       (XN): Stop inserting $* directly into PDF outlines; instead, use...
+       (spdf:bm.text): ...this new string; this is locally defined by...
+       (spdf:bm.define): ...this new macro; passed the original $* from
+       XN, this itself, is locally defined as a redirectable alias for...
+       (spdf:bm.basic): ...this new local macro; it simply copies $*,
+       passed from XN, to the string named by its first argument, (which is
+       always spdf:bm.text), so reproducing previous behaviour.
+       (opt*XN-S): New macro; defined for internal use only, it adds a "-S"
+       option to XN, such that, when specified, it temporarily redirects...
+       (spdf:bm.define): ...this macro mapping alias to...
+       (sanitize): ...this.
+
+       * pdfmark.ms (XN): Add "-S" option for all headings which include...
+       (\F[C]...\F[]): ...this escape sequence.
+
 2021-08-21  Keith Marshall  <keith.d.marshall@ntlworld.com>
 
        Define, and use registered trade mark strings.
diff --git a/contrib/pdfmark/pdfmark.am b/contrib/pdfmark/pdfmark.am
index d56dd9b..9a2d030 100644
--- a/contrib/pdfmark/pdfmark.am
+++ b/contrib/pdfmark/pdfmark.am
@@ -1,4 +1,4 @@
-# Copyright (C) 2005-2020 Free Software Foundation, Inc.
+# Copyright (C) 2005-2021 Free Software Foundation, Inc.
 #      Written by Keith Marshall (keith.d.marshall@ntlworld.com)
 #      Automake migration by Bertrand Garrigues
 #
@@ -27,6 +27,7 @@ bin_SCRIPTS += pdfroff
 # Files installed in $(tmacdir)
 TMACFILES = \
   contrib/pdfmark/pdfmark.tmac \
+  contrib/pdfmark/sanitize.tmac \
   contrib/pdfmark/spdf.tmac
 pdfmarktmacdir = $(tmacdir)
 dist_pdfmarktmac_DATA = $(TMACFILES)
diff --git a/contrib/pdfmark/pdfmark.ms b/contrib/pdfmark/pdfmark.ms
index fdd3e44..2abe022 100644
--- a/contrib/pdfmark/pdfmark.ms
+++ b/contrib/pdfmark/pdfmark.ms
@@ -349,7 +349,7 @@ of their choice, to format their documents, while also 
using the
 macros to add PDF features.
 .
 .NH 2
-.XN -N pdfmark-operator -- The \F[C]pdfmark\F[] Operator
+.XN -S -N pdfmark-operator -- The \F[C]pdfmark\F[] Operator
 .LP
 All PDF features are implemented by embedding instances of the
 .B \F[C]pdfmark\F[]
@@ -1178,7 +1178,7 @@ which extend through a page transition;
 .QE
 .
 .NH 3
-.XN Optional Features of the \F[C]pdfhref\F[] Macro
+.XN -S -- Optional Features of the \F[C]pdfhref\F[] Macro
 .LP
 The behaviour of a number of the
 .CW pdfhref
@@ -2340,7 +2340,7 @@ illustrates how this may be accomplished:\(en
 .XN -N add-note -- Annotating a PDF Document using Pop-Up Notes
 .
 .NH 2
-.XN -N pdfsync -- Synchronizing Output and \F[C]pdfmark\F[] Contexts
+.XN -S -N pdfsync -- Synchronizing Output and \F[C]pdfmark\F[] Contexts
 .LP
 It has been noted previously, that the
 .CW pdfview
@@ -2493,7 +2493,7 @@ as to how the
 macros may be employed with their chosen primary macro package.
 .
 .NH 2
-.XN -N using-spdf -- Using \F[C]pdfmark\F[] Macros with the \F[C]ms\F[] Macro 
Package
+.XN -S -N using-spdf -- Using \F[C]pdfmark\F[] Macros with the \F[C]ms\F[] 
Macro Package
 .LP
 The use of the binding macro package,
 .CW spdf.tmac ,
@@ -2544,7 +2544,7 @@ and the issues they are intended to address,
 are described below.
 .
 .NH 3
-.XN \F[C]ms\F[] Section Headings in PDF Documents
+.XN -S -- \F[C]ms\F[] Section Headings in PDF Documents
 .LP
 Traditionally,
 .CW ms
@@ -2572,7 +2572,7 @@ to be used in conjunction with the
 macro.
 .
 .NH 4
-.XN -N xn-macro -- The \F[C]XN\F[] Macro
+.XN -S -N xn-macro -- The \F[C]XN\F[] Macro
 .
 .NH 1
 .XN The PDF Publishing Process
diff --git a/contrib/pdfmark/sanitize.tmac b/contrib/pdfmark/sanitize.tmac
new file mode 100644
index 0000000..4efa785
--- /dev/null
+++ b/contrib/pdfmark/sanitize.tmac
@@ -0,0 +1,170 @@
+.ig
+
+sanitize.tmac
+
+Copyright (C) 2021 Free Software Foundation, Inc.
+     Written by Keith Marshall (keith.d.marshall@ntlworld.com)
+
+This file is part of groff.
+
+groff is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+groff is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+..
+.eo
+.de sanitize
+.\" Usage: .sanitize name text ...
+.\"
+.\" Remove designated formatting escape sequences from "text ..."; return
+.\" the sanitized text in a string register, identified by "name".
+.\"
+.\" Begin by initializing the named result as an empty string, bind it to
+.\" an internal reference name, and discard the "name" argument, to leave
+.\" only the text which is to be sanitized, as residual arguments.
+.\"
+.   ds \$1
+.   als sanitize:result \$1
+.   shift
+.
+.\" Initialize a working string register, which we will cyclically reduce
+.\" until it becomes empty, after starting with all of the text passed as
+.\" the residual arguments, and establish its initial length.
+.\"
+.   ds sanitize:residual "\$*\"
+.   length sanitize:residual.length "\$*\"
+.
+.\" Begin the cyclic reduction loop...
+.\"
+.   while \n[sanitize:residual.length] \{\
+.   \"
+.   \" ...assuming, at the start of each cycle, that the next character
+.   \" will not be skipped, and that it will be moved from the residual,
+.   \" to the result, as the character-by-character scan proceeds.
+.   \"
+.      nr sanitize:skip.count 0
+.      sanitize:scan.execute
+.
+.   \" For each character scanned, we need to check if it matches the
+.   \" normal escape character; the check is most readily performed, if
+.   \" an alternative escape character is introduced, and when a match
+.   \" is found, we prepare to skip an escape sequence.
+.   \"
+.      ec !
+.      if '!*[sanitize:scan.char]'\' .nr sanitize:skip.count 1
+.      ec
+.      ie \n[sanitize:skip.count] \{\
+.      \"
+.      \" When a possible escape sequence has been detected, we back it
+.      \" up, (in case it isn't recognized, and we need to reinstate its
+.      \" content into the result string), then scan ahead to check for
+.      \" an identifiable escape sequence...
+.      \"
+.         rn sanitize:scan.char sanitize:hold
+.         sanitize:scan.execute
+.         ie d sanitize:esc-\*[sanitize:scan.char] \
+.         \"
+.         \" ...which we delegate to its appropriate handler, to skip...
+.         \"
+.            sanitize:esc-\*[sanitize:scan.char]
+.
+.      \" ...but, in the case of an unrecognized escape sequence, we copy
+.      \" its backed-up content, followed by the character retrieved from
+.      \" the current scan cycle, to the result string.
+.      \"
+.         el .as sanitize:result "\*[sanitize:hold]\*[sanitize:scan.char]\"
+.      \}
+.
+.   \" When the current scan cycle has retrieved a character, which isn't
+.   \" part of any possible escape sequence, we simply copy that character
+.   \" to the result string.
+.   \"
+.      el .as sanitize:result "\*[sanitize:scan.char]\"
+.   \}
+.
+.\" Clean up the register space, by deleting all of the string registers,
+.\" and numeric registers, which are designated as temporary, for private
+.\" use within this macro only.
+.\"
+.   rm sanitize:hold sanitize:scan.char sanitize:residual sanitize:result
+.   rr sanitize:residual.length sanitize:skip.count
+..
+.de sanitize:scan.execute
+.\" Usage (internal): .sanitize:scan.execute
+.\"
+.\" Perform a single-character reduction of sanitize:residual, by copying
+.\" its initial character to sanitize:scan.char, and then deleting it from
+.\" sanitize:residual itself.  (Note that we use arithmetic decrementation
+.\" of sanitize:residual.length, rather than repeating the length request
+.\" on sanitize:residual, because reduction WILL fail when there is only
+.\" one character remaining).
+.\"
+.   nr sanitize:residual.length -1
+.   ds sanitize:scan.char "\*[sanitize:residual]\"
+.   substring sanitize:scan.char 0 0
+.   substring sanitize:residual 1
+..
+.de sanitize:skip-(
+.\" Usage (internal): .sanitize:skip-(
+.\"
+.\" For any identified escape sequence, with a two-character property name,
+.\" simply skip over the next two characters in the residual string.
+.\"
+.   nr sanitize:residual.length -2
+.   substring sanitize:residual 2
+..
+.de sanitize:skip-[
+.\" Usage (internal): .sanitize:skip-[
+.\"
+.\" For any identified escape sequence, with an arbitrary-length property
+.\" name, skip following characters in the residual string, until we find
+.\" a terminal "]" character, or we exhaust the residual.
+.\"
+.   while \n[sanitize:skip.count] \{\
+.      sanitize:scan.execute
+.      ie \n[sanitize:residual.length] \{\
+.      \" We haven't yet exhausted the residual; if we find a nested "["
+.      \" character, increment the nesting level, otherwise decrement it
+.      \" for each "]"; it will become zero at the terminal "]".
+.      \"
+.         ie '\*[sanitize:scan.char]'[' .nr sanitize:skip.count +1
+.         el .if '\*[sanitize:scan.char]']' .nr sanitize:skip.count -1
+.      \}
+.      \" Stop unconditionally, if we do exhaust the residual.
+.      \"
+.      el .nr sanitize:skip.count 0
+.   \}
+..
+.de sanitize:esc-generic
+.\" Usage: .sanitize:esc-X
+.\"
+.\" (X represents any legitimate single-character escape sequence id).
+.\"
+.\" Handler for skipping "\X" sequences, in text which is to be sanitized;
+.\" this will automatically detect sequences conforming to any of the forms
+.\" "\Xc", "\X(cc", or "\X[...]", and will handle each appropriately.  The
+.\" implementation is generic, and may be aliased to handle any specific
+.\" escape sequences, which exhibit similar semantics.
+.\"
+.   sanitize:scan.execute
+.   if d sanitize:skip-\*[sanitize:scan.char] \
+.      sanitize:skip-\*[sanitize:scan.char]
+..
+.\" Map the generic handler to specific escape sequences, as required.
+.\"
+.als sanitize:esc-F sanitize:esc-generic
+.ec
+.\" Local Variables:
+.\" mode: nroff
+.\" End:
+.\" vim: filetype=groff:
+.\" sanitize.tmac: end of file
diff --git a/contrib/pdfmark/spdf.tmac b/contrib/pdfmark/spdf.tmac
index 767f5ee..33591d0 100644
--- a/contrib/pdfmark/spdf.tmac
+++ b/contrib/pdfmark/spdf.tmac
@@ -2,7 +2,7 @@
 
 spdf.tmac
 
-Copyright (C) 2004-2020 Free Software Foundation, Inc.
+Copyright (C) 2004-2021 Free Software Foundation, Inc.
      Written by Keith Marshall (keith.d.marshall@ntlworld.com)
 
 This file is part of groff.
@@ -25,6 +25,7 @@ along with this program.  If not, see 
<http://www.gnu.org/licenses/>.
 .if !rOPMODE .nr OPMODE 1
 .\"
 .mso s.tmac
+.mso sanitize.tmac
 .mso pdfmark.tmac
 .\"
 .\" Omitted Sections
@@ -82,16 +83,18 @@ along with this program.  If not, see 
<http://www.gnu.org/licenses/>.
 .\"  additional spacing parameters may be set relative to the current
 .\"  document line spacing, as set by \n[VS]).
 .\"
-.rm xn*ref
+.rm spdf:refname
+.als spdf:bm.define spdf:bm.basic
 .while dopt*XN\\$1 \{\
 .   opt*XN\\$1 \\$*
-.   shift \\n[xn*argc]
+.   shift \\n[spdf:argc]
 .   \}
-.rr xn*argc
+.rr spdf:argc
 .if '\\$1'--' .shift
-.if dxn*ref .XM -N \\*[xn*ref] -- \\$@
-.rm xn*ref
-.pdfhref O \\n[nh*hl] "\\*(SN \\$*"
+.if dspdf:refname .XM -N \\*[spdf:refname] -- \\$@
+.rm spdf:refname
+.spdf:bm.define spdf:bm.text "\\$*"
+.pdfhref O \\n[nh*hl] "\\*(SN \\*[spdf:bm.text]"
 .XS
 .if rtc*hl \{\
 .   if !dXNVS1 .ds XNVS1 1.0v  \" default leading for top level
@@ -119,12 +122,20 @@ along with this program.  If not, see 
<http://www.gnu.org/licenses/>.
 \&\\$*
 ..
 .de opt*XN-N
-.nr xn*argc 2
-.ds xn*ref \\$2
+.ds spdf:refname \\$2
+.nr spdf:argc 2
+..
+.de opt*XN-S
+.als spdf:bm.define sanitize
+.nr spdf:argc 1
 ..
 .de opt*XN-X
-.nr xn*argc 1
-.if !dxn*ref .ds xn*ref \\\\$1
+.if !dspdf:refname .ds spdf:refname \\\\$1
+.nr spdf:argc 1
+..
+.de spdf:bm.basic
+.shift
+.ds spdf:bm.text "\\$*\"
 ..
 .de LU
 .LP



reply via email to

[Prev in Thread] Current Thread [Next in Thread]