coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 4/4] digest: support windows format checksum files


From: Pádraig Brady
Subject: [PATCH 4/4] digest: support windows format checksum files
Date: Wed, 15 Sep 2021 13:40:36 +0100

Support checksum files with CRLF line endings,
which is a common gotcha for using --check on windows,
or with checksum files generated on windows.
Note we escape \r here to support the original coreutils format
(with file name at EOL), and file names with literal
\r characters as the last character of their name.

* src/digest.c (filename_unescape): Convert \\r -> \r.
(print_filename): Escape \r -> \\r.
(output_file): Detect \r chars in file names.
(digest_check): Ignore literal \r char at EOL.
* tests/misc/md5sum.pl: Add a test case.
* tests/misc/sha1sum.pl: Likewise.
* NEWS: Mention the improvement.
---
 NEWS                  |  3 +++
 doc/coreutils.texi    | 10 ++++++----
 src/digest.c          | 45 ++++++++++++++++++++++++-------------------
 tests/misc/md5sum.pl  |  5 +++++
 tests/misc/sha1sum.pl |  2 ++
 5 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/NEWS b/NEWS
index b7d770bc3..74970a475 100644
--- a/NEWS
+++ b/NEWS
@@ -124,6 +124,9 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   and at least 8 times faster where pclmul instructions are supported.
   A new --debug option will indicate if pclmul is being used.
 
+  md5sum --check now supports checksum files with CRLF line endings.
+  This also applies to cksum, sha*sum, and b2sum.
+
   df now recognizes these file systems as remote:
   acfs, coda, fhgfs, gpfs, ibrix, ocfs2, and vxfs.
 
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 68c146ec9..17933f4fb 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4066,10 +4066,12 @@ Binary mode is indicated with @samp{*}, text mode with 
@samp{ } (space).
 Binary mode is the default on systems where it's significant,
 otherwise text mode is the default.  The @command{cksum} command always
 uses binary mode and a @samp{ } (space) flag.
-Without @option{--zero}, if @var{file} contains a backslash or newline,
-the line is started with a backslash, and each problematic character in
-the file name is escaped with a backslash, making the output
-unambiguous even in the presence of arbitrary file names.
+
+Without @option{--zero}, if @var{file} contains a backslash, newline,
+or carriage return, the line is started with a backslash, and each
+problematic character in the file name is escaped with a backslash,
+making the output unambiguous even in the presence of arbitrary file names.
+
 If @var{file} is omitted or specified as @samp{-}, standard input is read.
 
 The program accepts the following options.  Also see @ref{Common options}.
diff --git a/src/digest.c b/src/digest.c
index 5175e9c19..749974844 100644
--- a/src/digest.c
+++ b/src/digest.c
@@ -537,7 +537,8 @@ or equivalent standalone program.\
 
 /* Given a file name, S of length S_LEN, that is not NUL-terminated,
    modify it in place, performing the equivalent of this sed substitution:
-   's/\\n/\n/g;s/\\\\/\\/g' i.e., replacing each "\\n" string with a newline
+   's/\\n/\n/g;s/\\r/\r/g;s/\\\\/\\/g' i.e., replacing each "\\n" string
+   with a newline, each "\\r" string with a carriage return,
    and each "\\\\" with a single backslash, NUL-terminate it and return S.
    If S is not a valid escaped file name, i.e., if it ends with an odd number
    of backslashes or if it contains a backslash followed by anything other
@@ -564,11 +565,14 @@ filename_unescape (char *s, size_t s_len)
             case 'n':
               *dst++ = '\n';
               break;
+            case 'r':
+              *dst++ = '\r';
+              break;
             case '\\':
               *dst++ = '\\';
               break;
             default:
-              /* Only '\' or 'n' may follow a backslash.  */
+              /* Only '\', 'n' or 'r' may follow a backslash.  */
               return NULL;
             }
           break;
@@ -837,7 +841,9 @@ split_3 (char *s, size_t s_len,
   return true;
 }
 
-/* If ESCAPE is true, then translate each NEWLINE byte to the string, "\\n",
+/* If ESCAPE is true, then translate each:
+   NEWLINE byte to the string, "\\n",
+   CARRIAGE RETURN byte to the string, "\\r",
    and each backslash to "\\\\".  */
 static void
 print_filename (char const *file, bool escape)
@@ -856,6 +862,10 @@ print_filename (char const *file, bool escape)
           fputs ("\\n", stdout);
           break;
 
+        case '\r':
+          fputs ("\\r", stdout);
+          break;
+
         case '\\':
           fputs ("\\\\", stdout);
           break;
@@ -952,21 +962,18 @@ output_file (char const *file, int binary_file, void 
const *digest,
              uintmax_t length _GL_UNUSED)
 {
   unsigned char const *bin_buffer = digest;
-  /* We don't really need to escape, and hence detect, the '\\'
-      char, and not doing so should be both forwards and backwards
-      compatible, since only escaped lines would have a '\\' char at
-      the start.  However just in case users are directly comparing
-      against old (hashed) outputs, in the presence of files
-      containing '\\' characters, we decided to not simplify the
-      output in this case.  */
-  bool needs_escape = (strchr (file, '\\') || strchr (file, '\n'))
-                      && delim == '\n';
+
+  /* Output a leading backslash if the file name contains problematic chars.
+     Note we escape '\' itself to provide some forward compat to introduce
+     escaping of other characters.  */
+  bool needs_escape = delim == '\n' && (strchr (file, '\\')
+                                        || strchr (file, '\n')
+                                        || strchr (file, '\r'));
+  if (needs_escape)
+    putchar ('\\');
 
   if (tagged)
     {
-      if (needs_escape)
-        putchar ('\\');
-
       fputs (DIGEST_TYPE_STRING, stdout);
 # if HASH_ALGO_BLAKE2
       if (digest_length < BLAKE2B_MAX_LEN * 8)
@@ -983,11 +990,6 @@ output_file (char const *file, int binary_file, void const 
*digest,
       fputs (") = ", stdout);
     }
 
-  /* Output a leading backslash if the file name contains
-      a newline or backslash.  */
-  if (!tagged && needs_escape)
-    putchar ('\\');
-
   for (size_t i = 0; i < (digest_hex_bytes / 2); ++i)
     printf ("%02x", bin_buffer[i]);
 
@@ -1069,6 +1071,9 @@ digest_check (char const *checkfile_name)
       /* Remove any trailing newline.  */
       if (line[line_length - 1] == '\n')
         line[--line_length] = '\0';
+      /* Remove any trailing carriage return.  */
+      if (line[line_length - 1] == '\r')
+        line[--line_length] = '\0';
 
       if (! (split_3 (line, line_length, &hex_digest, &binary, &filename)
              && ! (is_stdin && STREQ (filename, "-"))))
diff --git a/tests/misc/md5sum.pl b/tests/misc/md5sum.pl
index c32dac0e1..09c2174f5 100755
--- a/tests/misc/md5sum.pl
+++ b/tests/misc/md5sum.pl
@@ -44,9 +44,14 @@ my @Tests =
                                 {OUT=>"\\$degenerate  .\\nfoo\n"}],
      ['backslash-2', {IN=> {".\\foo"=> ''}},
                                 {OUT=>"\\$degenerate  .\\\\foo\n"}],
+     ['backslash-3', {IN=> {".\rfoo"=> ''}},
+                                {OUT=>"\\$degenerate  .\\rfoo\n"}],
      ['check-1', '--check', {AUX=> {f=> ''}},
                                 {IN=> {'f.md5' => "$degenerate  f\n"}},
                                 {OUT=>"f: OK\n"}],
+     ['check-windows', '--check', {AUX=> {f=> ''}},
+                                {IN=> {'f.md5' => "$degenerate  f\r\n"}},
+                                {OUT=>"f: OK\n"}],
 
      # Same as above, but with an added empty line, to provoke --strict.
      ['ck-strict-1', '--check --strict', {AUX=> {f=> ''}},
diff --git a/tests/misc/sha1sum.pl b/tests/misc/sha1sum.pl
index abbda1c49..118708764 100755
--- a/tests/misc/sha1sum.pl
+++ b/tests/misc/sha1sum.pl
@@ -48,6 +48,8 @@ my @Tests =
                         {OUT=>"\\$sha_degenerate  .\\nfoo\n"}],
      ['bs-sha-2', {IN=> {".\\foo"=> ''}},
                         {OUT=>"\\$sha_degenerate  .\\\\foo\n"}],
+     ['bs-sha-3', {IN=> {".\rfoo"=> ''}},
+                        {OUT=>"\\$sha_degenerate  .\\rfoo\n"}],
      # The sha1sum and md5sum drivers share a lot of code.
      # Ensure that sha1sum does *not* share the part that makes
      # md5sum accept BSD format.
-- 
2.26.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]