bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26948: ‘write-file’ output should not be locale-dependent


From: Ludovic Courtès
Subject: bug#26948: ‘write-file’ output should not be locale-dependent
Date: Tue, 30 May 2017 13:57:24 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Hi Maxim,

Maxim Cournoyer <address@hidden> skribis:

> address@hidden (Ludovic Courtès) writes:

[...]

>> But wait!  “guix build nss-certs --check -K” fails, and the diff is:
>>
>> $ LANGUAGE= diff -ur 
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2{,-check}
>> Only in 
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs:
>>  
>> AC_Raíz_Certicámara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem
>> Only in 
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs: 
>> AC_Ra?z_Certic?mara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem
>> diff -ur 
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0
>>  
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0
>> --- 
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0
>>     1970-01-01 01:00:01.000000000 +0100
>> +++ 
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0
>>       1970-01-01 01:00:01.000000000 +0100
>> @@ -3,10 +3,10 @@
>>  # distrust=
>>  # openssl-trust=codeSigning emailProtection serverAuth
>>  -----BEGIN CERTIFICATE-----
>> -MIIHyTCCBbGgAwIBAgIBATANBgkqhkiG9w0BAQUFADB9MQswCQYDVQQGEwJJTDEW
>> +MIIHhzCCBW+gAwIBAgIBLTANBgkqhkiG9w0BAQsFADB9MQswCQYDVQQGEwJJTDEW
>
> Can this be explained by locale alone? That is troubling.

Yes it’s troubling, it deserves more investigation.

>> There are two ways to create nars.  One is via the ‘export-paths’ RPC
>> (implemented in the daemon in C++), which does not interpret file names
>> and thus leaves them untouched.  The other one is via ‘write-file’ from
>> (guix serialization), which is written in Scheme and thus converts file
>> names from locale encoding (specifically, ‘scandir’ does that.)
>>
>> ‘guix publish’ uses the latter, so ‘guix publish’ is sensitive to locale
>> settings, which is pretty bad.
>>
>> Guile currently does not allow us to specify whether/how file names
>> should be decoded, but possible solutions have been discussed for 2.2.
>>
>> In the meantime, solutions are:
>>
>>   1. To run ‘guix publish’ in a UTF-8 locale, which apparently was not
>>      the case.
>
> I'm surprised by that. Wouldn't a utf8 locale be the default?

Users are free to choose their favorite locale.  Also, on a foreign
distro where locales are not properly set up, you end up in the C locale
with US-ASCII encoding (as was the case here).

>>   2. Add to (guix build syscalls) a separate locale-independent
>>      ‘scandir’ implementation and use that.
>
> If the general solution is to fix it in Guile, the workaround proposed
> in 1. seems preferable.

I implemented ‘scandir/utf-8’ and used that in ‘write-file’ (patches
attached).  Unfortunately that’s not enough since libguile procedures
like ‘open-file’ still do locale-dependent conversion, so we’d need to
duplicate those as well, which is not great.

But on second thought, I think the problem is not in the ‘write-file’
call that ‘guix publish’ makes: if it were, ‘scandir’ would return bogus
file names (with question marks), but then trying to open them would
fail, so ‘write-file’ wouldn’t produce a bogus nar.

So I think the culprit is ‘restore-file-set’ (used in ‘guix offload’
when retrieving store items from a build machine): this one reads file
names as UTF-8, via ‘read-store-path’, but then when it tries to create
those files using Guile’s primitives, their names can be be converted,
possibly with those question marks popping up.  Here ‘restore-file-set’
can’t notice that Guile changed the file names behind its back.

The short-term fix is to ensure guix-daemon itself runs in a UTF-8
locale.  :-/

I’ve restarted guix-daemon and ‘guix publish’ in a UTF-8 locale on
hydra.gnu.org.

Thanks,
Ludo’.


>From e7f464bac58e1f09de5ceb194c4a30f6d899b29a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <address@hidden>
Date: Tue, 30 May 2017 12:03:54 +0200
Subject: [PATCH] syscalls: Add 'scandir/utf-8'.

* guix/build/syscalls.scm (%struct-dirent-header): New C struct.
(opendir/utf-8, closedir/utf-8, readdir/utf-8, scandir/utf-8): New
procedures.
* tests/syscalls.scm ("scandir/utf-8, ENOENT")
("scandir/utf-8, ASCII file names")
("scandir/utf8, UTF-8 file names"): New tests.
---
 guix/build/syscalls.scm | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/syscalls.scm      | 39 ++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 52439afd4..cfb43e93b 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -67,6 +67,7 @@
             mkdtemp!
             fdatasync
             pivot-root
+            scandir/utf-8
             fcntl-flock
 
             set-thread-name
@@ -812,6 +813,78 @@ system to PUT-OLD."
 

 ;;;
+;;; Opendir & co.
+;;;
+
+(define-c-struct %struct-dirent-header
+  sizeof-dirent-header
+  list
+  read-dirent-header
+  write-dirent-header!
+  (inode  int64)
+  (offset int64)
+  (length unsigned-short)
+  (type   uint8)
+  (name   uint8))                                 ;first byte of 'd_name'
+
+(define opendir/utf-8
+  (let ((proc (syscall->procedure '* "opendir" '(*))))
+    (lambda (name)
+      (let-values (((ptr err)
+                    (proc (string->pointer name "UTF-8"))))
+        (if (null-pointer? ptr)
+            (throw 'system-error "opendir/utf-8"
+                   "opendir/utf-8: ~A"
+                   (list (strerror err))
+                   (list err))
+            ptr)))))
+
+(define closedir/utf-8
+  (let ((proc (syscall->procedure int "closedir" '(*))))
+    (lambda (directory)
+      (let-values (((ret err)
+                    (proc directory)))
+        (unless (zero? ret)
+          (throw 'system-error "closedir"
+                 "closedir: ~A" (list (strerror err))
+                 (list err)))))))
+
+(define readdir/utf-8
+  (let ((proc (syscall->procedure '* "readdir64" '(*))))
+    (lambda (directory)
+      (let ((ptr (proc directory)))
+        (and (not (null-pointer? ptr))
+             (pointer->string
+              (make-pointer (+ (pointer-address ptr)
+                               (c-struct-field-offset %struct-dirent-header 
name)))
+              -1
+              "UTF-8"))))))
+
+(define* (scandir/utf-8 name #:optional
+                        (select? (const #t))
+                        (string<? string<?))
+  "Like 'scandir', but (1) systematically decode file names as UTF-8, and (2)
+raise to 'system-error' when NAME cannot be opened.
+
+This works around a defect in Guile 2.0/2.2 where 'scandir' decodes file names
+according to the current locale, which is not always desirable."
+  (let ((directory (opendir/utf-8 name)))
+    (dynamic-wind
+      (const #t)
+      (lambda ()
+        (let loop ((result '()))
+          (match (readdir/utf-8 directory)
+            (#f
+             (sort result string<?))
+            (entry
+             (loop (if (select? entry)
+                       (cons entry result)
+                       result))))))
+      (lambda ()
+        (closedir/utf-8 directory)))))
+
+
+;;;
 ;;; Advisory file locking.
 ;;;
 
diff --git a/tests/syscalls.scm b/tests/syscalls.scm
index e20f0600b..02062ec9d 100644
--- a/tests/syscalls.scm
+++ b/tests/syscalls.scm
@@ -24,6 +24,8 @@
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:use-module (srfi srfi-64)
+  #:use-module (system foreign)
+  #:use-module ((ice-9 ftw) #:select (scandir))
   #:use-module (ice-9 match))
 
 ;; Test the (guix build syscalls) module, although there's not much that can
@@ -184,6 +186,43 @@
                          (status:exit-val status))))
                (eq? #t result))))))))
 
+(test-equal "scandir/utf-8, ENOENT"
+  ENOENT
+  (catch 'system-error
+    (lambda ()
+      (scandir/utf-8 "/does/not/exist"))
+    (lambda args
+      (system-error-errno args))))
+
+(test-equal "scandir/utf-8, ASCII file names"
+  (scandir (dirname (search-path %load-path "guix/base32.scm")))
+  (scandir/utf-8 (dirname (search-path %load-path "guix/base32.scm"))))
+
+(test-equal "scandir/utf8, UTF-8 file names"
+  '("." ".." "α" "λ")
+  (call-with-temporary-directory
+   (lambda (directory)
+     ;; Wrap 'creat' to make sure that we really pass a UTF-8-encoded file
+     ;; name to the system call.
+     (let ((creat (pointer->procedure int
+                                      (dynamic-func "creat" (dynamic-link))
+                                      (list '* int))))
+       (creat (string->pointer (string-append directory "/α")
+                               "UTF-8")
+              #o644)
+       (creat (string->pointer (string-append directory "/λ")
+                               "UTF-8")
+              #o644)
+       (let ((locale (setlocale LC_ALL)))
+         (dynamic-wind
+           (lambda ()
+             ;; Make sure that even in a C locale we get the right result.
+             (setlocale LC_ALL "C"))
+           (lambda ()
+             (scandir/utf-8 directory))
+           (lambda ()
+             (setlocale LC_ALL locale))))))))
+
 (false-if-exception (delete-file temp-file))
 (test-equal "fcntl-flock wait"
   42                                              ; the child's exit status
-- 
2.13.0

diff --git a/guix/serialization.scm b/guix/serialization.scm
index e6ae2fc30..77a54f904 100644
--- a/guix/serialization.scm
+++ b/guix/serialization.scm
@@ -18,6 +18,8 @@
 
 (define-module (guix serialization)
   #:use-module (guix combinators)
+  #:use-module ((guix build syscalls)
+                #:select (scandir/utf-8))
   #:use-module (rnrs bytevectors)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
@@ -285,8 +287,11 @@ result of 'lstat'; exclude entries for which SELECT? does 
not return true."
               ;; 'scandir' defaults to 'string-locale<?' to sort files, but
               ;; this happens to be case-insensitive (at least in 'en_US'
               ;; locale on libc 2.18.)  Conversely, we want files to be
-              ;; sorted in a case-sensitive fashion.
-              (scandir f (negate (cut member <> '("." ".."))) string<?)))
+              ;; sorted in a case-sensitive fashion.  Also, always decode file
+              ;; names as UTF-8.
+              (scandir/utf-8
+               f (negate (cut member <> '("." "..")))
+                             string<?)))
          (for-each (lambda (e)
                      (let* ((f (string-append f "/" e))
                             (s (lstat f)))
diff --git a/tests/nar.scm b/tests/nar.scm
index 61646db96..d2eae65c4 100644
--- a/tests/nar.scm
+++ b/tests/nar.scm
@@ -25,6 +25,8 @@
                 #:select (open-sha256-port open-sha256-input-port))
   #:use-module ((guix packages)
                 #:select (base32))
+  #:use-module ((guix utils)
+                #:select (call-with-temporary-directory))
   #:use-module (rnrs bytevectors)
   #:use-module (rnrs io ports)
   #:use-module (srfi srfi-1)
@@ -272,6 +274,36 @@
       (lambda ()
         (rmdir input)))))
 
+(unless (equal? "UTF-8" (fluid-ref %default-port-encoding))
+  (test-skip 1))
+(test-assert "write-file + restore-file, UTF-8 file names"
+  (let* ((output %test-dir)
+         (nar    (string-append output ".nar"))
+         (locale (setlocale LC_ALL)))
+    (dynamic-wind
+      (lambda () #t)
+      (lambda ()
+        (call-with-temporary-directory
+         (lambda (input)
+           (call-with-output-file (string-append input "/α")
+             (const #t))
+           (call-with-output-file (string-append input "/λ")
+             (const #t))
+           (dynamic-wind
+             (const #f)
+             (lambda ()
+               (setlocale LC_ALL "C")
+               (call-with-output-file nar
+                 (cut write-file input <>)))
+             (lambda ()
+               (setlocale LC_ALL locale)))
+           (call-with-input-file nar
+             (cut restore-file <> output))
+           (file-tree-equal? input output))))
+      (lambda ()
+        (false-if-exception (delete-file nar))
+        (false-if-exception (rm-rf output))))))
+
 ;; 'restore-file-set' depends on 'open-sha256-input-port', which in turn
 ;; relies on a Guile 2.0.10+ feature.
 (test-skip (if (false-if-exception

reply via email to

[Prev in Thread] Current Thread [Next in Thread]