[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#26948: ‘write-file’ output should not be locale-dependent
From: |
Ludovic Courtès |
Subject: |
bug#26948: ‘write-file’ output should not be locale-dependent |
Date: |
Tue, 30 May 2017 13:57:24 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) |
Hi Maxim,
Maxim Cournoyer <address@hidden> skribis:
> address@hidden (Ludovic Courtès) writes:
[...]
>> But wait! “guix build nss-certs --check -K” fails, and the diff is:
>>
>> $ LANGUAGE= diff -ur
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2{,-check}
>> Only in
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs:
>>
>> AC_Raíz_Certicámara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem
>> Only in
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs:
>> AC_Ra?z_Certic?mara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem
>> diff -ur
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0
>>
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0
>> ---
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0
>> 1970-01-01 01:00:01.000000000 +0100
>> +++
>> /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0
>> 1970-01-01 01:00:01.000000000 +0100
>> @@ -3,10 +3,10 @@
>> # distrust=
>> # openssl-trust=codeSigning emailProtection serverAuth
>> -----BEGIN CERTIFICATE-----
>> -MIIHyTCCBbGgAwIBAgIBATANBgkqhkiG9w0BAQUFADB9MQswCQYDVQQGEwJJTDEW
>> +MIIHhzCCBW+gAwIBAgIBLTANBgkqhkiG9w0BAQsFADB9MQswCQYDVQQGEwJJTDEW
>
> Can this be explained by locale alone? That is troubling.
Yes it’s troubling, it deserves more investigation.
>> There are two ways to create nars. One is via the ‘export-paths’ RPC
>> (implemented in the daemon in C++), which does not interpret file names
>> and thus leaves them untouched. The other one is via ‘write-file’ from
>> (guix serialization), which is written in Scheme and thus converts file
>> names from locale encoding (specifically, ‘scandir’ does that.)
>>
>> ‘guix publish’ uses the latter, so ‘guix publish’ is sensitive to locale
>> settings, which is pretty bad.
>>
>> Guile currently does not allow us to specify whether/how file names
>> should be decoded, but possible solutions have been discussed for 2.2.
>>
>> In the meantime, solutions are:
>>
>> 1. To run ‘guix publish’ in a UTF-8 locale, which apparently was not
>> the case.
>
> I'm surprised by that. Wouldn't a utf8 locale be the default?
Users are free to choose their favorite locale. Also, on a foreign
distro where locales are not properly set up, you end up in the C locale
with US-ASCII encoding (as was the case here).
>> 2. Add to (guix build syscalls) a separate locale-independent
>> ‘scandir’ implementation and use that.
>
> If the general solution is to fix it in Guile, the workaround proposed
> in 1. seems preferable.
I implemented ‘scandir/utf-8’ and used that in ‘write-file’ (patches
attached). Unfortunately that’s not enough since libguile procedures
like ‘open-file’ still do locale-dependent conversion, so we’d need to
duplicate those as well, which is not great.
But on second thought, I think the problem is not in the ‘write-file’
call that ‘guix publish’ makes: if it were, ‘scandir’ would return bogus
file names (with question marks), but then trying to open them would
fail, so ‘write-file’ wouldn’t produce a bogus nar.
So I think the culprit is ‘restore-file-set’ (used in ‘guix offload’
when retrieving store items from a build machine): this one reads file
names as UTF-8, via ‘read-store-path’, but then when it tries to create
those files using Guile’s primitives, their names can be be converted,
possibly with those question marks popping up. Here ‘restore-file-set’
can’t notice that Guile changed the file names behind its back.
The short-term fix is to ensure guix-daemon itself runs in a UTF-8
locale. :-/
I’ve restarted guix-daemon and ‘guix publish’ in a UTF-8 locale on
hydra.gnu.org.
Thanks,
Ludo’.
>From e7f464bac58e1f09de5ceb194c4a30f6d899b29a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <address@hidden>
Date: Tue, 30 May 2017 12:03:54 +0200
Subject: [PATCH] syscalls: Add 'scandir/utf-8'.
* guix/build/syscalls.scm (%struct-dirent-header): New C struct.
(opendir/utf-8, closedir/utf-8, readdir/utf-8, scandir/utf-8): New
procedures.
* tests/syscalls.scm ("scandir/utf-8, ENOENT")
("scandir/utf-8, ASCII file names")
("scandir/utf8, UTF-8 file names"): New tests.
---
guix/build/syscalls.scm | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/syscalls.scm | 39 ++++++++++++++++++++++++++
2 files changed, 112 insertions(+)
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 52439afd4..cfb43e93b 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -67,6 +67,7 @@
mkdtemp!
fdatasync
pivot-root
+ scandir/utf-8
fcntl-flock
set-thread-name
@@ -812,6 +813,78 @@ system to PUT-OLD."
;;;
+;;; Opendir & co.
+;;;
+
+(define-c-struct %struct-dirent-header
+ sizeof-dirent-header
+ list
+ read-dirent-header
+ write-dirent-header!
+ (inode int64)
+ (offset int64)
+ (length unsigned-short)
+ (type uint8)
+ (name uint8)) ;first byte of 'd_name'
+
+(define opendir/utf-8
+ (let ((proc (syscall->procedure '* "opendir" '(*))))
+ (lambda (name)
+ (let-values (((ptr err)
+ (proc (string->pointer name "UTF-8"))))
+ (if (null-pointer? ptr)
+ (throw 'system-error "opendir/utf-8"
+ "opendir/utf-8: ~A"
+ (list (strerror err))
+ (list err))
+ ptr)))))
+
+(define closedir/utf-8
+ (let ((proc (syscall->procedure int "closedir" '(*))))
+ (lambda (directory)
+ (let-values (((ret err)
+ (proc directory)))
+ (unless (zero? ret)
+ (throw 'system-error "closedir"
+ "closedir: ~A" (list (strerror err))
+ (list err)))))))
+
+(define readdir/utf-8
+ (let ((proc (syscall->procedure '* "readdir64" '(*))))
+ (lambda (directory)
+ (let ((ptr (proc directory)))
+ (and (not (null-pointer? ptr))
+ (pointer->string
+ (make-pointer (+ (pointer-address ptr)
+ (c-struct-field-offset %struct-dirent-header
name)))
+ -1
+ "UTF-8"))))))
+
+(define* (scandir/utf-8 name #:optional
+ (select? (const #t))
+ (string<? string<?))
+ "Like 'scandir', but (1) systematically decode file names as UTF-8, and (2)
+raise to 'system-error' when NAME cannot be opened.
+
+This works around a defect in Guile 2.0/2.2 where 'scandir' decodes file names
+according to the current locale, which is not always desirable."
+ (let ((directory (opendir/utf-8 name)))
+ (dynamic-wind
+ (const #t)
+ (lambda ()
+ (let loop ((result '()))
+ (match (readdir/utf-8 directory)
+ (#f
+ (sort result string<?))
+ (entry
+ (loop (if (select? entry)
+ (cons entry result)
+ result))))))
+ (lambda ()
+ (closedir/utf-8 directory)))))
+
+
+;;;
;;; Advisory file locking.
;;;
diff --git a/tests/syscalls.scm b/tests/syscalls.scm
index e20f0600b..02062ec9d 100644
--- a/tests/syscalls.scm
+++ b/tests/syscalls.scm
@@ -24,6 +24,8 @@
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-26)
#:use-module (srfi srfi-64)
+ #:use-module (system foreign)
+ #:use-module ((ice-9 ftw) #:select (scandir))
#:use-module (ice-9 match))
;; Test the (guix build syscalls) module, although there's not much that can
@@ -184,6 +186,43 @@
(status:exit-val status))))
(eq? #t result))))))))
+(test-equal "scandir/utf-8, ENOENT"
+ ENOENT
+ (catch 'system-error
+ (lambda ()
+ (scandir/utf-8 "/does/not/exist"))
+ (lambda args
+ (system-error-errno args))))
+
+(test-equal "scandir/utf-8, ASCII file names"
+ (scandir (dirname (search-path %load-path "guix/base32.scm")))
+ (scandir/utf-8 (dirname (search-path %load-path "guix/base32.scm"))))
+
+(test-equal "scandir/utf8, UTF-8 file names"
+ '("." ".." "α" "λ")
+ (call-with-temporary-directory
+ (lambda (directory)
+ ;; Wrap 'creat' to make sure that we really pass a UTF-8-encoded file
+ ;; name to the system call.
+ (let ((creat (pointer->procedure int
+ (dynamic-func "creat" (dynamic-link))
+ (list '* int))))
+ (creat (string->pointer (string-append directory "/α")
+ "UTF-8")
+ #o644)
+ (creat (string->pointer (string-append directory "/λ")
+ "UTF-8")
+ #o644)
+ (let ((locale (setlocale LC_ALL)))
+ (dynamic-wind
+ (lambda ()
+ ;; Make sure that even in a C locale we get the right result.
+ (setlocale LC_ALL "C"))
+ (lambda ()
+ (scandir/utf-8 directory))
+ (lambda ()
+ (setlocale LC_ALL locale))))))))
+
(false-if-exception (delete-file temp-file))
(test-equal "fcntl-flock wait"
42 ; the child's exit status
--
2.13.0
diff --git a/guix/serialization.scm b/guix/serialization.scm
index e6ae2fc30..77a54f904 100644
--- a/guix/serialization.scm
+++ b/guix/serialization.scm
@@ -18,6 +18,8 @@
(define-module (guix serialization)
#:use-module (guix combinators)
+ #:use-module ((guix build syscalls)
+ #:select (scandir/utf-8))
#:use-module (rnrs bytevectors)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-26)
@@ -285,8 +287,11 @@ result of 'lstat'; exclude entries for which SELECT? does
not return true."
;; 'scandir' defaults to 'string-locale<?' to sort files, but
;; this happens to be case-insensitive (at least in 'en_US'
;; locale on libc 2.18.) Conversely, we want files to be
- ;; sorted in a case-sensitive fashion.
- (scandir f (negate (cut member <> '("." ".."))) string<?)))
+ ;; sorted in a case-sensitive fashion. Also, always decode file
+ ;; names as UTF-8.
+ (scandir/utf-8
+ f (negate (cut member <> '("." "..")))
+ string<?)))
(for-each (lambda (e)
(let* ((f (string-append f "/" e))
(s (lstat f)))
diff --git a/tests/nar.scm b/tests/nar.scm
index 61646db96..d2eae65c4 100644
--- a/tests/nar.scm
+++ b/tests/nar.scm
@@ -25,6 +25,8 @@
#:select (open-sha256-port open-sha256-input-port))
#:use-module ((guix packages)
#:select (base32))
+ #:use-module ((guix utils)
+ #:select (call-with-temporary-directory))
#:use-module (rnrs bytevectors)
#:use-module (rnrs io ports)
#:use-module (srfi srfi-1)
@@ -272,6 +274,36 @@
(lambda ()
(rmdir input)))))
+(unless (equal? "UTF-8" (fluid-ref %default-port-encoding))
+ (test-skip 1))
+(test-assert "write-file + restore-file, UTF-8 file names"
+ (let* ((output %test-dir)
+ (nar (string-append output ".nar"))
+ (locale (setlocale LC_ALL)))
+ (dynamic-wind
+ (lambda () #t)
+ (lambda ()
+ (call-with-temporary-directory
+ (lambda (input)
+ (call-with-output-file (string-append input "/α")
+ (const #t))
+ (call-with-output-file (string-append input "/λ")
+ (const #t))
+ (dynamic-wind
+ (const #f)
+ (lambda ()
+ (setlocale LC_ALL "C")
+ (call-with-output-file nar
+ (cut write-file input <>)))
+ (lambda ()
+ (setlocale LC_ALL locale)))
+ (call-with-input-file nar
+ (cut restore-file <> output))
+ (file-tree-equal? input output))))
+ (lambda ()
+ (false-if-exception (delete-file nar))
+ (false-if-exception (rm-rf output))))))
+
;; 'restore-file-set' depends on 'open-sha256-input-port', which in turn
;; relies on a Guile 2.0.10+ feature.
(test-skip (if (false-if-exception
- bug#26948: gnutls errors on multiple guix commands, (continued)
- bug#26948: gnutls errors on multiple guix commands, Maxim Cournoyer, 2017/05/25
- bug#26948: gnutls errors on multiple guix commands, Ludovic Courtès, 2017/05/26
- bug#26948: gnutls errors on multiple guix commands, Mark H Weaver, 2017/05/28
- bug#26948: gnutls errors on multiple guix commands, Maxim Cournoyer, 2017/05/29
- bug#26948: gnutls errors on multiple guix commands, Ludovic Courtès, 2017/05/29
- bug#26948: gnutls errors on multiple guix commands, Mark H Weaver, 2017/05/29
- bug#26948: gnutls errors on multiple guix commands, Ludovic Courtès, 2017/05/30
- bug#26948: gnutls errors on multiple guix commands, Maxim Cournoyer, 2017/05/28
- bug#26948: ‘write-file’ output should not be locale-dependent, Ludovic Courtès, 2017/05/29
- bug#26948: ‘write-file’ output should not be locale-dependent, Maxim Cournoyer, 2017/05/29
- bug#26948: ‘write-file’ output should not be locale-dependent,
Ludovic Courtès <=