[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
06/14: gpce-2017: Tweak some more.
From: |
Ludovic Courtčs |
Subject: |
06/14: gpce-2017: Tweak some more. |
Date: |
Fri, 1 Sep 2017 11:57:54 -0400 (EDT) |
civodul pushed a commit to branch master
in repository maintenance.
commit ee3a74c6e2a25956342157d555826626c00de3b0
Author: Ludovic Courtès <address@hidden>
Date: Fri Jul 7 11:48:54 2017 +0200
gpce-2017: Tweak some more.
---
doc/gpce-2017/code/system-test.scm | 2 +-
doc/gpce-2017/gpce.skb | 154 ++++++++++++++++++-------------------
doc/gpce-2017/staging.sbib | 36 +++++++++
3 files changed, 113 insertions(+), 79 deletions(-)
diff --git a/doc/gpce-2017/code/system-test.scm
b/doc/gpce-2017/code/system-test.scm
index 6a086b7..4ea879b 100644
--- a/doc/gpce-2017/code/system-test.scm
+++ b/doc/gpce-2017/code/system-test.scm
@@ -3,7 +3,7 @@
(srfi srfi-64) (ice-9 match))
;; Spawn the VM that runs the declared OS.
- (define marionette (make-marionette (list #$run)))
+ (define marionette (make-marionette (list #$vm)))
(test-begin "basic")
(test-assert "uname"
diff --git a/doc/gpce-2017/gpce.skb b/doc/gpce-2017/gpce.skb
index bbbbb14..1cc9ba4 100644
--- a/doc/gpce-2017/gpce.skb
+++ b/doc/gpce-2017/gpce.skb
@@ -292,7 +292,7 @@ homoiconicity—the fact that code has a direct
representation as a data
structure using the same syntax. “S-expressions” or “sexps”, Lisp’s
parenthecal expressions, thus look like they lend themselves to code
staging.
-In this section we show how we this early experience made it clear that
+In this section we show how our early experience made it clear that
we needed an ,(emph [augmented]) version of sexps.])
(section :title [Staging Build Expressions]
@@ -312,7 +312,7 @@ which relied solely on Lisp quotation ,(ref :bib
'bawden1999:quasiquotation). Figure ,(ref :figure "fig-build-sexp")
shows an example that creates a derivation that, when built, converts
the input image to JPEG, using the ,(tt [convert]) program from the
-ImageMagick package—this is equivalent to a three-line makefile, but
+ImageMagick package—this is equivalent to a three-line makefile rule, but
referentially transparent. In this example, variable ,(tt [store])
represents the connection to the build daemon. The ,(tt
[package-derivation]) function takes the ,(tt [imagemagick]) package
@@ -367,7 +367,7 @@ file name.])))
(chapter :title [G-Expressions]
:ident "gexps"
- (p [We devised “G-expressions” as a mechanism to address
+ (p [We devised “G-expressions” to address
these shortcomings. This section describes the design and implementation of
G-expressions, as well as extensions we added to address new use
cases.])
@@ -383,7 +383,8 @@ cases.])
:start ";!begin-imagemagick-gexp"
:stop ";!end-imagemagick-gexp")))
- (p [In essence, a gexp bundles an sexp and its inputs
+ (p [G-expressions ,(emph [bind software deployment to staging]).
+A gexp bundles an sexp and its inputs
and outputs, and it can be serialized with ,(tt [/gnu/store]) file
names substituted as needed. We first define two operators:
@@ -513,17 +514,7 @@ as illustrated by Figure ,(ref :figure
"fig-gexp-hygiene"). The
implementation is similar to MetaScheme ,(ref :bib
'kiselyov2008:metascheme) and to that described by Rhiger ,(ref :bib
'rhiger2012:hygienic), with caveats discussed in ,(numref :text
-[Section] :ident "limitations"). Unlike the examples usually given in
-the literature, identifiers must be generated in a ,(emph
-[deterministic]) fashion: if they were not, we would produce different
-derivations at each run, which in turn would trigger full rebuilds of
-the package graph. Thus, instead of relying on ,(tt [gensym]) and
-,(tt [generate-temporaries]), we generate identifiers as a function of
-the hash of
-the input expression and of the lexical nesting level of
-the identifier—these are the two components we can see in the generated
-identifiers of Figure ,(ref
-:figure "fig-gexp-hygiene").])
+[Section] :ident "limitations").])
(item [The second pass ,(emph [collects the escape forms]) (,(tt
[ungexp]) variants) in the input source. The list of escape forms is
needed to construct the list of inputs stored in the gexp
@@ -533,7 +524,19 @@ generation function shown in Figure ,(ref :figure
(item [The third pass ,(emph [substitutes escape forms]) with
references to the corresponding formal arguments of the code
generation function. This leads to the sexp-construction expression
-shown in Figure ,(ref :figure "fig-gexp-expansion").]))])
+shown in Figure ,(ref :figure "fig-gexp-expansion").]))
+
+Unlike the examples usually given in
+the literature, our renaming pass must generate identifiers in a ,(emph
+[deterministic]) fashion: if they were not, we would produce different
+derivations at each run, which in turn would trigger full rebuilds of
+the package graph. Thus, instead of relying on ,(tt [gensym]) and
+,(tt [generate-temporaries]), we generate identifiers as a function of
+the hash of
+the input expression and of the lexical nesting level of
+the identifier—these are the two components we can see in the generated
+identifiers of Figure ,(ref
+:figure "fig-gexp-hygiene").])
(figure
:legend [The gexp compilers for package objects and for
@@ -638,6 +641,7 @@ run anyway. Thus, we write ,(tt [#+imagemagick]) rather
than ,(tt
(p [Guix and GuixSD are used in production by individuals and
organizations to deploy software on laptops, servers, and clusters.
+Deploying GuixSD involves staging hundreds of gexps.
Introducing a new core mechanism in such a project can be both fruitful
and challenging. This section reports on our experience using gexps in
Guix.])
@@ -719,13 +723,16 @@ into the Shepherd configuration file.])
build linux-container)]) module to create Linux ,(emph [containers])
(isolated execution environments), we were able to reuse this
container module within the Shepherd ,(ref :bib
-'courtes2017:servicecontainers). Essentially, the only thing we had
+'courtes2017:servicecontainers). The only thing we had
to do to achieve this was to (1) wrap our ,(tt [start]) gexp in ,(tt
[with-imported-modules]) so that it has access to the container
functionality, and (2) use our start-process-in-container function
lieu of the Shepherd’s own start-process function. This is a good
example of cross-stage code sharing, where the second stage in this
-case is the operating system’s run-time environment.]))
+case is the operating system’s run-time environment.])
+ (p [Another system service implemented in Scheme is GNUÂ mcron,
+which handles scheduled job execution. Its configuration also consists
+of Scheme snippets, which GuixSD OS definitions can include as gexps.]))
(section :title [System Tests]
@@ -742,11 +749,12 @@ verifies that the system running in the VM matches some
of the settings. The
guest OS is instrumented with a Scheme interpreter that evaluates
expressions sent by the host OS—we call it “marionette”.])
(p [Whole-system tests are derivations whose build programs are
-gexps that resemble that of Figure ,(ref :figure "fig-system-test").
-The build program passes ,(tt [run]), the script to spawn the VM, to the
+gexps like that of Figure ,(ref :figure "fig-system-test").
+The build program passes ,(tt [vm]), the script to spawn the VM, to the
instrumentation tool. The test then uses ,(tt [marionette-eval]) to
-call the ,(tt [uname]) function: an ,(emph [additional code stage]) is
-introduced here, this time using ,(tt [quote]). The test matches the
+call the ,(tt [uname]) function in the guest: an ,(emph [additional code
stage]) is
+introduced here, this time using ,(tt [quote]) since gexps are currently
+limited to contexts with a connection to the build daemon. The test matches
the
return value of ,(tt [uname]) against the expected vector, and makes
sure the information corresponds to the various bits declared in ,(tt
[os]), our OS definition.])))
@@ -762,8 +770,7 @@ well-documented approach to the problem ,(ref :bib
implementation handles a single binding construct (,(tt [lambda])) and
MetaScheme handles a couple more constructs, but ours has
to deal with more binding constructs: R6RS defines around ten
-binding constructs (including binding constructs for syntactic
-keywords such as ,(tt [let-syntax])), and Guile adds a couple more.])
+binding constructs, and Guile adds a couple more.])
(p [Hygiene in multi-stage programs relies on identifying binding
constructs. This turns out to be hard to achieve in Scheme because
macros can define ,(emph [new]) bindings constructs.
@@ -773,7 +780,7 @@ macro expander, of course, does this and more already, so
it would be
tempting to reuse it rather than duplicate part of its work. However,
we do not want to macro-expand staged code; instead, macro expansion
should be performed “the normal way”, by the Guile program that
-compiles or evaluate the staged code. Again, this ensures
+compiles or evaluates the staged code. Again, this ensures
reproducibility across Guix installations since we control precisely
the Guile variant used in derivations whereas we do not control the
Guile variant used to evaluate “host-side” code. How we could hook
@@ -807,18 +814,17 @@ in scope at the macro definition point. How to achieve
something
similar with gexp, which lack the big picture that a macro expander has,
remains an open question.])
(p [,(bold [Cross-stage debugging.]) ,(tt [gexp->derivation])
-emits build programs as sexps in a file in ,(tt [/gnu/store]), using
-Scheme ,(tt [write]), which writes the whole sexp as one line. When
+emits build programs as sexps in a file in ,(tt [/gnu/store]). When
an error occurs during the execution of these programs, Guile prints a
backtrace that refers to source code locations ,(emph [inside the
generated code]). What we would like, instead, is for the backtrace
to refer to the location ,(emph [of the gexp itself]). C has ,(tt
[#line]) directives, which code generators insert in generated code to
-,(emph [map]) generated code to its source. Assuming a similar
+,(emph [map]) generated code to its source. If a similar
feature was available in Scheme, it would be unsuitable: moving the
source code where a gexp appears would lead to a different derivation,
-in turn triggering a rebuild of everything that depends on it, which
-is undesirable. Instead we would need a way to pass source code
+in turn triggering a rebuild of everything that depends on it.
+Instead we would need a way to pass source code
mapping information ,(emph [out-of-band]), in a way that does not affect
the derivation that is produced. We are investigating ways to
achieve that.]))
@@ -826,73 +832,65 @@ achieve that.]))
(chapter :title [Related Work]
:ident "related"
- (p [Nix shares the same concerns as Guix: its language must be
+ (p [Like Guix, Nix must be
able to include references to store items (derivation results) in
-generated code while not keeping track of derivations this generated
-code depends on. However, Nix is a single-stage language, only used
-on the “host side”, which describes package derivations and their
-composition, while the “build side” is left to other languages such as
-Bash or Perl. Nix provides a ,(emph [string interpolation]) mechanism
-that allows users to splice arbitrary Nix expressions in strings ,(ref
-:bib 'dolstra2010:nixos); when such an expression refers to a
-derivation, the Nix interpreter records this dependency in the string
+generated code while keeping track of derivations this generated
+code depends on. However, Nix is a single-stage language:
+the “build side” is left to other languages such as
+Bash or Perl. Users can splice arbitrary Nix expressions in
+strings thanks to ,(emph [string interpolation])
+,(ref :bib 'dolstra2010:nixos); when such an expression refers to a
+derivation, the interpreter records this dependency in the string
context and substitutes the reference with the output file name of the
derivation.])
- (p [Because Nix views this generated code as mere strings, it
-does not provide any guarantee on the generated code (notably syntactic
-correctness). The string interpolation syntax (,(tt [${])…,(tt [}])
-sequences), often clashes with the target’s language syntax (e.g.,
+ (p [Nix views staged code as mere strings and thus
+does not provide any guarantee on the generated code.
+The string interpolation syntax (,(tt [${])…,(tt [}])
+sequences) often clashes with the target’s language syntax (e.g.,
Bash uses dollar-brace syntax to reference variables), which can lead
to subtle errors and constrain developers to resort to non-trivial
escaping syntax. The “code-as-string” paradigm also has other side
effects: comments and whitespace in those strings is preserved, and
changing those triggers a rebuild of the derivation, which is
inconvenient.])
- (p [Code staging in Scheme has been studied in the context of
-,(emph [hygienic macros])—i.e., macros that generate
-well-scoped code, without unintended capture of variables ,(ref :bib
'(kohlbecker1986:hygienic dybvig1992:syntax-case))—which later
-made it into the Sixth Report on Scheme (R6RS). MacroML achieves
-something similar in the context of ML, which is statically-typed
-,(ref :bib 'ganz2001:macroml). Both tools allow users to define new
-binding constructs; the macro expander recognizes those bindings
-constructs, which allows it to track bindings and preserve hygiene,
-notably by ,(symbol "alpha")-renaming introduced bindings.])
+ (p [Code staging is often studied in the context of optimized code
+generation ,(ref :bib '(rompf2012:lms wang2002:s2 aktemur2013:shonan)),
+or that of hygienic macros ,(ref :bib '(kohlbecker1986:hygienic
+dybvig1992:syntax-case ganz2001:macroml)). Gexps appear to be the first
+use of staging in the context of software deployment. Apart from LMS,
+which relies on types ,(ref :bib 'rompf2012:lms), most approaches to
+staging rely on syntactic annotations similar to ,(tt [bracket]) or ,(tt
+[gexp]). Scheme’s ,(emph [hygienic macros]), now part of the R5RS and
+R6RS standards, as well as MacroML ,(ref :bib 'ganz2001:macroml) support
+user-defined binding constructs; the macro expander recognizes those
+bindings constructs, which allows it to track bindings and preserve
+hygiene, notably by ,(symbol "alpha")-renaming introduced bindings.])
(p [MetaScheme is a translation of MetaOCaml’s staging
primitives, ,(tt [bracket]), ,(tt [escape]), and ,(tt [lift]) ,(ref
-:bib 'kiselyov2008:metascheme). The beauty of MetaScheme is that it
-extends Scheme through a set of macros and does not necessitate any
-modification to the host Scheme implementation. MetaScheme inspired
-the ,(symbol "alpha")-renaming pass described in ,(numref :text
-[Section] :ident "implementation"). However, it only considers a few
+:bib 'kiselyov2008:metascheme) implemented as a macro that expands to an
+sexp. It considers only a few
core binding constructs and does not address hygiene in the presence
-of user-defined binding constructs (macros). This strategy is
-appropriate in a macro-less language with a fixed set of binding
-constructs like OCaml, but we have seen that languages such as Scheme
-that support user-defined binding constructs create additional
-challenges.
+of user-defined binding constructs introduced by macros.
Rhiger’s work ,(ref :bib 'rhiger2012:hygienic) follows a similar
-approach but chooses to redefine Scheme’s quasiquotation rather than
-introduce new constructs.])
- (p [Staged Scheme, or S,(sup [2]), also improved on Lisp
-quasiquotations by providing bracket, escape, and lift forms separate
+approach but redefines Scheme’s quasiquotation instead of
+introducing new constructs.])
+ (p [Staged Scheme, or S,(sup [2]), provides bracket, escape, and lift
forms separate
from ,(tt [quasiquote]) and ,(tt [unquote]) ,(ref :bib 'wang2002:s2).
-Therefore, as with ,(tt [syntax-case]) and gexps, quoted code has a
-disjoint type as opposed to being a list.
+As with ,(tt [syntax-case]) ,(ref :bib 'dybvig1992:syntax-case) and ,(tt
+[gexp]), staged code has a disjoint type, as opposed to being a list.
S,(sup [2])’s focus is on programs with
-possibly more than two stages, whereas gexp are, in practice, used for
+possibly more than two stages, whereas gexps are, in practice, used for
two-stage programs. The article discusses ,(emph [code regeneration])
at run time; gexps have a similar requirement here: at run time a
-given gexp may be instantiated for different system types, for
+given gexp may be instantiated for different systems, for
instance ,(tt [x86_64-linux]) and ,(tt [i686-linux]).])
- (p [While Guix uses ,(emph [homogeneous]) staging, where the
-source and staged language are the same, Hop instead performs ,(emph
+ (p [Hop performs ,(emph
[heterogenous staging]): the source language is Scheme, but the
-generated code is JavaScript ,(ref :bib 'serrano2010:multitier). Hop
-has a ,(tt [~]) (tilde) form to introduce staged expressions, and a
-,(tt [$]) (dollar) form to escape to unstaged code. Hop involves two
-code stages: server-side code and client-side code. Unlike
-G-expressions, support for tilde forms is built in the Hop compiler,
-and tilde forms are not first-class objects. Hop comes with useful
+generated code is JavaScript ,(ref :bib 'serrano2010:multitier). In Hop
+,(tt [~]) introduces staged client-side expressions and
+,(tt [$]) escapes to unstaged server-side code. Unlike
+gexps, support for ,(tt [~]) forms is built in the Hop compiler,
+and ,(tt [~]) forms are not first-class objects. Hop comes with useful
multi-stage debugging facilities not found in Guix, such as the
ability to display cross-stage stack traces with correct source
location information. It also has a way to express modules in scope for
diff --git a/doc/gpce-2017/staging.sbib b/doc/gpce-2017/staging.sbib
index d4e0fd0..33b277d 100644
--- a/doc/gpce-2017/staging.sbib
+++ b/doc/gpce-2017/staging.sbib
@@ -111,6 +111,42 @@ Evaluation and Semantics-Based Program Manipulation (PEPM
1999)")
(address "New York, NY, USA")
(keywords "hygiene, lexical scope, program generation, quasiquotation,
types"))
+(article rompf2012:lms
+ (author "Tiark Rompf and Martin Odersky")
+ (title "Lightweight Modular Staging: A Pragmatic Approach to Runtime Code
Generation and Compiled DSLs")
+ (journal "Commun. ACM")
+ (issue_date "June 2012")
+ (volume "55")
+ (number "6")
+ (month "June")
+ (year "2012")
+ (issn "0001-0782")
+ (pages "121--130")
+ (numpages "10")
+ (url "http://doi.acm.org/10.1145/2184319.2184345")
+ (doi "10.1145/2184319.2184345")
+ (acmid "2184345")
+ (publisher "ACM")
+ (address "New York, NY, USA"))
+
+(inproceedings aktemur2013:shonan
+ (author "Baris Aktemur, Yukiyoshi Kameyama, Oleg Kiselyov, and Chung-chieh
Shan")
+ (title "Shonan Challenge for Generative Programming: Short Position Paper")
+ (booktitle "Proceedings of the ACM SIGPLAN 2013 Workshop on Partial
Evaluation and Program Manipulation")
+ (series "PEPM '13")
+ (year "2013")
+ (isbn "978-1-4503-1842-6")
+ (location "Rome, Italy")
+ (pages "147--154")
+ (numpages "8")
+ (url "http://doi.acm.org/10.1145/2426890.2426917")
+ (doi "10.1145/2426890.2426917")
+ (acmid "2426917")
+ (publisher "ACM")
+ (address "New York, NY, USA")
+ (keywords "code generation, domain-specific languages, generative
programming, high-performance computing, staging"))
+
+
#|
(defun skr-from-bibtex ()
"Vaguely convert the BibTeX snippets after POINT to SBibTeX."
- 10/14: gpce-2017: Adjust as suggested by the reviewers., (continued)
- 10/14: gpce-2017: Adjust as suggested by the reviewers., Ludovic Courtčs, 2017/09/01
- 14/14: gpce-2017: Adjust ACM boilerplate., Ludovic Courtčs, 2017/09/01
- 05/14: gpce-2017: Shrink., Ludovic Courtčs, 2017/09/01
- 01/14: doc: Add GPCE paper., Ludovic Courtčs, 2017/09/01
- 08/14: gpce-2017: Add an explicit license., Ludovic Courtčs, 2017/09/01
- 09/14: gpce-2017: Fix typo., Ludovic Courtčs, 2017/09/01
- 03/14: gpce-2017: Write some more., Ludovic Courtčs, 2017/09/01
- 04/14: gpce-2017: Fixlets., Ludovic Courtčs, 2017/09/01
- 07/14: gpce-2017: Deanonymize., Ludovic Courtčs, 2017/09/01
- 13/14: gpce-2017: Shrink to 7 pages (10pt font)., Ludovic Courtčs, 2017/09/01
- 06/14: gpce-2017: Tweak some more.,
Ludovic Courtčs <=
- 02/14: gpce-2017: Write, write, write., Ludovic Courtčs, 2017/09/01
- 12/14: gpce-2017: Use acmart v1.47., Ludovic Courtčs, 2017/09/01