[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: website: Add draft about the reproducible video p
From: |
Ludovic Courtčs |
Subject: |
branch master updated: website: Add draft about the reproducible video pipeline. |
Date: |
Tue, 08 Jun 2021 16:31:07 -0400 |
This is an automated email from the git hooks/post-receive script.
civodul pushed a commit to branch master
in repository guix-artwork.
The following commit(s) were added to refs/heads/master by this push:
new cc08fed website: Add draft about the reproducible video pipeline.
cc08fed is described below
commit cc08fedce0d2fc542a26ae2719ea8304997427e4
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Tue Jun 8 22:28:29 2021 +0200
website: Add draft about the reproducible video pipeline.
* website/drafts/video-pipeline.md,
website/static/blog/img/2021-video-tv-screen.png: New file.
---
website/drafts/video-pipeline.md | 360 +++++++++++++++++++++++
website/static/blog/img/2021-video-tv-screen.png | Bin 0 -> 620280 bytes
2 files changed, 360 insertions(+)
diff --git a/website/drafts/video-pipeline.md b/website/drafts/video-pipeline.md
new file mode 100644
index 0000000..4061cfd
--- /dev/null
+++ b/website/drafts/video-pipeline.md
@@ -0,0 +1,360 @@
+title: Automated and reproducible video pipelines
+author: Ludovic Courtès
+tags: Scheme API, Reproducibility, Talks
+date: 2021-06-10 12:00:00
+---
+
+Last week, [we at Guix-HPC](https://hpc.guix.info) published [videos of
+a workshop on reproducible software
+environments](https://hpc.guix.info/events/2021/atelier-reproductibilit%C3%A9-environnements/)
+we organized on-line. The videos are well worth watching—especially if
+you’re into reproducible research, and especially if you speak French or
+want to practice. This post, though, is more of a meta-post: it’s about
+how we processed these videos. “A workshop on reproducibility _ought to
+have_ a reproducible video pipeline”, we thought. So this is what we
+[did](https://gitlab.inria.fr/guix-hpc/website/-/blob/master/doc/atelier-reproductibilit%C3%A9/render-videos.scm)!
+
+# From BigBlueButton to WebM
+
+Over the last year and half, perhaps you had the “opportunity” to
+participate in an on-line conference, or even to organize one. If so,
+chances are that you already know
+[BigBlueButton](https://bigbluebutton.org/) (BBB), the free software
+video conferencing suite initially designed for on-line teaching. In a
+nutshell, it allows participants to chat (audio, video, and keyboard),
+and speakers can share their screen or a PDF slide deck. Organizers can
+also record the session.
+
+BBB then creates a link to recorded sessions with a custom JavaScript
+player that replays everything: typed chat, audio and video (webcams),
+shared screens, and slide decks. This BBB replay a bit too rough though
+and often not the thing you’d like to publish after the conference.
+Instead, you’d rather do a bit of editing: adjusting the start and end
+time of each talk, removing live chat from what’s displayed (which
+allows you to remove info that personally identifies participants,
+too!), and so forth. Turns out this kind of post-processing is a bit of
+work, primarily because BBB does “the right thing” of recording each
+stream separately, in the most appropriate form: webcam and screen
+shares are recorded as separate videos, chat is recorded as text with
+timings, slide decks is recorded as a bunch of PNGs plus timings, and
+then there’s a bunch of XML files with metadata putting it all together.
+
+Anyway, with a bit of searching, we quickly found the handy
+[bbb-render](https://github.com/plugorgau/bbb-render) tool, which can
+first
+[download](https://github.com/plugorgau/bbb-render/blob/master/download.py)
+all these files and then
+[assemble](https://github.com/plugorgau/bbb-render/blob/master/make-xges.py)
+them using the Python interface to the [GStreamer Editing Services
+(GES)](https://gstreamer.freedesktop.org/documentation/gst-editing-services/index.html).
+Good thing: we don’t have to figure out all these things; we “just” have
+to run these two scripts in an environment with the right dependencies.
+And guess what: we know of a great tool to control execution
+environments!
+
+# A “deployment-aware Makefile”
+
+So we have a process that takes input files—those PNGs, videos, and XML
+files—and produces output files—WebM video files. As developers we
+immediately recognize a pattern and the timeless tool to deal with it:
+[`make`](https://www.gnu.org/software/make). The web already seems to
+contain countless BBB post-processing makefiles (and shell scripts,
+too). We were going to contribute to this while we suddenly realized
+that we know of _another_ great tool to express such processes: Guix!
+Bonus: while a makefile would address just the tip of the
+iceberg—running bbb-render—Guix can also take care of the tedious task
+of deploying the _right_ environment to run bbb-render in.
+
+What we did was to write some sort of a _deployment-aware makefile_.
+It’s still a relatively unconventional way to use Guix, but one that’s
+very convenient. We’re talking about videos, but really, you could use
+the same approach for any kind of processing graph where you’d be
+tempted to just use `make`.
+
+The end result here is a [Guix
+file](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm)
+that returns a _manifest_—a list of videos to “build”. You can build
+the videos with:
+
+```
+guix build -m render-videos.scm
+```
+
+Overall, the file defines a bunch of functions (_procedures_ in
+traditional Scheme parlance), each of which takes input files and
+produces output files. More accurately, these functions returns objects
+that _describe_ how to build their output from the input files—similar
+to how a [makefile
+rule](https://www.gnu.org/software/make/manual/html_node/Rule-Introduction.html)
+describes how to build its target(s) from its prerequisite(s). (The
+reader familiar with functional programming may recognize a monad here,
+and indeed, those build descriptions can be thought of as monadic values
+in a hypothetical “Guix build” monad; technically though, they’re
+regular Scheme values.)
+
+Let’s take a guided tour of this 300-line file.
+
+# Rendering
+
+The [first
+step](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L23-75)
+in this file describes where bbb-render can be found and how to run it
+to produce a GES “project” file, which we’ll use later to render the
+video:
+
+```scheme
+(define bbb-render
+ (origin
+ (method git-fetch)
+ (uri (git-reference (url "https://github.com/plugorgau/bbb-render")
+ (commit "a3c10518aedc1bd9e2b71a4af54903adf1d972e5")))
+ (file-name "bbb-render-checkout")
+ (sha256
+ (base32 "1sf99xp334aa0qgp99byvh8k39kc88al8l2wy77zx7fyvknxjy98"))))
+
+(define rendering-profile
+ (profile
+ (content (specifications->manifest
+ '("gstreamer" "gst-editing-services" "gobject-introspection"
+ "gst-plugins-base" "gst-plugins-good"
+ "python-wrapper" "python-pygobject" "python-intervaltree")))))
+
+(define* (video-ges-project bbb-data start end
+ #:key (webcam-size 25))
+ "Return a GStreamer Editing Services (GES) project for the video,
+starting at START seconds and ending at END seconds. BBB-DATA is the raw
+BigBlueButton directory as fetched by bbb-render's 'download.py' script.
+WEBCAM-SIZE is the percentage of the screen occupied by the webcam."
+ (computed-file "video.ges"
+ (with-extensions (list (specification->package
"guile-gcrypt"))
+ (with-imported-modules (source-module-closure
+ '((guix build utils)
+ (guix profiles)))
+ #~(begin
+ (use-modules (guix build utils) (guix profiles)
+ (guix search-paths) (ice-9 match))
+
+ (define search-paths
+ (profile-search-paths #+rendering-profile))
+
+ (for-each (match-lambda
+ ((spec . value)
+ (setenv
+ (search-path-specification-variable
+ spec)
+ value)))
+ search-paths)
+
+ (invoke "python"
+ #+(file-append bbb-render "/make-xges.py")
+ #+bbb-data #$output
+ "--start" #$(number->string start)
+ "--end" #$(number->string end)
+ "--webcam-size"
+ #$(number->string webcam-size)))))))
+```
+
+First it defines the source code location of bbb-render as an
+[“origin”](https://guix.gnu.org/manual/en/html_node/origin-Reference.html).
+Second, it defines `rendering-profile` as a
+[“profile”](https://guix.gnu.org/manual/en/html_node/Getting-Started.html#index-profile)
+containing all the packages needed to run bbb-render’s `make-xges.py`
+script.
+
+Last, it defines `video-ges-project` as a function that takes the BBB
+raw data, a start and end time, and produces a `video.ges` file. There
+are three key elements here:
+
+ 1.
[`computed-file`](https://guix.gnu.org/manual/en/html_node/G_002dExpressions.html#index-computed_002dfile)
+ is a function to produce a file, `video.ges` in this case, by
+ running the code you give it as its second argument—the *recipe*,
+ in makefile terms.
+ 2. The recipe passed to `computed-file` is a
+
[_G-expression_](https://guix.gnu.org/manual/en/html_node/G_002dExpressions.html)
+ (or “gexp”), introduced by this fancy `#~` (hash tilde) notation.
+ G-expressions are a way to _stage_ code, to mark it for eventual
+ execution. Indeed, that code will only be executed if and when we
+ run `guix build` (without `--dry-run`), and only if the result is
+ not already in [the
+ store](https://guix.gnu.org/manual/en/html_node/The-Store.html).
+ 3. The gexp refers to `rendering-profile`, to `bbb-render`, to
+ `bbb-data` and so on by _escaping_ with the `#+` or `#$` syntax
+ (they’re equivalent, unless doing cross-compilation). During
+ build, these reference items in the store, such as
+ `/gnu/store/…-bbb-render`, which is itself the result of “building”
+ the origin we’ve seen above. The `#$output` reference corresponds
+ to the build result of this `computed-file`, the complete file name
+ of `video.ges` under `/gnu/store`.
+
+Woow, that’s quite a lot already! Of course, this real-world example is
+more intimidating than the toy examples you’d find in the manual, but
+really, pretty much everything’s there. Let’s see in more detail at
+what’s inside this gexp.
+
+The gexp first imports a bunch of helper modules with [build
+utilities](https://guix.gnu.org/manual/en/html_node/Build-Utilities.html)
+and tools to manipulate profiles and search path environment variables.
+The `for-each` call iterates over search path environment
+variables—`PATH`, `PYTHONPATH`, and so on—, setting them so that the
+`python` command is found and so that the needed Python modules are
+found.
+
+The `with-imported-modules` form above indicates that the `(guix build
+utils)` and `(guix profiles)` modules, which are part of Guix, along
+with their dependencies (their _closure_), need to be imported in the
+build environment. What about `with-extensions`? Those `(guix …)`
+module indirectly depend on additional modules, provided by the
+`guile-gcrypt` package, hence this spec.
+
+Next comes the
+[`ges->webm`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L77-106)
+function which, as the name implies, takes a `.ges` file and produces a
+WebM video file by invoking `ges-launch-1.0`. The end result is a video
+containing the recording’s audio, the webcam and screen share (or slide
+deck), but not the chat.
+
+# Opening and closing
+
+We have a WebM video, so we’re pretty much done, right? But… we’d also
+like to have an opening, showing the talk title and the speaker’s name,
+as well as a closing. How do we get that done?
+
+Perhaps a bit of a sledgehammer, but it turns out that we chose to
+produce those still images with LaTeX/Beamer, from
+[these](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/opening.tex)
+[templates](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/closing.tex).
+
+We need again several processing steps:
+
+ 1. We first define the
+
[`latex->pdf`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L140-166)
+ function that takes a template `.tex` file, a speaker name and
+ title. It copies the template, replaces placeholders with the
+ speaker name and title, and runs `pdflatex` to produce the PDF.
+ 2. The
+
[`pdf->bitmap`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L168-175)
+ function takes a PDF and returns a suitably-sized JPEG.
+ 3.
[`image->webm`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L177-200)
+ takes that JPEG and invokes `ffmpeg` to render it as WebM, with the
+ right resolution, frame rate, and audio track.
+
+With that in place, we define a sweet and small function that produces
+the opening WebM file for a given talk:
+
+```scheme
+(define (opening title speaker)
+ (image->webm
+ (pdf->bitmap (latex->pdf (local-file "opening.tex") "opening.pdf"
+ #:title title #:speaker speaker)
+ "opening.jpg")
+ "opening.webm" #:duration 5))
+```
+
+We need one last function,
+[`video-with-opening/closing`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L216-236),
+that given a talk, an opening, and a closing, concatenates them by
+invoking `ffmpeg`.
+
+# Putting it all together
+
+Now we have all the building blocks!
+
+We use
+[`local-file`](https://guix.gnu.org/manual/en/html_node/G_002dExpressions.html#index-local_002dfile)
+to refer to the raw BBB data, taken from disk:
+
+```scheme
+(define raw-bbb-data/monday
+ ;; The raw BigBlueButton data as returned by './download.py URL', where
+ ;; 'download.py' is part of bbb-render.
+ (local-file "bbb-video-data.monday" "bbb-video-data"
+ #:recursive? #t))
+
+(define raw-bbb-data/tuesday
+ (local-file "bbb-video-data.tuesday" "bbb-video-data"
+ #:recursive? #t))
+```
+
+No, the raw data is not in the Git repository (it’s too big and contains
+personally-identifying information about participants), so this assumes
+that there’s a `bbb-video-data.monday` and a `bbb-video-data.tuesday` in
+the same directory as `render-videos.scm`.
+
+For good measure, we define a
+[`<talk>`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L243-251)
+data type:
+
+```scheme
+(define-record-type <talk>
+ (talk title speaker start end cam-size data)
+ talk?
+ (title talk-title)
+ (speaker talk-speaker)
+ (start talk-start) ;start time in seconds
+ (end talk-end) ;end time
+ (cam-size talk-webcam-size) ;percentage used for the webcam
+ (data talk-bbb-data)) ;BigBlueButton data
+```
+
+… such that we can easily [define
+talks](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L263-288),
+along with
+[`talk->video`](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L297-311),
+which takes a talk and return a complete, final video:
+
+```scheme
+(define (talk->video talk)
+ "Given a talk, return a complete video, with opening and closing."
+ (define file-name
+ (string-append (canonicalize-string (talk-speaker talk))
+ ".webm"))
+
+ (let ((raw (ges->webm (video-ges-project (talk-bbb-data talk)
+ (talk-start talk)
+ (talk-end talk)
+ #:webcam-size
+ (talk-webcam-size talk))
+ file-name))
+ (opening (opening (talk-title talk) (talk-speaker talk))))
+ (video-with-opening/closing file-name raw
+ opening closing.webm)))
+```
+
+The [very last
+bit](https://gitlab.inria.fr/guix-hpc/website/-/blob/6977da4618814c790e767618da5cf9ec2cab0742/doc/atelier-reproductibilit%C3%A9/render-videos.scm#L313-319)
+iterates over the talks and returns a manifest containing all the final
+videos. Now we can build the ready-to-be-published videos, all at once:
+
+```
+$ guix build -m render-videos.scm
+[… time passes…]
+/gnu/store/…-emmanuel-agullo.webm
+/gnu/store/…-francois-rue.webm
+…
+```
+
+[Voilà !](https://hpc.guix.info/events/2021/atelier-reproductibilité-environnements/)
+
+![Image of an old TV screen showing a video
opening.](/static/blog/img/2021-video-tv-screen.png)
+
+# Why all the fuss?
+
+OK, maybe you’re thinking “this is just another hackish script to fiddle
+with videos”, and that’s right! But look, this one’s different: it’s
+self-contained, it’s reproducible, and it has the right abstraction
+level. Self-contained is a big thing; it means you can run it and it
+knows what software to deploy, what environment variables to set, and so
+on, for each step of the pipeline. Granted, it could be simplified with
+appropriate high-level interfaces in Guix. But remember: the
+alternative is a makefile (“deployment-unaware”) completed by a `README`
+file giving a vague idea of the dependencies needed. The reproducible
+bit is pretty nice too (especially for a workshop _on_ reproducibility).
+It also means there’s caching: videos or intermediate byproducts already
+in the store don’t need to be recomputed. Last, we have access to a
+general-purpose programming language where we can _build abstractions_,
+such as the `<talk>` data type, that makes the whole thing more pleasant
+to work with and more maintainable.
+
+Hopefully that’ll inspire you to have a reproducible video pipeline for
+your next on-line event, or maybe that’ll inspire you to replace your
+old makefile and shelly habits!
diff --git a/website/static/blog/img/2021-video-tv-screen.png
b/website/static/blog/img/2021-video-tv-screen.png
new file mode 100644
index 0000000..affd637
Binary files /dev/null and b/website/static/blog/img/2021-video-tv-screen.png
differ
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: website: Add draft about the reproducible video pipeline.,
Ludovic Courtčs <=