[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] ob-shell-test, test-ob-shell and introduction
From: |
Matt |
Subject: |
Re: [PATCH] ob-shell-test, test-ob-shell and introduction |
Date: |
Sat, 01 Jan 2022 23:32:00 -0500 |
User-agent: |
Zoho Mail |
Apologies for the book. I've been sitting on this stuff for two months and am
wondering how to proceed.
IANAL but AFAIK/CT, my contract contains an exception for making contributions
to projects like Org. I've gotten confirmation from my manager and by HR.
However, until the CEO signs the FSF disclaimer, I can't officially contribute.
I'm confident that I can publish changes (e.g. to a personal website); the FSF
just can't accept my changes (yet).
I could start working on ob-shell.el separately now and publish that myself.
It's not clear how I would then contribute those changes back once I'm cleared
by the FSF. I'm inclined towards some refactoring and I'm not sure how I could
break that down in such a way that if it took two more months to get the
copyright stuff worked out that anyone could make sense of the changes. I
would much prefer to gradually submit patches, discuss them, and then
eventually have them merged in turn. If I have a heap of changes elsewhere, I
feel like it would be harder to have a conversion about them.
Regardless, I think I should define test cases first. If those are considered
valid, then any refactoring would be moot if they pass, right? With (agreed
upon) test cases, it shouldn't matter if I refactor as long as functionality
remains the same?
Overall, what advice do you have?
It looks to me like ob-shell.el could use some love in other respects besides
async evaluation. I've detailed where below, partly for my own organization
and partly for posterity, but mainly because this isn't my house, so to speak,
and I don't want to barge in and start rearranging the furniture and eating
whatever's in the fridge. Or, is it like Worg in that once I have the keys I
can drive where I like, so long as there're no crashes?
I'm interested in people's thoughts on these notes on ob-shell.el:
* Tests
There are several code paths, like shebang, cmdline, and basic execution, which
don't always have something to do with one another in the code. Having tests
would be really helpful to make sure everything jives. Before doing anything
with the code base, I intend to create tests for all the functionality.
* 2D-array
I documented two options for the =:var= header[fn:1]. The ob-shell.el code
actually defines a third option for 2D-arrays. I couldn't get it to work. It
was one of several things not documented anywhere, at least as far as I could
find, and which I had to figure out straight from the code. Between not being
great at shell scripting and having a hard time deciphering that ob-shell.el
code, I'm not sure 2D-arrays are actually fully or correctly implemented.
* M-up behavior <<M-up>>
The =org-babel-load-session:shell= code path only works when M-up is used on a
code block[fn:2]. Is M-up actually documented anywhere? Furthermore, the
=org-babel-load-session:shell= only works for the "shell" language, which is
not actually a "proper" shell (i.e. it's not listed in
=org-babel-shell-names=). The M-up defaults to using $ESHELL or shell-file-name
through the =shell= command.
For example, try calling M-up on these:
#+comment: (opaquely) calls the system shell
#+begin_src shell :session my-shell
echo "hello, world!" #+end_src
#+comment: fails
#+begin_src sh :session my-sh
echo "hello, world!" #+end_src
#+comment: fails
#+begin_src bash :session my-bash
echo "hello, world!" #+end_src
To fix this, there needs to be an =org-babel-load-session:<lang>= for each
language in =org-babel-shell-names=. This would probably make the most sense
in =org-babel-shell-initialize=. However, that function
[[org-babel-shell-initialize][needs attention]].
* Refactoring <<refactoring>>
The ob-shell.el code appears inconsistent to me. Clearly, this is somewhat
subjective. I've tried to give a rationale for each point to make it less so.
My goal is to be maintainer of ob-shell.el, but that's not forever. These
things were an obstacle for me and my aim is to remove them for the next person.
** =org-babel-shell-initialize= <<org-babel-shell-initialize>>
As alluded to elsewhere, =org-babel-shell-initialize= doesn't appear to adhere
to the (elisp) Coding Conventions,
#+begin_quote
• Constructs that define a function or variable should be macros, not
functions, and their names should start with ‘define-’. The macro should
receive the name to be defined as the first argument. That will help various
tools find the definition automatically. Avoid constructing the names in the
macro itself, since that would confuse these tools. #+end_quote
The =org-babel-shell-initialize= function defines =org-babel-execute:<lang>=,
=org-babel-variable-assignments:<lang>=, and
=org-babel-default-header-args:<lang>= for the "languages" in
=org-babel-shell-names=. As it stands, that =org-babel-shell-initialize= is a
function does no harm (aside from being confusing by way of straying from
convention). However, if the [[M-up][M-up]] issue is to be resolved, it seems
to me that =org-babel-shell-initialize= should be updated to match the
convention (i.e. be a macro).
** Organization
Definitions are introduced in different orders. For example, the
=org-babel-shell-initialize= function whose sole purpose is to work with
=org-babel-shell-names= is defined before the reader knows what
=org-babel-shell-names= is. Later, this pattern of defining the component
pieces after they're used is reversed. For example,
=org-babel-load-session:shell= relies on =org-babel-prep-session:shell= which
is defined first. I find defining terms before they're used makes a document
more easy to comprehend than otherwise. I want to reorder the definitions.
Similarly, some functions could use breaking out. The most important is
probably =org-babel-sh-evaluate= which handles the various header arguments.
There are various paths of execution each requiring separate logic, yet all
live in the one large function. Breaking these out would allow them to have
separate docstrings and, I expect, be easier to understand since the logic of
one would be (lexically) separated from the rest.
Other functionality might be better served by consolidating functions. There's
a lot of fiddly code dedicated to variable assignment. Actually, much of the
ob-shell.el file is related to variable assignment. The assignments are done
using separate functions, yet all apply to the same task. They'll never be used
for anything else. Do they need to be split out? Is there a technical reason?
I don't see one. Does it help comprehension? When split out as they are, I
found it hard to make sense of the context they're used in since they're
defined apart from the logic that uses them (i.e. what header uses them, what
form does the header take, etc.). I think it's worth seeing if there's a
better way to present that part of the code base.
** Naming
The following apply to all shells, not just "sh" and should be updated to be
"shell". This looks like cruft from when ob-shell.el was called ob-sh.el AFAICT.
- =org-babel-sh-evaluate=
- =org-babel-sh-eoe-indicator=
- =org-babel-sh-eoe-output=
- =org-babel--variable-assignments:sh-generic=
- =org-babel-sh-var-to-sh=
- =org-babel-sh-var-to-string=
- =org-babel-sh-initiate-session=
- =org-babel-sh-evaluate=
- =org-babel-sh-strip-weird-long-prompt=
Generally speaking, I find the Org Babel code base tricky to read (especially
at first). I spent a good deal of time untangling what lived where and who did
what. I can play along fine now that I'm familiar. However, since understanding
took longer than I think was necessary, I want to detail the pain points as
they have made contributing to Babel harder.
Overall, Babel somewhat breaks the (elisp) Coding Conventions for naming,
#+begin_quote
You should choose a short word to distinguish your program from other Lisp
programs. #+end_quote
I understand the variable/function name prefix to be the file name, typically.
The file name is often the package name, or more precisely the feature provided
by the file[fn:3]. For Org Babel, there's not a solid file-to-prefix relation.
We say "Org Babel", but the main functionality is in ob-core and the various
"ob-" files either extend or implement implied behavior (e.g.
=org-babel-<lang>-execute=). Is the "program" ob-core, ob-lang, or the whole
suite of files? This is a subjective question which the Org Babel "program"
answers with, "the whole suite of files". All components, across all "ob-"
files, bear the name "org-babel-". This is still something that trips me up: is
the current symbol core or not? Who is responsible for what?
I would expect the core API to have its own prefix. The extensions would then
define their code and have a different prefix, "ob-<lang>-". This way,
readers/contributors could open the pertinent ob-* file, see the expected
symbol prefix (e.g. "ob-shell-") and another prefix (e.g. "org-babel-") and be
able to distinguish which is which. As it stands, ob-core.el could be renamed
to org-babel.el or the "org-babel-" prefix could be changed to "ob-core-".
Another possible solution, or a stopgap, would be to have a document detailing
the Org Babel API[fn:4].
* Process interaction
Emacs has several different ways of interacting with processes. The ob-shell.el
code uses a few of them. Since async is another way to interact with a process,
a single process pattern could be used. The goal would be to make each of the
different functionalities provided by ob-shell.el have a similar
implementation. The expectation is that this would benefit maintenance.
* Footnotes
[fn:1]
https://orgmode.org/worg/org-contrib/babel/languages/ob-doc-shell.html#orgfa6b7c5
[fn:2] M-up is bound to =org-metaup-hook= and
=ob-core:org-babel-load-in-session= by default.
[fn:3] It's not clear to me if there's a technical definition for an
Emacs package.
[fn:4] I may extend my personal notes into a document detailing the
Org API. http://excalamus.com/2021-11-03-org_babel.html
- Re: [PATCH] ob-shell-test, test-ob-shell and introduction,
Matt <=