[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] parser handling of \^A
From: |
Grisha Levit |
Subject: |
[PATCH] parser handling of \^A |
Date: |
Thu, 12 Oct 2023 21:36:48 -0700 |
There are some issues with parser output when the input has an unquoted
backslash followed by a raw ^A character:
$ bash -c $'echo ${_+\\\1}' |& cat -v
bash: line 1: bad substitution: no closing `}' in ${_+\^A^A}
$ bash -c $'[[ \1 =~ (\\\1) ]]' |& cat -v
bash: line 1: [[: invalid regular expression `(^A\)': parentheses not balanced
$ bash -c $'echo $\'\\\1\'' | cat -v
\^A^A
$ bash -c $'echo "\\\1\177"' | cat -v
\^A^A^?
The main loop in read_token_word usually ^A-escapes ^A and ^?, but not
when they are escaped by a backslash -- the char pairs \^A and \^? are
stored as is. OTOH, the loop in parse_matched_pair special-cases \^A,
outputting \^A^A.
However, when expand_word_internal subsequently encounters this \^A^A,
the backslash escapes the first ^A, and the second ^A escapes whatever
character happens to follow.
AFAICT, everything works fine if parse_matched_pair just stores \^A as
is (as long as dequote_string doesn't drop trailing ^A's). This seems
a lot easier than the alternative of teaching the subst.c functions to
handle \^A^A and \^A^? specially but maybe there's some other approach.
---
diff --git a/parse.y b/parse.y
index 3e5b814f..dd35ea76 100644
--- a/parse.y
+++ b/parse.y
@@ -3834,9 +3834,7 @@ parse_matched_pair (int qc, int open, int close,
size_t *lenp, int flags)
continue;
}
- RESIZE_MALLOCED_BUFFER (ret, retind, 2, retsize, 64);
- if MBTEST(ch == CTLESC)
- ret[retind++] = CTLESC;
+ RESIZE_MALLOCED_BUFFER (ret, retind, 1, retsize, 64);
ret[retind++] = ch;
continue;
}
diff --git a/subst.c b/subst.c
index 89ec6eb7..f075380c 100644
--- a/subst.c
+++ b/subst.c
@@ -4810,14 +4810,6 @@ dequote_string (const char *string)
return (result);
}
- /* A string consisting of only a single CTLESC should pass through
unchanged */
- if (string[0] == CTLESC && string[1] == 0)
- {
- result[0] = CTLESC;
- result[1] = '\0';
- return (result);
- }
-
/* If no character in the string can be quoted, don't bother examining
each character. Just return a copy of the string passed to us. */
if (strchr (string, CTLESC) == NULL)
@@ -4827,12 +4819,8 @@ dequote_string (const char *string)
s = (char *)string;
while (*s)
{
- if (*s == CTLESC)
- {
- s++;
- if (*s == '\0')
- break;
- }
+ if (*s == CTLESC && s[1])
+ s++;
COPY_CHAR_P (t, s, send);
}
- [PATCH] parser handling of \^A,
Grisha Levit <=