bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extending Bash brace expansion


From: McGuinness, Brian
Subject: Extending Bash brace expansion
Date: Mon, 09 Feb 2015 23:14:50 -0500

Extended Brace Expansions

The enclosed code implements an extended version of the Bash brace expansion.
The main objectives are:

* Allow plain text and sequences to be used together in a brace expression,
  e.g. file.{7,14,22..30,56,80..84}

* Generalize sequences so they are not limited to integers and single letters.

Unix utilities such as split generate files with multiple letter suffixes. It would be useful to be able to respresent these with beace expansions. Moreover, there are many other useful applications of extended sequences. Extending
sequences to arbitrary strings is not difficult.

Incrementing Arbitrary Strings

Consider the process of incrementing an integer. The last digit is incremented until it reaches 9. At that point it wraps around to 0 and we carry the 1: if there is only one digit in the integer, we prefix a 1; otherwise we increment the next digit to the left, following the same procedure until there are no more
carries left to perform.

Letters of the alphabet have a definite sequence just as digits do. So we can increment them in much the same way: we increment the last letter until we reach "z", then we wrap around to "a" and carry the 1. The difference is that letters have no equivalent to zero, so when we perform a carry by prefixing a letter, we prefix the first letter ("a") instead of the second ("b"), whereas with digits we would skip the first digit (0) and prefix the second (1) since leading zeros
have no value.

I choose to distinguish between uppercase and lowercase letters, since brace expansions are often used to generate file names and Unix distinguishes between case in file names. But uppercase letters are incremented in the same way as
lowercase letters.

Characters other than letters and digits have no unique ordering. So we just don't increment them. Instead, we always carry the 1. If we need to prefix a character, we just copy the current character. So "-" would be followed by
"--", and so on.

When we expand a range, the lower and upper bounds are specified. Normally, the upper bound will be a string as long as, or longer than, the lower bound. So when we need to perform a carry by prefixing a character, we determine what position it is in (how many columns from the right it is), and then look at the upper bound and see what character it has in that position. If the upper bound has a digit in that position we prefix a 1, if the upper bound has a letter in that position we prefix an "a", and if neither case is true we copy the character from the upper bound. But if the upper bound has no character in that position, we look at the leftmost character in the current value instead.

For example, suppose that we want to expand {8..b-5}. We generate "8" and "9", and then we have to carry. Since there is only one character in our current value, we have to prefix a character, which will be placed in the second column from the right. Looking at the upper bound, we see that it has a "-" in that position. This is neither a digit nor a letter, so we copy it and continue on, generating "-0", "-1", "-2", "-3", "-4", "-5", "-6", "-7", "-8", and "-9". Now we have to prefix another character, this time in the third column from the right. The upper bound has a "b" in this position, which is a letter, so we prefix an "a". We generate "a-0", "a-1", "a-2", "a-3", "a-4", "a-5", "a-6", "a-7", "a-8", and "a-9". This time, when we carry we can increment the leftmost character. So we generate "b-0", "b-1", "b-2", "b-3", "b-4", and "b-5". Now
we've hit the upper bound, so we're finished.

Comparing Strings

Leading zeros have no value. So when we compare two strings we start out by scanning past any leading zeros, then compare the remaining strings. That way, numeric sequences will work as expected, e.g. "00" is less than "9" even though
"00" is a longer string.

If one string is longer than another it is deemed to be greater. But if the two strings are of equal length, we compare them lexically, e.g. via strcmp(),
strcoll(), or some other such.

Examples

{0.1..2.3} generates the sequence

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3

{q..af} generates the sequence

  q r s t u v w x y z aa ab ac ad ae af

/usr/{ucb/{ex,edit},lib/{ex?.?*,how_ex}} still generates the expected result:

/usr/ucb/ex /usr/ucb/edit /usr/lib/ex?.?* /usr/lib/how_ex

data.{4,7..10,22,23,35..37} generates the sequence

data.4 data.7 data.8 data.9 data.10 data.22 data.23 data.35 data.36 data.37

--- Brian

Attachment: brace_expander.c
Description: Text Data

Attachment: brace_expander.h
Description: Text Data

Attachment: build
Description: Binary data

Attachment: errors.c
Description: Text Data

Attachment: errors.h
Description: Text Data

Attachment: expansion_data.c
Description: Text Data

Attachment: expansion_data.h
Description: Text Data

Attachment: run_test
Description: Binary data

Attachment: stack.c
Description: Text Data

Attachment: stack.h
Description: Text Data

Attachment: string_list.c
Description: Text Data

Attachment: string_list.h
Description: Text Data

Attachment: strings.c
Description: Text Data

Attachment: strings.h
Description: Text Data

Attachment: symbols.h
Description: Text Data

Attachment: test.c
Description: Text Data

Attachment: types.h
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]