coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/4] cut: Optionally treat multiple consecutive delimiters as


From: Dragan Simic
Subject: Re: [PATCH 4/4] cut: Optionally treat multiple consecutive delimiters as one
Date: Tue, 15 Aug 2023 12:22:57 +0200

On 2023-08-10 17:05, Dragan Simic wrote:
On 2023-08-01 20:37, Dragan Simic wrote:
On 2023-08-01 16:42, Pádraig Brady wrote:
On 01/08/2023 10:07, Dragan Simic wrote:
Add new command-line option and the required logic that allow multiple consecutive delimiters to be treated as a single delimiter. Of course,
this option is valid only with the cut's field mode.

This new feature should make cut much more usable in various real-world applications, some of which are already mentioned in the gotchas. For example, merging the consecutive delimiters is very useful when cut is
used to process the outputs of various commands.

Add a whole battery of new cut tests, which cover this new feature, and add more tests for the related already existing features, to make sure
no regressions are introduced.

While there, clean up the comments and the whitespace in the cut tests
a bit, to make them slightly more readable.

Thanks for the patch.
I wonder whether a --empty-fields={ignore,suppress} is a more general interface.

I wonder would it be a more complex approach, and more importantly,
less intuitive?  Quite frankly, I think it's easier to visualize the
empty space. or the delimiters as a more general approach, becoming
"squeezed".  I think that visualizing the empty fields is harder,
especially when the delimiter is a whitespace character.

This overlaps somewhat with the -w option in FreeBSD's cut,
which merges runs of whitespace, and which I was also considering adding.

After thinking a bit about it, how about having both "-m", from the
patch I submitted, and "-w", which would behave differently than the
FreeBSD's "-w"?  Please, allow me to explain.

More specifically, our "-w" would simply "squeeze" all the whitespace
in the input without forcing the delimiter to be whitespace.  The
"squeezing" would produce a whitespace character in the input, instead
of whatever got "squeezed" there.  That would be either the whitespace
character specified as an optional value for the "-w" option, or it
may by default produce a space wherever only spaces were "squeezed",
or a tab wherever the "squeezed" whitespace contained at least one
tab.

With both "-m" and "-w" options in place we'd end up with a quite
versatile cut, which would cover what FreeBSD's cut does, and be able
to do more.  I'd be willing to implement the "-w" option as well.

Just checking, any further thoughts on this approach?

This feature for cut has been hoped for more than a few times, here are a few examples:

- https://stackoverflow.com/questions/21322968/does-cut-support-multiple-spaces-as-the-delimiter - https://stackoverflow.com/questions/7142735/how-to-specify-more-spaces-for-the-delimiter-using-cut - https://unix.stackexchange.com/questions/109835/how-do-i-use-cut-to-separate-by-multiple-whitespace - https://unix.stackexchange.com/questions/606639/why-does-cut-d-not-work-with-space-in-this-case - https://unix.stackexchange.com/questions/387544/cut-with-2-character-delimiter - https://stackoverflow.com/questions/25447324/how-to-use-cut-with-multiple-character-delimiter-in-unix

I'd really appreciate if we could discuss this further.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]