[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0"
From: |
Eric W. Biederman |
Subject: |
Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0" |
Date: |
05 Jan 2004 22:09:45 -0700 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) Emacs/21.2 |
Tom Lord <address@hidden> writes:
> > From: Charles Duffy <address@hidden>
>
> > ...resulting in null-delimited output, suitable for piping into
> > xargs -0 or the like, and thus causing The Right Thing to happen
> > in cases involving filenames with spaces.
>
> > Thoughts?
>
> Perhaps.
>
> What I would most like to avoid longer-term is a half-hearted
> accumulation of features, each intended to make filenames-with-spaces
> support closer, but in actuality not adding up to anything coherent.
>
> The null-character convention used by GNU xargs (and GNU tar as I
> recall) is one strategy for dealing with such filenames -- but I think
> it is a problematic one. For example, other textutils don't
> understand that convention, it looks horrible in a text editor,
> although fine for filenames it can't handle fields that contain the
> null character, etc.
>
> We have other needs within arch for lists (in some cases multi-field
> lists) which can include odd filenames. I'd find it easier to say yes
> to incrementally adding features to arch if we first had an overall
> strategy for fields that can contain non-graphical characters.
>
> So far as I know, the choices basically come down to:
>
> ~ use 0 specially
>
> losses: not terminal or editor friendly,
> can't handle 0 in fields
>
> wins: GNU xargs and GNU tar support it
>
> ~ use a quotation syntax (which also then has to include escapes)
> to delimite fields with some kind of quote mark
>
> losses: whitespace-based field separation fails,
> tools need to translate fields for many operations
>
> wins: pick the string syntax of your favorite scripting language
> terminal/editor-friendly
>
> ~ use an escape syntax without delimiters to map all strings into
> strings of graphical characters
>
> losses: tools need to translate fields for many operations
>
> wins: whitespace-based field separation works,
> terminal/editor-friendly
>
>
> Of these, I think I'm mostly inclined towards the last one (but see
> below).
Then let me suggest the C convention for representing unicode characters.
\u hex-quad
\U hex-quad hex quad
This is generally useful, it is clear that it is an escape sequence,
and it is trivial to verify that it is a complete escape sequence.
Given existing unix conventions it is probably worth implementing the
rest of the standard escapes to be implemented as well.
The command line option -e could be used to go into escape processing
mode, just like it is in echo. The only real problem I can see is if
multiple tools in a chain attempted escape processing, but there is
really no solution to that problem.
> If you look at my full devo tree (as opposed to devo.tla) you can see
> that there's a lonely directory there containing just `unfold.c'.
>
> One direction I think is worth exploring:
>
> ~ making a full plan for arch (changeset format, log file format,
> cached inventory file format ....)
>
> ~ make a coding standards spec for tools in general to handle
> the new conventions
>
> ~ incrementally add stuff to arch according to the plan.
> also incrementally add utils to src/text-utils according
> to the plan
>
> One difficulty is that it's probably worth thinking about Unicode
> issues in the same plan.
Generally things should be exchanged in utf8, but the above lets
you stick to pure ascii which is a subset of most character set.
> Another difficulty is that it's probably worth thinking about
> alternative record syntaxes at the same time -- e.g., a generic syntax
> for multi-line records.
At least until there is a need I don't see the point.
Eric
- Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0",
Eric W. Biederman <=