gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0"


From: Eric W. Biederman
Subject: Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0"
Date: 05 Jan 2004 22:09:45 -0700
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.2

Tom Lord <address@hidden> writes:

>     > From: Charles Duffy <address@hidden>
> 
>     > ...resulting in null-delimited output, suitable for piping into
>     > xargs -0 or the like, and thus causing The Right Thing to happen
>     > in cases involving filenames with spaces.
> 
>     > Thoughts?
> 
> Perhaps.
> 
> What I would most like to avoid longer-term is a half-hearted
> accumulation of features, each intended to make filenames-with-spaces
> support closer, but in actuality not adding up to anything coherent.
> 
> The null-character convention used by GNU xargs (and GNU tar as I
> recall) is one strategy for dealing with such filenames -- but I think
> it is a problematic one.   For example, other textutils don't
> understand that convention, it looks horrible in a text editor,
> although fine for filenames it can't handle fields that contain the
> null character, etc.
> 
> We have other needs within arch for lists (in some cases multi-field
> lists) which can include odd filenames.  I'd find it easier to say yes
> to incrementally adding features to arch if we first had an overall
> strategy for fields that can contain non-graphical characters.
> 
> So far as I know, the choices basically come down to:
> 
> ~ use 0 specially 
> 
>   losses: not terminal or editor friendly,
>           can't handle 0 in fields
> 
>   wins: GNU xargs and GNU tar support it
> 
> ~ use a quotation syntax (which also then has to include escapes)
>   to delimite fields with some kind of quote mark
> 
>   losses: whitespace-based field separation fails,
>           tools need to translate fields for many operations
> 
>   wins: pick the string syntax of your favorite scripting language
>         terminal/editor-friendly
> 
> ~ use an escape syntax without delimiters to map all strings into
>   strings of graphical characters
> 
>   losses: tools need to translate fields for many operations
> 
>   wins: whitespace-based field separation works,
>         terminal/editor-friendly
> 
> 
> Of these, I think I'm mostly inclined towards the last one (but see
> below).

Then let me suggest the C convention for representing unicode characters.
\u hex-quad
\U hex-quad hex quad

This is generally useful, it is clear that it is an escape sequence,
and it is trivial to verify that it is a complete escape sequence.

Given existing unix conventions it is probably worth implementing the
rest of the standard escapes to be implemented as well.

The command line option -e could be used to go into escape processing
mode, just like it is in echo.  The only real problem I can see is if
multiple tools in a chain attempted escape processing, but there is
really no solution to that problem.

> If you look at my full devo tree (as opposed to devo.tla) you can see
> that there's a lonely directory there containing just `unfold.c'.
> 
> One direction I think is worth exploring:
> 
> ~ making a full plan for arch (changeset format, log file format,
>   cached inventory file format ....)
> 
> ~ make a coding standards spec for tools in general to handle 
>   the new conventions
> 
> ~ incrementally add stuff to arch according to the plan.
>   also incrementally add utils to src/text-utils according
>   to the plan
> 
> One difficulty is that it's probably worth thinking about Unicode
> issues in the same plan.

Generally things should be exchanged in utf8, but the above lets
you stick to pure ascii which is a subset of most character set.
 
> Another difficulty is that it's probably worth thinking about
> alternative record syntaxes at the same time -- e.g., a generic syntax
> for multi-line records.

At least until there is a need I don't see the point.

Eric

 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]