gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0"


From: Jan Hudec
Subject: Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0"
Date: Tue, 6 Jan 2004 12:11:30 +0100
User-agent: Mutt/1.5.4i

On Mon, Jan 05, 2004 at 22:09:45 -0700, Eric W. Biederman wrote:
> Tom Lord <address@hidden> writes:
> 
> >     > From: Charles Duffy <address@hidden>
> > 
> >     > ...resulting in null-delimited output, suitable for piping into
> >     > xargs -0 or the like, and thus causing The Right Thing to happen
> >     > in cases involving filenames with spaces.
> > 
> >     > Thoughts?
> > 
> > Perhaps.
> > 
> > What I would most like to avoid longer-term is a half-hearted
> > accumulation of features, each intended to make filenames-with-spaces
> > support closer, but in actuality not adding up to anything coherent.
> > 
> > The null-character convention used by GNU xargs (and GNU tar as I
> > recall) is one strategy for dealing with such filenames -- but I think
> > it is a problematic one.   For example, other textutils don't
> > understand that convention, it looks horrible in a text editor,
> > although fine for filenames it can't handle fields that contain the
> > null character, etc.
> > 
> > We have other needs within arch for lists (in some cases multi-field
> > lists) which can include odd filenames.  I'd find it easier to say yes
> > to incrementally adding features to arch if we first had an overall
> > strategy for fields that can contain non-graphical characters.
> > 
> > So far as I know, the choices basically come down to:
> > 
> > ~ use 0 specially 
> > 
> >   losses: not terminal or editor friendly,
> >           can't handle 0 in fields
> > 
> >   wins: GNU xargs and GNU tar support it
> > 
> > ~ use a quotation syntax (which also then has to include escapes)
> >   to delimite fields with some kind of quote mark
> > 
> >   losses: whitespace-based field separation fails,
> >           tools need to translate fields for many operations
> > 
> >   wins: pick the string syntax of your favorite scripting language
> >         terminal/editor-friendly
> > 
> > ~ use an escape syntax without delimiters to map all strings into
> >   strings of graphical characters
> > 
> >   losses: tools need to translate fields for many operations
> > 
> >   wins: whitespace-based field separation works,
> >         terminal/editor-friendly
> > 
> > 
> > Of these, I think I'm mostly inclined towards the last one (but see
> > below).
> 
> Then let me suggest the C convention for representing unicode characters.
> \u hex-quad
> \U hex-quad hex quad

For ascii characters, old octal syntax (\octal-triplet) would be
preferable however, since most tools understand it...

> This is generally useful, it is clear that it is an escape sequence,
> and it is trivial to verify that it is a complete escape sequence.
> 
> Given existing unix conventions it is probably worth implementing the
> rest of the standard escapes to be implemented as well.
> 
> The command line option -e could be used to go into escape processing
> mode, just like it is in echo.  The only real problem I can see is if
> multiple tools in a chain attempted escape processing, but there is
> really no solution to that problem.
> 
> > If you look at my full devo tree (as opposed to devo.tla) you can see
> > that there's a lonely directory there containing just `unfold.c'.
> > 
> > One direction I think is worth exploring:
> > 
> > ~ making a full plan for arch (changeset format, log file format,
> >   cached inventory file format ....)
> > 
> > ~ make a coding standards spec for tools in general to handle 
> >   the new conventions
> > 
> > ~ incrementally add stuff to arch according to the plan.
> >   also incrementally add utils to src/text-utils according
> >   to the plan
> > 
> > One difficulty is that it's probably worth thinking about Unicode
> > issues in the same plan.
> 
> Generally things should be exchanged in utf8, but the above lets
> you stick to pure ascii which is a subset of most character set.

I don't know of any tool, that would have trouble accepting characters
128-255 and thus accepting any properly utf8-encoded non-ascii unicode
character (though it probably won't be able to convert it to the current
locale). What is much bigger problem is characters 0-32 (control chars
+ space).

> > Another difficulty is that it's probably worth thinking about
> > alternative record syntaxes at the same time -- e.g., a generic syntax
> > for multi-line records.
> 
> At least until there is a need I don't see the point.

Newline is perfectly encodeable as \012. Should be sufficient.

-------------------------------------------------------------------------------
                                                 Jan 'Bulb' Hudec 
<address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]