[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Encoding handling proposal
From: |
John Meinel |
Subject: |
Re: [Gnu-arch-users] Encoding handling proposal |
Date: |
Sun, 29 Aug 2004 13:13:09 -0500 |
User-agent: |
Mozilla Thunderbird 0.7 (Windows/20040616) |
Marcus Sundman wrote:
D) There should be a filter/plugin architecture to enable a transcoding of
files on input and output based on their content-types and user settings
and user-provided parameters.
E) Utilities such as "diff", "merge" and "annotate" (aka "blame") should be
provided by plugins mapped to content-types.
You definitely have some interesting proposals here. One thing to watch
out for, though... Once we stop having one type of diff (say a xdelta
diff for binary files, and another type for xml files, etc.) how do we
make (or at least help) everyone have all of these programs.
Maybe it's something that happens outside of tla, but one of the nice
things is that tla uses diff, patch, and tar. Which are reasonably
simple programs that everyone is likely to have.
If I *don't* have the xmldiff/xmlpatch program, then it is likely that I
won't be able to checkout a project that used them. As I would doubt the
format for the .patch file will be the same as diff/patch. Also, what
about versions, is xmldiff 1.0 compatible with xmlpatch 2.0? (1 year ago
I checked it in, but now I'm getting it back).
Will there be "blessed" diff/transcode programs? Will it only be the
ones that are bundled inside of tla?
I'm not sure about your statement that files are typically stored in the
"local" encoding. The editors I use (gvim, scintilla) allow me to
specify the encoding. (Admittedly it's mostly latin-1, or utf-8, or utf-16).
So in that situation, when I write out a file, if I try to check it into
arch, then I have to worry about telling arch *not* to use the local
encoding.
I know one of your reasons for wanting encoding to be included is so you
can keep the "official" repository in the official encoding. One way to
do that is to put a person in there. So people are allowed to work on
any repository they want, but only a few people commit to the "official"
one, and they are all knowledgeable about watching out for file encoding
issues.
F) Commit comments and other string attributes should use UTF-8.
G) Filenames and paths should use UTF-8 in the repository, and be transcoded
to the proper encoding when a client accesses the local file system.
This I do agree with. But I seem to recall that Tom's position is people
will probably want the files in local encoding. So that
cat <patch-log>
Will be readable on that system.
I remember a big discussion about this in the past, but I don't think it
was thoroughly resolved.
I think Tom designed hackerlab such that you deal with characters, and
never know how many bytes/codepoints/etc is used underneath.
[...]
D) Since editors and other programmers' tools tend to use whatever the local
system encoding happens to be and a project might include people with
different systems there needs to be some transcoding of most text files.
The contents of files whose "Auto-Filter" attribute is set to "true" will be
stored UTF-8 encoded with U+2028 newlines in the repository and transcoded
from/to the local encoding and local newlines on input/output. The contents
of files whose "Auto-Filter" attribute is set to "false" will not be
transcoded on input/output.
Often the proper local encoding and line breaks can be detected
automatically, but the user should be able to override the auto-detection
in his settings and/or by a parameter to the cm client.
This is where I feel "use the local system encoding" may not be
perfectly true. But it is possible that "Auto-Filter" will handle this.
E) E.g. if two files with the content-type "application/vnd.sun.xml.writer"
are diffed the system should use a diff plugin that knows how to interpret
OpenOffice.org Writer documents. If no such plugin is found it defaults to
the standard diff which regards the files as byte blobs.
This is where the problem with plugins exists. On *my* machine, I have
the application/vnd.sun.xml.writer diff program. You don't have it on
*your* machine. You can no longer read my archive.
If you just treat everything as blobs, at least you can get version 1
and version 10, and create your own diff, and manually patch so that you
get nice context-sensitive diffs.
My personal feeling is that we could do this 2 ways. Have tla generate
the standard diff and the special one. Clients who understand the
special format use it, else you can rely on the standard one. (This was
proposed for xdelta use with pure binary files.)
The other way is to have tla start to incorporate more diff/patch
programs. Keep in mind that adding a new diff/patch effectively changes
the archive format, which is not something to do lightly.
I favor the former, though it doesn't allow for compact archive size.
[...]
Notice that there is no distinction between "text files" and "binary files".
The same system that converts between different text encodings might just
as well be used to convert between different "raw" audio formats. Just add
the appropriate plugin/filter and you're set.
Interesting idea, but I have to wonder if it is what you would really want.
- Marcus Sundman
Overall, I think you raise some good points. There is just a lot of care
with something that could potentially fragment repositories.
John
=:->
signature.asc
Description: OpenPGP digital signature
- [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal,
John Meinel <=
- Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/30
- Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/30
- Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/30
Re: [Gnu-arch-users] Encoding handling proposal, Alexey N. Solofnenko, 2004/08/29