gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [BUG] pika escaping corrupts taglines containing "


From: chth
Subject: Re: [Gnu-arch-users] [BUG] pika escaping corrupts taglines containing "
Date: Thu, 16 Sep 2004 07:00:27 +0200

> 
>     > From: address@hidden
> 
>     > Bugfix idea:
> 
>     > make an arch_cmp_ids (alloc_limits, t_uchar*, t_uchar*) function
>     > and use that everywhere to compare ids.
> 
> Doesn't that violate the abstractions you introduced?
> 
> Escaping/unescaping is supposed to happen at write/read time and,
> internally, str_cmp should be fine.

Excactly that seems to be the bug, tla sees escaped strings at runtime
where they should be unescaped. I didn't yet investigate it further but
you are right and it should be fixed when the taglines are read in.


> If there are historic revisions that contain unescaped " characters,
> then perhaps that could be fixed in the unescaping engine?

Certainly not, the escaping engine should be purely doing its work, if
some application (tla) has to deal with historic compatibility issues
then the fix clearly falls into the domain of the that application.
Adding a kludge to the escaping engine would be worse. The arch_cmp_ids
function would be just the place to handle tla's own compatibility
issues.


> After all: if the input string to unescaping is improperly escaped in
> some historic cases -- that's terrific!   That can be detected and
> treated as a special case for unescaping.
following constraints always hold:

modern_escaped_string = 'foo\"bar'
historic_unescaped_string = 'foo"bar'
assert(escape(historic_unescaped_string) == modern_escaped_string)
assert(unescape(historic_unescaped_string) == historic_unescaped_string)

there is only *ONE* exception, that is if the historic_unescaped_string
contains backslashes. This can not be fixed in a sensible way, since
changesets have no protocol versions and we can't determine if the \ is
an old verbatim character or an escape character.

> (A tla-specific wrapper around the hackerlab unescaping code would be
> fine for this, so long as we are no longer generating improperly
> escaped data.   That will help keep the hackerlab code clean and it is
> only tla that needs this special case.)

if I am right, this is only a runtime problem, the archive format is
always consistent (escaped now). be carefull I didnt checked it in depth
for now.

Please consider my 'drop the smash_non_graphical' in favor of escaping
idea. Replacing non_graphical with _ is horrible ambigous, I really
wonder why there was no problem with that so far.
That feature should be included in the next archive/changeset protocol
version imo.


        Christian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]