[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23701: Decoding broken by sequence ESC comma
From: |
Taylan Ulrich Bayırlı/Kammer |
Subject: |
bug#23701: Decoding broken by sequence ESC comma |
Date: |
Mon, 06 Jun 2016 01:35:26 +0300 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Andreas Schwab <schwab@linux-m68k.org> writes:
> taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") writes:
>
>> The occurrence of the sequence of the bytes 1B 2C (ASCII ESC and comma)
>> messes up Emacs's decoding of an ASCII file from that point on.
>
> This is one of the ISO 2022 escape sequences.
>
>> This doesn't happen in any other text-displaying application I tested,
>> including a terminal emulator (given it's an escape sequence and all).
>
> None of them know about ISO 2022, apparently.
>
> Andreas.
Hmm, OK. I figure it's an obscure use-case, but perhaps so is its
accidental(?) occurrence in a text file.
On the meanwhile I found out C-x RET r us-ascii RET fixes my issue.
The file in which I encountered this (mailing list archives of R6RS)
actually contains the sequences escape, comma, capital-a, and that in
places where these seem intentionally positioned, such as between
sentences. I wonder what this is about. Whatever it means, if this is
more common than uses of that ISO 2022 sequence, that would be a problem
I suppose. Here's the relevant snippet from the file, with literal ESC
characters changed to ^[:
> | On Fri, Sep 11, 2009 at 10:46 PM, Aubrey Jaffer<agj at alum.mit.edu> wrote:
> | > ^[,A | Date: Wed, 9 Sep 2009 00:30:18 -0400
> | > ^[,A | From: Lynn Winebarger <owinebar at gmail.com>
> | > ^[,A |
> | > ^[,A | ...
> | > ^[,A | The advent of hygeinic macros marked the end of the era in which
> | > ^[,A | symbols could be equated with identifiers. ^[,A Identifiers have
> a lot
> | > ^[,A | more information in them.
> | >
> | > The SLIB implementations of syntactic-closures, syntax-case,
I just grepped all the files and the archives seem to contain a few more
files in which the ESC , sequence appears, such as:
G^[,Avdel vs Godel vs Goedel
^[,Hylem vs ^[,Hylen vs the same with proper vowel symbols
... I know that there is a single bit sequence that specifies
strings, and it's not ^[,A+;^[(Bs; I know that there's another
single sequence that specifies ellipsis, and it's not ^[$,1s&^[(B
...
These aren't ISO-8859-1 either. I don't know what encoding they're
supposed to be in. Could also be a mail server breaking things.
All in all, I'm just throwing this out there; I have no idea how
commonly used ISO 2022 is, but handling it by default certainly breaks
some files that contain ESC , either by accident or with some other
purpose. Maybe it should not be handled by default.
Taylan