bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18610: 24.4.50; Specific file causing emacs to segfault upon opening


From: Eli Zaretskii
Subject: bug#18610: 24.4.50; Specific file causing emacs to segfault upon opening
Date: Sun, 05 Oct 2014 19:09:26 +0300

> From: handa@gnu.org (K. Handa)
> Date: Sun, 05 Oct 2014 17:59:45 +0900
> Cc: dmantipov@yandex.ru, maden.ldm@gmail.com, 18610@debbugs.gnu.org
> 
> > However, detect_coding_iso_2022 returns with the 'found' member of its
> > second argument having zero value, which I interpret as meaning that
> > it didn't really find any ISO-2022 sequences.  So the simple patch
> > below fixes this for me.  Kenichi, is this patch OK?
> 
> No.  Even if there's no special ISO-2022 escape sequence, we
> should not reject iso-2022 as a detected coding system.

Can you explain why?  AFAICT, all the other detectors are required to
set some flag in the 'found' field, so why is ISO-2022 special in this
regard?

> And, even if that detection was incorrect, the decoder
> should not produce an invalid byte sequence in a
> buffer/string which leads to Emacs crash.

No argument here.

> The bug is in detect_coding_iso_2022 which doesn't set
> CATEGORY_MASK_ISO_7_ELSE in coding->rejected in this case.

Btw, it would be nice if these masks could be documented so that their
meaning was clear.  I considered the possibility that the flags are
not set correctly, but couldn't test that hypothesis given my
insufficient knowledge of ISO-2022 details and variants.

> I've just installed a fix to trunk.  Could you please try
> the latest version?

It fixes the crash, but I'm not sure the results are what we want.
Emacs 24.3, which also did not crash, would set the
buffer-file-coding-system of the buffer visiting the file to
'undecided', and regarded the \226 characters as 8-bit raw bytes:

   character: \226 (displayed as \226) (codepoint 4194198, #o17777626, #x3fff96)
   ...
   general-category: Cn (Other, Not Assigned)

By contrast, the current trunk sets buffer-file-coding-system to
'latin-1' and thinks this character is a Latin-1 character:

   character: \226 (displayed as \226) (codepoint 150, #o226, #x96)
   preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
   ...
   old-name: START OF GUARDED AREA
   general-category: Cc (Other, Control)

That doesn't sound right to me.

If I force some specific coding system, e.g.

   C-x RET c utf-8 RET C-x C-f FILE RET

then the \226 characters are correctly recognized as 8-bit bytes by
the current trunk (as was the case before your changes).

Could it be that the current trunk fails to recognize the 8-bit bytes
in the file?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]