[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#17343: 24.2; Exponential growth of files using raw-mode
From: |
Eli Zaretskii |
Subject: |
bug#17343: 24.2; Exponential growth of files using raw-mode |
Date: |
Fri, 25 Apr 2014 10:13:29 +0300 |
> Date: Thu, 24 Apr 2014 15:58:41 -0300
> From: Jeremy Barbay <jbarbay@dcc.uchile.cl>
>
> Following the short recipe below shows how a user saving files in "raw
> mode" could end up with files doubling their size each time saved, if
> following emacs' suggestion to save it in raw mode:
>
> * Recipe:
>
> 1. Save the following line in a file "testAccentsMinimal.txt"
>
> Nà¥\206à¤\206\206à¥\206
>
> 2. Repeatedly,
>
> 0) measure the size of the file (wc -c testAccentsMinimal.txt);
> 1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
> 2) insert and delete a character in it (manually);
> 3) save it selecting the suggested raw encoding (manually);
> 4) quit emacs (or force the reload of the file).
>
> * Result:
>
> This should give something akin to the following, where one can see
> the size of the file growing exponentially with the number of savings.
>
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 11 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 19 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 35 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 67 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 131 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 259 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 515 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 1027 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
> 2051 testAccentsMinimal.txt
>
> * (Tentative) Explanation:
>
> - Even though the file is saved in "raw" mode, it is read in another
> mode which prefix the "special" characters with a unicode code.
> - Due to symbols from incompatible encodings, emacs is confused about
> which encoding to use for saving and asks the user about it.
>
> * Why it matters:
>
> - The faulty sequence above occured naturally from copy pasting from
> various webpages (containing accented characters) into the same
> document, and was identified when some files grew too large. -
> Files (e.g. of notes) end up doubling in size at each edition, until
> they fill the memory and/or hard-drive, slow down the system and
> make Emacs complain about the size of the file.
>
> * (Potential) Solutions:
>
> - when saving a file with conflicting encodings, instead of merely
> suggesting the raw encoding, add an option to "clean" the file
> instead of merely save it in raw mode, for instance by projecting
> the file to an encoding by deleting all symbols which are
> incompatible with it.
>
> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(
It's not a bug. When you modify a file, its size can grow, sometimes
a lot, due to a change in encoding. This is intended behavior.
To avoid the problem in the first place, once you discover that the
file was visited with raw-text encoding, use "C-x RET r" to re-visit
the buffer in the encoding you think is correct, and then manually fix
the bad sequences. Then the growth will not happen.