emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: docs for insert-file-contents use 'bytes'


From: Ted Zlatanov
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Tue, 30 Sep 2008 08:48:28 -0500
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.60 (gnu/linux)

On Tue, 30 Sep 2008 10:19:26 +0300 Eli Zaretskii <address@hidden> wrote: 

>> From: Ted Zlatanov <address@hidden>
>> Date: Mon, 29 Sep 2008 16:04:13 -0500
>> 
>> This is not a safe operation mode with multibyte sequences; is there a
>> way to DTRT?  I'm specifically thinking about a paged buffer mode where
>> you only see a small portion of the file (for editing large files, as we
>> discussed in another newsgroup a while ago).

EZ> How about this idea: read a bit more than you want, then find safe
EZ> place to end this page-full?

How do I find the next safe position in the byte flow?

>> I don't know if this is the right wording, but it's a pretty essential
>> operation so it should give some warning about this common (nowadays)
>> case.

EZ> Is it really a common case that insert-file-contents is used to read a
EZ> portion of a file?  Where is this used?

I want to use it to implement a paged view of large files.  We discussed
this in emacs-help and you suggested using insert-file-contents IIRC.

Anyhow, the point is the docs don't mention this issue, let's fix that
first.  I mention one possible way to do the code below.

On Tue, 30 Sep 2008 15:06:17 +0900 Miles Bader <address@hidden> wrote: 

MB> Ted Zlatanov <address@hidden> writes:
EZ> Like with any other random bytes, I think: it will produce eight-bit-*
EZ> characters in the buffer.  IOW, you get garbled text.
>> 
>> This is not a safe operation mode with multibyte sequences; is there a
>> way to DTRT?  I'm specifically thinking about a paged buffer mode where
>> you only see a small portion of the file (for editing large files, as we
>> discussed in another newsgroup a while ago).

MB> Why is it "not safe"?  

Because the text will be corrupted if you seek in the middle of a
multibyte sequence, and there's no way to know in advance if a position
is safe without at least some scanning.

MB> How would you do things differently?

I don't know, I'm just saying the docs don't mention the possibility of
corrupted text.  Can we fix that, if possible?  The docs just need to
warn, not solve the issue.

MB> In conjunction with _file_ contents, a byte offset seems certainly the
MB> most natural thing.  An "encoded character offset", for instance, would
MB> be far less efficient, much more complex to implement (and thus
MB> buggier), and harder to use in general.

Agreed.  Still, encoding schemes like UTF-8 are so popular today that
the docs should at least warn about careless seeking to a byte offset.

There could be a insert-file-decoded-contents that seeks to a byte
position and gets the next character at or after that position.  That's
not too hard to implement and it's fast.

Ted





reply via email to

[Prev in Thread] Current Thread [Next in Thread]