[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: auto-detection

From: Kenichi Handa
Subject: Re: auto-detection
Date: Fri, 21 Nov 2003 11:06:34 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvhe11d1s7.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:

>>>  I think it would be good when saving a file to automatically verify that
>>>  the coding-system chosen will be correctly auto-detected if read by
>>>  a similarly-configured Emacs.  This is already done w.r.t the
>>>  coding-cookie but not with the auto-detection.
>>  The easy but slow way to implement it is to insert the file
>>  again in a temporary buffer with (let
>>  ((coding-system-for-read 'undecided)) ..), and check which
>>  coding system is detected.  And I think any other methods
>>  are quite difficult to implement.

> That's indeed the problem: there doesn't seem to be any easy way to make
> the test robust and lightweight.

Something like this function is mostly acculate and
lightweight.  It would be better that it also accepts FILE
argument to check auto-coding-alist and
file-coding-system-alist.  But, for the moment, I don't have
a time to work on it further.

(defun coding-system-round-trip-safe-p (coding-system from to &optional string)
  "Check if CODING-SYSTEM is round-trip safe for the region FROM and TO.

The value is non-nil if and only if we can recover the same text
by encoding a text in the region between FROM and TO with
CODING-SYSTEM and decoding the result back with auto-detection.

In the case the value is nil, you can check how it was asctually
detected by the value of `last-coding-system-used'.

If the optional 4th argument STRING is a string, FROM and TO are
indices to STRING defaulting to 0 and length of STRING

The check is done only for the first 10 non-ASCII characters."
  (let ((str "")
        (count 10))
    (if (stringp string)
          (or from (setq from 0))
          (or to (setq to (length string)))
          (while (and (> count 0)
                      (setq from (string-match "[^\000-\177]" string from))
                      (< from to))
            (setq str (concat str (string (aref string from)))
                  from (1+ from)
                  count (1- count))))
        (goto-char from)
        (while (and (> count 0)
                    (re-search-forward "[^\000-\177]" to t))
          (setq str (concat str (string (preceding-char)))
                count (1- count)))))
    (or (= (length str) 0)
        (string= (decode-coding-string
                  (encode-coding-string str coding-system) 'undecided)

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]