[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: how to scan file for non-ascii chars (eg cut-n-paste from ms-word)
From: |
harven |
Subject: |
Re: how to scan file for non-ascii chars (eg cut-n-paste from ms-word) |
Date: |
Tue, 18 Jan 2011 22:32:23 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) |
dkcombs@panix.com (David Combs) writes:
> FURTHER, and more importantly, how do I *search* for
> one of these funny things, a left-double-quote, say?
> It's so *easy* to just hit C-s "!
You can go to the next non-ascii character using
C-M-s [^[:ascii:]] RET
Repeating C-s after that will recurse through the non-ascii characters.
> You mean do a query-replace on each non-ascii char? How do I
> even know which ones are even *in* some buffer of text?
You can use the next command to list all characters in the buffer together
with their frequencies. The non-ascii one should appear at the end.
(defun frequency ()
"Compute the frequencies for each character in the buffer.
The result appears in another buffer called *frequency*"
(interactive)
(save-excursion
(goto-char (point-min))
(let ((freq (make-hash-table :test 'equal)))
(while (re-search-forward "." nil t)
(puthash (match-string 0)
(1+ (gethash (match-string 0) freq 0))
freq))
(pop-to-buffer "*frequency*")
(erase-buffer)
(maphash
'(lambda (key value)
(insert key " " (number-to-string value) "\n"))
freq))
(sort-numeric-fields -1 (point-min) (point-max))
(reverse-region (point-min) (point-max))
(other-window 1)))
>
> What'd be nice is something that went through the whole
> buffer *once*, doing the "right thing" with each
> non-ascii char.
>
> Do I make any sense? Or do I not really understand?
Yes it makes sense.
Have a look at iso-cvt.el. This package provides commands to handle iso8859-1
characters. You can find there a function called iso-translate-conventions.
This function translates character according to a translation table. I am not
aware of a table giving an ascii translation for all utf-8 characters, so you
will have to make your own, along the lines of
(defvar my-iso-trans-tab
'(("à" "a")
("é" "e")
("ß" "s")
("ñ" "~n"))
"Translation table for translating some character to ascii.
This table is not exhaustive.")
Then, assuming you have executed iso-translate-conventions from iso-cvt.el,
use the following command to translate the selected region.
(defun my-iso-all2ascii (from to &optional buffer)
"Translate to ascii characters.
Translate the region between FROM and TO using the table
`my-iso-trans-tab'.
Optional arg BUFFER is ignored (for use in `format-alist')."
(interactive "*r")
(iso-translate-conventions from to my-iso-trans-tab))
Hope that helps
Message not available