[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problem with Boyer Moore and Greek characters
From: |
Kenichi Handa |
Subject: |
Re: Problem with Boyer Moore and Greek characters |
Date: |
Tue, 7 May 2002 22:35:29 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
Sorry for the late reply on this matter.
Although I don't understand this part of code fully, it
seems that your fix is correct. Richard, what do you think?
Shall I install it (both in HEAD and RC)?
---
Ken'ichi HANDA
handa@etl.go.jp
Thomas Morgan <tlm@pocketmail.com> writes:
> I ran GNU Emacs 21.1.1 (i686-pc-linux-gnu, X toolkit) with the options
> `--q --no-site-file', then typed the following into `*scratch*':
> (search-forward "á½·")
> á½»
> (The first Greek character is an accented iota represented in Emacs by
> the character number 342199, and the second is an accented upsilon
> represented by 342203. I entered them with the input method
> `greek-ibycus4'.)
> Then I pressed `C-p' and `C-e' to move point to the end of the first
> line, and `C-x C-e' to evaluate the expression.
> Here is the exact input for all of that:
> ( s e a r c h - f o r w a r d SPC " C-x <return> C-\
> g r e e k - i b y c u s 4 <return> i ' C-\ " ) <return>
> C-\ u ' C-\ C-p C-e C-x C-e
> This moved the cursor to the end of the second line, and displayed
> `214', the new position of point, in the echo area. So searching for
> the iota found the upsilon. This must be a bug.
> Boyer Moore searching compares only the last bytes of the characters,
> and this leads to the problem. If you capitalize the accented iota,
> the last byte is the same as the last byte of the upsilon, although
> their second-to-last bytes are different.
> Capital accented iota \234\364\362\273
> Small accented upsilon \234\364\361\273
> So before doing a Boyer Moore search, `search_buffer' needs to check
> that the character and its inversion have the same first three bytes.
> Here is the patch I made to do that. Please forgive my mistakes; I am
> not a programmer.
> cd ~/emacs-21.1/src/
> diff -c /home/tlm/emacs-21.1/src/search.c.\~1\~
> /home/tlm/emacs-21.1/src/search.c
> *** /home/tlm/emacs-21.1/src/search.c.~1~ Mon Oct 1 02:08:20 2001
> --- /home/tlm/emacs-21.1/src/search.c Wed Apr 3 07:53:39 2002
> ***************
> *** 1237,1243 ****
> /* Keep track of which character set row
> contains the characters that need translation. */
> int charset_base_code = c & ~CHAR_FIELD3_MASK;
> ! if (charset_base == -1)
> charset_base = charset_base_code;
> else if (charset_base != charset_base_code)
> /* If two different rows appear, needing translation,
> --- 1237,1246 ----
> /* Keep track of which character set row
> contains the characters that need translation. */
> int charset_base_code = c & ~CHAR_FIELD3_MASK;
> ! int inverse_charset_base = inverse & ~CHAR_FIELD3_MASK;
> ! if (charset_base_code != inverse_charset_base)
> ! boyer_moore_ok = 0;
> ! else if (charset_base == -1)
> charset_base = charset_base_code;
> else if (charset_base != charset_base_code)
> /* If two different rows appear, needing translation,
> Diff finished at Wed Apr 3 08:00:10
> _______________________________________________
> Bug-gnu-emacs mailing list
> Bug-gnu-emacs@gnu.org
> http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs
- Re: Problem with Boyer Moore and Greek characters,
Kenichi Handa <=