|
From: | Gitanjali Bhatia |
Subject: | [Aspell-user] byte offsets vs. character offsets |
Date: | Tue, 19 Sep 2006 09:45:53 -0700 |
Hi, I am using the C interface to the aspell library to parse
the incoming text that I need to spell check. This is what I am doing: spell_document_checker_process(checker, pText, -1); while (token = aspell_document_checker_next_misspelling(checker),
token.len != 0) The text that I am passing in is Vietnamese UTF-8 encoded
text. What I am seeing is that the token.offset that I get is in bytes and not
characters. This means that if a character before the misspelled word was 3 bytes
long, then the offset of the misspelled word would be off by 3 as well. This
causes a problem for me to highlight or replace the word. I looked into the
aspell manual and saw that there is an option to set the byte-offsets. I tried
setting it to both true and false, but the offset seems to be the same each
time. I set it through a config file that I then load the following way: AspellConfig* config = new_aspell_config(); aspell_config_replace( config, "conf",
"aspell.conf" ); Is there any other way to get the character offset? Thanks -Gitanjali |
[Prev in Thread] | Current Thread | [Next in Thread] |