[Aspell-user] byte offsets vs. character offsets

Hi,

I am using the C interface to the aspell library to parse the incoming text that I need to spell check. This is what I am doing:

spell_document_checker_process(checker, pText, -1);

while (token = aspell_document_checker_next_misspelling(checker), token.len != 0)

The text that I am passing in is Vietnamese UTF-8 encoded text. What I am seeing is that the token.offset that I get is in bytes and not characters. This means that if a character before the misspelled word was 3 bytes long, then the offset of the misspelled word would be off by 3 as well. This causes a problem for me to highlight or replace the word. I looked into the aspell manual and saw that there is an option to set the byte-offsets. I tried setting it to both true and false, but the offset seems to be the same each time. I set it through a config file that I then load the following way:

AspellConfig* config = new_aspell_config();

aspell_config_replace( config, "conf", "aspell.conf" );

Is there any other way to get the character offset?

Thanks

-Gitanjali

From:	Gitanjali Bhatia
Subject:	[Aspell-user] byte offsets vs. character offsets
Date:	Tue, 19 Sep 2006 09:45:53 -0700