[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...
From: |
Martin Swift |
Subject: |
Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position... |
Date: |
Sat, 3 Mar 2007 21:16:08 +0900 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Sat, Mar 03, 2007 at 04:29:15AM -0700, Kevin Atkinson wrote:
> The word list is likely in iso-8859-1 but Aspell expects it in utf-8.
Indeed:
# file de*
de_affix.dat: ISO-8859 text
de_AT.multi: ASCII text
de_AT-only.cwl: data
de_CH.multi: ASCII text
de_CH-only.cwl: data
de-common.cwl: data
de.dat: ASCII text
de_DE.multi: ASCII text
de_DE-only.cwl: data
de.multi: ASCII text
de_phonet.dat: ISO-8859 English text
deutsch.alias: ASCII text
> Your locale settings _should_ not have an effect here. What does have an
> effect is the setting the the language data file "de.dat", in particular
> "data-encoding". See
> http://aspell.net/man-html/The-Language-Data-File.html
>From that page:
data-encoding
The encoding the language data files are expected to be in as well
as the default encoding to use when saving the personal
dictionaries. It can be either `utf-8' or any of the 8-bit
encoding that Aspell supports. If not set, then it defaults to
charset.
I hope not to offend, but I found that paragraph a little terse..
* Should it be: "The encoding *of* the language data files"?
* "are expected to be in as well as..." Expected to be in what?
* Should it be: "as well as the default encoding *used* when saving"
Does this mean that aspell expects the word lists to have the same
charset as the machine? Isn't that a little odd?
de.dat sets 'charset' as iso-8859-1:
# cat de.dat
# Generated with Aspell Dicts "proc" script version 0.50.1
name de
charset iso-8859-1
soundslike de
affix de
Does aspell not use this to determine the charset? If not, /shouldn't/
it?
I just tried
/usr/bin/prezip-bin -d < de-common.cwl | /usr/bin/aspell --lang=de create
--encoding=iso8859-1 master ./de-common.rws
which completed without any errors, producing de-common.rws. As it is
quite late here in Japan, I don't have any more time tonight to work
on this.
A couple of questions:
Is this going to conflict with my machines character encoding, or
has aspell created an rws file for a utf-8 system?
Is the machine character encoding check a feature? It really seems
that since one might attemp to install the same wordlist on machines
with different character encodings that this is prone to failure.
--
\u270C
- [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Martin Swift, 2007/03/01
- [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Martin Swift, 2007/03/03
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/03
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Martin Swift, 2007/03/03
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/03
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position...,
Martin Swift <=
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/03
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/05
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Martin Swift, 2007/03/06
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/06
- Message not available
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/07
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Martin Swift, 2007/03/07
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Martin Swift, 2007/03/05
- Re: [Aspell-user] aspell-<LANG>: Invalid UTF-8 sequence at position..., Kevin Atkinson, 2007/03/06