nano-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nano-devel] adding a word-completion feature to nano


From: Benno Schulenberg
Subject: Re: [Nano-devel] adding a word-completion feature to nano
Date: Sat, 29 Oct 2016 10:30:08 +0200

Hello Sumedh,

On Fri, Oct 28, 2016, at 23:58, Sumedh Pendurkar wrote:
> On Saturday 29 October 2016 01:02 AM, Benno Schulenberg wrote:
> > ŝampu
> > chère
> >
> > Then type <Enter> ŝa ^]
> > It says: ŝadds.
> > Typing ^] again, it says: ŝallowing
> >
> > Not good.  It doesn't see the ŝ as forming part of a word.
> > But it should.  Selecting those two words, and running the
> > internal spell checker on them says it cannot find the word
> > "ampu".  Hmmm...  There seems to be something wrong with
> > the word-forming determination.  That will have to be fixed
> > first.
>
> Hmm..

Sorry, I was confused.  It is 'spell' (which is called by the
internal spell checker) that does not recognize non-ascii letters
as alphabetic: it thinks that "ŝ" is whitespace or punctuation or
whatever, but not a letter, so it sees the word "ampu" and reports
it as misspelled.  But nano correctly sees "ŝ" as a letter and
therefore cannot find the word "ampu" as a separate word because
it only sees "ŝampu".  So the word-forming detection in nano
works okay.

> > Typing <Enter> ch ^]
> > it says: ch�
> > It produces an invalid byte.
> >
> > Your code is not entirely UTF-8 compatible.
> 
> I am new to utf8. So I haven't read enough about it.
> Please correct my mistakes if I make any.
> I just looked into the code and just ran the code on paper.
> 1)is  "è" a single byte?

No.  UTF-8 is a multibyte encoding.  Anything that is not ASCII
takes up two or three or four bytes.

> Then it checks if next byte is a mb_char or not(Which surprisingly 
> returns true)

is_word_mbchar() does not check a byte; it checks whether the
string that starts at the given position begins with a valid
/multibyte/ character -- mb = multibyte.  (But a valid single-byte
character is good too, of course.)

> (note: if it is two bytes. the second byte was not a word forming 
> character thats why it signaled the end of word).

You cannot check bytes for being word forming, you need to
check characters, which means that now and then you have
to skip a byte, or two, or three.

> Also, after the git pull.
> if i do ./autogen.sh;./configure;make;
> It says that using_utf8 not declared. I think it should be as I have 
> attached in the patch. I might have been mistaken even.

Good catch.  Apparently the locally trimmed regression script needs
to have the --disable-utf8 option added back in.  Thanks for the
patch; I have installed something slightly different: 33bc848.

> Also,
> ./configure --enable-utf8
> put this on the terminal.
> *** UTF-8 support was requested, but insufficient UTF-8 support was
> *** detected [...]

You need to have libncurses5w or libncurses6w (note the "w")
and the corresponding libncurses5w-dev or libncurses6w-dev
packages installed.  Please report what you needed to install
for configure to pass without error, so I can update README.GIT.

Benno

-- 
http://www.fastmail.com - The way an email service should be




reply via email to

[Prev in Thread] Current Thread [Next in Thread]