[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HELP] a search engine in GNU Guile
From: |
Amirouche Boubekki |
Subject: |
Re: [HELP] a search engine in GNU Guile |
Date: |
Fri, 09 Sep 2016 20:10:30 +0200 |
User-agent: |
Roundcube Webmail/1.1.2 |
On 2016-09-09 16:05, Ralf Mattes wrote:
On Fri, Sep 09, 2016 at 09:39:24AM -0500, Christopher Allan Webber
wrote:
Amirouche Boubekki writes:
> - port whoosh/lucene to guile to improve text search
Sorry, but I don't see the point of this.
I mean to say "to improve text search of my previous attempt at writing
a search
engine". The previous iteration of this project does not support boolean
search.
At least Lucene has a http-based
interface that can be accessed by any kind of client language.
That is trivial to do with guile too.
Why reinvent the wheel
Because it's a hobby.
(and, in the case of Lucene, a rather well working,
It's not possible to use a custom storage engine with Lucene.
extremly mature
My theory is that some search engine businesses like algolia forked
Lucene to build
it on top of something similar to wiredtiger and can now claim
impressive performance.
What I mean to say basically, is that wsh.scm is innovation. I read here
and there that
big players are actually using storage engines similar to wiredtiger to
build search engines...
So, it's not a bad idea it just an idea that is not common.
and complex wheel)?
How complex? That's what I try to understand. AFAIK it's not as complex
as opencog
since I can rewrite more features.
This is something I'd love to see generally. It would be nice to have
an indexing library, either by writing bindings to Xapian (which
unfortunately couldn't use the FFI since it's C++),
But almost all of Xapian's bindings are Swig-generated (and that seems
to be
the prefered way of generating bindings). IIRC I used the Swig Guile
bindings
years ago (I'm pretty shure that code got lost in a harddisk crash, but
I'm to
lazy to google it up ...).
or natively porting
something like Whoosh, for Guile.
I've seen similar approaches for Common Lisp (search for montezuma) but
in the
end it seems to be way too much work - remember that not a small part
of Lucene's
success is based on the existing ecosystem (Solr, excellent language
parsers et al.)
If you think about stemming then it's not supported yet by wsh at all.
It's an area
I'd like to improve.
I agree that if someone wants to create a business using Guile, they
would be
up and running faster using ES or solr. It will be a good contribution
to Guile
ecosystem. I am not building a business, I'm studying free software zoo.
wsh is
basically a notes in the form of code on the road to what I actually
want to reach
which is concept search cf.
https://en.wikipedia.org/wiki/Concept_search