guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What's next with culturia search engine? (and guile-wiredtiger)


From: Catonano
Subject: Re: What's next with culturia search engine? (and guile-wiredtiger)
Date: Sun, 14 Jan 2018 15:12:23 +0100

2018-01-14 11:05 GMT+01:00 Amirouche Boubekki <address@hidden>:

> On 2018-01-14 09:12, Catonano wrote:
>
>> 2017-11-26 23:33 GMT+01:00 Amirouche Boubekki
>> <address@hidden>:
>>
>>>
>>> The quering engine will first compute the frequency of both
>>> keywords and then lookup the inverted index for the least
>>> frequent keyword.
>>>
>>
>> The least frequent keyword ?
>>
>> Not the most frequent keyword ?
>>
>
> Yes, imagine you search for serif+font, most common
> word and the least discriminant is "font" because there
> is (I think) more page containing "font".
>
> The result of the inverted lookup above is used as seed
> of the rest of the algorithm that is O(n) so I need to
> minimize 'n' ie. the count of initial documents.


I see now. Thanks


>
>
>
>> That way, there is a 'seed' set of documents
>>> that we can filter with a small vm that will interpret the
>>> rest of the query for instance. Something like:
>>>
>>> (filter (hit? (cdr query)) seed)
>>>
>>> Sort of. I can't make it simpler right now, but you can
>>> have a look at the code. The public procedure and the bottom
>>> called 'search' [4] is the where the code starts.
>>>
>>
> This is badly explained.  At this point SEED contains the unique
> identifier of document that contains the least frequent word.
> We remove it from the query hence the (cdr query) and filter
> the SEED with the rest of the query. This is small optimization,
> because we know that the least frequent word is already in the
> documents found in the SEED, so we do not need to check its
> presence in the SEED documents. 'hits?' will return somekind
> of state-machine that will check that a given document match
> the QUERY passed as argument.
>
> That what I mean to do, the (cdr query) to remove the most
> discriminant query term is not implemented, yet.
>

Ok


>
>>> [4]
>>>
>>> https://github.com/a-guile-mind/culturia.one/blob/master/src
>> /wiredtiger/ix.scm#L455
>>
>>> [8]
>>>
>>
>> file not fond
>>
>>
> It's here: https://github.com/a-guile-mind/culturia.one/blob/master/src
> /ix.scm#L439
>
> I reworked the thing to use grf3 graph abstraction to store
> the documents.
>
> Also guile-wiredtiger 0.6.4 is in guix.


I know ;-)

At the moment I don't feel confident with guile code calling C code

There's guile-squee thaht needs some love too, that could be a starting
point ofrr me

Also G-golf is very important. GUIs are not optional, they are fundamental,
we should absolutely have a decent integration between Guile and Gnome

When and if I'll know more, I'll take a look at Culturia too :-)


>
>
>
>
>> All this looks pretty interesting but I have to say that I prefer the
>> work you're doing on GNUNet ;-)
>>
>
> Tx for you interest!
>

No, thank you !
GNUNet is also very important, probably more than g-golf, I'm not sure

I can't wait to test drive it


reply via email to

[Prev in Thread] Current Thread [Next in Thread]