guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers


From: zimoun
Subject: Re: File search progress: database review and question on triggers
Date: Mon, 12 Oct 2020 13:23:13 +0200

On Mon, 12 Oct 2020 at 12:20, Ludovic Courtès <ludo@gnu.org> wrote:

>> - Textual database: slow and not lighter than SQLite.  Not worth it I 
>> believe.
>>
>> - SQLite without full-text search: fast, supports classic patterns
>>   (e.g. "foo*bar") but does not support word permutations.
>>
>> - SQLite with full-text search: fast, supports word permutations but
>>   does not support suffix-matching (e.g. "bar" won't match "foobar").
>>   Size is about the same as without full-text search.
>>
>> - Include synopsis and descriptions.  Maybe we should include all fields
>>   that are searched by `guix search`.  This incurs a cost on the
>>   database size but it would fix the `guix search` speed issue.  Size
>>   increases by some 10 MiB.
>
> Oh so this is going beyond file search, right?
>
> Perhaps it would make sense to focus on file search only as a first
> step, and see what can be done with synopses/descriptions (like Arun and
> zimoun did before) later, separately?

Well, the first patch set that Arun sent for improving “guix search” was
the introduction of a SQLite database, replacing the current
’package.cache’.  And I quote your wise advice:

        I would rather keep the current package cache as-is instead of
        inserting sqlite in here.  I don’t expect it to bring much
        compared performance-wise to the current simple cache
        (especially if we look at load time), and it does increase
        complexity quite a bit.

        However, using sqlite for keyword search as you initially
        proposed on guix-devel does sound like a great idea to me.

Message-ID: <87sgjhx92g.fsf@gnu.org>


Therefore, if Pierre is going to introduce a SQL database where the
addition of the synopses/descriptions is cheap, it seems a good idea to
use it, isn’t it?  Keeping the ’package.cache’ as-is.  And in parallel,
“we“ can try to use this WIP branch for improving the speed of “guix
search” (by “we”, I mean that I plan to work on).

BTW, somehow, it would be really easy to remove these 2 extra fields if
it is not concluding for search, since it is only the function
’add-files’:

--8<---------------cut here---------------start------------->8---
    (with-statement
        db
        (string-append "insert into Info (name, synopsis, description, package)"
                       " values (:name, :synopsis, :description, :id)")
        stmt
      (sqlite-bind-arguments stmt
                             #:name name
                             #:synopsis synopsis
                             #:description description
                             #:id id)        
--8<---------------cut here---------------end--------------->8---

and used only once by ’persist-package-files’.



> It would be nice to see whether/how this could be integrated with
> third-party channels.  Of course it’s not a priority, but while
> designing this feature, we should keep in mind that we might want
> third-party channel authors to be able to offer such a database for
> their packages.

If the third-party channels also provides substitutes, then it would be
part of the substitutes, or easy to build from the substitute meta-data.



>> - Find a way to garbage-collect the database(s).  My intuition is that
>>   we should have 1 database per Guix checkout and when we `guix gc` a
>>   Guix checkout we collect the corresponding database.
>
> If we download a fresh database every time, we might as well simply
> overwrite the one we have?

But you do not want to download it again if you roll-back for example.
>From my point of view, it should be the same mechanism as
’package.cache’.


Cheers,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]