guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search progress: database review and question on triggers


From: Pierre Neidhardt
Subject: Re: File search progress: database review and question on triggers
Date: Mon, 05 Oct 2020 20:53:01 +0200

Hi Ludo!

Ludovic Courtès <ludo@gnu.org> writes:

> Nice!

Thanks!

> Could you post a summary of what you have done, what’s left to do, and
> how you’d like to integrate it?  (If you’ve already done it, my
> apologies, but you can resend a link.  :-))

What I've done: mostly a database benchmark.

- Textual database: slow and not lighter than SQLite.  Not worth it I believe.

- SQLite without full-text search: fast, supports classic patterns
  (e.g. "foo*bar") but does not support word permutations.

- SQLite with full-text search: fast, supports word permutations but
  does not support suffix-matching (e.g. "bar" won't match "foobar").
  Size is about the same as without full-text search.

- Include synopsis and descriptions.  Maybe we should include all fields
  that are searched by `guix search`.  This incurs a cost on the
  database size but it would fix the `guix search` speed issue.  Size
  increases by some 10 MiB.

I say we go with SQLite full-text search for now with all package
details.  Switching to without full-text search is just a matter of a
minor adjustment, which we can decide later when merging the final
patch.  Same if we decide not to include the description, synopsis, etc.

What's left to do:

- Populate the database on demand, either after a `guix build` or from a
  `guix filesearch...`.  This is important so that `guix filesearch`
  works on packages built locally.  If `guix build`, I need help to know
  where to plug it in.

- Adapt Cuirass so that it builds its file database.
  I need pointers to get started here.

- Sync the databases from the substitute server to the client when
  running `guix filesearch`.  For this I suggest we send the compressed
  database corresponding to a guix generation over the network (around
  10 MiB).  Not sure sending just the delta is worth it.

- Find a way to garbage-collect the database(s).  My intuition is that
  we should have 1 database per Guix checkout and when we `guix gc` a
  Guix checkout we collect the corresponding database.

  I would store the databases in /var/guix/...

Comments and help welcome! :)

-- 
Pierre Neidhardt
https://ambrevar.xyz/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]