[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: File search progress: database review and question on triggers
From: |
Pierre Neidhardt |
Subject: |
Re: File search progress: database review and question on triggers |
Date: |
Sun, 11 Oct 2020 16:25:59 +0200 |
Hi Zimoun,
Maybe you misunderstood a point: the filesearch database is not a
database of _all store items_, but only of the items that correspond to
the packages of a given Guix generation.
This should answer many of your comments.
>> I don't know the size of your store nor your hardware. Could you
>> benchmark against my filesearch implementation?
>
> 30G as I reported in my previous email. ;-)
Sorry, I was unclear: I meant to benchmark the runtime of the Guile code
I wrote in the patch, i.e.
--8<---------------cut here---------------start------------->8---
,time (persist-all-local-packages)
--8<---------------cut here---------------end--------------->8---
>> I should have benchmarked with Lzip, it would have been more useful. I
>> think we can get it down to approximately 8 MiB in Lzip.
>
> Well, I think it will be more with all the items of all the packages.
No, the 8 MiB include _all the packages_ of a Guix generation.
We never include the complete store, it would not make sense for filesearch.
> This means to setup server side, right? So implement the "diff" in
> "guix publish", right? Hum? I feel it is overcomplicated.
I don't think it's to complicated: client sends a request along with the
Guix generation commit and the closer Guix generation commit for which
they have a database, server diffs the 2 SQLite database, compresses the
result and sends it back.
> Well, what is the size of for a full /gnu/store/ containing all the
> packages of one specific revision? Sorry if you already provided this
> information, I have missed it.
The size of a /gnu/store does not matter. The size of the databse does
however. In the email from the 26th of September:
--8<---------------cut here---------------start------------->8---
The database will all package descriptions and synopsis
is 46 MiB and
compresses down to 11 MiB in zstd.
--8<---------------cut here---------------end--------------->8---
>> "manually" is not good in my opinion. The end-user will inevitably
>> forget. An out-of-sync database would return bad results which is a
>> big no-no for search. On-demand database updates are ideals I think.
>
> The tradeoff is:
> - when is "on-demand"? When updates the database?
"guix build" and "guix pull".
> - still fast when I search
Sorry, what is your question?
> - do not slow down other guix subcommands
"guix pull" is not called by other commands.
I don't think that "guix build" would be impacted much because the
database update for a single store item is very fast.
> What you are proposing is:
>
> - when "guix search --file":
> + if the database does not exist: fetch it
> + otherwise: use it
No, do it in "guix pull" since it requires networking already.
> - after each "guix build", update the database
Yes.
> I am still missing the other update mechanism for updating the database.
Why?
> (Note that the "fetch it" could be done at "guix pull" time which is
> more meaningful since pull requires network access as you said. And
> the real computations for updating could be done at the first "guix
> search --file" after the pull.)
Maybe this is the misunderstanding: "fetch it" and "update it" is the
same thing.
You fetch the diff from the substitute server and you apply it onto your
local database.
> Note that since the same code is used on build farms and their store
> is several TB (see recent discussion about "guix gc" on Berlin that
> takes hours), the build and update of the database need some care. :-)
There is no difference between the build farm and my computer since I've
generated the database over all 15000+ packages. That the store has
several TB is irrelevant since only the given 15000 items will be browsed.
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
signature.asc
Description: PGP signature
- Re: File search progress: database review and question on triggers, (continued)
- Re: File search progress: database review and question on triggers, zimoun, 2020/10/09
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/10
- Re: File search progress: database review and question on triggers, zimoun, 2020/10/10
- Re: File search progress: database review and question on triggers, Ludovic Courtès, 2020/10/12
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/12
- Re: File search progress: database review and question on triggers, Ludovic Courtès, 2020/10/13
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/13
Re: File search progress: database review and question on triggers, zimoun, 2020/10/10
Re: File search progress: database review and question on triggers, Ludovic Courtès, 2020/10/12
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/12
- Re: File search progress: database review and question on triggers, Ludovic Courtès, 2020/10/13
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/13
- Re: File search progress: database review and question on triggers, Ludovic Courtès, 2020/10/13
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/14
- Re: File search progress: database review and question on triggers, Ludovic Courtès, 2020/10/16
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/17
- Re: File search progress: database review and question on triggers, Pierre Neidhardt, 2020/10/17