guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search


From: Antoine R. Dumont (@ardumont)
Subject: Re: File search
Date: Fri, 02 Dec 2022 19:22:27 +0100

Hello again,

As mentioned previously, I have iterated on the work @civodul started.
After discussing it a bit over irc, I proposed to try and push a bit
forward the discussion and the implementation [2] to see where this
goes.

After toying a bit with the initial code, I took the liberty to make it
a guix extension (we discussed it a bit with @zimoun). It was mostly to
get started with Guile (I know some lisp implems but not this one so i
had to familiarize myself with tools and whatnot ;). Anyway, that can be
reverted if you feel like it can be integrated as a Guix cli directly.

Currently, the implementation scans and indexes whatever package is
present in the local store of the machine's user. From nix/guix's
design, it makes sense to do it that way as it's likely that even though
you don't have all the tools locally, it may be already present as a
dependency of some high level tools you already use (it's just not
exposed because not declared in config.scm or home-configuration.scm).

You will find inlines (at the bottom) some cli usage calls [1] and the
current implementation [2].

Thanks in advance for any feedback ;)

Cheers,
--
tony / Antoine R. Dumont (@ardumont)

-----------------------------------------------------------------
gpg fingerprint BF00 203D 741A C9D5 46A8 BE07 52E2 E984 0D10 C3B8


[1] Usage sample:
--8<---------------cut here---------------start------------->8---
$ env | grep GUIX_EXTENSION
GUIX_EXTENSIONS_PATH=$HOME/repo/public/guix/guix/guix/extensions

$ guix index
;;; (package 1 "acl")
;;; (package 595 "shepherd")
;;; (package 596 "guile2.2-shepherd")
;;; (package 2 "htop")
;;; (package 7 "shadow")
;;; (package 6 "shepherd")
;;; (package 5 "autojump")
...
^C

$ guix index search shepherd
guile2.2-shepherd@0.9.3 
/gnu/store/cq8r2vzg56ax0iidgs4biz3sv0b9jxp3-guile2.2-shepherd-0.9.3/bin/shepherd
shepherd@0.9.3       
/gnu/store/a9jdd8kgckwlq97yw3pjqs6sy4lqgrfq-shepherd-0.9.3/bin/shepherd
shepherd@0.8.1       
/gnu/store/vza48khbaq0fdmcsrn27xj5y5yy76z6l-shepherd-0.8.1/bin/shepherd
shepherd@0.9.1       
/gnu/store/gxz67p4gx9g6rpxxpsgmhsybczimdlx5-shepherd-0.9.1/bin/shepherd

guix help | grep -C3 extension
    repl       read-eval-print loop (REPL) for interactive programming

  extension commands
    index  Index packages to allow searching package for a given filename

Report bugs to: bug-guix@gnu.org.
$ guix help index  # or: guix index [--help|-h]
Usage: guix index [OPTIONS...] [search FILE...]
Without FILE, index (package, file) relationships in the local store.
With 'search FILE', search for packages installing FILE.

Note: The internal cache is located at ~/.config/guix/locate-db.sqlite.
See --db-path for customization.

The valid values for OPTIONS are:

  -h, --help      Display this help and exit
  -V, --version   Display version information and exit
  --db-path=DIR   Change default location of the cache db

The valid values for ARGS are:

  search FILE     Search for packages installing the FILE (from cache db)

Report bugs to: bug-guix@gnu.org.
GNU Guix home page: <https://guix.gnu.org>
General help using Guix and GNU software: <https://guix.gnu.org/en/help/>
$ guix index --version # or guix index -V
guix locate (GNU Guix) 5ccb5837ccfb39af4e3e6399a0124997a187beb1
Copyright (C) 2022 the Guix authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
--8<---------------cut here---------------start------------->8---

[2] The code:
--8<---------------cut here---------------start------------->8---
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2022 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
;;; GNU Guix is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or (at
;;; your option) any later version.
;;;
;;; GNU Guix is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.

(define-module (guix extensions index)
  #:use-module (guix config)  ;; %guix-package-name, ...
  #:use-module (guix ui)      ;; display G_
  #:use-module (guix scripts)
  #:use-module (sqlite3)
  #:use-module (ice-9 match)
  #:use-module (guix describe)
  #:use-module (guix store)
  #:use-module (guix monads)
  #:autoload   (guix grafts) (%graft?)
  #:use-module (guix derivations)
  #:use-module (guix packages)
  #:autoload   (guix build utils) (find-files)
  #:autoload   (gnu packages) (fold-packages)
  #:use-module (srfi srfi-1)
  #:use-module (srfi srfi-9)
  #:export     (guix-index))

(define debug #f)

(define schema
  "
create table if not exists Packages (
  id        integer primary key autoincrement not null,
  name      text not null,
  version   text not null,
  unique    (name, version) -- add uniqueness constraint
);

create table if not exists Directories (
  id        integer primary key autoincrement not null,
  name      text not null,
  package   integer not null,
  foreign key (package) references Packages(id) on delete cascade,
  unique (name, package) -- add uniqueness constraint
);

create table if not exists Files (
  name      text not null,
  basename  text not null,
  directory integer not null,
  foreign key (directory) references Directories(id) on delete cascade
  unique (name, basename, directory) -- add uniqueness constraint
);

create index if not exists IndexFiles on Files(basename);")

(define (call-with-database file proc)
  (let ((db (sqlite-open file)))
    (dynamic-wind
      (lambda () #t)
      (lambda ()
        (sqlite-exec db schema)
        (proc db))
      (lambda ()
        (sqlite-close db)))))

(define (insert-files db package version directories)
  "Insert files from DIRECTORIES as belonging to PACKAGE at VERSION."
  (define stmt-select-package
    (sqlite-prepare db "\
SELECT id FROM Packages WHERE name = :name AND version = :version;"
                    #:cache? #t))

  (define stmt-insert-package
    (sqlite-prepare db "\
INSERT OR IGNORE INTO Packages(name, version) -- to avoid spurious writes
VALUES (:name, :version);"
                    #:cache? #t))

  (define stmt-select-directory
    (sqlite-prepare db "\
SELECT id FROM Directories WHERE name = :name AND package = :package;"
                    #:cache? #t))

  (define stmt-insert-directory
    (sqlite-prepare db "\
INSERT OR IGNORE INTO Directories(name, package) -- to avoid spurious writes
VALUES (:name, :package);"
                    #:cache? #t))

  (define stmt-insert-file
    (sqlite-prepare db "\
INSERT OR IGNORE INTO Files(name, basename, directory)
VALUES (:name, :basename, :directory);"
                    #:cache? #t))

  (sqlite-exec db "begin immediate;")
  (sqlite-bind-arguments stmt-insert-package
                         #:name package
                         #:version version)
  (sqlite-fold (const #t) #t stmt-insert-package)

  (sqlite-bind-arguments stmt-select-package
                         #:name package
                         #:version version)
  (match (sqlite-fold cons '() stmt-select-package)
    ((#(package-id))
     (when debug
       (format #t "(pkg, version, pkg-id): (~a, ~a, ~a)"
               package version package-id))
     (pk 'package package-id package)
     (for-each (lambda (directory)
                 (define (strip file)
                   (string-drop file (+ (string-length directory) 1)))

                 (sqlite-reset stmt-insert-directory)
                 (sqlite-bind-arguments stmt-insert-directory
                                        #:name directory
                                        #:package package-id)
                 (sqlite-fold (const #t) #t stmt-insert-directory)

                 (sqlite-reset stmt-select-directory)
                 (sqlite-bind-arguments stmt-select-directory
                                        #:name directory
                                        #:package package-id)
                 (match (sqlite-fold cons '() stmt-select-directory)
                   ((#(directory-id))
                    (when debug
                      (format #t "(name, package, dir-id): (~a, ~a, ~a)\n"
                              directory package-id directory-id))
                    (for-each (lambda (file)
                                ;; If DIRECTORY is a symlink, (find-files
                                ;; DIRECTORY) returns the DIRECTORY singleton.
                                (unless (string=? file directory)
                                  (sqlite-reset stmt-insert-file)
                                  (sqlite-bind-arguments stmt-insert-file
                                                         #:name (strip file)
                                                         #:basename
                                                         (basename file)
                                                         #:directory
                                                         directory-id)
                                  (sqlite-fold (const #t) #t stmt-insert-file)))
                              (find-files directory)))))
               directories)))
  (sqlite-exec db "commit;"))

(define (insert-package db package)
  "Insert all the files of PACKAGE into DB."
  (mlet %store-monad ((drv (package->derivation package #:graft? #f)))
    (match (derivation->output-paths drv)
      (((labels . directories) ...)
       (when (every file-exists? directories)
         (insert-files db (package-name package) (package-version package)
                       directories))
       (return #t)))))

(define (filter-public-current-supported package)
  "Filter supported, not hidden (public) and not superseded (current) package."
  (and (not (hidden-package? package))
       (not (package-superseded package))
       (supported-package? package)))

(define (filter-supported-package package)
  "Filter supported package (package might be hidden or superseded)."
  (and (supported-package? package)))

(define (no-filter package) "No filtering on package" #t)

(define* (insert-packages db #:optional (filter-policy 
filter-public-current-supported))
  "Insert all current packages matching `filter-package-policy` into DB."
  (with-store store
    (parameterize ((%graft? #f))
      (fold-packages (lambda (package _)
                       (run-with-store store
                         (insert-package db package)))
                     #t
                     #:select? filter-policy))))

(define-record-type <package-match>
  (package-match name version file)
  package-match?
  (name      package-match-name)
  (version   package-match-version)
  (file      package-match-file))

(define (matching-packages db file)
  "Return unique <package-match> corresponding to packages containing FILE."
  (define lookup-stmt
    (sqlite-prepare db "\
SELECT Packages.name, Packages.version, Directories.name, Files.name
FROM Packages
INNER JOIN Files, Directories
ON files.basename = :file
  AND directories.id = files.directory
  AND packages.id = directories.package;"))

  (sqlite-bind-arguments lookup-stmt #:file file)
  (sqlite-fold (lambda (result lst)
                 (match result
                   (#(package version directory file)
                    (cons (package-match package version
                                         (string-append directory "/" file))
                          lst))))
               '() lookup-stmt))



(define (index-packages-with-db db-pathname)
  "Index packages using db at location DB-PATHNAME."
  (call-with-database db-pathname
    (lambda (db)
      (insert-packages db no-filter))))

(define (matching-packages-with-db db-pathname file)
  "Compute list of packages referencing FILE using db at DB-PATHNAME."
  (call-with-database db-pathname
    (lambda (db)
      (matching-packages db file))))

(define (print-matching-results matches)
  "Print the MATCHES matching results."
  (for-each (lambda (result)
              (format #t "~20a ~a~%"
                      (string-append (package-match-name result)
                                     "@" (package-match-version result))
                      (package-match-file result)))
            matches))

(define default-db-path
  (let ((local-config-path
         (and=> (getenv "HOME")
                (lambda (home)
                  (string-append home "/.config/guix/")))))
    (string-append local-config-path "locate-db.sqlite")))

(define (show-bug-report-information)
  ;; TRANSLATORS: The placeholder indicates the bug-reporting address for this
  ;; package.  Please add another line saying "Report translation bugs to
  ;; ...\n" with the address for translation bugs (typically your translation
  ;; team's web or email address).
  (format #t (G_ "
Report bugs to: ~a.") %guix-bug-report-address)
  (format #t (G_ "
~a home page: <~a>") %guix-package-name %guix-home-page-url)
  (format #t (G_ "
General help using Guix and GNU software: <~a>")
          ;; TRANSLATORS: Change the "/en" bit of this URL appropriately if
          ;; the web site is translated in your language.
          (G_ "https://guix.gnu.org/en/help/";))
  (newline))

(define (show-help)
  (display (G_ "Usage: guix index [OPTIONS...] [search FILE...]
Without FILE, index (package, file) relationships in the local store.
With 'search FILE', search for packages installing FILEx;x.\n
Note: The internal cache is located at ~/.config/guix/locate-db.sqlite.
See --db-path for customization.\n"))
  (newline)
  (display (G_ "The valid values for OPTIONS are:"))
  (newline)
  (display (G_ "
  -h, --help      Display this help and exit"))
  (display (G_ "
  -V, --version   Display version information and exit"))
  (display (G_ "
  --db-path=DIR   Change default location of the cache db"))
  (newline)
  (newline)
  (display (G_ "The valid values for ARGS are:"))
  (newline)
  (display (G_ "
  search FILE     Search for packages installing the FILE (from cache db)"))
  (newline)
  (show-bug-report-information))

(define-command (guix-index . args)
  (category extension)
  (synopsis "Index packages to allow searching package for a given filename")

  (define (parse-db-args args)
    "Parsing of string key=value where we are only interested in  'value'"
    (match (string-split args #\=)
      ((unused db-path)
       db-path)
      (_ #f)))

  (define (display-help-and-exit)
    (show-help)
    (exit 0))

  (match args
    ((or ("-h") ("--help"))
     (display-help-and-exit))
    ((or ("-V") ("--version"))
     (show-version-and-exit "guix locate"))
    ((db-path-args)
     (let ((db-path (parse-db-args db-path-args)))
       (if db-path
           (index-packages-with-db db-path)
           (display-help-and-exit))))
    (("search" file)
     (let ((matches (matching-packages-with-db default-db-path file)))
       (print-matching-results matches)
       (exit (pair? matches))))
    ((db-path-args "search" file)
     (let ((db-path (parse-db-args db-path-args)))
       (if db-path
           (let ((matches (matching-packages-with-db db-path file)))
             (print-matching-results matches)
             (exit (pair? matches)))
           (display-help-and-exit))))
    (_  ;; index by default
     (index-packages-with-db default-db-path))))
--8<---------------cut here---------------start------------->8---

antoine.romain.dumont@gmail.com writes:

> Hello Guix!
>
> Guix is top so thanks for the awesome work!
>
> Just to give some feedback on this thread. That's a good news that the
> file search functionality in the radar.
>
>> Lately I found myself going several times to
>> <https://packages.debian.org> to look for packages providing a given
>> file and I thought it’s time to do something about it.
>
> I've finally started to set up my machine with Guix system (and
> Guix Home). Finding out where such program or cli is packaged is
> definitely something that I need to port my existing use (from mainly
> nixified debian or nixos machines) to Guix.
>
> And to answer such question, I used existing "offline" programs in my
> machines. I've bounced back and forth between `nix-locate` and `apt-file
> search` to determine approximately the packages in Guix (names aren't
> usually that different).
>
> Hence, as a user, it's one of my expectation that the Guix cli provides
> some equivalent program to lookup from file to package ;).
>
>> The script below creates an SQLite database for the current set of
>> packages, but only for those already in the store:
>>
>>   Guix repl file-database.scm populate
>>
>> That creates /tmp/db; it took about 25mn on berlin, for 18K packages.
>> Then you can run, say:
>>
>>   Guix repl file-database.scm search boot-9.scm
>>
>> to find which packages provide a file named ‘boot-9.scm’.  That part is
>> instantaneous.
>>
>> The database for 18K packages is quite big:
>>
>> --8<---------------cut here---------------start------------->8---
>> $ du -h /tmp/db*
>> 389M    /tmp/db
>> 82M     /tmp/db.gz
>> 61M     /tmp/db.zst
>> --8<---------------cut here---------------end--------------->8---
>
> For information, in a most recent implementation (@civodul provided me
> in #guix-devel), I noticed multiple calls to the indexation step would
> duplicate information (at all levels packages, files, directories). So
> that might have had an impact in the extracted values above (if ludo had
> triggered multiple times the script at the time).
>
> Jsyk, I have started iterating a bit over that provided implementation
> (and fixed the current caveat mentioned), added some help message...
> I'll follow up with it in a bit (same thread) to have some more feedback
> on it.
>
>> How do we expose that information?  There are several criteria I can
>> think of: accuracy, freshness, privacy, responsiveness, off-line
>> operation.
>>
>> I think accuracy (making sure you get results that correspond precisely
>> to, say, your current channel revisions and your current system) is not
>> a high priority: some result is better than no result.
>
> I definitely agree with this. At least from the offline use perspective.
> I did not focus at all on the second part of the problematic ("online"
> and distribution use).
>
>> Likewise for freshness: results for an older version of a given
>> package may still be valid now.
>
> Indeed.
>
> Cheers,
> --
> tony / Antoine R. Dumont (@ardumont)
>
> -----------------------------------------------------------------
> gpg fingerprint BF00 203D 741A C9D5 46A8 BE07 52E2 E984 0D10 C3B8

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]