Re: [ELPA] New package: find-dups

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ELPA] New package: find-dups

From:	Eli Zaretskii
Subject:	Re: [ELPA] New package: find-dups
Date:	Wed, 11 Oct 2017 21:56:43 +0300

> From: Michael Heerdegen <address@hidden>
> Date: Wed, 11 Oct 2017 19:56:26 +0200
> Cc: address@hidden, Emacs Development <address@hidden>
> 
> #+begin_src emacs-lisp
> (find-dups my-sequence-of-file-names
>            (list (list (lambda (file)
>                          (file-attribute-size (file-attributes file)))
>                        #'eq)
>                  (list (lambda (file)
>                          (shell-command-to-string
>                           (format "head %s"
>                                   (shell-quote-argument file))))
>                        #'equal)
>                  (list (lambda (file)
>                          (shell-command-to-string
>                           (format "md5sum %s | awk '{print $1;}'"
>                                   (shell-quote-argument file))))
>                        #'equal)))
> #+end_src

Apologies for barging into the middle of a discussion, but starting
processes and making strings out of their output to process just a
portion of a file is sub-optimal, because process creation is not
cheap.  It is easier to simply read a predefined number of bytes into
a buffer; insert-file-contents-literally supports that.  Likewise with
md5sum: we have the md5 primitive for that.

In general, working with buffers is much more efficient in Emacs than
working with strings, so avoid strings, let alone large strings, as
much as you can.

One other comment is that shell-command-to-string decodes the output
from the shell command, which is not something you want here, because
AFAIU you are looking for files whose contents is identical on the
byte-stream level, i.e. 2 files which have the same characters, but
are encoded differently on disk (like one UTF-8, the other Latin-1)
should be considered different in this contents, whereas
shell-command-to-string will/might produce identical strings for them.
(Decoding is also expensive run-time wise.)

[Prev in Thread]

Current Thread

[Next in Thread]

[ELPA] New package: find-dups, Michael Heerdegen, 2017/10/11
- Re: [ELPA] New package: find-dups, Robert Weiner, 2017/10/11
  - Re: [ELPA] New package: find-dups, Michael Heerdegen, 2017/10/11
    - Re: [ELPA] New package: find-dups, Eli Zaretskii <=
    - Re: [ELPA] New package: find-dups, Michael Heerdegen, 2017/10/11
    - Re: [ELPA] New package: find-dups, Thien-Thi Nguyen, 2017/10/11
    - Re: [ELPA] New package: find-dups, Robert Weiner, 2017/10/11
- Re: [ELPA] New package: find-dups, Andreas Politz, 2017/10/12
  - Re: [ELPA] New package: find-dups, Michael Heerdegen, 2017/10/12
    - Re: [ELPA] New package: find-dups, Nicolas Petton, 2017/10/12
    - Re: [ELPA] New package: find-dups, Michael Heerdegen, 2017/10/12
    - Re: [ELPA] New package: find-dups, Michael Heerdegen, 2017/10/13

Prev by Date: Re: New Flymake rewrite in emacs-26
Next by Date: Re: Image resizing and rotation on NS port without imagemagick
Previous by thread: Re: [ELPA] New package: find-dups
Next by thread: Re: [ELPA] New package: find-dups
Index(es):
- Date
- Thread