[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
What to do for faster `remove-duplicates'?
From: |
Oleh Krehel |
Subject: |
What to do for faster `remove-duplicates'? |
Date: |
Wed, 06 May 2015 09:33:02 +0200 |
Hi all,
Should `cl-remove-duplicates' switch to using a hash table if the
candidates list is large?
I see here a speedup factor of >500 for removing the duplicates of
`load-library' candidates. Helm has a function for this, which looks
very simple (much more simple that the current `cl-remove-duplicates'):
(cl-defun helm-fast-remove-dups (seq &key (test 'eq))
"Remove duplicates elements in list SEQ.
This is same as `remove-duplicates' but with memoisation.
It is much faster, especially in large lists.
A test function can be provided with TEST argument key.
Default is `eq'."
(cl-loop with cont = (make-hash-table :test test)
for elm in seq
unless (gethash elm cont)
do (puthash elm elm cont)
finally return
(cl-loop for i being the hash-values in cont collect i)))
And here are the benchmark tests:
(setq cands (locate-file-completion-table
load-path (get-load-suffixes) "" nil t))
(length cands)
5357
(length (cl-remove-duplicates cands :test 'equal))
2481
(benchmark-run (cl-remove-duplicates cands :test 'equal))
(0.67873101 0 0.0)
(benchmark-run (helm-fast-remove-dups cands :test 'equal))
(0.001350054 0 0.0)
(benchmark-run (seq-uniq cands 'equal))
(5.270219822 27 2.396615401000002)
This is also a request for seq.el to revise its optimization strategy.
So should `cl-remove-duplicates' be a 2-in-1 (lists for small input,
hash table for large input), or should there be
`cl-remove-duplicates-hash', or an optional argument to
`cl-remove-duplicates' to use a hash?
Oleh
- What to do for faster `remove-duplicates'?,
Oleh Krehel <=
- Re: What to do for faster `remove-duplicates'?, Stefan Monnier, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Stefan Monnier, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Thierry Volpiatto, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06