[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-datamash] Is Datamash parallelizable?
From: |
Assaf Gordon |
Subject: |
Re: [Bug-datamash] Is Datamash parallelizable? |
Date: |
Fri, 08 Aug 2014 07:40:37 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.7.0 |
Hello Maximilian,
On 08/07/2014 09:09 PM, Maximilian E. Schüle wrote:
thanks for maintaining datamash. For my thesis I want to do some speed
tests with data over different databases. For this purpose I was happy
to find the very interesting tool "datamash", that makes it easier to
compare the processing of data in a database to the processing of data
with normal shell-scripts. For this reason I want to know if Datamash is
parallelizable or does it work on parallel threads. Is it like this?
I'm glad to hear you find "datamash" useful.
Currently, "datamash" does not use multiple threads.
I'm always interested in improvement performance, and if there's a good case
for multi-threading I'll be glad to try it out.
I'm working on a I/O speed-up improvement (roughly upto x2.5 faster) which will
be ready on the GNU website soon.
It's available here (including some new operations), if you feel comfortable
trying non-stable version:
http://files.housegordon.org/datamash/src/datamash-1.0.6.30-1ee5.tar.gz
http://git.savannah.gnu.org/cgit/datamash.git/?h=devel1 ( 'devel1' branch ).
One thing I'd consider trying, if you can split your input files,
is to run multiple 'datamash' instances in parallel, then combine the results.
I'll be happy to discuss further,
regards,
- Assaf.