bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnu parallel in the bash manual


From: Linda Walsh
Subject: Re: gnu parallel in the bash manual
Date: Tue, 05 Mar 2013 16:03:34 -0800
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666


John Kearney wrote:
> The example is bad anyway as you normally don't want to parallelize disk
> io , due to seek overhead and io bottle neck congestion. This example
> will be slower and more likely to damage your disk than simply using mv
> on its own. but thats another discussion.
---
        That depends on how many IOPS your disk subsystem can
handle and how much cpu is between each of the IO calls.
Generally, unless you have a really old, non-queuing disk,
>1 procs will be of help.  If you have a RAID, it can go
up with # of data spindles (as a max, though if all are reading
from the same area, not so much...;-))...


        Case in point, I wanted to compare rpm versions of files
on disk in a dir to see if there were duplicate version, and if so,
only keep the newest (highest numbered) version) (with the rest
going into a per-disk recycling bin (a fall-out of sharing
those disks to windows and implementing undo abilities on
the shares (samba, vfs_recycle).

        I was working directories with 1000's of files -- (1 dir,
after pruning has 10,312 entries).  Sequential reading of those files
was DOG slow.

        I parallelized it (using perl) first by sorting all the names,
then breaking it into 'N' lists -- doing those in parallel, then
merging the results (and comparing end-points -- like end of one list
might have been diff-ver from beginning of next).  I found a dynamic
'N' based on max cpu load v.disk (i.e. no matter how many procs I
threw at it, it still used about 75% cpu).

So I chose 9:

Hot cache:
Read 12161 rpm names.
Use 1 procs w/12162 items/process
#pkgs=10161, #deletes=2000, total=12161
Recycling 2000 duplicates...Done
 Cumulative      This Phase      ID
 0.000s          0.000s          Init
 0.000s          0.000s          start_program
 0.038s          0.038s          starting_children
 0.038s          0.001s          end_starting_children
 8.653s          8.615s          endRdFrmChldrn_n_start_re_sort
 10.733s         2.079s          afterFinalSort
17.94sec 3.71usr 6.21sys (55.29% cpu)
---------------
Read 12161 rpm names.
Use 9 procs w/1353 items/process
#pkgs=10161, #deletes=2000, total=12161
Recycling 2000 duplicates...Done
 Cumulative      This Phase      ID
 0.000s          0.000s          Init
 0.000s          0.000s          start_program
 0.032s          0.032s          starting_children
 0.036s          0.004s          end_starting_children
 1.535s          1.500s          endRdFrmChldrn_n_start_re_sort
 3.722s          2.187s          afterFinalSort
10.36sec 3.31usr 4.47sys (75.09% cpu)

Cold Cache:
============
Read 12161 rpm names.
Use 1 procs w/12162 items/process
#pkgs=10161, #deletes=2000, total=12161
Recycling 2000 duplicates...Done
 Cumulative      This Phase      ID
 0.000s          0.000s          Init
 0.000s          0.000s          start_program
 0.095s          0.095s          starting_children
 0.096s          0.001s          end_starting_children
 75.067s         74.971s         endRdFrmChldrn_n_start_re_sort
 77.140s         2.073s          afterFinalSort
84.52sec 3.62usr 6.26sys (11.70% cpu)
----
Read 12161 rpm names.
Use 9 procs w/1353 items/process
#pkgs=10161, #deletes=2000, total=12161
Recycling 2000 duplicates...Done
 Cumulative      This Phase      ID
 0.000s          0.000s          Init
 0.000s          0.000s          start_program
 0.107s          0.107s          starting_children
 0.112s          0.005s          end_starting_children
 29.350s         29.238s         endRdFrmChldrn_n_start_re_sort
 31.497s         2.147s          afterFinalSort
38.27sec 3.35usr 4.47sys (20.47% cpu)

---
hot cache savings: 42%
cold cache savings: 55%







reply via email to

[Prev in Thread] Current Thread [Next in Thread]