fab-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fab-user] Launch many parallel tasks against a single (local) host


From: Tyler Pirtle
Subject: Re: [Fab-user] Launch many parallel tasks against a single (local) host
Date: Tue, 27 Mar 2012 17:05:33 -0700



On Tue, Mar 27, 2012 at 4:08 PM, Morgan Goose <address@hidden> wrote:
Nah, Fabric's meant to make remote stuff easier. If you're doing local
stuff it's encouraged to shell out or use the native python you have,
so that every single command in the fabric api doesn't have to have
local mode.


Thats exactly what I'm trying to do - i'm trying to make remote stuff easier. ;)

Also if you used Fabric to make multiple runs of the same task with
different args to the same host, you'd be making x number of
concurrent forks on you local machine all with their own stdin/stdout
using execute, vs. one process thread on the local machine and
multiple forks on the remote machine using parallel.


Yes. This is what I'd be doing anyway, say i have a file i want to copy to X hosts in
parallel, I'd be making X forks locally. All with their own everything, yes. Fabric already
does this. I'm not quite sure you understood what I'm trying to do - I dont want multiple
processes on the remote machines, I want them each to do a single task.
 
If you wanted to do both multiple machines all with multiple calls on
each this method would still work, and be more concise as each machine
would only have one stdin/stdout to have fabric manage.

also when you right it as I did you can use and call it from execute:

def parallel_command(cmd, list_args, procs=4):
   run("echo '%s' | xargs -P %d -n1 %s" % (
            " ".join(list_of_args), num_of_procs, cmd))

@task
def foo():
   execute(parallel_command, command, list_args, hosts=lost_list,
parallel=True)

so with this you can run this on a list of hosts including localhost.



Again, while I'm a huge fan of xargs, and while I agree this is clever, I have issues
with the flexibility of this approach. For one, as i mentioned earlier, what happens when
my input becomes more complex than just simple strings? Now I need to serialize and
de-serialize python objects across a shell which seems rather silly. Also xargs varies
in arguments between BSD and Linux and is not as portable as I would like.


Check out this example from IPython's parallel documentation, where they want to do some calculation of millions of digits of Pi, but they want each worker to fetch a file when they start:

http://ipython.org/ipython-doc/rel-0.12/parallel/parallel_demos.html#million-digits-of-pi
In [2]: c = Client(profile='mycluster')
In [3]: v = c[:]

In [3]: c.ids
Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

In [4]: run pidigits.py

In [5]: filestring = 'pi200m.ascii.%(i)02dof20'

# Create the list of files to process.
In [6]: files = [filestring % {'i':i} for i in range(1,16)]
In [7]: files
Out[7]:
['pi200m.ascii.01of20',
 'pi200m.ascii.02of20',
 'pi200m.ascii.03of20',
 'pi200m.ascii.04of20',
 'pi200m.ascii.05of20',
 'pi200m.ascii.06of20',
 'pi200m.ascii.07of20',
 'pi200m.ascii.08of20',
 'pi200m.ascii.09of20',
 'pi200m.ascii.10of20',
 'pi200m.ascii.11of20',
 'pi200m.ascii.12of20',
 'pi200m.ascii.13of20',
 'pi200m.ascii.14of20',
 'pi200m.ascii.15of20']

# download the data files if they don't already exist:
In [8]: v.map(fetch_pi_file, files)
In IPython, here's 15 files that need to be placed on some number of remote hosts. I have no way of doing this very trivial example in a straightforward way in Fabric, without resorting to either invoking fab in a reentrant fashion via xargs, or via the silly hack I discovered below, or by rolling my own parallelism (again, just so that I can pass some kind of invariant to each invocation of function i'm wanting to parallelize).

So again, and I'll leave it at this it seems to me that this relatively simple task is a use case that Fabric should support - there are lots of examples of requiring this type of operation, notably any type of MapReduce setup.

I'm happy to propose and provide modifications to execute() (or perhaps a sibling to execute()), but as you say perhaps it isn't a good fit for Fabric.


T


On Tue, Mar 27, 2012 at 3:44 PM, Tyler Pirtle <address@hidden> wrote:
>
>
> On Tue, Mar 27, 2012 at 3:24 PM, Morgan Goose <address@hidden>
> wrote:
>>
>> Use xargs:
>> def parallel_local(cmd, list_of_args, procs=4):
>>    local("echo '%s' | xargs -P %d -n1 %s" % (" ".join(list_of_args),
>> num_of_procs, cmd))
>>
>>
>> I find that most everything you wanna do has a gnu command or flag for
>> it already. thankfully.
>>
>> -goose
>>
>
> While I agree this is a solution... ;) it still seems like a bit of a
> kludge. Also, i'd typically be calling fab again because I want to write the
> command with fabric APIs. So make it "...-n1 fab %s ", sure.
>
> But still, my point is that while Fabric is good at executing commands in
> parallel for some unique set of hosts, it seems like execute() should have
> some mode of operation to do the complement, to execute commands in parallel
> for some set of hosts as well as some set of inputs.
>
> Rather than shell out to xargs to get this done, I'd just as soon call
> multiprocessing.imap() to do what I need to do directly - my observation
> here is that this is very close to what execute() does already, perhaps
> there is room for it within execute().
>
>
> T
>
>
>
>>
>>
>> On Tue, Mar 27, 2012 at 2:57 PM, Tyler Pirtle <address@hidden> wrote:
>> >
>> >
>> > On Tue, Mar 27, 2012 at 1:13 PM, Morgan Goose <address@hidden>
>> > wrote:
>> >>
>> >> What is it you even mean by staging? You might be over looking GNU
>> >> commands that will do exactly what you need. If you give us some more
>> >> information, we might be able to assist.
>> >>
>> >> -goose
>> >>
>> >
>> > Hi goose - let me try and clarify.
>> >
>> > My use case is something like invoking a function over one host many
>> > times
>> > with
>> > different argument values (Or many hosts with many values for extra
>> > credit).
>> >
>> > So I've got some function i want to execute on localhost,
>> >
>> > def foo(i): ...
>> >
>> > What I'd like to do is imap(foo, xrange(1, 10)) in parallel.
>> >
>> > The execute() function doesn't give me a way to express this (the way
>> > I've
>> > done this below is to generate fake host lists). The host list seems to
>> > be
>> > the unit controlling the parallelism, I'd like to be able to control it
>> > via
>> > input arguments
>> > as well.
>> >
>> > To illustrate:
>> >
>> > (with env.parallel = True)
>> > hosts = ["localhost", "localhost", "localhost"]
>> > execute(foo, hosts, ... )
>> >
>> > Would execute "foo" 3 times in parallel (except it wouldnt because host
>> > lists are uniq'd)
>> >
>> > What I'm looking for:
>> >
>> > hosts = ["localhost"]
>> > input = xrange(10)
>> > execute(foo, hosts, input)
>> >
>> > Should be equivalent to imap(foo, input) over hosts.
>> >
>> >
>> >
>> >
>> >>
>> >> On Fri, Mar 23, 2012 at 7:42 PM, Tyler Pirtle <address@hidden> wrote:
>> >> > I realize this use case in particular may sound a little strange, but
>> >> > bear
>> >> > with me.
>> >> >
>> >> > I've got 10 files lets say, locally, and would like to launch 10
>> >> > tasks
>> >> > to
>> >> > stage them to some other target. I'd like to do this in parallel,
>> >> > since
>> >> > the
>> >> > files are named appropriately (they're sharded), and what I'd like to
>> >> > do
>> >> > is
>> >> > to construct a call in fabric that executed some function over the
>> >> > same
>> >> > host
>> >> > with varying input.
>> >> >
>> >> > At a first pass, I thought I would just try to get one task to
>> >> > execute N
>> >> > times in parallel at localhost:
>> >> >
>> >> > @parallel
>> >> > def Stuff():
>> >> >   print "Sleep!"
>> >> >   time.sleep(2)
>> >> >
>> >> >
>> >> > $ fab -H localhost,localhost,localhost Stuff
>> >> > [localhost] Executing task 'Stuff'
>> >> > Sleep!
>> >> >
>> >> > Done.
>> >> >
>> >> > This doesn't work because the host list gets merged (merge in
>> >> > task_utils.py).
>> >> >
>> >> > But, merge() isn't that smart.
>> >> >
>> >> > $ fab -P -H address@hidden,address@hidden,address@hidden Stuff
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > Sleep!
>> >> > Sleep!
>> >> > Sleep!
>> >> >
>> >> > So this is a workaround and a pretty silly hack - does anyone have a
>> >> > better
>> >> > way to do this? As a further hack, I wrap it in an easy-to-use
>> >> > command:
>> >> >
>> >> >
>> >> > @parallel
>> >> > def Stuff():
>> >> >   print "Sleep - %s" % env.host
>> >> >
>> >> > @hosts("localhost")
>> >> > @parallel
>> >> > def PLaunch(target="", times=1):
>> >> >   h = []
>> >> >   for i in xrange(int(times)):
>> >> >     h.append("address@hidden" % i)
>> >> >   execute(target, hosts=h)
>> >> >
>> >> > $ time fab -f fab2.py -P PLaunch:Stuff,10
>> >> > [localhost] Executing task 'PLaunch'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > address@hidden Executing task 'Stuff'
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> > Sleep - address@hidden
>> >> >
>> >> > Done.
>> >> >
>> >> > real 0m5.522s
>> >> > user 0m0.352s
>> >> > sys 0m0.251s
>> >> >
>> >> >
>> >> > So i can strip off everything up to the @ and use that as the arg.
>> >> > Any
>> >> > better ideas? ;)
>> >> >
>> >> > Thanks,
>> >> >
>> >> >
>> >> > Tyler
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Fab-user mailing list
>> >> > address@hidden
>> >> > https://lists.nongnu.org/mailman/listinfo/fab-user
>> >> >
>> >
>> >
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]