Spreading parallel across nodes on HPC system

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Spreading parallel across nodes on HPC system

From:	Ken Mankoff
Subject:	Spreading parallel across nodes on HPC system
Date:	Thu, 10 Nov 2022 20:49:00 +0100
User-agent:	mu4e 1.8.10; emacs 27.1

Hello,

I'm trying to run parallel on multiple nodes. Each node may have a different 
number of CPUs. It appears the best syntax for this is from the man page --slf 
section:

8/my-8-cpu-server.example.com
2/my_other_username@my-dualcore.example.net

My problem is that I'm running in the SLURM environment. I can get the 
hostnames with

scontrol show hostnames $SLURM_JOB_NODELIST > nodelist.0

But I cannot easily get the CPUS-per-node. From the SLURM docs,

SLURM_JOB_CPUS_PER_NODE: Count of CPUs available to the job on the nodes in the 
allocation, using the format CPU_count[(xnumber_of_nodes)][,CPU_count 
[(xnumber_of_nodes)] ...]. For example: SLURM_JOB_CPUS_PER_NODE='72(x2),36' 
indicates that on the first and second nodes (as listed by SLURM_JOB_NODELIST) 
the allocation has 72 CPUs, while the third node has 36 CPUs.

So, parsing '72(x2),36' seems complicated.

If I requested a total of 1000 tasks, but have no control over how many nodes, 
can I just call parallel with -j1000 and pass it a hostfile without the "CPUs/" 
prepended to the hostname? Would parallel then start however many jobs it can 
per node, and if for some reason I was allocated 1000 CPUS on 1 node, that 
would work fine, as would 1 CPU on 1000 different nodes?

Thanks,

  -k.

[Prev in Thread]

Current Thread

[Next in Thread]

Spreading parallel across nodes on HPC system, Ken Mankoff <=
- Re: Spreading parallel across nodes on HPC system, Rob Sargent, 2022/11/10
  - Re: Spreading parallel across nodes on HPC system, Ken Mankoff, 2022/11/11
    - Re: Spreading parallel across nodes on HPC system, Rob Sargent, 2022/11/11
    - Re: Spreading parallel across nodes on HPC system, Ken Mankoff, 2022/11/11
    - Re: Spreading parallel across nodes on HPC system, Christian Meesters, 2022/11/11
- Re: Spreading parallel across nodes on HPC system, Christian Meesters, 2022/11/10
  - Re: Spreading parallel across nodes on HPC system, Ken Mankoff, 2022/11/11
- Re: Spreading parallel across nodes on HPC system, Ken Mankoff, 2022/11/11
  - Re: Spreading parallel across nodes on HPC system, Ole Tange, 2022/11/11
    - Re: Spreading parallel across nodes on HPC system, Ken Mankoff, 2022/11/12

Prev by Date: RE: Error
Next by Date: Re: Spreading parallel across nodes on HPC system
Previous by thread: Re: parallel: This should not happen. You have found a bug.
Next by thread: Re: Spreading parallel across nodes on HPC system
Index(es):
- Date
- Thread