parsl.providers.SlurmProvider
- class parsl.providers.SlurmProvider(partition: str | None = None, account: str | None = None, qos: str | None = None, constraint: str | None = None, clusters: str | None = None, nodes_per_block: int = 1, cores_per_node: int | None = None, mem_per_node: int | None = None, init_blocks: int = 1, min_blocks: int = 0, max_blocks: int = 1, parallelism: float = 1, walltime: str = '00:10:00', scheduler_options: str = '', regex_job_id: str = 'Submitted batch job (?P<id>\\S*)', worker_init: str = '', cmd_timeout: int = 10, exclusive: bool = True, launcher: Launcher = SingleNodeLauncher(debug=True, fail_on_any=False))[source]
Slurm Execution Provider
This provider uses sbatch to submit, sacct for status and scancel to cancel jobs. The sbatch script to be used is created from a template file in this same module.
- Parameters:
partition (str) – Slurm partition to request blocks from. If unspecified or
None
, no partition slurm directive will be specified.account (str) – Slurm account to which to charge resources used by the job. If unspecified or
None
, the job will use the user’s default account.qos (str) – Slurm queue to place job in. If unspecified or
None
, no queue slurm directive will be specified.constraint (str) – Slurm job constraint, often used to choose cpu or gpu type. If unspecified or
None
, no constraint slurm directive will be added.clusters (str) – Slurm cluster name, or comma seperated cluster list, used to choose between different clusters in a federated Slurm instance. If unspecified or
None
, no slurm directive for clusters will be added.nodes_per_block (int) – Nodes to provision per block.
cores_per_node (int) – Specify the number of cores to provision per node. If set to None, executors will assume all cores on the node are available for computation. Default is None.
mem_per_node (int) – Specify the real memory to provision per node in GB. If set to None, no explicit request to the scheduler will be made. Default is None.
init_blocks (int) – Number of blocks to provision at the start of the run. Default is 1.
min_blocks (int) – Minimum number of blocks to maintain.
max_blocks (int) – Maximum number of blocks to maintain.
parallelism (float) – Ratio of provisioned task slots to active tasks. A parallelism value of 1 represents aggressive scaling where as many resources as possible are used; parallelism close to 0 represents the opposite situation in which as few resources as possible (i.e., min_blocks) are used.
walltime (str) – Walltime requested per block in HH:MM:SS.
scheduler_options (str) – String to prepend to the #SBATCH blocks in the submit script to the scheduler.
regex_job_id (str) – The regular expression used to extract the job ID from the
sbatch
standard output. The default isr"Submitted batch job (?P<id>\S*)"
, whereid
is the regular expression symbolic group for the job ID.worker_init (str) – Command to be run before starting a worker, such as ‘module load Anaconda; source activate env’.
exclusive (bool (Default = True)) – Requests nodes which are not shared with other running jobs.
launcher (Launcher) – Launcher for this provider. Possible launchers include
SingleNodeLauncher
(the default),SrunLauncher
, orAprunLauncher
- __init__(partition: str | None = None, account: str | None = None, qos: str | None = None, constraint: str | None = None, clusters: str | None = None, nodes_per_block: int = 1, cores_per_node: int | None = None, mem_per_node: int | None = None, init_blocks: int = 1, min_blocks: int = 0, max_blocks: int = 1, parallelism: float = 1, walltime: str = '00:10:00', scheduler_options: str = '', regex_job_id: str = 'Submitted batch job (?P<id>\\S*)', worker_init: str = '', cmd_timeout: int = 10, exclusive: bool = True, launcher: Launcher = SingleNodeLauncher(debug=True, fail_on_any=False))[source]
Methods
__init__
([partition, account, qos, ...])cancel
(job_ids)Cancels the jobs specified by a list of job ids
execute_wait
(cmd[, timeout])status
(job_ids)Get the status of a list of jobs identified by the job identifiers returned from the submit request.
submit
(command, tasks_per_node[, job_name])Submit the command as a slurm job.
Attributes
cores_per_node
Number of cores to provision per node.
label
Provides the label for this provider
mem_per_node
Real memory to provision per node in GB.
Returns the interval, in seconds, at which the status method should be called.
- cancel(job_ids)[source]
Cancels the jobs specified by a list of job ids
Args: job_ids : [<job_id> …]
Returns : [True/False…] : If the cancel operation fails the entire list will be False.
- property status_polling_interval[source]
Returns the interval, in seconds, at which the status method should be called.
- Returns:
the number of seconds to wait between calls to status()