Structuring Parsl programs
While convenient to build simple Parsl programs as a single Python file, splitting a Parsl programs into multiple files and a Python module has significant benefits, including:
Better readability
Logical separation of components (e.g., apps, config, and control logic)
Ease of reuse of components
Large applications that use Parsl often divide into several core components:
The following sections use an example where each component is in a separate file:
examples/logic.py
examples/app.py
examples/config.py
examples/__init__.py
run.py
pyproject.toml
Run the application by first installing the Python library and then executing the “run.py” script.
pip install . # Install module so it can be imported by workers
python run.py
Core application logic
The core application logic should be developed without any deference to Parsl. Implement capabilities, write unit tests, and prepare documentation in which ever way works best for the problem at hand.
Parallelization with Parsl will be easy if the software already follows best practices.
The example defines a function to convert a single integer into binary.
from typing import Tuple
def convert_to_binary(x: int) -> Tuple[bool, ...]:
"""Convert a nonnegative integer into a binary
Args:
x: Number to be converted
Returns:
The binary number represented as list of booleans
"""
if x < 0:
raise ValueError('`x` must be nonnegative')
bin_as_string = bin(x)
return tuple(i == '1' for i in bin_as_string[2:])
Workflow functions
Tasks within a workflow may require unique combinations of core functions. Functions to be run in parallel must also meet specific requirements that may complicate writing the core logic effectively. As such, separating functions to be used as Apps is often beneficial.
The example includes a function to convert many integers into binary.
Key points to note:
It is not necessary to have import statements inside the function. Parsl will serialize this function by reference, as described in Functions from Modules.
The function is not yet marked as a Parsl PythonApp. Keeping Parsl out of the function definitions simplifies testing because you will not need to run Parsl when testing the code.
Advanced: Consider including Parsl decorators in the library if using complex workflow patterns, such as join apps or functions which take special arguments.
"""Functions used as part of the workflow"""
from typing import List, Tuple
from .logic import convert_to_binary
def convert_many_to_binary(xs: List[int]) -> List[Tuple[bool, ...]]:
"""Convert a list of nonnegative integers to binary"""
return [convert_to_binary(x) for x in xs]
Parsl configuration functions
Create Parsl configurations specific to your application needs as functions. While not necessary, including the Parsl configuration functions inside the module ensures they can be imported into other scripts easily.
Generating Parsl Config
objects from a function
makes it possible to change the configuration without editing the module.
The example function provides a configuration suited for a single node.
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider
def make_local_config(cores_per_worker: int = 1) -> Config:
"""Generate a configuration which runs all tasks on the local system
Args:
cores_per_worker: Number of cores to dedicate for each task
Returns:
Configuration object with the requested settings
"""
return Config(
executors=[
HighThroughputExecutor(
label="htex_local",
cores_per_worker=cores_per_worker,
cpu_affinity='block',
provider=LocalProvider(),
)
],
)
Orchestration Scripts
The last file defines the workflow itself.
Such orchestration scripts, at minimum, perform at least four tasks:
Load execution options using a tool like
argparse
.Prepare workflow functions for execution by creating
PythonApp
wrappers over each function.Create configuration then start Parsl with the
parsl.load()
function.Launch tasks and retrieve results depending on the needs of the application.
An example run script is as follows
from argparse import ArgumentParser
import parsl
from library.config import make_local_config
from library.app import convert_many_to_binary
from parsl.app.python import PythonApp
# Protect the script from running twice.
# See "Safe importing of main module" in Python multiprocessing docs
# https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming
if __name__ == "__main__":
# Get user instructions
parser = ArgumentParser()
parser.add_argument('--numbers-per-batch', default=8, type=int)
parser.add_argument('numbers', nargs='+', type=int)
args = parser.parse_args()
# Prepare the workflow functions
convert_app = PythonApp(convert_many_to_binary, cache=False)
# Load the configuration
# As a context manager so resources are shutdown on exit
with parsl.load(make_local_config()):
# Spawn tasks
futures = [
convert_app(args.numbers[start:start + args.numbers_per_batch])
for start in range(0, len(args.numbers), args.numbers_per_batch)
]
# Retrieve task results
for future in futures:
for x, b in zip(future.task_record['args'][0], future.result()):
print(f'{x} -> {"".join("1" if i else "0" for i in b)}')