Structuring Parsl programs

While convenient to build simple Parsl programs as a single Python file, splitting a Parsl programs into multiple files and a Python module has significant benefits, including:

  1. Better readability

  2. Logical separation of components (e.g., apps, config, and control logic)

  3. Ease of reuse of components

Large applications that use Parsl often divide into several core components:

The following sections use an example where each component is in a separate file:

examples/logic.py
examples/app.py
examples/config.py
examples/__init__.py
run.py
pyproject.toml

Run the application by first installing the Python library and then executing the “run.py” script.

pip install .  # Install module so it can be imported by workers
python run.py

Core application logic

The core application logic should be developed without any deference to Parsl. Implement capabilities, write unit tests, and prepare documentation in which ever way works best for the problem at hand.

Parallelization with Parsl will be easy if the software already follows best practices.

The example defines a function to convert a single integer into binary.

library/logic.py
from typing import Tuple


def convert_to_binary(x: int) -> Tuple[bool, ...]:
    """Convert a nonnegative integer into a binary

    Args:
        x: Number to be converted
    Returns:
        The binary number represented as list of booleans
    """
    if x < 0:
        raise ValueError('`x` must be nonnegative')
    bin_as_string = bin(x)
    return tuple(i == '1' for i in bin_as_string[2:])

Workflow functions

Tasks within a workflow may require unique combinations of core functions. Functions to be run in parallel must also meet specific requirements that may complicate writing the core logic effectively. As such, separating functions to be used as Apps is often beneficial.

The example includes a function to convert many integers into binary.

Key points to note:

  • It is not necessary to have import statements inside the function. Parsl will serialize this function by reference, as described in Functions from Modules.

  • The function is not yet marked as a Parsl PythonApp. Keeping Parsl out of the function definitions simplifies testing because you will not need to run Parsl when testing the code.

  • Advanced: Consider including Parsl decorators in the library if using complex workflow patterns, such as join apps or functions which take special arguments.

library/app.py
"""Functions used as part of the workflow"""
from typing import List, Tuple

from .logic import convert_to_binary


def convert_many_to_binary(xs: List[int]) -> List[Tuple[bool, ...]]:
    """Convert a list of nonnegative integers to binary"""
    return [convert_to_binary(x) for x in xs]

Parsl configuration functions

Create Parsl configurations specific to your application needs as functions. While not necessary, including the Parsl configuration functions inside the module ensures they can be imported into other scripts easily.

Generating Parsl Config objects from a function makes it possible to change the configuration without editing the module.

The example function provides a configuration suited for a single node.

library/config.py
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider


def make_local_config(cores_per_worker: int = 1) -> Config:
    """Generate a configuration which runs all tasks on the local system

    Args:
        cores_per_worker: Number of cores to dedicate for each task
    Returns:
        Configuration object with the requested settings
    """
    return Config(
        executors=[
            HighThroughputExecutor(
                label="htex_local",
                cores_per_worker=cores_per_worker,
                cpu_affinity='block',
                provider=LocalProvider(),
            )
        ],
    )

Orchestration Scripts

The last file defines the workflow itself.

Such orchestration scripts, at minimum, perform at least four tasks:

  1. Load execution options using a tool like argparse.

  2. Prepare workflow functions for execution by creating PythonApp wrappers over each function.

  3. Create configuration then start Parsl with the parsl.load() function.

  4. Launch tasks and retrieve results depending on the needs of the application.

An example run script is as follows

run.py
from argparse import ArgumentParser

import parsl

from library.config import make_local_config
from library.app import convert_many_to_binary
from parsl.app.python import PythonApp

# Protect the script from running twice.
#  See "Safe importing of main module" in Python multiprocessing docs
#  https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming
if __name__ == "__main__":
    # Get user instructions
    parser = ArgumentParser()
    parser.add_argument('--numbers-per-batch', default=8, type=int)
    parser.add_argument('numbers', nargs='+', type=int)
    args = parser.parse_args()

    # Prepare the workflow functions
    convert_app = PythonApp(convert_many_to_binary, cache=False)

    # Load the configuration
    #  As a context manager so resources are shutdown on exit
    with parsl.load(make_local_config()):

        # Spawn tasks
        futures = [
            convert_app(args.numbers[start:start + args.numbers_per_batch])
            for start in range(0, len(args.numbers), args.numbers_per_batch)
        ]

        # Retrieve task results
        for future in futures:
            for x, b in zip(future.task_record['args'][0], future.result()):
                print(f'{x} -> {"".join("1" if i else "0" for i in b)}')