parsl.dataflow.memoization.Memoizer

class parsl.dataflow.memoization.Memoizer(dfk: DataFlowKernel, memoize: bool = True, checkpoint: Dict[str, Future[Any]] = {})[source]

Memoizer is responsible for ensuring that identical work is not repeated.

When a task is repeated, i.e., the same function is called with the same exact arguments, the result from a previous execution is reused. wiki

The memoizer implementation here does not collapse duplicate calls at call time, but works only when the result of a previous call is available at the time the duplicate call is made.

For instance:

No advantage from                 Memoization helps
memoization here:                 here:

 TaskA                            TaskB
   |   TaskA                        |
   |     |   TaskA                done  (TaskB)
   |     |     |                                (TaskB)
 done    |     |
       done    |
             done

The memoizer creates a lookup table by hashing the function name and its inputs, and storing the results of the function.

When a task is ready for launch, i.e., all of its arguments have resolved, we add its hash to the task datastructure.

__init__(dfk: DataFlowKernel, memoize: bool = True, checkpoint: Dict[str, Future[Any]] = {})[source]

Initialize the memoizer.

Parameters:

dfk (-) – The DFK object

KWargs:
  • memoize (Bool): enable memoization or not.

  • checkpoint (Dict): A checkpoint loaded as a dict.

Methods

__init__(dfk[, memoize, checkpoint])

Initialize the memoizer.

check_memo(task)

Create a hash of the task and its inputs and check the lookup table for this hash.

hash_lookup(hashsum)

Lookup a hash in the memoization table.

make_hash(task)

Create a hash of the task inputs.

update_memo(task, r)

Updates the memoization lookup table with the result from a task.

check_memo(task: TaskRecord) Future[Any] | None[source]

Create a hash of the task and its inputs and check the lookup table for this hash.

If present, the results are returned.

Parameters:

task (-) – task from the dfk.tasks table

Returns:

  • Result of the function if present in table, wrapped in a Future

This call will also set task[‘hashsum’] to the unique hashsum for the func+inputs.

hash_lookup(hashsum: str) Future[Any][source]

Lookup a hash in the memoization table.

Parameters:

hashsum (-) – The same hashes used to uniquely identify apps+inputs

Returns:

  • Lookup result

Raises:

- KeyError – if hash not in table

make_hash(task: TaskRecord) str[source]

Create a hash of the task inputs.

Parameters:

task (-) – Task dictionary from dfk.tasks

Returns:

A unique hash string

Return type:

  • hash (str)

update_memo(task: TaskRecord, r: Future[Any]) None[source]

Updates the memoization lookup table with the result from a task.

Parameters:
  • task (-) – A task dict from dfk.tasks

  • r (-) – Result future