parsl.dataflow.memoization.Memoizer¶

class parsl.dataflow.memoization.Memoizer(dfk, memoize=True, checkpoint={})[source]¶

Memoizer is responsible for ensuring that identical work is not repeated.

When a task is repeated, i.e., the same function is called with the same exact arguments, the result from a previous execution is reused. wiki

The memoizer implementation here does not collapse duplicate calls at call time, but works only when the result of a previous call is available at the time the duplicate call is made.

For instance:

No advantage from                 Memoization helps
memoization here:                 here:

 TaskA                            TaskB
   |   TaskA                        |
   |     |   TaskA                done  (TaskB)
   |     |     |                                (TaskB)
 done    |     |
       done    |
             done

The memoizer creates a lookup table by hashing the function name and its inputs, and storing the results of the function.

When a task is ready for launch, i.e., all of its arguments have resolved, we add its hash to the task datastructure.

__init__(dfk, memoize=True, checkpoint={})[source]¶

Initialize the memoizer.

Parameters: dfk (-) – The DFK object

KWargs:

memoize (Bool): enable memoization or not.
checkpoint (Dict): A checkpoint loaded as a dict.

Methods

`__init__`(dfk[, memoize, checkpoint])	Initialize the memoizer.
`check_memo`(task_id, task)	Create a hash of the task and its inputs and check the lookup table for this hash.
`hash_lookup`(hashsum)	Lookup a hash in the memoization table.
`make_hash`(task)	Create a hash of the task inputs.
`update_memo`(task_id, task, r)	Updates the memoization lookup table with the result from a task.