Changelog

Parsl 0.8.0

Released on June 13th, 2019

Parsl v0.8.0 includes 58 closed issues and pull requests with contributions (code, tests, reviews and reports)

from: Andrew Litteken @AndrewLitteken, Anna Woodard @annawoodard, Antonio Villarreal @villarrealas, Ben Clifford @benc, Daniel S. Katz @danielskatz, Eric Tatara @etatara, Juan David Garrido @garri1105, Kyle Chard @@kylechard, Lindsey Gray @lgray, Tim Armstrong @timarmstrong, Tom Glanzman @TomGlanzman, Yadu Nand Babuji @yadudoc, and Zhuozhao Li @ZhuozhaoLi

New Functionality

  • Monitoring is now integrated into parsl as default functionality.

  • parsl.AUTO_LOGNAME: Support for a special AUTO_LOGNAME option to auto generate stdout and stderr file paths.

  • parsl.Files no longer behave as strings. This means that operations in apps that treated parsl.Files as strings will break. For example the following snippet will have to be updated:

    # Old style: " ".join(inputs) is legal since inputs will behave like a list of strings
    @bash_app
    def concat(inputs=[], outputs=[], stdout="stdout.txt", stderr='stderr.txt'):
        return "cat {0} > {1}".format(" ".join(inputs), outputs[0])
    
    # New style:
    @bash_app
    def concat(inputs=[], outputs=[], stdout="stdout.txt", stderr='stderr.txt'):
        return "cat {0} > {1}".format(" ".join(list(map(str,inputs))), outputs[0])
    
  • Cleaner user app file log management.

  • Updated configurations using HighThroughputExecutor in the configuration section of the userguide.

  • Support for OAuth based SSH with OAuthSSHChannel.

Bug Fixes

  • Monitoring resource usage bug issue#975
  • Bash apps fail due to missing dir paths issue#1001
  • Viz server explicit binding fix issue#1023
  • Fix sqlalchemy version warning issue#997
  • All workflows are called typeguard issue#973
  • Fix ModuleNotFoundError: No module named 'monitoring issue#971
  • Fix sqlite3 integrity error issue#920
  • HTEX interchange check python version mismatch to the micro level issue#857
  • Clarify warning message when a manager goes missing issue#698
  • Apps without a specified DFK should use the global DFK in scope at call time, not at other times. issue#697

Parsl 0.7.2

Released on Mar 14th, 2019

New Functionality

  • Monitoring: Support for reporting monitoring data to a local sqlite database is now available.
  • Parsl is switching to an opt-in model for anonymous usage tracking. Read more here: Usage statistics collection.
  • bash_app now supports specification of write modes for stdout and stderr.
  • Persistent volume support added to Kubernetes provider.
  • Scaling recommendations from study on Bluewaters is now available in the userguide.

Parsl 0.7.1

Released on Jan 18th, 2019

New Functionality

  • LowLatencyExecutor: a new executor designed to address use-cases with tight latency requirements such as model serving (Machine Learning), function serving and interactive analyses is now available.
  • New options in HighThroughputExecutor:
    • suppress_failure: Enable suppression of worker rejoin errors.
    • max_workers: Limit workers spawned by manager
  • Late binding of DFK, allows apps to pick DFK dynamically at call time. This functionality adds safety to cases where a new config is loaded and a new DFK is created.

Bug fixes

Parsl 0.7.0

Released on Dec 20st, 2018

Parsl v0.7.0 includes 110 closed issues with contributions (code, tests, reviews and reports) from: Alex Hays @ahayschi, Anna Woodard @annawoodard, Ben Clifford @benc, Connor Pigg @ConnorPigg, David Heise @daheise, Daniel S. Katz @danielskatz, Dominic Fitzgerald @djf604, Francois Lanusse @EiffL, Juan David Garrido @garri1105, Gordon Watts @gordonwatts, Justin Wozniak @jmjwozniak, Joseph Moon @jmoon1506, Kenyi Hurtado @khurtado, Kyle Chard @kylechard, Lukasz Lacinski @lukaszlacinski, Ravi Madduri @madduri, Marco Govoni @mgovoni-devel, Reid McIlroy-Young @reidmcy, Ryan Chard @ryanchard, @sdustrud, Yadu Nand Babuji @yadudoc, and Zhuozhao Li @ZhuozhaoLi

New functionality

  • HighThroughputExecutor: a new executor intended to replace the IPyParallelExecutor is now available. This new executor addresses several limitations of IPyParallelExecutor such as:

    • Scale beyond the ~300 worker limitation of IPP.
    • Multi-processing manager supports execution on all cores of a single node.
    • Improved worker side reporting of version, system and status info.
    • Supports failure detection and cleaner manager shutdown.

    Here’s a sample configuration for using this executor locally:

    from parsl.providers import LocalProvider
    from parsl.channels import LocalChannel
    
    from parsl.config import Config
    from parsl.executors import HighThroughputExecutor
    
    config = Config(
        executors=[
            HighThroughputExecutor(
                label="htex_local",
                cores_per_worker=1,
                provider=LocalProvider(
                    channel=LocalChannel(),
                    init_blocks=1,
                    max_blocks=1,
                ),
            )
        ],
    )
    

    More information on configuring is available in the Configuration section.

  • ExtremeScaleExecutor a new executor targeting supercomputer scale (>1000 nodes) workflows is now available.

    Here’s a sample configuration for using this executor locally:

    from parsl.providers import LocalProvider
    from parsl.channels import LocalChannel
    from parsl.launchers import SimpleLauncher
    
    from parsl.config import Config
    from parsl.executors import ExtremeScaleExecutor
    
    config = Config(
        executors=[
            ExtremeScaleExecutor(
                label="extreme_local",
                ranks_per_node=4,
                provider=LocalProvider(
                    channel=LocalChannel(),
                    init_blocks=0,
                    max_blocks=1,
                    launcher=SimpleLauncher(),
                )
            )
        ],
        strategy=None,
    )
    

    More information on configuring is available in the Configuration section.

  • The libsubmit repository has been merged with Parsl to reduce overheads on maintenance with respect to documentation, testing, and release synchronization. Since the merge, the API has undergone several updates to support the growing collection of executors, and as a result Parsl 0.7.0+ will not be backwards compatible with the standalone libsubmit repos. The major components of libsubmit are now available through Parsl, and require the following changes to import lines to migrate scripts to 0.7.0:

    • from libsubmit.providers import <ProviderName> is now from parsl.providers import <ProviderName>
    • from libsubmit.channels import <ChannelName> is now from parsl.channels import <ChannelName>
    • from libsubmit.launchers import <LauncherName> is now from parsl.launchers import <LauncherName>

    Warning

    This is a breaking change from Parsl v0.6.0

  • To support resource-based requests for workers and to maintain uniformity across interfaces, tasks_per_node is no longer a provider option. Instead, the notion of tasks_per_node is defined via executor specific options, for eg:

    Warning

    This is a breaking change from Parsl v0.6.0

  • Major upgrades to the monitoring infrastructure.
    • Monitoring information can now be written to a SQLite database, created on the fly by Parsl
    • Web-based monitoring to track workflow progress
  • Determining the correct IP address/interface given network firewall rules is often a nuisance. To simplify this, three new methods are now supported:

    • parsl.addresses.address_by_route
    • parsl.addresses.address_by_query
    • parsl.addresses.address_by_hostname
  • AprunLauncher now supports overrides option that allows arbitrary strings to be added to the aprun launcher call.

  • DataFlowKernel has a new method wait_for_current_tasks()

  • DataFlowKernel now uses per-task locks and an improved mechanism to handle task completions improving performance for workflows with large number of tasks.

Bug fixes (highlights)

  • Ctlr+C should cause fast DFK cleanup issue#641
  • Fix to avoid padding in wtime_to_minutes() issue#522
  • Updates to block semantics issue#557
  • Updates public_ip to address for clarity issue#557
  • Improvements to launcher docs issue#424
  • Fixes for inconsistencies between stream_logger and file_logger issue#629
  • Fixes to DFK discarding some un-executed tasks at end of workflow issue#222
  • Implement per-task locks to avoid deadlocks issue#591
  • Fixes to internal consistency errors issue#604
  • Removed unnecessary provider labels issue#440
  • Fixes to TorqueProvider to work on NSCC issue#489
  • Several fixes and updates to monitoring subsystem issue#471
  • DataManager calls wrong DFK issue#412
  • Config isn’t reloading properly in notebooks issue#549
  • Cobalt provider partition should be queue issue#353
  • bash AppFailure exceptions contain useful but un-displayed information issue#384
  • Do not CD to engine_dir issue#543
  • Parsl install fails without kubernetes config file issue#527
  • Fix import error issue#533
  • Change Local Database Strategy from Many Writers to a Single Writer issue#472
  • All run-related working files should go in the rundir unless otherwise configured issue#457
  • Fix concurrency issue with many engines accessing the same IPP config issue#469
  • Ensure we are not caching failed tasks issue#368
  • File staging of unknown schemes fails silently issue#382
  • Inform user checkpointed results are being used issue#494
  • Fix IPP + python 3.5 failure issue#490
  • File creation fails if no executor has been loaded issue#482
  • Make sure tasks in dep_fail state are retried issue#473
  • Hard requirement for CMRESHandler issue#422
  • Log error Globus events to stderr issue#436
  • Take ‘slots’ out of logging issue#411
  • Remove redundant logging issue#267
  • Zombie ipcontroller processes - Process cleanup in case of interruption issue#460
  • IPyparallel failure when submitting several apps in parallel threads issue#451
  • SlurmProvider + SingleNodeLauncher starts all engines on a single core issue#454
  • IPP engine_dir has no effect if indicated dir does not exist issue#446
  • Clarify AppBadFormatting error issue#433
  • confusing error message with simple configs issue#379
  • Error due to missing kubernetes config file issue#432
  • parsl.configs and parsl.tests.configs missing init files issue#409
  • Error when Python versions differ issue#62
  • Fixing ManagerLost error in HTEX/EXEX issue#577
  • Write all debug logs to rundir by default in HTEX/EXEX issue#574
  • Write one log per HTEX worker issue#572
  • Fixing ManagerLost error in HTEX/EXEX issue#577

Parsl 0.6.1

Released on July 23rd, 2018.

This point release contains fixes for issue#409

Parsl 0.6.0

Released July 23rd, 2018.

New functionality

  • Switch to class based configuration issue#133

    Here’s a the config for using threads for local execution

    from parsl.config import Config
    from parsl.executors.threads import ThreadPoolExecutor
    
    config = Config(executors=[ThreadPoolExecutor()])
    

    Here’s a more complex config that uses SSH to run on a Slurm based cluster

    from libsubmit.channels import SSHChannel
    from libsubmit.providers import SlurmProvider
    
    from parsl.config import Config
    from parsl.executors.ipp import IPyParallelExecutor
    from parsl.executors.ipp_controller import Controller
    
    config = Config(
        executors=[
            IPyParallelExecutor(
                provider=SlurmProvider(
                    'westmere',
                    channel=SSHChannel(
                        hostname='swift.rcc.uchicago.edu',
                        username=<USERNAME>,
                        script_dir=<SCRIPTDIR>
                    ),
                    init_blocks=1,
                    min_blocks=1,
                    max_blocks=2,
                    nodes_per_block=1,
                    tasks_per_node=4,
                    parallelism=0.5,
                    overrides=<SPECIFY_INSTRUCTIONS_TO_LOAD_PYTHON3>
                ),
                label='midway_ipp',
                controller=Controller(public_ip=<PUBLIC_IP>),
            )
        ]
    )
    
  • Implicit Data Staging issue#281

  • Support for application profiling issue#5

  • Real-time usage tracking via external systems issue#248, issue#251

  • Several fixes and upgrades to tests and testing infrastructure issue#157, issue#159, issue#128, issue#192, issue#196

  • Better state reporting in logs issue#242

  • Hide DFK issue#50

    • Instead of passing a config dictionary to the DataFlowKernel, now you can call parsl.load(Config)

    • Instead of having to specify the dfk at the time of App declaration, the DFK is a singleton loaded at call time :

      import parsl
      from parsl.tests.configs.local_ipp import config
      parsl.load(config)
      
      @App('python')
      def double(x):
          return x * 2
      
      fut = double(5)
      fut.result()
      
  • Support for better reporting of remote side exceptions issue#110

Bug Fixes

  • Making naming conventions consistent issue#109
  • Globus staging returns unclear error bug issue#178
  • Duplicate log-lines when using IPP issue#204
  • Usage tracking with certain missing network causes 20s startup delay. issue#220
  • task_exit checkpointing repeatedly truncates checkpoint file during run bug issue#230
  • Checkpoints will not reload from a run that was Ctrl-C’ed issue#232
  • Race condition in task checkpointing issue#234
  • Failures not to be checkpointed issue#239
  • Naming inconsitencies with maxThreads, max_threads, max_workers are now resolved issue#303
  • Fatal not a git repository alerts issue#326
  • Default kwargs in bash apps unavailable at command-line string format time issue#349
  • Fix launcher class inconsistencies issue#360
  • Several fixes to AWS provider issue#362
    • Fixes faulty status updates
    • Faulty termination of instance at cleanup, leaving zombie nodes.

Parsl 0.5.1

Released. May 15th, 2018.

New functionality

Bug Fixes


Released. June 21st, 2018. This is an emergency release addressing issue#347

Bug Fixes

  • Parsl version conflict with libsubmit 0.4.1 issue#347

Parsl 0.5.0

Released. Apr 16th, 2018.

New functionality

  • Support for Globus file transfers issue#71

    Caution

    This feature is available from Parsl v0.5.0 in an experimental state.

  • PathLike behavior for Files issue#174
    • Files behave like strings here :
    myfile = File("hello.txt")
    f = open(myfile, 'r')
    
  • Automatic checkpointing modes issue#106

    config = {
        "globals": {
            "lazyErrors": True,
            "memoize": True,
            "checkpointMode": "dfk_exit"
        }
    }
    
  • Support for containers with docker issue#45

        localDockerIPP = {
             "sites": [
                 {"site": "Local_IPP",
                  "auth": {"channel": None},
                  "execution": {
                      "executor": "ipp",
                      "container": {
                          "type": "docker",     # <----- Specify Docker
                          "image": "app1_v0.1", # <------Specify docker image
                      },
                      "provider": "local",
                      "block": {
                          "initBlocks": 2,  # Start with 4 workers
                      },
                  }
                  }],
             "globals": {"lazyErrors": True}        }
    
    .. caution::
      This feature is available from Parsl ``v0.5.0`` in an ``experimental`` state.
    
  • Cleaner logging issue#85
    • Logs are now written by default to runinfo/RUN_ID/parsl.log.
    • INFO log lines are more readable and compact
  • Local configs are now packaged issue#96

    from parsl.configs.local import localThreads
    from parsl.configs.local import localIPP
    

Bug Fixes

  • Passing Files over IPP broken issue#200
  • Fix DataFuture.__repr__ for default instantiation issue#164
  • Results added to appCache before retries exhausted issue#130
  • Missing documentation added for Multisite and Error handling issue#116
  • TypeError raised when a bad stdout/stderr path is provided. issue#104
  • Race condition in DFK issue#102
  • Cobalt provider broken on Cooley.alfc issue#101
  • No blocks provisioned if parallelism/blocks = 0 issue#97
  • Checkpoint restart assumes rundir issue#95
  • Logger continues after cleanup is called issue#93

Parsl 0.4.1

Released. Feb 23rd, 2018.

New functionality

  • GoogleCloud provider support via libsubmit
  • GridEngine provider support via libsubmit

Bug Fixes

  • Cobalt provider issues with job state issue#101
  • Parsl updates config inadvertently issue#98
  • No blocks provisioned if parallelism/blocks = 0 issue#97
  • Checkpoint restart assumes rundir bug issue#95
  • Logger continues after cleanup called enhancement issue#93
  • Error checkpointing when no cache enabled issue#92
  • Several fixes to libsubmit.

Parsl 0.4.0

Here are the major changes included in the Parsl 0.4.0 release.

New functionality

  • Elastic scaling in response to workflow pressure. issue#46 Options minBlocks, maxBlocks, and parallelism now work and controls workflow execution.

    Documented in: Elasticity

  • Multisite support, enables targetting apps within a single workflow to different sites issue#48

    @App('python', dfk, sites=['SITE1', 'SITE2'])
    def my_app(...):
       ...
    
  • Anonymized usage tracking added. issue#34

    Documented in: Usage statistics collection

  • AppCaching and Checkpointing issue#43

    # Set cache=True to enable appCaching
    @App('python', dfk, cache=True)
    def my_app(...):
        ...
    
    
    # To checkpoint a workflow:
    dfk.checkpoint()
    

    Documented in: Checkpointing, App caching

  • Parsl now creates a new directory under ./runinfo/ with an incrementing number per workflow invocation

  • Troubleshooting guide and more documentation

  • PEP8 conformance tests added to travis testing issue#72

Bug Fixes

  • Missing documentation from libsubmit was added back issue#41
  • Fixes for script_dir | scriptDir inconsistencies issue#64
    • We now use scriptDir exclusively.
  • Fix for caching not working on jupyter notebooks issue#90
  • Config defaults module failure when part of the option set is provided issue#74
  • Fixes for network errors with usage_tracking issue#70
  • PEP8 conformance of code and tests with limited exclusions issue#72
  • Doc bug in recommending max_workers instead of maxThreads issue#73

Parsl 0.3.1

This is a point release with mostly minor features and several bug fixes

  • Fixes for remote side handling
  • Support for specifying IPythonDir for IPP controllers
  • Several tests added that test provider launcher functionality from libsubmit
  • This upgrade will also push the libsubmit requirement from 0.2.4 -> 0.2.5.

Several critical fixes from libsubmit are brought in:

  • Several fixes and improvements to Condor from @annawoodard.
  • Support for Torque scheduler
  • Provider script output paths are fixed
  • Increased walltimes to deal with slow scheduler system
  • Srun launcher for slurm systems
  • SSH channels now support file_pull() method
    While files are not automatically staged, the channels provide support for bi-directional file transport.

Parsl 0.3.0

Here are the major changes that are included in the Parsl 0.3.0 release.

New functionality

  • Arguments to DFK has changed:

    # Old dfk(executor_obj)

    # New, pass a list of executors dfk(executors=[list_of_executors])

    # Alternatively, pass the config from which the DFK will #instantiate resources dfk(config=config_dict)

  • Execution providers have been restructured to a separate repo: libsubmit

  • Bash app styles have changes to return the commandline string rather than be assigned to the special keyword cmd_line. Please refer to RFC #37 for more details. This is a non-backward compatible change.

  • Output files from apps are now made available as an attribute of the AppFuture. Please refer #26 for more details. This is a non-backward compatible change

    # This is the pre 0.3.0 style
    app_fu, [file1, file2] = make_files(x, y, outputs=['f1.txt', 'f2.txt'])
    
    #This is the style that will be followed going forward.
    app_fu = make_files(x, y, outputs=['f1.txt', 'f2.txt'])
    [file1, file2] = app_fu.outputs
    
  • DFK init now supports auto-start of IPP controllers

  • Support for channels via libsubmit. Channels enable execution of commands from execution providers either locally, or remotely via ssh.

  • Bash apps now support timeouts.

  • Support for cobalt execution provider.

Bug fixes

  • Futures have inconsistent behavior in bash app fn body #35
  • Parsl dflow structure missing dependency information #30

Parsl 0.2.0

Here are the major changes that are included in the Parsl 0.2.0 release.

New functionality

  • Support for execution via IPythonParallel executor enabling distributed execution.
  • Generic executors

Parsl 0.1.0

Here are the major changes that are included in the Parsl 0.1.0 release.

New functionality

  • Support for Bash and Python apps
  • Support for chaining of apps via futures handled by the DataFlowKernel.
  • Support for execution over threads.
  • Arbitrary DAGs can be constructed and executed asynchronously.

Bug Fixes

  • Initial release, no listed bugs.

Libsubmit Changelog

As of Parsl 0.7.0 the libsubmit repository has been merged into Parsl.

Libsubmit 0.4.1

Released. June 18th, 2018. This release folds in massive contributions from @annawoodard.

New functionality

  • Several code cleanups, doc improvements, and consistent naming
  • All providers have the initialization and actual start of resources decoupled.

Libsubmit 0.4.0

Released. May 15th, 2018. This release folds in contributions from @ahayschi, @annawoodard, @yadudoc

New functionality

  • Several enhancements and fixes to the AWS cloud provider (#44, #45, #50)
  • Added support for python3.4

Bug Fixes

  • Condor jobs left in queue with X state at end of completion issue#26
  • Worker launches on Cori seem to fail from broken ENV issue#27
  • EC2 provider throwing an exception at initial run issue#46