Suites I

Rose User Guide: Suites I

Suites I

Introduction

What is this?

  • This is guide material for developing and using Rose suites under cylc.
  • A Rose suite is a collection of scientific task configurations (or 'applications') with a common purpose.
  • cylc is a suite engine that drives task submission and monitoring.

That's too abstract - what's cylc again?

  • cylc makes your Rose stuff happen.

What does this cover?

What does this cover?

  • cylc background and philosophy
  • cylc basic usage
  • basic suite design

Disclaimer

The best reference for cylc is the cylc User Guide - some of the following images are taken from there.

Background

Background - NIWA

Originally called cylon, cylc was first started by Hilary Oliver at NIWA. It runs their EcoConnect operational suite.

NIWA logo

cylc was developed to provide a fast, automated way of scheduling tasks so that a suite could catch up after outages.

Background - ECOX

ECOX

Background - Current

cylc is now a collaborative effort, involving the Met Office.

cylc is written in Python and uses PyGTK for its GUIs. It is open source and licensed under GPL v3.

cylc lives on github.

Scheduling Algorithm

Scheduling - making sure tasks run in the correct order, at the correct time - is the core purpose of cylc. The cylc scheduling algorithm is very simple, and very powerful:

  • Every task has input and output dependencies.
  • Each task checks the others to see if its inputs are satisfied - if so, it runs.

Single Cycle

Task dependencies - single cycle

single cycle dependencies

Scheduling Cycling Tasks

Most meteorological suites repeat over time (cycling):

multi-cycle dependencies

A Cycle isn't A Prison

A cycle isn't a prison - if tasks can run ahead of their cycle time, they should. The image shows how a traditional cycle-fixed scheduler operates (bottom) vs cylc (top).

optimal task scheduling

The suite.rc File

cylc has a single file to configure the suite, the suite.rc file. It configures:

  • Task dependencies, including times and dates
  • Task environment
  • Task scripts

Format

cylc uses a nested INI-based configuration format that looks like this:

[scheduling]                     # Scheduling section
    [[dependencies]]             # Dependencies sub-section
        graph = my_task          # Graph Setting, option = value
[runtime]                        # Runtime Section
    [[my_task]]                  # User-specified task sub-section
        [[[environment]]]        # Environment sub-sub-section
            FOO = bar            # User-specified, option = value

Hello World cylc suite

A simple cylc suite has a single suite.rc file that looks like this:

[scheduling]
    [[dependencies]]
        graph = hello_world
[runtime]
    [[hello_world]]
        script = echo 'Hello World'

This will run a cylc task called hello_world that prints Hello World to standard out.

Demoing the Hello World cylc suite

Demo this by creating and running the simple cylc suite:

mkdir $TMPDIR/simplesuite  # Make the suite directory
touch $TMPDIR/simplesuite/rose-suite.conf  # Rose needs a rose-suite.conf
cat >$TMPDIR/simplesuite/suite.rc <<__SUITE_RC__
[scheduling]
    [[dependencies]]
        graph = hello_world
[runtime]
    [[hello_world]]
        script = echo 'Hello World'
__SUITE_RC__
rose suite-run -C $TMPDIR/simplesuite   # Run the suite!

Demo Output

As we saw before, the suite will run and shutdown when all tasks are successful - in this case, the only task is hello_world.

The suite output will be in our cylc-run directory, which you can also access by running rose suite-log --name=simplesuite.

Hello World Rose Suite

Tasks such as our hello_world task above are generic - they can be Rose applications, or commands or executables as above.

If we wanted to invoke a Rose application, we'd make an app called hello_world, by making an app/hello_world/ directory and creating a rose-app.conf file in there with this content:

[command]
default=echo 'Hello World'

Hello World rose task-run Example

We would then have to change the [runtime] section to:

[runtime]
    [[hello_world]]
        script = rose task-run

rose task-run uses environment variables passed in by cylc (like CYLC_TASK_NAME) to figure out which Rose application to run. In this case, it'll be app/hello_world

Hello World rose task-run

rose task-run is reasonably generic, so we can put it as the suite default ([[root]])script by writing:

[runtime]
    [[root]]
        script = rose task-run
    [[hello_world]]

[[root]] contains default settings that all cylc tasks inherit from. It is a special case of a cylc family. Families contain shared settings for one or more tasks.

Families

Families define shared configuration.

Families can be used to reduce duplication between tasks.

Overriding family settings is possible.

Multiple inheritance.

Families (2)

Families can also be used to help write the dependencies - e.g. to set up a task so that it runs when all the tasks in a family succeed (more later). They can also be used in queues.

Families Example

Here is a suite.rc [runtime] section snippet for a suite with two similar applications, hello_eris/ and hello_pluto/

[runtime]
    [[root]]
        script = rose task-run
    [[HELLO_FAMILY]]
        [[[environment]]]
            IS_WORLD_A_PLANET = false # Shared env variable
    [[hello_eris]]
        inherit = HELLO_FAMILY
    [[hello_pluto]]
        inherit = HELLO_FAMILY

Families and Overrides

Overriding family settings can be done under a task section:

[runtime]
    [[root]]
        script = rose task-run
    [[HELLO_FAMILY]]
        [[[environment]]]
            IS_WORLD_A_PLANET = false  # Shared env variable
    [[hello_eris]]
        inherit = HELLO_FAMILY
    [[hello_pluto]]
        inherit = HELLO_FAMILY
        [[[environment]]]
            IS_WORLD_A_PLANET = true  # Override.

Shared Applications

In the last example, if we don't really need two Rose applications, (app/hello_eris/ and app/hello_pluto/) we can create two tasks that share a single application, with some override settings - in this example, a WORLD environment variable.

We can reference our single application, app/hello_world/ by using ROSE_TASK_APP. rose task-run will read this environment variable value and use it to run the Rose app. We will put the override setting (WORLD) in the suite.rc file.

Shared Applications Example

[runtime]
    [[root]]
        script = rose task-run
    [[HELLO_FAMILY]]
        [[[environment]]]
            IS_WORLD_A_PLANET = false  # Shared env variable
            ROSE_TASK_APP = hello_world
    [[hello_eris]]
        inherit = HELLO_FAMILY
        [[[environment]]]
            WORLD = eris
    [[hello_pluto]]
        inherit = HELLO_FAMILY
        [[[environment]]]
            WORLD = pluto

Multiple Family Inheritance

cylc supports multiple inheritance, so tasks can combine useful configuration from more than one separate family. If you set up families like this:

    [[HELLO_FAMILY]]
        [[[environment]]]
            ROSE_TASK_APP = hello_world
    [[GAS_GIANT_FAMILY]]
        [[[environment]]]
            ATMOSPHERE_ONLY = true
    [[ROCKY_FAMILY]]
        [[[environment]]]
            ATMOSPHERE_ONLY = false

Multiple Family Inheritance (2)

You can inherit them like this:

    [[hello_neptune]]
        inherit = HELLO_FAMILY, GAS_GIANT_FAMILY
        [[[environment]]]
            WORLD = neptune

The order in which they are combined is essentially last to first - e.g. HELLO_FAMILY will override any shared setting in GAS_GIANT_FAMILY.

Remote Hosts

So far, the example tasks run on the localhost - it is usually better to farm off tasks to a remote host like a compute server or a cluster/supercomputer.

We could set hello_eris to run on a given host by setting [[[remote]]] section settings:

  [[hello_eris]]
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = eris
      [[[remote]]]
          host = voyager_1

Dependencies

We haven't looked at the [scheduling] part of the suite.rc yet.

Let's say hello_pluto must run and succeed before hello_eris. We can put this in our suite.rc:

[scheduling]
  [[dependencies]]
      graph = hello_pluto => hello_eris

Cycling Dependencies

We can make this run as a cycling suite, repeating every 12 hours:

[scheduling]
  initial cycle point = 20130105T00Z  # 00:00, 5/1/2013
  final cycle point = 20130106T00Z    # 00:00, 6/1/2013
  [[dependencies]]
      [[[T00, T12]]]  # run each day at 00:00 and 12:00
          graph = hello_pluto => hello_eris

N.B. cylc will not work with the various cycle points in real time unless you ask it to. For details see the clock triggered tasks tutorial.

Parameterization

cylc provides an inbuilt templating language that you can use to generate repeated graphing and runtime entries by working through a list of parameter values.

This allows you to reduce down sections of the suite.rc file into special instruction code that will be expanded by cylc at runtime.

Parameterized graphing

This can be particularly useful when you have large amounts of repetition - for example, in an ensemble context.

Sets of parameters are configured in the [cylc][[parameters]] section. For example:

[cylc]
   [[parameters]]
         world = eris, pluto, makemake, haumea

You can then get cylc to work through a set of parameters using <parameter> tags.

[scheduling]
   [[dependencies]]
         graph = hello_<world>

Parameterized runtime

Similarly, you can work with parameterized items in the runtime section too, further helping you reduce down your suite.rc:

  [[hello_<world>]]
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = $CYLC_TASK_PARAM_world

Notice that the task name uses <world> to access the parameter value, telling cylc that it is a repeated item, while access to its value at runtime is provided by a $CYLC_TASK_PARAM_ variable with a suffix of the parameter name (in this case world).

Parameterization Result

This gets processed internally by the running suite process, so cylc sees it in the same way as if you had explicitly written:

  [[hello_eris]]
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = eris
  [[hello_pluto]]
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = pluto
  [[hello_makemake]]
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = makemake
  [[hello_haumea]]
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = haumea

Parameterization and Specific Tasks

If, for a particular task that is being parameterized, you want to provide a slight variant then you can do so as:

  [[hello_<world>]] # General case
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = $CYLC_TASK_PARAM_world
  [[hello_<world=pluto>]] # Specific case
      inherit = HELLO_FAMILY
      [[[environment]]]
          WORLD = $CYLC_TASK_PARAM_world
          MOON = True

Make sure that any specific cases follow the generic one as the last config entry in the suite.rc will override any earlier ones.

Further Parameterization

Beyond the examples given, Parameterization can be used to automatically generate more complex graphing sections between several sets of parameters. For more details see the Cylc User Guide.

Jinja2

cylc uses a templating language called Jinja2 that you can use to embed special instruction code in the suite.rc file to generate or insert text that will be expanded at runtime.

Jinja2 Switch

This can be especially useful when you have a some task or graphing that you want to be able to easily swap between two modes - for example, for turning on or off an archiving task. You can use if and for blocks, amongst other things:

{% set ARCHIVE_RESULTS=true %}
[scheduling]
  [[runtime]]
    [[[dependencies]]]
        graph = """
                HELLO_FAMILY
{% if ARCHIVE_RESULTS %}
                HELLO_FAMILY:succeed-all => archive_results
{% endif %}
                """

Jinja2 Result

This gets processed at suite install time. So, if ARCHIVE_RESULTS is set then the file as read in by cylc looks like:

[scheduling]
  [[runtime]]
    [[[dependencies]]]
        graph = """
                HELLO_FAMILY
                HELLO_FAMILY:succeed-all => archive_results
                """

Jinja2 Result (2)

Conversely, if ARCHIVE_RESULTS is not set then the file as read in by cylc will look like:

[scheduling]
  [[runtime]]
    [[[dependencies]]]
        graph = """
                HELLO_FAMILY
                """

Jinja2 rose-suite.conf

You can use Jinja2 to centralise commonly-used settings. Rose supports storing these in the rose-suite.conf file - e.g.

[jinja2:suite.rc]
ARCHIVE_RESULTS=true

Rose passes variables in this section into the Jinja2 template at runtime so your user doesn't have to look for the line to change in the suite.rc file.

Furthermore, you can also add proper metadata for them such as help, so they have a nice interface in the config editor.

Jinja2 rose-suite.conf Setting Example

For example, if we had a suite.rc setting that looked like this:

      [[[environment]]]
          INCLUDE_MOONS = {{ HELLO_TO_MOONS }}

We could have a rose-suite.conf file that looked like this:

[jinja2:suite.rc]
HELLO_TO_MOONS=false

Jinja2 rose-suite.conf Metadata Example

We could then write some nice metadata (similar to app metadata) for the rose-suite.conf file, such as:

[jinja2:suite.rc=HELLO_TO_MOONS]
help=Decide whether to say hello to the moons of the world
  =(if any).
  =
  =If true, include moons like
  =http://en.wikipedia.org/wiki/Dysnomia_%28moon%29
  =If false, ignore them.
  =
  =Moons of moons are not supported.
title=Include Moons when saying hello?
type=boolean

UTC mode

We recommend running all suites in UTC mode. E.g.:

[cylc]
  UTC mode = True # Ignore DST

Event Hooks

Suites can have event handlers to report events or shutdown on failure.

[cylc]
  UTC mode = True # Ignore DST
  # abort if any task fails = True
  [[events]]
      # abort on timeout = True
      # mail events = timeout
      # timeout = P3D

Event Hooks Explained

Normally, when a task fails, the suite will continue to run as much as possible and wait for user input to fix (and perhaps retry) the failure so it can continue. If abort if any task fails = True is not commented out, the suite will abort as soon as a task fails.

Hello World Event Hooks (Continued)

Suites can also be configured to email you on specific events. For example:

[runtime]
  [[root]]
      [[[events]]]
          mail events = submission retry, retry, \
                        submission failed, failed, \
                        submission timeout, execution timeout

The events should be adjusted if necessary - for example, increasing the timeout lengths or altering the events that require email notification.

Job Submission

We may want to configure a job submission method, such as using PBS:

  [[hello_eris]]
...
      [[[job]]]
          batch system = pbs

Directives

The job submission method may need configuration via directives:

  [[hello_eris]]
...
      [[[job]]]
          batch system = pbs
          execution time limit = PT5M # generates directives walltime value
      [[[directives]]]
          -l select = 1
          -q = normal

Job Submission Methods

cylc supports the following job submission methods (and more):

  • at
  • background (default)
  • loadleveler
  • pbs (Portable Batch System)
  • sge (Sun Oracle Grid Engine)
  • slurm (not the fictional soft drink)
  • lsf

Independent Learning

Next Steps: