Runtime Configuration

In the last section we associated tasks with scripts and ran a simple suite. In this section we will look at how we can configure these tasks.

Environment Variables

We can specify environment variables in a task’s [environment] section. These environment variables are then provided to jobs when they run.

[runtime]
    [[countdown]]
        script = seq $START_NUMBER
        [[[environment]]]
            START_NUMBER = 5

Each job is also provided with some standard environment variables e.g:

CYLC_SUITE_RUN_DIR
The path to the suite’s run directory (e.g. ~/cylc-run/suite).
CYLC_TASK_WORK_DIR
The path to the associated task’s work directory (e.g. run-directory/work/cycle/task).
CYLC_TASK_CYCLE_POINT
The cycle point for the associated task (e.g. 20171009T0950).

There are many more environment variables - see the Cylc User Guide for more information.

Job Submission

By default Cylc runs jobs on the machine where the suite is running. We can tell Cylc to run jobs on other machines by setting the [remote]host setting to the name of the host, e.g. to run a task on the host computehost you might write:

[runtime]
    [[hello_computehost]]
        script = echo "Hello Compute Host"
        [[[remote]]]
            host = computehost

Cylc also executes jobs as background processes by default. When we are running jobs on other compute hosts we will often want to use a batch system (job scheduler) to submit our job. Cylc supports the following batch systems:

  • at
  • loadleveler
  • lsf
  • pbs
  • sge
  • slurm
  • moab

Batch systems typically require directives in some form. Directives inform the batch system of the requirements of a job, for example how much memory a given job requires or how many CPUs the job will run on. For example:

[runtime]
    [[big_task]]
        script = big-executable

        # Submit to the host "big-computer".
        [[[remote]]]
            host = big-computer

        # Submit the job using the "slurm" batch system.
        [[[job]]]
            batch system = slurm

        # Inform "slurm" that this job requires 500MB of RAM and 4 CPUs.
        [[[directives]]]
            --mem = 500
            --ntasks = 4

Timeouts

We can specify a time limit after which a job will be terminated using the [job]execution time limit setting. The value of the setting is an ISO8601 duration. Cylc automatically inserts this into a job’s directives as appropriate.

[runtime]
    [[some_task]]
        script = some-executable
        [[[job]]]
            execution time limit = PT15M  # 15 minutes.

Retries

Sometimes jobs fail. This can be caused by two factors:

  • Something going wrong with the job’s execution e.g:
    • A bug;
    • A system error;
    • The job hitting the execution time limit.
  • Something going wrong with the job submission e.g:
    • A network problem;
    • The job host becoming unavailable or overloaded;
    • An issue with the directives.

In the event of failure Cylc can automatically re-submit (retry) jobs. We configure retries using the [job]execution retry delays and [job]submission retry delays settings. These settings are both set to an ISO8601 duration, e.g. setting execution retry delays to PT10M would cause the job to retry every 10 minutes in the event of execution failure.

We can limit the number of retries by writing a multiple in front of the duration, e.g:

[runtime]
    [[some-task]]
        script = some-script
        [[[job]]]
            # In the event of execution failure, retry a maximum
            # of three times every 15 minutes.
            execution retry delays = 3*PT15M
            # In the event of submission failure, retry a maximum
            # of two times every ten minutes and then every 30
            # minutes thereafter.
            submission retry delays = 2*PT10M, PT30M

Start, Stop, Restart

We have seen how to start and stop Cylc suites with cylc run and cylc stop respectively. The cylc stop command causes Cylc to wait for all running jobs to finish before it stops the suite. There are two options which change this behaviour:

cylc stop --kill
When the --kill option is used Cylc will kill all running jobs before stopping. Cylc can kill jobs on remote hosts and uses the appropriate command when a batch system is used.
cylc stop --now --now
When the --now option is used twice Cylc stops straight away, leaving any jobs running.

Once a suite has stopped it is possible to restart it using the cylc restart command. When the suite restarts it picks up where it left off and carries on as normal.

# Run the suite "name".
cylc run <name>
# Stop the suite "name", killing any running tasks.
cylc stop <name> --kill
# Restart the suite "name", picking up where it left off.
cylc restart <name>

Practical

In this practical we will add runtime configuration to the weather-forecasting suite from the scheduling tutorial.

  1. Create A New Suite.

    Create a new suite by running the command:

    rose tutorial runtime-tutorial
    cd ~/cylc-run/runtime-tutorial
    

    You will now have a copy of the weather-forecasting suite along with some executables and python modules.

  1. Set The Initial And Final Cycle Points.

    First we will set the initial and final cycle points (see the datetime tutorial for help with writing ISO8601 datetimes):

    • The final cycle point should be set to the time one hour ago from the present time (with minutes and seconds ignored), e.g. if the current time is 9:45 UTC then the final cycle point should be at 8:00 UTC.
    • The initial cycle point should be the final cycle point minus six hours.

    Reminder

    Remember that we are working in UTC mode (the +00 time zone). Datetimes should end with a Z character to reflect this.

    Solution

    You can check your answers by running the following commands (hyphens and colons optional but can’t be mixed):

    For the initial cycle point:

    rose date --utc --offset -PT7H --format CCYY-MM-DDThh:00Z

    For the final cycle point:

    rose date --utc --offset -PT1H --format CCYY-MM-DDThh:00Z

    Run cylc validate to check for any errors:

    cylc validate .
    
  2. Add Runtime Configuration For The get_observations Tasks.

    In the bin directory is a script called get-observations. This script gets weather data from the MetOffice DataPoint service. It requires two environment variables:

    SITE_ID:

    A four digit numerical code which is used to identify a weather station, e.g. 3772 is Heathrow Airport.

    API_KEY:

    An authentication key required for access to the service.

    Generate a Datapoint API key:

    rose tutorial api-key
    

    Add the following lines to the bottom of the suite.rc file replacing xxx... with your API key:

    [runtime]
        [[get_observations_heathrow]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3772
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    

    Add three more get_observations tasks for each of the remaining weather stations.

    You will need the codes for the other three weather stations, which are:

    • Camborne - 3808
    • Shetland - 3005
    • Aldergrove - 3917

    Solution

    [runtime]
        [[get_observations_heathrow]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3772
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        [[get_observations_camborne]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3808
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        [[get_observations_shetland]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3005
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        [[get_observations_aldergrove]]
            script = get-observations
            [[[environment]]]
                SITE_ID = 3917
                API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    

    Check the suite.rc file is valid by running the command:

    cylc validate .
    
  3. Test The get_observations Tasks.

    Next we will test the get_observations tasks.

    Open the Cylc GUI by running the following command:

    cylc gui runtime-tutorial &
    

    Run the suite either by pressing the play button in the Cylc GUI or by running the command:

    cylc run runtime-tutorial
    

    If all goes well the suite will startup and the tasks will run and succeed. Note that the tasks which do not have a [runtime] section will still run though they will not do anything as they do not call any scripts.

    Once the suite has reached the final cycle point and all tasks have succeeded the suite will automatically shutdown.

    The get-observations script produces a file called wind.csv which specifies the wind speed and direction. This file is written in the task’s work directory.

    Try and open one of the wind.csv files. Note that the path to the work directory is:

    work/<cycle-point>/<task-name>
    

    You should find a file containing four numbers:

    • The longitude of the weather station;
    • The latitude of the weather station;
    • The wind direction (the direction the wind is blowing towards) in degrees;
    • The wind speed in miles per hour.

    Hint

    If you run ls work you should see a list of cycles. Pick one of them and open the file:

    work/<cycle-point>/get_observations_heathrow/wind.csv
    
  4. Add runtime configuration for the other tasks.

    The runtime configuration for the remaining tasks has been written out for you in the runtime file which you will find in the suite directory. Copy the code in the runtime file to the bottom of the suite.rc file.

    Check the suite.rc file is valid by running the command:

    cylc validate .
    
  5. Run The Suite.

    Open the Cylc GUI (if not already open) and run the suite.

    Hint

    cylc gui runtime-tutorial &
    

    Run the suite either by:

    • Pressing the play button in the Cylc GUI. Then, ensuring that “Cold Start” is selected within the dialogue window, pressing the “Start” button.
    • Running the command cylc run runtime-tutorial.
  6. View The Forecast Summary.

    The post_process_exeter task will produce a one-line summary of the weather in Exeter, as forecast two hours ahead of time. This summary can be found in the summary.txt file in the work directory.

    Try opening the summary file - it will be in the last cycle. The path to the work directory is:

    work/<cycle-point>/<task-name>
    

    Hint

    • cycle-point - this will be the last cycle of the suite, i.e. the final cycle point.
    • task-name - set this to “post_process_exeter”.
  7. View The Rainfall Data.

    The forecast task will produce a html page where the rainfall data is rendered on a map. This html file is called job-map.html and is saved alongside the job log.

    Try opening this file in a web browser, e.g via:

    firefox <filename> &
    

    The path to the job log directory is:

    log/job/<cycle-point>/<task-name>/<submission-number>
    

    Hint

    • cycle-point - this will be the last cycle of the suite, i.e. the final cycle point.
    • task-name - set this to “forecast”.
    • submission-number - set this to “01”.