Runtime Configuration¶
In the last section we associated tasks with scripts and ran a simple suite. In this section we will look at how we can configure these tasks.
Environment Variables¶
We can specify environment variables in a task’s [environment]
section.
These environment variables are then provided to jobs when they
run.
[runtime]
[[countdown]]
script = seq $START_NUMBER
[[[environment]]]
START_NUMBER = 5
Each job is also provided with some standard environment variables e.g:
CYLC_SUITE_RUN_DIR
- The path to the suite’s run directory (e.g. ~/cylc-run/suite).
CYLC_TASK_WORK_DIR
- The path to the associated task’s work directory (e.g. run-directory/work/cycle/task).
CYLC_TASK_CYCLE_POINT
- The cycle point for the associated task (e.g. 20171009T0950).
There are many more environment variables - see the Cylc User Guide for more information.
Job Submission¶
By default Cylc runs jobs on the machine where the suite is
running. We can tell Cylc to run jobs on other machines by setting the
[remote]host
setting to the name of the host, e.g. to run a task on the
host computehost
you might write:
[runtime]
[[hello_computehost]]
script = echo "Hello Compute Host"
[[[remote]]]
host = computehost
Cylc also executes jobs as background processes by default. When we are running jobs on other compute hosts we will often want to use a batch system (job scheduler) to submit our job. Cylc supports the following batch systems:
- at
- loadleveler
- lsf
- pbs
- sge
- slurm
- moab
Batch systems typically require directives in some form. Directives inform the batch system of the requirements of a job, for example how much memory a given job requires or how many CPUs the job will run on. For example:
[runtime]
[[big_task]]
script = big-executable
# Submit to the host "big-computer".
[[[remote]]]
host = big-computer
# Submit the job using the "slurm" batch system.
[[[job]]]
batch system = slurm
# Inform "slurm" that this job requires 500MB of RAM and 4 CPUs.
[[[directives]]]
--mem = 500
--ntasks = 4
Timeouts¶
We can specify a time limit after which a job will be terminated using the
[job]execution time limit
setting. The value of the setting is an
ISO8601 duration. Cylc automatically inserts this into a job’s
directives as appropriate.
[runtime]
[[some_task]]
script = some-executable
[[[job]]]
execution time limit = PT15M # 15 minutes.
Retries¶
Sometimes jobs fail. This can be caused by two factors:
- Something going wrong with the job’s execution e.g:
- A bug;
- A system error;
- The job hitting the
execution time limit
.
- Something going wrong with the job submission e.g:
- A network problem;
- The job host becoming unavailable or overloaded;
- An issue with the directives.
In the event of failure Cylc can automatically re-submit (retry) jobs. We
configure retries using the [job]execution retry delays
and
[job]submission retry delays
settings. These settings are both set to an
ISO8601 duration, e.g. setting execution retry delays
to PT10M
would cause the job to retry every 10 minutes in the event of execution
failure.
We can limit the number of retries by writing a multiple in front of the duration, e.g:
[runtime]
[[some-task]]
script = some-script
[[[job]]]
# In the event of execution failure, retry a maximum
# of three times every 15 minutes.
execution retry delays = 3*PT15M
# In the event of submission failure, retry a maximum
# of two times every ten minutes and then every 30
# minutes thereafter.
submission retry delays = 2*PT10M, PT30M
Start, Stop, Restart¶
We have seen how to start and stop Cylc suites with cylc run
and
cylc stop
respectively. The cylc stop
command causes Cylc to wait
for all running jobs to finish before it stops the suite. There are two
options which change this behaviour:
cylc stop --kill
- When the
--kill
option is used Cylc will kill all running jobs before stopping. Cylc can kill jobs on remote hosts and uses the appropriate command when a batch system is used. cylc stop --now --now
- When the
--now
option is used twice Cylc stops straight away, leaving any jobs running.
Once a suite has stopped it is possible to restart it using the
cylc restart
command. When the suite restarts it picks up where it left
off and carries on as normal.
# Run the suite "name".
cylc run <name>
# Stop the suite "name", killing any running tasks.
cylc stop <name> --kill
# Restart the suite "name", picking up where it left off.
cylc restart <name>
Practical
In this practical we will add runtime configuration to the weather-forecasting suite from the scheduling tutorial.
Create A New Suite.
Create a new suite by running the command:
rose tutorial runtime-tutorial cd ~/cylc-run/runtime-tutorial
You will now have a copy of the weather-forecasting suite along with some executables and python modules.
Set The Initial And Final Cycle Points.
First we will set the initial and final cycle points (see the datetime tutorial for help with writing ISO8601 datetimes):
- The final cycle point should be set to the time one hour ago from the present time (with minutes and seconds ignored), e.g. if the current time is 9:45 UTC then the final cycle point should be at 8:00 UTC.
- The initial cycle point should be the final cycle point minus six hours.
Reminder
Remember that we are working in UTC mode (the
+00
time zone). Datetimes should end with aZ
character to reflect this.Solution
You can check your answers by running the following commands (hyphens and colons optional but can’t be mixed):
- For the initial cycle point:
rose date --utc --offset -PT7H --format CCYY-MM-DDThh:00Z
- For the final cycle point:
rose date --utc --offset -PT1H --format CCYY-MM-DDThh:00Z
Run
cylc validate
to check for any errors:cylc validate .
Add Runtime Configuration For The
get_observations
Tasks.In the
bin
directory is a script calledget-observations
. This script gets weather data from the MetOffice DataPoint service. It requires two environment variables:SITE_ID
:A four digit numerical code which is used to identify a weather station, e.g.
3772
is Heathrow Airport.API_KEY
:An authentication key required for access to the service.
Generate a Datapoint API key:
rose tutorial api-key
Add the following lines to the bottom of the
suite.rc
file replacingxxx...
with your API key:[runtime] [[get_observations_heathrow]] script = get-observations [[[environment]]] SITE_ID = 3772 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Add three more
get_observations
tasks for each of the remaining weather stations.You will need the codes for the other three weather stations, which are:
- Camborne -
3808
- Shetland -
3005
- Aldergrove -
3917
Solution
[runtime] [[get_observations_heathrow]] script = get-observations [[[environment]]] SITE_ID = 3772 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_camborne]] script = get-observations [[[environment]]] SITE_ID = 3808 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_shetland]] script = get-observations [[[environment]]] SITE_ID = 3005 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx [[get_observations_aldergrove]] script = get-observations [[[environment]]] SITE_ID = 3917 API_KEY = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Check the
suite.rc
file is valid by running the command:cylc validate .
Test The
get_observations
Tasks.Next we will test the
get_observations
tasks.Open the Cylc GUI by running the following command:
cylc gui runtime-tutorial &
Run the suite either by pressing the play button in the Cylc GUI or by running the command:
cylc run runtime-tutorial
If all goes well the suite will startup and the tasks will run and succeed. Note that the tasks which do not have a
[runtime]
section will still run though they will not do anything as they do not call any scripts.Once the suite has reached the final cycle point and all tasks have succeeded the suite will automatically shutdown.
The
get-observations
script produces a file calledwind.csv
which specifies the wind speed and direction. This file is written in the task’s work directory.Try and open one of the
wind.csv
files. Note that the path to the work directory is:work/<cycle-point>/<task-name>
You should find a file containing four numbers:
- The longitude of the weather station;
- The latitude of the weather station;
- The wind direction (the direction the wind is blowing towards) in degrees;
- The wind speed in miles per hour.
Hint
If you run
ls work
you should see a list of cycles. Pick one of them and open the file:work/<cycle-point>/get_observations_heathrow/wind.csv
Add runtime configuration for the other tasks.
The runtime configuration for the remaining tasks has been written out for you in the
runtime
file which you will find in the suite directory. Copy the code in theruntime
file to the bottom of thesuite.rc
file.Check the
suite.rc
file is valid by running the command:cylc validate .
Run The Suite.
Open the Cylc GUI (if not already open) and run the suite.
Hint
cylc gui runtime-tutorial &
Run the suite either by:
- Pressing the play button in the Cylc GUI. Then, ensuring that “Cold Start” is selected within the dialogue window, pressing the “Start” button.
- Running the command
cylc run runtime-tutorial
.
View The Forecast Summary.
The
post_process_exeter
task will produce a one-line summary of the weather in Exeter, as forecast two hours ahead of time. This summary can be found in thesummary.txt
file in the work directory.Try opening the summary file - it will be in the last cycle. The path to the work directory is:
work/<cycle-point>/<task-name>
Hint
cycle-point
- this will be the last cycle of the suite, i.e. the final cycle point.task-name
- set this to “post_process_exeter”.
View The Rainfall Data.
The
forecast
task will produce a html page where the rainfall data is rendered on a map. This html file is calledjob-map.html
and is saved alongside the job log.Try opening this file in a web browser, e.g via:
firefox <filename> &
The path to the job log directory is:
log/job/<cycle-point>/<task-name>/<submission-number>
Hint
cycle-point
- this will be the last cycle of the suite, i.e. the final cycle point.task-name
- set this to “forecast”.submission-number
- set this to “01”.