Input data#
This page provides a guide on how to provide input data and specify it in the configuration.
There are three general source directories for input data
the ICON (-ART) repository
a separate (experiment-specific) input directory
pool directories on the clusters or the web.
In the working directory, links to the respective files are created. auto-icon reads the DIRECTORIES.LINK_FILES section in the configuration (<expname>/experiment.yml, respectively conf/art/experiments/<expname>.yml) to read which files to link.
General linked files#
The following subsections are present in DIRECTORIES.LINK_FILES section:
subsection |
purpose |
|---|---|
FILES |
General input files and directories |
ICON_DATA |
Parametrizations below |
ART |
Data in |
Domain specific files#
Information that is specific to the grid (e.g. grid or and initial conditions files) can be provided conveniently with a separate section in the experiment config file. Standard grids, i.e. grids that are officially distributed by DWD/MPIM and listed in the grid file server can be conveniently specified and used. These grids are also available on several HPC systems directly.
This information is contained in the global GRID section. Each domain gets an own subsection, numbered subsequently DOM01, DOM02, etc. and for each domain, grid type (e.g. G for global), the R and B values as well as the official grid number need to be provided. Further, info on external parameter files and files for initial conditions can be supplied. An example looks as follows:
GRID:
#-- The FILELIST section provides a list of all input files that shall be used
#-- for the current run.
FILELIST:
- GRID
- EXTPAR
- DWDFG
- DWDANA
- BCF #-- ART file
- IAE #-- ART file
- STY #-- ART file
#-- Turn on, if you want to use a radiation grid (the grid has to be provided on the server as well).
RADGRID: False
DOM1:
#-- Type (G: global, R: radiation grid, O: ocean, Nxx: nested grid numbered xx, L: LAM grid, L*: radgrid or nest for LAM, ...)
#-- The type letter(s) have to correspond to the suffix letters of the grids in the list (s. above).
TYPE: G
#-- R and B values of the grid. Leading zeroes are ignored.
R: 2
B: 4
#-- Grid number of the official grid to use (s. above).
GRID_NUMBER: 12
#-- Date of the corresponding extpar dataset. The dataset has to be provided by the user or be present in the public repository.
EXTPAR_DATE: 20131001
A detailed overview of all possible options is provided in the template file conf/art/experiments/template.yml. Speicifying different file names for input files is also described there.
Tip
With multiple domains, several values can be inferred, e.g. if DOM01 is an R2B4 one with grid number 12, DOM02 is expected to be R2B5 with grid number 13. Such values can be left out unless deviating from the default.
Domains#
Each domain corresponds to one ICON domain and thus one grid. The radiation grid is hereby treated basically the same, with the limitation that it can only contain a grid as associated file. Further, all parameters can be inferred for a radiation grid, when adhering to the official grids (or at least the nomencalture).
Hint
If inferring works, you can use a radiation grid by just setting RADGRID: True in the GRID section. If it is not sufficient, you can create a full domain specification also for the radiation grid.
File names#
File type |
File tag |
Default file name (1) |
|---|---|---|
Grid |
GRID |
|
Extpar |
EXTPAR |
|
dwd first guess |
DWDFG |
|
dwd analysis |
DWDANA |
|
ART input |
TYP (2) |
|
(1) Placeholders used here are:
ZZZZ: grid number padded to 4 digits (for ART file, set to ART_IO_SUFFIX if present)
x|xx, yy: R and B values padded to 1 or 2 digits each
ii: domain id
T: Grid type (for GRID and EXTPAR)
DATE: datestring (EXTPAR: YYYYMMDD; IFS,INC: YYYYMMDDHH (start date, see below))
_tiles: added if EXTPAR_TILES is set to true
(2) The tag is the 3-letter type as specified for art, see ART User guide for details.
With the FILENAMES section, one can easily set the file name. In the section, for each file tag, one can set a specific file name (relative or absolute, see also the syntax for link specification).
FILENAMES:
GRID: my_test_domain.nc
EXTPAR: my_test_domain_extpar_data.nc
Grid generation and remapping#
With the pre-create-grid job, one can create a grid from scratch or from an existing grid. For using an existing grid, that one has to be included as a separate domain with the additional key UNUSED: True, to tell auto-icon it is only a dummy domain. For those dummy domains, other domain numbers might be useful, then the parent-child relationship can be explicitly set with the PARENT: <id> section, e.g. PARENT: 10.
A section GEN_PARAMS then defines the grid to be created, where the individual keys will be transferred to the namelist of the DWD ICON tool icongridgen, e.g.:
GEN_PARAMS:
region_type: 3
hwidth_lon: 5.0
hwidth_lat: 5.0
center_lon: 18.0
center_lat: 12.0
# min_refin_c_ctrl: 1
# max_refin_c_ctrl: 14
Remapping of (nearly arbitrary) input data can be done with CDO via the pre-remap job. An additional SOURCE section has to be provided, where for each file tag (to be remapped), the source file and source file grid are presented in a list, e.g.:
SOURCE:
STY:
- '/path/to/source/ART_STY_iconR2B09-grid_0015.nc'
- '/path/to/source/icon_grid_0015_R02B09_G.nc'
Note
If you have a LAM grid, the pre-create-grid job also creates the lateral_boundary.grid.nc for you and can remap ERA5 data to that grid.
Hint
Take a look at the template LAM_DUST for a full running example.
Syntax for link specification#
There is a special syntax for specifying the link names to allow for sophisticated location of input files.
The simplest way of specifying an input name is to provide its file name (the TARGET of the link) as a plain string. Optional, you can specify the LINK_NAME (name of the symlink in the working directory) in the string separated by a vertical bar, i.e. TARGET|LINK_NAME. Hereby, TARGET can be an absolute file name, a relative file name (which is looked up at several places, see File location order) or even a URL. However, there are a few special characters which you should not use in your file names.
In addition to specifying a full file name as the TARGET (and optional a LINK_NAME), you can supply patterns to match multiple files or use placeholders for convenient substitution of configuration parameters.
Patterns#
The pattern syntax is used if TARGET starts with an exclamation mark (!), i.e. you specify !PATTERN. The pattern is then matched against all files in the respective input directories (including pool directories). The following patterns are evaluated (see doc for details):
Pattern |
Meaning |
|---|---|
* |
matches everything |
? |
matches any single character |
[seq] |
matches any character in seq |
[!seq] |
matches any character not in seq |
As an example may serve the following:
DIRECTORIES:
LINK_FILES:
ART:
- '!FJX_scat-*.dat'
# This pattern shall link the following files:
# - 'FJX_scat-aer.dat'
# - 'FJX_scat-cld.dat'
# - 'FJX_scat-ssa.dat'
# - 'FJX_scat-UMa.dat'
Caution
If the patter is quite general, many files might match in the pool directories!
Placeholders#
Autosubmit placeholders can be used in all config options, and as such also in the TARGET|LINK_NAME field. In addition, specific placeholders for the start date (and time) can be used to specify a file (e.g. time dependent parametrization) generally. These additional placeholders are specified in pointing brackets (<...>). The following table provides an overview of available replacements (the example starts on 2004-08-27 at 18:00).
Placeholder |
Value |
Example |
|---|---|---|
|
YYYYMMDD |
20040827 |
|
YYYY |
2004 |
|
MM |
08 |
|
DD |
27 |
|
HH |
18 |
|
YYYYMMDDHH |
2004082718 |
|
YYYYMMDD |
20040827 |
|
YYYY |
2004 |
|
MM |
08 |
|
DD |
27 |
|
HH |
18 |
Caution
Only the start date for the member will be substituted, not of each chunk. If you need e.g. monthly files and the simulation runs longer, you should use patterns.
Special characters#
There are several special characters, that usually cannot be escaped, so avoid usage in file names:
%: a pair of percent signs with characters in between is substituted by Autosubmit with a placeholder. This can be escaped with a single%, i.e.%%gives a literal%, but use is highly discouraged.|: the vertical bar separates TARGET and LINK_NAME (see here)<>: pointing brackets introduce placeholder substitution!: only at the beginning of TARGET, this introduces lookup for patterns, which also makes all wildcards special characters.
File location order#
Depending on the file type (general or domain specific) and the specified name (link TARGET), there are multiple options where to look for the file. The job PRE_FIND_FILES searches for all these files. If one of the files cannot be found and the file cannot be created (as ERA5 input data could be), this job fails.
The search order is as follows:
1. Absolute path#
If the TARGET starts with a slash (/), it is treated as an absolute path and searched for directly. If this file is not present, file location will fail.
2. URL#
If the TARGET seems to be a URL, the URL is downloaded to DIRECTORIES.INDIR and subsequently linked. If downloading fails, file location fails.
3. Serach input and pool directories#
Next, several directories will be searched for. For each directory (DIR) it is first checked, whether the file is present directly in DIR (DIR/FILE), if not, whether it exists somewhere in a subdirectory of DIR.
First, the INDIR is searched and then other pool directories (see below). The type of directories searched depends on the type of file to look for.
In pool directories that are a URL, the file POOLURL/TARGET will be attempted to download. If the download succeeds, it is linked, otherwise file location fails.
Pool directories#
The following table gives an overview of all pool directories. If multiple are given, they are searched from top to bottom. The config keys are specified in conf/common/platforms/<platform>.yml. You can add further pool directories to the list.
Pool |
Default (Levante) |
Default (Horeka) |
|---|---|---|
GRID |
|
|
GENERAL |
|
|
EXP (a) |
— |
|
(a): These pools will ge the experiment name appended, i.e. the pool directory on HoreKa that is actually search is /lsdf/kit/imk/projects/icon/TESTSUITE/<EXPNAME>.
Pool |
Default (all platforms) |
|---|---|
GRID |
|
|
|
ICON |
|
ART |
|
|
|
|
|
|
Grid creation#
Grid files can automatically be created with auto-icon. To do so, the pre-create_grid job should be activated, e.g. with the create grid option of the init script. Details on the grid to be created can be supplied in the GRID section. For details, please refer to the conf/art/experiments/template.yml file.
ERA5 or IFS input data#
In case you use ERA5 or IFS data as initial conditions, the raw data can be remapped to the ICON grid if required. Hereby, the IFS raw data (if applicable) file (e.g. ifs_r1279+O_<DATE>.grb) needs to be findable in the above pool directories or already present in the input directory. The grid file will be located with the above described routines. The ERA5 raw data (if applicable) is retrieved automatically on Levante. On other machines, this is currently not implemented (see issue #132).
The remapping can be done with the DWD_ICON_TOOLS or CDO. The method to be used can be set in the init script.
If you create and run your experiment or at least the job again, the remapping will be skipped and the existing file be used. If you want to do the remapping anyway, you can set MISC.REQUIRE_REMAPPING to True in the file conf/art/simulation.yml.
Online archive of input data#
If (parts of) the input data is available as a downloadable repository, the URL can be specified in DIRECTORIES.ARCHIVE in the <expname>/experiment.yml file. It will be automatically downloaded and extracted into INDIR.