Input data#

This page provides a guide on how to provide input data and specify it in the configuration.

There are three general source directories for input data

  • the ICON (-ART) repository

  • a separate (experiment-specific) input directory

  • pool directories on the clusters or the web.

In the working directory, links to the respective files are created. auto-icon reads the DIRECTORIES.LINK_FILES section in the configuration (<expname>/experiment.yml, respectively conf/art/experiments/<expname>.yml) to read which files to link.

General linked files#

The following subsections are present in DIRECTORIES.LINK_FILES section:

subsection

purpose

FILES

General input files and directories

ICON_DATA

Parametrizations below <icon_rootdir>

ART

Data in <icon_rootdir>/externals/art

Domain specific files#

Information that is specific to the grid (e.g. grid or and initial conditions files) can be provided conveniently with a separate section in the experiment config file. Standard grids, i.e. grids that are officially distributed by DWD/MPIM and listed in the grid file server can be conveniently specified and used. These grids are also available on several HPC systems directly.

This information is contained in the global GRID section. Each domain gets an own subsection, numbered subsequently DOM01, DOM02, etc. and for each domain, grid type (e.g. G for global), the R and B values as well as the official grid number need to be provided. Further, info on external parameter files and files for initial conditions can be supplied. An example looks as follows:

GRID:
  #-- The FILELIST section provides a list of all input files that shall be used
  #-- for the current run.
  FILELIST:
    - GRID
    - EXTPAR
    - DWDFG
    - DWDANA
    - BCF   #-- ART file
    - IAE   #-- ART file
    - STY   #-- ART file
  #-- Turn on, if you want to use a radiation grid (the grid has to be provided on the server as well).
  RADGRID: False
  DOM1:
    #-- Type (G: global, R: radiation grid, O: ocean, Nxx: nested grid numbered xx, L: LAM grid, L*: radgrid or nest for LAM, ...)
    #-- The type letter(s) have to correspond to the suffix letters of the grids in the list (s. above).
    TYPE: G
    #-- R and B values of the grid. Leading zeroes are ignored.
    R: 2
    B: 4
    #-- Grid number of the official grid to use (s. above).
    GRID_NUMBER: 12
    #-- Date of the corresponding extpar dataset. The dataset has to be provided by the user or be present in the public repository.
    EXTPAR_DATE: 20131001

A detailed overview of all possible options is provided in the template file conf/art/experiments/template.yml. Speicifying different file names for input files is also described there.

Tip

With multiple domains, several values can be inferred, e.g. if DOM01 is an R2B4 one with grid number 12, DOM02 is expected to be R2B5 with grid number 13. Such values can be left out unless deviating from the default.

Domains#

Each domain corresponds to one ICON domain and thus one grid. The radiation grid is hereby treated basically the same, with the limitation that it can only contain a grid as associated file. Further, all parameters can be inferred for a radiation grid, when adhering to the official grids (or at least the nomencalture).

Hint

If inferring works, you can use a radiation grid by just setting RADGRID: True in the GRID section. If it is not sufficient, you can create a full domain specification also for the radiation grid.

File names#

File type

File tag

Default file name (1)

Grid

GRID

icon_grid_ZZZZ_RxxByy[_T].nc

Extpar

EXTPAR

icon_extpar_ZZZZ_RxxByy[_T][_<DATE>][_tiles].nc

dwd first guess

DWDFG

dwdFG_RxByy_DOMii.nc

dwd analysis

DWDANA

dwdana_RxByy_DOMii.nc

ART input

TYP (2)

ART_TYP_iconRxByy-grid_ZZZZ.nc

(1) Placeholders used here are:

ZZZZ:     grid number padded to 4 digits (for ART file, set to ART_IO_SUFFIX if present)
x|xx, yy: R and B values padded to 1 or 2 digits each
ii:       domain id
T:        Grid type (for GRID and EXTPAR)
DATE:     datestring (EXTPAR: YYYYMMDD; IFS,INC: YYYYMMDDHH (start date, see below))
_tiles:   added if EXTPAR_TILES is set to true

(2) The tag is the 3-letter type as specified for art, see ART User guide for details.

With the FILENAMES section, one can easily set the file name. In the section, for each file tag, one can set a specific file name (relative or absolute, see also the syntax for link specification).

FILENAMES:
  GRID: my_test_domain.nc
  EXTPAR: my_test_domain_extpar_data.nc

Grid generation and remapping#

With the pre-create-grid job, one can create a grid from scratch or from an existing grid. For using an existing grid, that one has to be included as a separate domain with the additional key UNUSED: True, to tell auto-icon it is only a dummy domain. For those dummy domains, other domain numbers might be useful, then the parent-child relationship can be explicitly set with the PARENT: <id> section, e.g. PARENT: 10.

A section GEN_PARAMS then defines the grid to be created, where the individual keys will be transferred to the namelist of the DWD ICON tool icongridgen, e.g.:

GEN_PARAMS:
  region_type: 3
  hwidth_lon: 5.0
  hwidth_lat: 5.0
  center_lon: 18.0
  center_lat: 12.0
  # min_refin_c_ctrl: 1
  # max_refin_c_ctrl: 14

Remapping of (nearly arbitrary) input data can be done with CDO via the pre-remap job. An additional SOURCE section has to be provided, where for each file tag (to be remapped), the source file and source file grid are presented in a list, e.g.:

SOURCE:
  STY:
    - '/path/to/source/ART_STY_iconR2B09-grid_0015.nc'
    - '/path/to/source/icon_grid_0015_R02B09_G.nc'

Note

If you have a LAM grid, the pre-create-grid job also creates the lateral_boundary.grid.nc for you and can remap ERA5 data to that grid.

Hint

Take a look at the template LAM_DUST for a full running example.

File location order#

Depending on the file type (general or domain specific) and the specified name (link TARGET), there are multiple options where to look for the file. The job PRE_FIND_FILES searches for all these files. If one of the files cannot be found and the file cannot be created (as ERA5 input data could be), this job fails. The search order is as follows:

1. Absolute path#

If the TARGET starts with a slash (/), it is treated as an absolute path and searched for directly. If this file is not present, file location will fail.

2. URL#

If the TARGET seems to be a URL, the URL is downloaded to DIRECTORIES.INDIR and subsequently linked. If downloading fails, file location fails.

3. Serach input and pool directories#

Next, several directories will be searched for. For each directory (DIR) it is first checked, whether the file is present directly in DIR (DIR/FILE), if not, whether it exists somewhere in a subdirectory of DIR. First, the INDIR is searched and then other pool directories (see below). The type of directories searched depends on the type of file to look for.

In pool directories that are a URL, the file POOLURL/TARGET will be attempted to download. If the download succeeds, it is linked, otherwise file location fails.

Pool directories#

The following table gives an overview of all pool directories. If multiple are given, they are searched from top to bottom. The config keys are specified in conf/common/platforms/<platform>.yml. You can add further pool directories to the list.

Pool

Default (Levante)

Default (Horeka)

GRID

/pool/data/ICON/grids/public

/lsdf/kit/imk/projects/icon/INPUT

GENERAL

/pool/data/ICON

/lsdf/kit/imk/projects/icon

EXP (a)

/lsdf/kit/imk/projects/icon/TESTSUITE

(a): These pools will ge the experiment name appended, i.e. the pool directory on HoreKa that is actually search is /lsdf/kit/imk/projects/icon/TESTSUITE/<EXPNAME>.

Pool

Default (all platforms)

GRID

http://icon-downloads.mpimet.mpg.de/grids/public/edzw

http://icon-downloads.mpimet.mpg.de/grids/public/edzw

ICON

%ICON.INSTALLDIR%

ART

%ICON.INSTALLDIR%/externals/art/runctrl_examples/xml_ctrl/%EXPNAME%

%ICON.INSTALLDIR%/externals/art/runctrl_examples/photo_ctrl

%ICON.INSTALLDIR%/externals/art/runctrl_examples/init_ctrl

%ICON.INSTALLDIR%/externals/art

Grid creation#

Grid files can automatically be created with auto-icon. To do so, the pre-create_grid job should be activated, e.g. with the create grid option of the init script. Details on the grid to be created can be supplied in the GRID section. For details, please refer to the conf/art/experiments/template.yml file.

ERA5 or IFS input data#

In case you use ERA5 or IFS data as initial conditions, the raw data can be remapped to the ICON grid if required. Hereby, the IFS raw data (if applicable) file (e.g. ifs_r1279+O_<DATE>.grb) needs to be findable in the above pool directories or already present in the input directory. The grid file will be located with the above described routines. The ERA5 raw data (if applicable) is retrieved automatically on Levante. On other machines, this is currently not implemented (see issue #132).

The remapping can be done with the DWD_ICON_TOOLS or CDO. The method to be used can be set in the init script.

If you create and run your experiment or at least the job again, the remapping will be skipped and the existing file be used. If you want to do the remapping anyway, you can set MISC.REQUIRE_REMAPPING to True in the file conf/art/simulation.yml.

Online archive of input data#

If (parts of) the input data is available as a downloadable repository, the URL can be specified in DIRECTORIES.ARCHIVE in the <expname>/experiment.yml file. It will be automatically downloaded and extracted into INDIR.