Transforming a runscript into an auto-icon configuration#

This page is a guide on how to transform your runscript into an auto-icon configuration. Further information can be found especially in the guide on the configuration and also in the step-by-step guide. The guide is basically an extension to the steps 5 (Modify the namelist) and 6 (Modify the configuration).

As an example, we take a runscript, corresponding to the NWP_EXT_DATA test case, that is also included as a template in auto-icon.

Configuration files#

Three main configuration files#

There are a few configuration files that the user has to update for a simple such experiment. In the following %PROJDIR% always refers to the project directory of your experiment, i.e. by default to /pool/experiments/<expid>/proj/git_project.

Include-file: %PROJDIR%/conf/art.yml
Yaml configuration file: %PROJDIR%/conf/art/experiments/<expname>.yml
Namelist file: %PROJDIR%/namelists/art/<expname>.yml or %PROJDIR%/namelists/art/<expname>.nml

Hereby, <expname> is an experiment name, that you make up yourself. You enter that name in the include file. It is used as an identifier of the experiment for your configuration and namelist file and also on the HPC system as an output (and input) directory name.

Using the init script#

Further, you can use the initialization script auto-icon-init for a convenience setup (call it with -h or check out the Step-by-step guide for details). The script will

Create a subdirectory <expname> in your experiment directory (i.e. <expid>/<expname>).
Create symlinks in this directory to the the relevant config files.
Modify the include file to use the options provided to the script, including the experiment name. If you supplied all the options, you don’t have to modify the include file anymore.

A note on building the ICON model#

auto-icon will take care of compiling the model code. For this guide, the default settings are perfectly fine. If you would like to run your own experiments with a specific model version or you need to know the location of the ICON directory, etc., further information is provided on the page on Building ICON.

Reinitializations#

Runs that do regular reinitializations (i.e. initialization of meteorology, aerosol or chemistry data in regular intervals throughout the run), are supported. The following runscript example does not cover this, but instead there is a section on this in the configuration page.

Runscript sections overview#

In the following, all the sections of the run script will be inspected.
The first line in the code blocks (starting with -----) is only there to help the reader allocate the code block to the respective file. It is present in neither of the files and should not be included there as it breaks the syntax.

Directories#

Initially, there is a section on the used directories, defining variables that will be used throughout the script. auto-icon has similar such definitions.

-----ORIGINAL RUNSCRIPT-----
#!/bin/bash
CENTER=IMK
basedir=/home/hk-project-scs/iv8169/icon/icon-art.intel-19.1-openmpi-4.0/run/checksuite.icon-kit/../..
icon_data_poolFolder=/lsdf/kit/imk/projects/icon/INPUT/AMIP/amip_input
aer_opt=/lsdf/kit/imk/projects/icon/INPUT/AMIP/amip_input
EXPNAME=atm_amip_test_kit
OUTDIR=/hkfs/work/workspace/scratch/iv8169-icon/output/NWP_EXT_DATA
ICONFOLDER=/home/hk-project-scs/iv8169/icon/icon-art.intel-19.1-openmpi-4.0/run/checksuite.icon-kit/../..
ARTFOLDER=/home/hk-project-scs/iv8169/icon/icon-art.intel-19.1-openmpi-4.0/run/checksuite.icon-kit/../../externals/art
#INDIR=/lsdf/kit/imk/projects/icon/TESTSUITE
INDIR=/hkfs/work/workspace/scratch/iv8169-icon/input/TESTSUITE
EXP=NWP_EXT_DATA
lart=.True.

The definitions in auto-icon are set in the yaml configuration file:

----------%PROJDIR%/conf/art/experiments/<expname>.yml----------
DIRECTORIES:
  # INDIR, OUTDIR and REFDIR are treatet as full absolute paths.
  INDIR:  "%DIRECTORIES.PREFIX_INDIR%/input/%EXPNAME%"
  OUTDIR: "%DIRECTORIES.PREFIX_OUTDIR%/output/%EXPNAME%"

INDIR specifies the main directory, where to look for input files (e.g. initial conditions and external parameters) before searching HPC pool directories or online repositories.
OUTDIR specifies the output directory. All output will be written to this directory with subdirectories <start date>/<member>.

You can directly enter absolute paths (using bash variables defined globally or in ~/.bashrc is ok) for your paths, or you can use placeholders like PREFIX_... (for further details check out the configuration page.

A few of the vairables defined above are built into the core configuration and thus not provided here:

CENTER is currently set to IMK per default and can be set in the include file
basedir not used
icon_data_poolFolder and aero_opt are built-in to the HPC configuration (conf/common/platforms/<HPC>.yml)
ICONFOLDER and ARTFOLDER are built-in as auto-icon takes care of building the model (build location configurable in conf/common/build.yml)
lart moved into the namelist

Some lines of wrapper code follow, which are part of the auto-icon core.

-----ORIGINAL RUNSCRIPT (continuation)-----
FILETYPE=4
COMPILER=intel
restart=.False.
read_restart_namelists=.False.

# Remove folder  from OUTDIR for postprocessing output
OUTDIR_PREFIX="$OUTDIR"

# Create output directory and go to this directory
if [ ! -d $OUTDIR ]; then
    mkdir -p $OUTDIR
fi
cd $OUTDIR

Linked files#

The following section is to link files and directories to the output directory.

-----ORIGINAL RUNSCRIPT (continuation)-----
ln -sf ${INDIR}/${EXP}/icon_grid_0011_R02B03_R.nc iconR2B03-grid.nc
ln -sf ${INDIR}/${EXP}/icon_grid_0012_R02B04_G.nc iconR2B04-grid.nc
ln -sf ${INDIR}/${EXP}/icon_extpar_0012_R02B04_G_20131001.nc extpar_iconR2B04-grid.nc
ln -sf ${INDIR}/${EXP}/uc1_ei_t255_nc_remap_rev832_0012_R02B04_2004010100.nc ifs2icon_R2B04_DOM01.nc

ln -sf ${ARTFOLDER}/runctrl_examples/init_ctrl/mozart_coord.nc ${OUTDIR}/mozart_coord.nc
ln -sf ${ARTFOLDER}/runctrl_examples/init_ctrl/Linoz2004Br.dat ${OUTDIR}/Linoz2004Br.dat
ln -sf ${ARTFOLDER}/runctrl_examples/init_ctrl/Simnoy2002.dat ${OUTDIR}/Simnoy2002.dat

ln -sf $ICONFOLDER/data/rrtmg_lw.nc rrtmg_lw.nc
ln -sf $ICONFOLDER/data/ECHAM6_CldOptProps.nc ECHAM6_CldOptProps.nc

...

# this if condition is necessary because otherwise
# a new link in ${INDIR}/${EXP}/emiss_minimal is generated
# linking to itself
if [ ! -L ${OUTDIR}/emissions ]; then
ln -sd ${INDIR}/${EXP}/emiss_minimal       ${OUTDIR}/emissions
fi

...

This will all be included in the yaml configuration file in the DIRECTORIES section. The Grid(s) are specified in a special GRID section (see here for details).

----------%PROJDIR%/conf/art/experiments/<expname>.yml----------
#-- ----- Grid info ----- --#
GRID:
  RADGRID: True
  DOM1:
    TYPE: G
    R: 2
    B: 4
    GRID_NUMBER: 12
    EXTPAR_DATE: 20131001
    BASENAME:
      IFS: 'uc1_ei_t255_nc_remap_rev832'
DIRECTORIES:
  LINK_FILES:
    FILES:
      - 'emiss_minimal|emissions'
      - 'datasets|datasets'
      - '.|PFT'
      - '.|DMS'
    ICON_DATA:
      - 'rrtmg_lw.nc'
      - 'ECHAM6_CldOptProps.nc'
    ART:
      - 'mozart_coord.nc'
      - 'Linoz2004Br.dat'
      - ...

Any non-grid specific files and directories have to be specified to be linked. Absolute paths can be used as well as relative paths (to the respective directory: INDIR, the icon root directory of the ICON installation or the root of the ART directory). For relative paths, the file name is sufficient, provided it is unique in the pool directory. The files have to be given in a list as specified in the example.
Further details on how to specify linked files is presented on the page for input data.

Experiment timing#

A section on the experiment timing follows.

-----ORIGINAL RUNSCRIPT (continuation)-----
# the namelist filename
atmo_namelist=NAMELIST_${EXP}
#
#-----------------------------------------------------------------------------
# global timing
ndays_restart=60
dt_restart=`expr ${ndays_restart} \* 86400`
#
#-----------------------------------------------------------------------------
# model timing
STARTDATE="2004-01-01T00:00:00Z"
dtime=360  # 360 sec for R2B6, 120 sec for R3B7
nsteps=10
LEADTIME="PT1H"

((dt_checkpoint= 10 * 3600 * 24 ))
checkpoint_interval="P10D"

Timing of the experiment is handled by auto-icon. The user has to specify several values in the experiments sections:

Start date: DATELIST is a single or a list of start dates [1]; Format YYYYMMDD(HH)
Lead time: given by the number of chunks (NUMCHUNKS) and their length (CHUNKSIZE * CHUNKSIZEUNIT)
Time step: TIMESTEP is the time step that will be passed directly to the namelist. Usually, this is an integer (in seconds), or a time period (“PT6m”), but it can also be an evaluable python expression. It can also be omitted and set directly in the namelist.
The restart time will be set to the length of a chunk.
The checkpoint interval is set via CHECKPOINT_INTERVAL.

Different options for specifying the end time/date of the simulations are not supported, because that is not compatible with the way the simulation is split into chunks.
[1] Several start dates will cause several independent/parallel simulations to be run. A run with reinitialization uses only a single start date and multiple chunks.

Some misc part follows that is included in the config (above) and namelist (below).

-----ORIGINAL RUNSCRIPT (continuation)-----
#
#-----------------------------------------------------------------------------
# model parameters
model_equations=3             # equation system
#                     1=hydrost. atm. T
#                     1=hydrost. atm. theta dp
#                     3=non-hydrost. atm.,
#                     0=shallow water model
#                    -1=hydrost. ocean
#-----------------------------------------------------------------------------
# the grid parameters
atmo_dyn_grids="iconR2B04-grid.nc"
atmo_rad_grids="iconR2B03-grid.nc"
#-----------------------------------------------------------------------------

Master namelist#

The ICON master namelist follows. It is completely written by auto-icon, the user does not have to specify anything.

-----ORIGINAL RUNSCRIPT (continuation)-----
# create ICON master namelist
# ------------------------
# For a complete list see Namelist_overview and Namelist_overview.pdf

cat > icon_master.namelist << EOF
&master_nml
 lRestart               = .false.
/
&master_time_control_nml
 experimentStartDate = "$STARTDATE"
 forecastLeadTime = "$LEADTIME"
 checkpointTimeIntval = "$checkpoint_interval"
/
&master_model_nml
  model_type=1
  model_name="ATMO"
  model_namelist_filename="$atmo_namelist"
  model_min_rank=1
  model_max_rank=65536
  model_inc_rank=1
/
&time_nml
 ini_datetime_string = "$STARTDATE"
 dt_restart          = $dt_restart
/
EOF

#-----------------------------------------------------------------------------

Grids only have to be specified in the respective input section (see above), where also multiple domains can be inserted.

-----ORIGINAL RUNSCRIPT (continuation)-----
#-----------------------------------------------------------------------------
#
# write ICON namelist parameters
# ------------------------
# For a complete list see Namelist_overview and Namelist_overview.pdf
#
# ------------------------
# reconstrcuct the grid parameters in namelist form
dynamics_grid_filename=""
for gridfile in ${atmo_dyn_grids}; do
  dynamics_grid_filename="${dynamics_grid_filename} '${gridfile}',"
done
radiation_grid_filename=""
for gridfile in ${atmo_rad_grids}; do
  radiation_grid_filename="${radiation_grid_filename} '${gridfile}',"
done

Namelist#

Next is the namelist. The namelist is specified in a namelist file (<expname>/namelist.nml), with a bunch of additions done by auto-icon. A short overview can be found in the step-by-step guide, a detailed list on the namelists page. You can also convert the f90 namelist into yaml format (check the FAQs for a short how-to). This allows you to create multiple members with different namelist settings.

-----ORIGINAL RUNSCRIPT (continuation)-----
# ------------------------

cat > ${atmo_namelist} << EOF
&parallel_nml
 nproma         = 8  ! optimal setting 8 for CRAY; use 16 or 24 for IBM
 p_test_run     = .false.
 l_test_openmp  = .false.
 l_log_checks   = .false.
 num_io_procs   = 1  ! up to one PE per output stream is possible
 itype_comm     = 1
 iorder_sendrecv = 3  ! best value for CRAY (slightly faster than option 1)
/
&grid_nml
 dynamics_grid_filename  = ${dynamics_grid_filename}    ! substituted by:
 !dynamics_grid_filename  = "%EXPERIMENT.GRID%"
 ...
/
...
&art_nml
 ...
 cart_cheminit_file = '${INDIR}/${EXP}/ART_EMAC_H2SO4_iconR2B04-grid-2004-01-01-08_0012.nc'
 ! -> cart_cheminit_file = '%DIRECTORIES.INDIR%/ART_EMAC_H2SO4_iconR2B04-grid-2004-01-01-08_0012.nc'
 cart_cheminit_type = 'EMAC'
 cart_ext_data_xml = '${ARTFOLDER}/runctrl_examples/xml_ctrl/ext_dataset_minimal.xml'
 ! -> cart_ext_data_xml = '%ICON.INSTALLDIR%/externals/art/runctrl_examples/xml_ctrl/ext_dataset_minimal.xml'
/
EOF

Batch script#

Copying of the binary is not done, instead it is executed directly.

-----ORIGINAL RUNSCRIPT (continuation)-----
cp /path/to/icon/run/checksuite.icon-kit/../../bin/icon ./icon.exe

Finally, the batch script is written, which is also done by auto-icon, including both the SBATCH directives and the modules.

-----ORIGINAL RUNSCRIPT (continuation)-----
cat > job_ICON << ENDFILE
#!/bin/bash -x
#SBATCH --nodes=4
#SBATCH --time=00:15:00
#SBATCH --ntasks-per-node=76
#SBATCH --partition=cpuonly


module load compiler/intel/19.1 mpi/openmpi/4.0 lib/netcdf/4.7_serial lib/hdf5/1.10_serial lib/netcdf-fortran/4.5_serial lib/eccodes/2.22.0 numlib/mkl/2020 

mpirun --bind-to core --map-by core --report-bindings ./icon.exe

ENDFILE

chmod +x job_ICON
sbatch job_ICON

The user only needs to enter the number of nodes that the simulation should use for running icon and the respective wall time in the yaml config file in the jobs section:

----------%PROJDIR%/conf/art/experiments/<expname>.yml----------
JOBS:
  RUN_ICON:
    NODES: 4
    WALLCLOCK: '00:15'

The number of nodes, partition, account name and other SBATCH directives are included by auto-icon, if the user needs do add something, they can add this here, following the official Autosubmit documentation.