############################################
Migrating from 4.0.x to 4.2.x
############################################

.. |nbsp| unicode:: 0xA0 .. non-breaking space

Modification of input and output files
======================================

Up to NEMO version 4.0.x (inclusive), if the domain configuration includes one of the
possible periodicity conditions (East-West (EW), North-South (SN) or North-Pole (NP)
folding), all input and output files include extra columns and/or rows used to implement
the periodicity, see Figure 1.:

.. list-table::

   * - In EW periodicity, the first and last columns are duplicated. 
   * - In the SN periodicity, the first and last rows are duplicated. 
   * - The NP folding requires the duplication of (at least) the last row. 
   * - |nbsp| |nbsp| |nbsp| Note that, NP folding also includes the EW periodicity (jperio = 1).
   * - The bi-periodicity (jperio = 7) is the combination of jperio = 1 and 2.

.. figure:: _static/NEMO_periodicity.png
   :width: 95%
   :align: center

   Figure 1: Columns and/or rows duplication included in input/output files according to the
   chosen periodicity. Thick black rows delimit the part of the domain included in all
   input/outputs files.

**From NEMO 4.2.0, the columns and rows that are duplicated because of periodicity (EW, SN
and NP)  are excluded from all input and output files.**  In practice, when preparing old
configurations for use with 4.2.0, this means:

.. list-table::

   * - In EW periodicity, the first and last columns are removed. 
   * - For the SN periodicity, the top and bottom rows are removed 
   * - The last row is also removed when using the NP folding 
   * - Closed boundary conditions are unchanged and still require a
   * - |nbsp| |nbsp| |nbsp| column or a row of land points  

These changes are illustrated in Figure 2. 

If we consider for example the ORCA2 grid, up to version 4.0.x (included), the grid size
is (182/149). From version 4.2.0, its size becomes (180/148) as the first and last columns
have been removed because of the EW periodicity, and as the last row has been removed
because of the NP folding. The southern boundary is untouched as it is closed by a row of
land points.

Note that, in NP folding case, there is often more than the last row, which is duplicated.
For NP fording around the "T" point, 1.5(2) rows are replicated for the grids T and U (V
and F). The folding around the "F" point requires duplication of 1(1.5) rows for grids T
and U (V and F). We however decided to remove only the last row in all cases to make
things simpler. In consequence, even with the top row removed, the input and output files
may still contain duplicated points depending of the grid type and the NP folding case.

The simplest way to use the input files for version <= 4.0.x with version >= 4.2.0 is to
cut the extra columns and rows in the input files with a command like ``ncks -F -d`` or ``cdo
selindexbox``. Note that the sides of the domain which correspond to closed boundaries must
not been changed for the revision 4.2.0. Here is and example on how to rewrite the domcfg
file of ORCA2 from 4.0.x to 4.2.0:

.. code-block::

   ncks -F -d x,2,181 -d y,1,148 ORCA2_r4.0.x_domcfg.nc ORCA2_r4.2.0_domcfg.nc

This simple cut of the columns and rows can also be applied to the weights files used for
the on-the-fly interpolation.

.. figure:: _static/NEMO_periodicity2.png
   :width: 95%
   :align: center

   Figure 2: Exclusion of the duplicated columns and rows according to the chosen
   periodicity. Thick black rows delimit the part of the domain included in all input/outputs
   files. Light colour cells are no longer required.

Even when extra columns and rows used for periodicity (light colours in Figure 2) are
excluded from the input and output files, they are still needed during the code
integration. They are therefore recreated during code execution and the grid size in the
input and output files is thus different from the grid size inside the code. The values of
the additional columns and/or rows are automatically defined at run-time by applying the
proper boundaries conditions (by means of a call to the lbc_lnk routine).

From 4.2.0, the grid size inside the code depends on the number of MPI halos (``nn_hls`` in
the namelist). We add ``nn_hls`` columns/rows in each direction, even if the boundary is
closed. This last point differs from what was done before the version 4.2.0. If the domain
contains one or more closed boundaries, its domain size inside the code will differ before
and after the version 4.2.0, even if ``nn_hls = 1``. For example, with ORCA2, the grid size
inside the code was (182/149), it will be, from the 4.2.0, (182/150) if ``nn_hls = 1`` or
(184/152) if ``nn_hls = 2``. Note that, as the MPI domain decomposition depends of the grid
size inside the code, it will therefore change according to the value of ``nn_hls``. If the
domain contains a closed boundary, the MPI domain decomposition won't be the same before
and after the version 4.2.0, even if ``nn_hls = 1`` and even if we fix the number of MPI of
subdomains in the i and j directions (with ``jpni`` and ``jpnj`` in the namelist).

The way we define the type of periodicity has also been reviewed from version 4.2.0. We
replaced `jperio` by several,  more explicit variables. The table bellow details the
equivalent of the different values of jperio from revision 4.2.0.
   
===========          ===============================================================
<= 4.0.x               >= 4.2.0
===========          ===============================================================
jperio = 0            l_Iperio = .F., l_Jperio = .F., l_NFold = .F.
jperio = 1            l_Iperio = .T., l_Jperio = .F., l_NFold = .F.
jperio = 2            l_Iperio = .F., l_Jperio = .T., l_NFold = .F.
jperio = 3            l_Iperio = .F., l_Jperio = .F., l_NFold = .T., c_NFtype = "T"
jperio = 4            l_Iperio = .T., l_Jperio = .F., l_NFold = .T., c_NFtype = "T"
jperio = 5            l_Iperio = .F., l_Jperio = .F., l_NFold = .T., c_NFtype = "F"
jperio = 6            l_Iperio = .T., l_Jperio = .F., l_NFold = .T., c_NFtype = "F"
jperio = 7            l_Iperio = .T., l_Jperio = .T., l_NFold = .F.
===========          ===============================================================
   

Following these modifications we also changed the way options are defined in domcfg files.
All options are now defined with NetCDF global attributes instead of scalar variables.
This is more readable and easier to manipulate with commands like ncatted. The following
table details which attributes replace which scalar variable. Note that the 4.2.0 code is
still able to read the old domcfg files with the options defined with scalar variables
(including the automatic translation from jperio to the new varaibles listed in the above
table).
   
=======================   ===============================================================
<= 4.0.x                  >= 4.2.0
NetCDF scalar variables                  NetCDF global attributes
=======================   ===============================================================
jpiglo                    Automatically defined with the 1st dimension of the variable e3t_0
jpjglo                    Automatically defined with the 2nd dimension of the variable e3t_0
jpkglo                    Automatically defined with the 3rd dimension of the variable e3t_0
jperio                    Iperio = 0 or 1, 0 by default
                          Jperio = 0 or 1, 0 by default
                          NFold = 0 or 1, 0 by default
                          NFtype = "T", "F", "-" by default 
ORCA                      CfgName = any character string, 'UNKNOWN' by default
ORCA_index                CfgIndex = any integer, -999 by default
ln_zco = 1.               VertCoord = "zco", "-" by default
ln_zps = 1.               VertCoord = "zps", "-" by default
ln_sco = 1.               VertCoord = "sco", "-" by default
ln_isfcav                 IsfCav = 0 or 1, 0 by default
=======================   ===============================================================
     
Here is an example on how to create global attributes with the command ncatted: 

.. code-block::

   ncatted -a Iperio,global,c,l,1 -a NFtype,global,c,c,'T' domcfg.nc


Speed up associated with reduced memory requirements
====================================================
 
In order to reduce the memory footprint in NEMO 4.2, an additional way of dealing with vertical scale factors has been implemented using cpp keys (``key_linssh``, ``key_qco``). These keys allow a reduction of the memory footprint by removing 26 x 3D arrays at the cost of 10 x 2D arrays. In tests without I/O, this implementation allows a speed up of 7 to 15% depending on the average memory load among processors.


There are 4 distinct vertical coordinates, namely: fixed coordinates (ln_linssh=T) , variable zstar coordinates (ln_linssh=F & ln_vvl_zstar=T) , variable ztilde coordinates (ln_linssh=F & ln_vvl_tilde=T) and variable zlayer coordinates (ln_linssh=F & ln_vvl_layer=T ).
The optimization has only been implemented for fixed and zstar vertical coordinates but it will be generalized to all coordinates in  future releases. Users are encouraged to use it for their applications.

=============================          ===============================================================
<= 4.0.x                               >= 4.2.0
=============================          ===============================================================
ln_linssh=T                            ln_linssh=T                   & compile with ``key_linssh``
ln_linssh=F , ln_vvl_zstar=T           ln_linssh=F , ln_vvl_zstar=T  & compile with ``key_qco``
ln_linssh=F , ln_vvl_zlayer=T          ln_linssh=F , ln_vvl_zlayer=T
ln_linssh=F , ln_vvl_ztilde=T          ln_linssh=F , ln_vvl_ztilde=T
=============================          ===============================================================

.. note::

   In the cases when you use ``key_linssh`` or ``key_qco`` you must also set the namelist parameter ``ln_linssh`` accordingly.

**In the case of fixed vertical scale factors**, in addition to setting the namelist parameter ln_linssh = .true., compiling the code with ``key_linssh`` is recommended. The use of the key avoids the duplication of variables such as scale factors and water depth. Instead it substitutes them with time invariant arrays. For instance:

.. code-block::

   #   define  gdept(i,j,k,t) gdept_0(i,j,k)

**In the case of varying vertical scale factors with zstar coordinates**, in addition to setting the namelist parameter ln_linssh = .false., compile the code with ``key_qco``. QCO stands for Quasi-eulerian COordinates and it replaces VVL (Vertical Varying Layer). In practice, each vertical level varies linearly with respect to ssh. Therefore, time evolution of the vertical scale factors can be expressed as a function of ssh to h ratio (r3 = ssh/h) and 4D arrays can be substituted by simple expressions. For instance:

.. code-block::

   #   define  e3t(i,j,k,t)   (e3t_0(i,j,k)*(1._wp+r3t(i,j,t)*tmask(i,j,k)))

For ice-shelf simulations, a ``key_isf`` cpp key is needed in addition to ``key_qco``, so that water depth expressions gdept and gdepw take the ice-shelf depth into account: 

.. code-block::

   #   define  gdept(i,j,k,t) ((gdept_0(i,j,k)-risfdep(i,j))*(1._wp+r3t(i,j,t))+risfdep(i,j)) 
   

**In the case of varying vertical scale factors with other coordinates**, do not apply the ``key_linssh`` or ``key_qco`` keys.

.. note::

   This optimisation is currently not working with ztilde nor layer options for vertical scale factors. 


Shallow Water 
=============

The SWE (Shallow Water Equation) module (./src/SWE) is a demonstrator for shallow water equations newly included in NEMO v4.2. This module is an early demonstrator for the changes to the time-stepping scheme that are planned for the next release. It contains suitable shallow water time stepping routines for both Modified Leap Frog (stpmlf.F90) and Runge Kutta 3rd order (stprk3.F90) time schemes. Note that in the MLF case, the ``key_qco`` compilation pre-processing key is needed. RK3 requires both ``key_qco`` and either ``key_RK3`` or ``key_RK3all`` compilation pre-processing keys. RK3 time-stepping is a 3 stage algorithm: 1st and 2nd stages build estimates of intermediate fields, the 3rd stage integrates the model from n to n+1 using the intermediate estimates. With ``key_RK3all``, all the stages are integrated with the full suite of ocean dynamics including advection, vorticity, lateral diffusion, pressure gradient, and forcing. Instead, with ``key_RK3``, lateral diffusion and forcing are removed from stages 1 and 2.