Using PSyclone

Overview

This section contains step-by-step instructions that demonstrate an application of the PSyclone-based source-code processing option available in the NEMO build system. It assumes that PSyclone has been correctly installed as detailed in the NEMO installation guide and that an appropriate arch file has been generated (hereafter referred to as arch-auto.fcm).

This initial application is simply a PSyclone passthrough of a NEMO configuration. That is, the source code is processed through PSyclone but the resultant code is functionally equivalent to the original source (although standardisation of some F90 constructs will be carried out). Later additions to this guide will illustrate how to add transformation scripts to this process to perform complex tasks such as identifying computational kernels and inserting OpenACC directives for GPU offloading.

PSyclone processing of the NEMO source code is available as an option in the NEMO build system, and PSyclone transformations can be enabled via option:

-p <PSyclone processing option>

of the ./makenemo command (-p all lists the available PSyclone transformations). The different options correspond to transformation scripts in directory sct/ with the exception of passthrough (which is supported internally). This initial part of PSyclone-processing guide demonstrates the use of the passthrough option from scratch, including the compilation of a model configuration with and without PSyclone passthrough, running of the model, and verification of the passthrough.

Compilation and running of the BENCH test case

Step 1 - Compile the BENCH test case with and without PSyclone passthrough

At the top level of the NEMO repository,

$ cd nemo

a reference configuration of the BENCH test case can be compiled:

$ ./makenemo -m auto -a BENCH -n BENCH_0 -j 8 -v 1

Next, a corresponding configuration with PSyclone passthrough can be built:

$ ./makenemo -m auto -a BENCH -n BENCH_PT -j 8 -v 1 -p passthrough

For demonstration purposes, a simple, non-invasive PSyclone transformation script can alternatively be enabled:

$ ./makenemo -m auto -a BENCH -n BENCH_INFO -j 8 -v 1 -p list_symbols

This variant produces the same source code as the passthrough, but during the build process it outputs the names of all variables defined in the majority of the Fortran modules and in the associated module procedures.

Step 2 - Prepare a suitable configuration and submission script

For testing, a BENCH configuration of a small domain (ORCA2 equivalent) and a low number of time steps (80) can be generated as:

$ sed -e 's/nn_itend.*/nn_itend=80/' -e 's/nn_isize.*/nn_isize=180/' -e 's/nn_jsize.*/nn_jsize=148/' -e 's/nn_ksize.*/nn_ksize=31/' -e 's/ln_timing.*/ln_timing=.true./' -e '/\&namctl/asn_cfctl%l_runstat=.true.' tests/BENCH/EXPREF/namelist_cfg_orca1_like > ./namelist_cfg

Step 3 - Prepare a submission script (if required)

In principle, inside the experiment directories tests/BENCH_{0,PT,INFO}/EXP00/ it would suffice to run the model as mpirun -n 4 ./nemo or similarly, but assuming that NEMO runs are typically submitted on a HPC system via a job scheduler, script ./submit.sh will be assumed to contain the necessary system-specific settings and commands (in effect starting the executable ./nemo in an MPI environment) in the next step:

$ vi submit.sh; chmod u+x ./submit.sh

Step 4 - Run NEMO BENCH_0 and BENCH_PT

Next, the two model runs can be started:

$ cp namelist_cfg submit.sh tests/BENCH_0/EXP00/
$ cd tests/BENCH_0/EXP00/
$ <job submission command> ./submit.sh
$ cd -
$ cp namelist_cfg submit.sh tests/BENCH_PT/EXP00/
$ cd tests/BENCH_PT/EXP00/
$ <job submission command> ./submit.sh
$ cd -

Verification and source-code inspection

Step 5 - Verify the PSyclone passthrough

Once the runs have finished, comparison of model output from the model builds with and without PSyclone passthrough,

$ vimdiff tests/BENCH_{0,PT}/EXP00/run.stat

should (hopefully) reveal identical results.

Step 6 - Inspect the source code for the effect of the PSyclone passthrough

With PSyclone processing, the build system processes the NEMO source code in three stages (or with AGRIF in four stages): CPP preprocessing, PSyclone processing, and the actual compilation. For the example of the BENCH_PT configuration, the original source-code files are linked to in directory tests/BENCH_PT/WORK, the CPP preprocessed versions can be found in directory tests/BENCH_ST/BLD_SCT_PSYCLONE/ppsrc/nemo/, and the PSyclone processed files supplied to the Fortran compiler are at tests/BENCH_ST/BLD_SCT_PSYCLONE/obj/. For example, differences between the three source-code variants for module usrdef_sbc can be visualised with:

$ vimdiff tests/BENCH_PT/WORK/usrdef_sbc.F90 tests/BENCH_PT/BLD_SCT_PSYCLONE/{ppsrc/nemo,obj}/usrdef_sbc.f90

The substantial transformation of the original WHERE construct starting at line 178 of the original file in this example demonstrates the normalisation aspect of the PSyclone processing stage. The various build stages are illustrated by this example which takes the code block from its original:

DO jl = 1, jpl
   WHERE    ( phs(A2D(0),jl) <= 0._wp .AND. phi(A2D(0),jl) <  0.1_wp )     ! linear decrease from hi=0 to 10cm
      qtr_ice_top(:,:,jl) = qsr_ice(:,:,jl) * ( ztri(:,:) + ( 1._wp - ztri(:,:) ) * ( 1._wp - phi(A2D(0),jl) * 10._wp ) )
   ELSEWHERE( phs(A2D(0),jl) <= 0._wp .AND. phi(A2D(0),jl) >= 0.1_wp )     ! constant (ztri) when hi>10cm
      qtr_ice_top(:,:,jl) = qsr_ice(:,:,jl) * ztri(:,:)
   ELSEWHERE                                                         ! zero when hs>0
      qtr_ice_top(:,:,jl) = 0._wp
   END WHERE
ENDDO

through normal CPP macro expansion, to:

DO jl = 1, jpl
   WHERE    ( phs(Nis0-(0):Nie0+(0),Njs0-(0):Nje0+(0),jl) <= 0._wp .AND. phi(Nis0-(0):Nie0+(0),Njs0-(0):Nje0+(0),jl) <  0.1_wp )     ! linear decrease from hi=0 to 10cm
      qtr_ice_top(:,:,jl) = qsr_ice(:,:,jl) * ( ztri(:,:) + ( 1._wp - ztri(:,:) ) * ( 1._wp - phi(Nis0-(0):Nie0+(0),Njs0-(0):Nje0+(0),jl) * 10._wp ) )
   ELSEWHERE( phs(Nis0-(0):Nie0+(0),Njs0-(0):Nje0+(0),jl) <= 0._wp .AND. phi(Nis0-(0):Nie0+(0),Njs0-(0):Nje0+(0),jl) >= 0.1_wp )     ! constant (ztri) when hi>10cm
      qtr_ice_top(:,:,jl) = qsr_ice(:,:,jl) * ztri(:,:)
   ELSEWHERE                                                         ! zero when hs>0
      qtr_ice_top(:,:,jl) = 0._wp
   END WHERE
ENDDO

and PSyclone transformation to:

do jl = 1, jpl, 1
  do widx2 = 1, Nje0 + 0 - (Njs0 - 0) + 1, 1
    do widx1 = 1, Nie0 + 0 - (Nis0 - 0) + 1, 1
      if (phs(Nis0 - 0 + widx1 - 1,Njs0 - 0 + widx2 - 1,jl) <= 0._wp .AND. &
         &phi(Nis0 - 0 + widx1 - 1,Njs0 - 0 + widx2 - 1,jl) <  0.1_wp) then
          qtr_ice_top(LBOUND(qtr_ice_top, dim=1) + widx1 - 1,LBOUND(qtr_ice_top, dim=2) + widx2 - 1,jl) = &
         &    qsr_ice(LBOUND(qsr_ice,     dim=1) + widx1 - 1,LBOUND(qsr_ice,     dim=2) + widx2 - 1,jl) * &
         &      (ztri(LBOUND(ztri,        dim=1) + widx1 - 1,LBOUND(ztri,        dim=2) + widx2 - 1) + &
         & (1._wp - ztri(LBOUND(ztri, dim=1) + widx1 - 1,LBOUND(ztri, dim=2) + widx2 - 1)) * &
         & (1._wp - phi(Nis0 - 0 + widx1 - 1,Njs0 - 0 + widx2 - 1,jl) * 10._wp))
      else
        if (phs(Nis0 - 0 + widx1 - 1,Njs0 - 0 + widx2 - 1,jl) <= 0._wp .AND. &
           &phi(Nis0 - 0 + widx1 - 1,Njs0 - 0 + widx2 - 1,jl) >= 0.1_wp) then
            qtr_ice_top(LBOUND(qtr_ice_top, dim=1) + widx1 - 1,LBOUND(qtr_ice_top, dim=2) + widx2 - 1,jl) = &
           &    qsr_ice(LBOUND(qsr_ice,     dim=1) + widx1 - 1,LBOUND(qsr_ice,     dim=2) + widx2 - 1,jl) * &
           &       ztri(LBOUND(ztri,        dim=1) + widx1 - 1,LBOUND(ztri,        dim=2) + widx2 - 1)
        else
          qtr_ice_top(LBOUND(qtr_ice_top, dim=1) + widx1 - 1,LBOUND(qtr_ice_top, dim=2) + widx2 - 1,jl) = 0._wp
        end if
      end if
    enddo
  enddo
enddo

where the latter has been manually reformatted for readability.