Hi all,

I tried to run OpenIFS 43r3 with TCO95 and I'm using pextra fields for the first time. The run crashed due to a 'Floating point exception (core dumped)’.

Is there any additional information necessary to know about pextra or a transformation, that should be taken into account here?

Does anyone encountered a similar problem?

Many thanks for any help!

Best regards, 

Julian Krüger

9 Comments

  1. Hi Julian,

    The 'PEXTRA' facility in IFS has to be set up carefully; it's a tool that the research department here use but it's not integrated into the namelists the same way as the normal output fields are.

    There are several steps involved, they are described in more detail on this page: How to control OpenIFS output#Physicaltendenciesandfluxes(budget)output(PEXTRA)

    The steps are:

    1. Set LBUD23=.true. in the namelist (this turns on the extra diagnostics output but DOES NOT define which fields NOR allocate storage)

    2. Set NVEXTR & NCEXTR in namelist NAMDHY.   This creates the array space in the model for the extra fields.

    3. Set the required GRIB codes to be output, by setting NVEXTRAGB. You MUST make sure that the number of grib codes matches the value used in NVEXTR.

    4.None of the above steps actually cause the fields to be written. The last step involves setting MFP3DFS to NAMFPOS namelist which uses the grib codes defined above.

    If you've checked all these steps against the link above and it looks correct, attach the model logfile and the crash report to this forum post and I'll take a look.


    Best regards,  Glenn

  2. Hi Glenn

    Julian and I spoke this morning and I agreed to have a look as well.

    I tried all your suggestions above, but the error remains. Should say we are using OpenIFS + XIOS, so not the normal GRIB output.

    I have set the following in my fort.4:


    fort.4


    &NAMFPC
        CFPFMT = 'MODEL'
        NFP3DFS = 4
        NFP3DFT = 0
        NFP3DFV = 0
        NFP2DF = 2
        MFP2DF = 129, 152
        NFP3DFP = 4
        MFP3DFP = 131, 132, 130, 133
        MFP3DFS = 91, 92, 93, 94
        NFPPHY = 34
        MFPPHY(:) = 167, 235, 151, 134, 165, 166, 168, 228, 144, 143, 182, 44,
    		180, 181, 146, 147, 175, 176, 169, 177, 210, 211, 164, 212,
                    178, 179, 208, 209, 78, 79, 205, 206, 151131, 151132
        RFP3P(:) = 100000.0, 92500.0, 85000.0, 50000.0, 30000.0, 10000.0, 5000.0,
                   1000.0, 500.0, 100.0, 50.0, 10.0
        NRFP3S = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
                 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
                 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
                 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,
                 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
                 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91
        NFPCLI = 0
        LFPQ = .false.
        LTRACEFP = .false.
        RFPCORR = 60000.0
    /
    &NAMDPHY
        NVEXTR = 4
        NCEXTR = 91
    /
    &NAMPHYDS
        NVEXTRAGB(1:4) = 91, 92, 93, 94
    /

    And I've activated pextra_91, pextra_92, pextra_93 and pextra_94 in the file_def_ifs.xml file.

    This should give dynamics tendencies of u,v,t,q on all 91 model levels, but the error is "floating-point overflow"  with the traceback:

     79: forrtl: error (78): process killed (SIGTERM)
     79: Image              PC                Routine            Line        Source
     79: oifs               00000000027A6994  Unknown               Unknown  Unknown
     79: libpthread-2.17.s  00002AAAB423D630  Unknown               Unknown  Unknown
     79: libpsm2.so.2.2     00002AAB7D6F3D59  Unknown               Unknown  Unknown
     79: libpsm2.so.2.2     00002AAB7D6EDC8D  psm2_mq_ipeek         Unknown  Unknown
     79: libpsmx2-fi.so     00002AAB7D473A53  Unknown               Unknown  Unknown
     79: libpsmx2-fi.so     00002AAB7D475966  Unknown               Unknown  Unknown
     79: libmpi.so.12.0.0   00002AAAB36D727C  Unknown               Unknown  Unknown
     79: libmpi.so.12.0.0   00002AAAB2FF4777  Unknown               Unknown  Unknown
     79: libmpi.so.12.0.0   00002AAAB37181AE  PMPI_Wait             Unknown  Unknown
     79: libmpifort.so.12.  00002AAAB2B91C3D  mpi_wait              Unknown  Unknown
     79: oifs               0000000000A32470  mpl_wait_mod_mp_m         193  mpl_wait_mod.F90
     79: oifs               0000000001808BA8  trltog_mod_mp_trl         786  trltog_mod.F90
     79: oifs               0000000001801D8B  trltog_mod_mp_trl          88  trltog_mod.F90
     79: oifs               0000000001800538  ftinv_ctl_mod_mp_         258  ftinv_ctl_mod.F90
     79: oifs               0000000000A55902  inv_trans_ctl_mod         279  inv_trans_ctl_mod.F90
     79: oifs               0000000000871B52  inv_trans_                600  inv_trans.F90
     79: oifs               0000000000B642EB  transinv_mdl_             286  transinv_mdl.F90
     79: oifs               0000000000B5A5C3  transinvh_                114  transinvh.F90
     79: oifs               000000000046507D  stepo_                    265  stepo.F90
     79: oifs               000000000046124E  cnt4_                    1140  cnt4.F90
     79: oifs               00000000004369DD  cnt3_                     278  cnt3.F90
     79: oifs               0000000000435902  cnt2_                      88  cnt2.F90
     79: oifs               00000000004357F8  cnt1_                      92  cnt1.F90
     79: oifs               0000000000434A8F  cnt0_                     146  cnt0.F90
     79: oifs               00000000004202D9  MAIN__                    129  master.F90
     79: oifs               000000000041FFE2  Unknown               Unknown  Unknown
     79: libc-2.17.so       00002AAAB446C555  __libc_start_main     Unknown  Unknown
     79: oifs               000000000041FEE9  Unknown               Unknown  Unknown

    The error seems to be in the MPI communications associated with inverse spectral transforms. But that might just be where the FPE is caught and the real error is somewhere else.

    We are using the famously problematic Intel MPI 2019, but it always worked for uncoupled runs before so it should be ok with PEXTRA turned on too.

    Full log file and namelist are attached.

    Cheers
    Joakim




  3. Hi Joakim, Julian,

    I did wonder if you had already done the required steps correctly but it's always the first thing to ask and check.

    At first look I can't see anything wrong in what you are doing. I'm not used to seeing the model blow up with an overflow in the transforms though.

    How many steps has the model run?

    I'll take your namelist file and try to reproduce your crash in a low-res standalone test I have. If that doesn't work then I'll have something to investigate.  I'll try do it today but I'm in a meeting all afternoon.

    Perhaps you could try just setting a single pextra variable instead of 4 and see if that makes any difference?

      Cheers,  Glenn

  4. Hi Julian,

    I can reproduce an abort from the model using your namelist settings. Can you please try defining all the GRIB codes instead of just 4?  I vaguely remember they all need defining if you enabled LBUD23 as the code writes to the arrays even if you don't ask for them to be output. So if not all are defined, you get memory overwrites. I've had a busy week of meetings so I've not had chance to check the code, but try that and let me know?

    CHeers,  Glenn

  5. Hi Glenn

    Thanks for checking! I will run this test tomorrow or Thursday. We've got some big HPC maintenance running for a few days now.

    Cheers
    Joakim

  6. Hi Julian, Joakim,

    I've had more time to check this. I can confirm that you need to define ALL of the grib codes if you enable LBUD23. If I do this, the model works ok. You can choose what to output but you must set NVEXTR & NVEXTRAGB to define all the budget tendencies.

    Have a look at the code in src/ifs/phys_ec/callpar.F90. This is where the LBUD23 diagnostics are written and it's clear that once LBUD23 is enabled, all of the tendencies will be written.

    One remaining issue is that the code appears to be writing to 25 budget variables, not the 24 listed on the OpenIFS User Guide. This might be a change between OpenIFS 40r1 & 43r3. I've contacted someone internally to double check this and I'll get back to you as soon as possible.

    Cheers,  Glenn


  7. Hi Julian, Joakim,

    Richard Forbes has got back to me about the extra budget diagnostics activated with the LBUD23 namelist switch.   This confirms:

    • you must define all GRIB codes with NVEXTR & NVEXTRAGB even if you don't intend to write all of them out.
    • there are 25 budget variables, not the 24 listed on the OpenIFS user guide (I will update this shortly).

    I attach an updated document but confusing this lists 26 budget variables, though only 25 are defined in the code. Note that the 25th (grib code 115) is actually a 3D field made up of individual 2D fields.

    If you still have problems, let me know.

    Glenn

    ifs_LBUD23_extra_diagnostics_list.pdf

  8. Hi Glenn

    I think I've got a running configuration now. As you said, I had to define all 25 variables in NVEXTRAGB etc.
    I also found that I had to update yomxios.F90 to support field 115.

    (base) blogin5:~/model_codes/oifs-43r3-v1/oifs-43r3-v1 $ git diff
    diff --git a/src/ifs/module/yomxios.F90 b/src/ifs/module/yomxios.F90
    index 3e60047..578f6c2 100644
    --- a/src/ifs/module/yomxios.F90
    +++ b/src/ifs/module/yomxios.F90
    @@ -107,19 +107,19 @@ INTEGER(KIND=JPIM), PARAMETER :: IGRB3DFLD(N3DFLD)=&
         &                  157,                203,                246,                247,                248/)
     
     ! 3D PEXTRA fields ids and grib codes
    -INTEGER(KIND=JPIM), PARAMETER :: NPEXTRAFLD=24
    +INTEGER(KIND=JPIM), PARAMETER :: NPEXTRAFLD=25 !24
     CHARACTER (LEN=16), PARAMETER :: CPEXTRAFLD(NPEXTRAFLD)=&
         &(/ 'pextra_91       ', 'pextra_92       ', 'pextra_93       ', 'pextra_94       ', 'pextra_95       ', &
         &   'pextra_96       ', 'pextra_97       ', 'pextra_98       ', 'pextra_99       ', 'pextra_100      ', &
         &   'pextra_101      ', 'pextra_102      ', 'pextra_103      ', 'pextra_104      ', 'pextra_105      ', &
         &   'pextra_106      ', 'pextra_107      ', 'pextra_108      ', 'pextra_109      ', 'pextra_110      ', &
    -    &   'pextra_111      ', 'pextra_112      ', 'pextra_113      ', 'pextra_114      '/)
    +    &   'pextra_111      ', 'pextra_112      ', 'pextra_113      ', 'pextra_114      ', 'pextra_115'/)
     INTEGER(KIND=JPIM), PARAMETER :: IGRBPEXTRAFLD(NPEXTRAFLD)=&
         &(/                 91,                 92,                 93,                 94,                 95, &
         &                   96,                 97,                 98,                 99,                100, &
         &                  101,                102,                103,                104,                105, &
         &                  106,                107,                108,                109,                110, &
    -    &                  111,                112,                113,                114/)
    +    &                  111,                112,                113,                114,                115 /)
     
     ! PEXTRA setup variables
     LOGICAL :: LBUD23=.FALSE.

    and add the new "pextra_115" to both field_def_ifs.xml and file_def_ifs.xml.

    With those changes, I can get PEXTRA on both reduced and regular grids.

    Thanks again for the help!
    /Joakim

  9. Hi Joakim,

    Happy New Year!  Glad it all works and thanks for those changes to the XIOS interface code and files. I will add those into the next release of OpenIFS 43r3, which should be out Feb.

    Cheers,  Glenn