- Created by Glenn Carver, last modified on May 11, 2020
This page has a list of known issues for the Intel compilers (all versions).
OpenIFS 43r3 fails in WAM if compiled with intel/2017.0.3 compiler
OpenIFS 43r3 is known to fail in the wave model (WAM) if compiled with intel/17.0.3. Users may see a failure message like:
forrtl: error (65): floating invalid
and a traceback list that shows the error in:
master.exe 0000000001BDBC03 Unknown Unknown Unknown master.exe 0000000001A5282A wsigstar_ 86 wsigstar.F
It's possible this is related to the error below noted with intel/2017.4.
We recommend using either intel/2018 or intel/2019 which are known to work (OpenIFS is not tested against older intel compilers 2015 & 2016).
OpenIFS 43r3 fails in WAM if compiled with intel/2017.4 and impi/2018.4
OpenIFS 43r3 is known to fail in the wave model (WAM) if compiled with OpenMP enabled when using intel/2017.4 and impi/2018.4. The model works correctly with intel/2018 and intel/2019.
If using intel/2017 is necessary, a workaround is to disable compiling with openmp enabled.
Users may see the following error from the wave model in OpenIFS:
longjmp causes uninitialized stack frame ***: ./master.exe terminated
OpenIFS 40r1v2: Intel compiler 16.0.0 fails in sufa.F90
A user has reported that OpenIFS 40r1v2 fails to compile with the Intel compiler version 16.0.0 with the error:
[FAIL] /tmp/oifs40r1v2/src/ifs/setup/sufa.F90(296): error #5521: A derived-type object in an input/output list cannot have inaccessible components unless a suitable user-defined input/output procedure is available. [FAIL] WRITE(UNIT=NULOUT,FMT='(3(''( '',A16,1X,I3,'') ''))') & [FAIL] ^ [FAIL] compilation aborted for /tmp/oifs40r1v2/src/ifs/setup/sufa.F90 (code 1)
It's recommended to use a later version of the Intel compiler. Version 16.0.3 has been tested and known to work.
Use of MKL library can cause irreproducible results
OpenIFS includes a compilation configuration for the Intel compiler with the Intel MKL library (for optimized LAPACK/BLAS). However, please be aware use of this library can cause the model to be irreproducible, even on the same core count in successive runs. We recommend not using it if reproducibility is a concern.
OpenIFS also only provides a compilation configuration for the MKL and the Intel library. Linking MKL with other compilers is possible, though complicated and is not tried or tested with OpenIFS.
For help with linking the MKL library with other compilers, please see: https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
OpenIFS can fail with Intel compiler at -O2
There is an issue with OpenIFS when compiling with the Intel compiler at optimization level -O2 or above on chipsets that support SSE4.1 & AVX instructions. Intel compilers are generally more aggressive at optimisations for -O2 than other compilers.
Users will see failure with the T21 test job similar to the following:
signal_harakiri(SIGALRM=14): New handler installed at 0x432110; old preserved at 0x0 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 6.18 myproc#1,tid#1,pid#27600,signal#8(SIGFPE): Received signal :: 123MB (heap), 125MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 6.18 tid#1 starting drhook traceback, time = 6.18 myproc#1,tid#1,pid#27600: MASTER myproc#1,tid#1,pid#27600: CNT0<1> myproc#1,tid#1,pid#27600: CNT1 myproc#1,tid#1,pid#27600: CNT2 myproc#1,tid#1,pid#27600: CNT3 myproc#1,tid#1,pid#27600: CNT4 myproc#1,tid#1,pid#27600: STEPO myproc#1,tid#1,pid#27600: SCAN2H myproc#1,tid#1,pid#27600: SCAN2M myproc#1,tid#1,pid#27600: GP_MODEL myproc#1,tid#1,pid#27600: EC_PHYS_DRV myproc#1,tid#1,pid#27600: >OMP-PHYSICS CLDPP T/S (1002) myproc#1,tid#1,pid#27600: EC_PHYS myproc#1,tid#1,pid#27600: CALLPAR myproc#1,tid#1,pid#27600: SLTEND
It arises because this compiler makes use of 2-way vectorization when compiling both branches of IF statements which can generate floating point exceptions if a zero divide is possible in the unexecuted branch and the IFS internal signal handler (DRHOOK) is enabled.
There are several possible workarounds:
- Compile the routines that cause the problem with lower optimisation, -O1. The routines affected are:
sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90
. - Run with the environment variable: DR_HOOK_IGNORE_SIGNALS=8 to disable trapping of floating point exception signals (SIGFPE) by the model. This is not ideal as it will not catch other causes of floating point exceptions.
Edit the code and insert the line:
!DEC$ OPTIMIZE:1
directly after the SUBROUTINE statement into the routines:
sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90
.- Edit the intel-*.cfg configuration files in make/cfg and add lines to change the compile options specifically for these files.
OpenIFS uses a default of -O1 in the configuration files. If you increase the optimisation level, please be aware of this issue.
For more help with this issue, please contact openifs-support@ecmwf.int.
OpenIFS fails writing GRIB if grib_api compiled with Intel and -O2
We are aware of a problem in grib_api when using the Intel compiler that seems to affect different versions of grib_api and causes the model to fail with a floating point exception (SIGFPE). This is known to happen in the routine PRESET_GRIB_TEMPLATE or in the GRIB_F_SET_REAL8_ARRAY in the grib_api library. The advice is to reduce the optimization level when compiling grib_api to -O1 rather than -O2 or try a more recent version of the Intel compiler.
The error message that typifies this problem is:
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 3.10 JSETSIG: sl->active = 0 signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 3.10 [myproc#1,tid#1,pid#14063]: MASTER [myproc#1,tid#1,pid#14063]: CNT0<1> [myproc#1,tid#1,pid#14063]: SU0YOMB [myproc#1,tid#1,pid#14063]: SU_GRIB_API [myproc#1,tid#1,pid#14063]: PRESET_GRIB_TEMPLATE JSETSIG: sl->active = 0 signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0
or a traceback like this:
[gdb__sigdump] : Received signal#8(SIGFPE), pid=-1 [LinuxTraceBack]: Backtrace(s) for program 'oifs38r1/make/intel_mkl-opt_conv/oifs/bin/master.exe' (pid=38451) : (pid=38451): oifs38r1/src_conv/ifsaux/utilities/linuxtrbk.c:109 : master.exe() [0xc14a2d] (pid=38451): oifs38r1/src_conv/ifsaux/support/drhook.c:884 : master.exe() [0xac8ddb] (pid=38451): <Unknown> : libpthread.so.0(+0xf7e0) [0x7f59e215b7e0] (pid=38451): <Unknown> : libgrib_api.so.0(log.L+0x23c) [0x7f59e60db98c] (pid=38451): <Unknown> : libgrib_api.so.0(+0xa7de4) [0x7f59e60a7de4] (pid=38451): <Unknown> : libgrib_api.so.0(+0x9c6d4) [0x7f59e609c6d4] (pid=38451): <Unknown> : libgrib_api.so.0(grib_pack_double+0x18) [0x7f59e6079847] (pid=38451): <Unknown> : libgrib_api.so.0(+0xc4814) [0x7f59e60c4814] (pid=38451): <Unknown> : libgrib_api.so.0(+0xc4890) [0x7f59e60c4890] (pid=38451): <Unknown> : libgrib_api.so.0(grib_set_double_array_internal+0x68) [0x7f59e60c4921] (pid=38451): <Unknown> : libgrib_api.so.0(+0xa3a4a) [0x7f59e60a3a4a] (pid=38451): <Unknown> : libgrib_api.so.0(grib_pack_double+0x18) [0x7f59e6079847] (pid=38451): <Unknown> : libgrib_api.so.0(+0xc4814) [0x7f59e60c4814] (pid=38451): <Unknown> : libgrib_api.so.0(+0xc4890) [0x7f59e60c4890] (pid=38451): <Unknown> : libgrib_api.so.0(+0xc4b3f) [0x7f59e60c4b3f] (pid=38451): <Unknown> : libgrib_api_f90.so.0(grib_f_set_real8_array_+0x51) [0x7f59e6380aea] (pid=38451): <Unknown> : libgrib_api_f90.so.0(grib_api_mp_grib_set_real8_array_+0x8a) [0x7f59e63858af] (pid=38451): oifs38r1/src_conv/ifsaux/module/grib_api_interface.F90:358 : master.exe() [0xb03bbd]
Note that the grib packing can also fail if the model has produced fields with a very large range of values, such that the grib library can't pack the values into a smaller bit range. For further help, please contact openifs-support@ecmwf.int.
2 Comments
Victoria Sinclair
I think I am encountering the irreproducibility issue. I have compiled with
export OIFS_COMP=intel_mkl
export OIFS_BUILD=opt.
I have only seen this twice, both in long term (10 year) aqua-planet simulations. I have ran lots of weather forecast type of simulations and have not encountered this with the same compilation. The first time I saw this irreproducibility behaviour the model climate was statistically the same but the run as not identical and therefore I wasn't too concerned. However the second time, when I re-ran the simulation the model climate is completely different - the Hadley cell moves polewards and significantly strengthens! I am fairly sure I haven't changed anything else but is this possible? What sort of irreproducibility issues have you encountered before? I think the long term solution is to re-compile with a different compiler option. My only other option on the linux cluster I am running on is GNU (gcc / gfortran). Are these a safer option?
Victoria
Glenn Carver
Hi Victoria,
the issue with the MKL library if I remember correctly is that it may not always use memory consistently. By that I mean, when using vector instructions, depending on the memory alignment, it may split the vectorizable statements/arrays differently causing potential for rounding differences. This behaviour was seen in IFS some years ago, I have not tested it recently. Safe compiler options that _should_ reproducibility for intel is: -fpe0 -fp-model precise -fp-speculation=safe. You could also try using the lapack code supplied with OpenIFS rather than the system one. To do this, set the environment variable OIFS_LAPACK=openifs. I have not tested this for reproducibility though.
Glenn