Hi all
I've been running OpenIFS with nice, even numbers of NPROC, e.g. NPROC=200, 400, so far, but now tried to run with NPROC=199 and got this error
522 ABOR1 CALLED
522 sumpini: nprtrw (approx square value) > nspecresmin
Digging into the source code, this seems to be OpenIFS trying to distribute zonal waves and vertical levels across MPI tasks and failing to reach a good number. But I'm at a loss as to why NPROC=199 is a bad number and NPROC=200 is good.
Does anyone have a good "rule of thumb" on how NPROC should be set for a given configuration, e.g. T159L91 or T639L137?
Must NPROC always be an even number? Or a product of any two numbers? Or a multiple of the T or L numbers?
Best regards
Joakim
4 Comments
Glenn Carver
Hi Joakim,
For anyone else reading this let's just clarify that NPROC is not the total number of compute cores used to parallelize the model, it is the number of MPI tasks used. The total number of cores is given by NPROC x number of OpenMP threads, where NPROC is set in the model's namelist (fort.4) and the number of threads is set as an environment variable).
I don't have a good answer for what the allowed values of NPROC are, but I think your error is related to the way the model parallelizes the gridpoint and fourier/spectral space computations. There's a detailed description of this in the IFS CY40R1 technical manual (see section 2.2 onwards). Note that OpenIFS uses the EQ_REGION algorithm by default to decompose gridpoint space into parallel MPI tasks (two methods are described in the manual). In Fourier space, the decomposition has to be over latitudes & zonal waves and this has to match the number of MPI parallel regions in gridpoint space. Also, FFTs require power of 2 points. So this might be why odd values of NPROC don't work but I'm not an expert on this part of the code.
I can give you typical values of NPROC for resolutions that we use if you like? But typically it's determined by cost of running on the available machine. Speedup will obviously scale differently with different resolutions and increasing NPROC too.
The CY40R1 IFS technical manual can be found at: https://www.ecmwf.int/en/elibrary/9206-part-vi-technical-and-computational-procedures
Hope that helps,
Glenn
Jan Streffing
Hello Glenn,
I think this can not be the reason. I run OpenIFS with odd values of NPROC all the time. The reason is that I have the EC-Earth runoff mapper that uses only one cpu. It would be a waste to run it on a full node. When I started working I did some tests if it would be better to any of the following MPI decompositions:
FESOM2: 144, OIFS: 288, RNFMAP: 1 or
FESOM2: 144, OIFS: 287, RNFMAP: 1 or
FESOM2: 143, OIFS: 288, RNFMAP: 1
I found no difference in performance and results, so opted for the middle one and stuck with it ever since. I do the same in the OIFS-AMIP-reader configuration; OIFS gives up one process on the last node for the amip-reader to run in. I'm doing that at variety of resolutions and NPROCS.
I did btw. also find NPROCS that don't work. For example T288L91 with NPROC 287, 575 and 1151 work but with
323359 863 it crashes. (I forgot the exact error, that was a year ago)Cheers, Jan
Unknown User (joakimkjellsson@gmail.com)
Hi Glenn and Jan
Thanks for the replies. I read the CY40R1 documentation, but found it hard to get a grip on exactly what NPROC can/can't be, but I understand now that it has to be a decomposition in both GP and spectral space.
I thought that OpenIFS needs all longitudes on one latitude to do the FFT, so an MPI task has to contain a full latitude ring, and therefore using more than 160 tasks for an N80 grid should not work? But my T159 scales beautifully beyond 160 tasks, so that's definitely not the case.
I ended up trying a couple of different values, and found that as long as NPROC was a product of two numbers, OpenIFS ran fine. So odd numbers like 279 (=9x31) worked fine, but any prime number did not.
I've got the same situation as Jan and EC-Earth, that there's river routing scheme on 1 CPU, so I'd like OpenIFS to use NPROC so that NPROC+1 is a multiple of 40 (number of CPUs per node), and I found 279 to be a good choice.
Thanks again for the replies. I'll probably test a couple more NPROC values if I need to run the model on another machine.
Cheers
/Joakim
Jan Streffing
Hey you are right, I checked, it was 72*5-1 = 359 and 72*12-1 = 863 that failed. Both prime.