You may use the ecinteractive tool to open interactive sessions with dedicated resources on the HPCF.
Those sessions run as an interactive job in the batch system, and allow you to avoid the the stricter limits on the CPUs, CPU Time or memory you find in standard interactive sessions on the login nodes.
However, note that they will be time constrained and once the job has reached it's time limit they will be closed.
Main Features
The main features of this ecinteractive tool are the following:
- Only one interactive job is allowed at a time
- Your job keeps on running after you exit the interactive shell, so you can reattach to it any time or open multiple interactive shells within the same job.
- You may open a basic graphical desktop for X11 applications.
- You may open a Jupyter Lab instance and connect to it through your browser.
- By default it will submit to the local cluster, or AA if run from the Linux VDI, although you can choose what complex (platform) to use.
- You can run ecinteractive from any Atos HPCF complex, Red Hat Linux VDI. You may also copy the script to your end user device and use it from there. It should work from Linux, Mac, or WSL under windows, and requires the Teleport tsh client to be installed and configured.
$ ecinteractive -h Usage : /usr/local/bin/ecinteractive [options] [--] -d|desktop Submits a vnc job (default is interactive ssh job) -j|jupyter Submits a jupyter job (default is interactive ssh job) -J|jupyters Submits a jupyter job with HTTPS support (default is interactive ssh job) More Options: -h|help Display this message -v|version Display script version -p|platform Platform (default aa. Choices: aa, ab, ac, ad, ecs) -u|user ECMWF User (default user) -A|account Project account -c|cpus Number of CPUs (default 2) -m|memory Requested Memory (default 8G) -s|tmpdirsize Requested TMPDIR size (default 3 GB) -t|time Wall clock limit (default 12:00:00) -k|kill Cancel any running interactive job -q|query Check running job -Q|quiet Silent mode -o|output Output file for the interactive job (default /dev/null) -x set -x
Before you start: Set up your SSH key-based authentication
For ecinteractive to work properly, passwordless ssh must be configured between Atos HPCF nodes. See HPC2020: How to connect for more information on how to set it up.
Creating an interactive job
You can get an interactive shell running on an allocated node within the Atos HCPF with just calling ecinteractive. By default it will just use the default settings which are:
Cpus | 2 |
---|---|
Memory | 8 GB |
Time | 12 hours |
TMPDIR size | 3 GB |
If you need more resources, you may use the ecinteractive options when creating the job. For example, to get a shell with 4 cpus and 16 GB or memory for 12 hours:
[user@aa6-100 ~]$ ecinteractive -c4 -m 16G -t 12:00:00 Submitted batch job 10225018 Waiting 5 seconds for the job to be ready... Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:59:55 4 16G ssdtmp:3G To cancel the job: /usr/local/bin/ecinteractive -k Last login: Mon Dec 13 09:39:09 2021 [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on aa6-104 at 20211213_093914.794, PID: 1736962, JOBID: 10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/ec/res4/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/ec/res4/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/ec/res4/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/ec/res4/scratchdir/user/8/10225018 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] Job 10225018 time left: 11:59:54 [user@aa6-104 ~]$
If you log out, the job continues to run until explicitly cancelled or reaching the time limit.
The maximum resources you request for your interactive session are those described in the ni (or ei for ecs users) in HPC2020: Batch system.
Reattaching to an existing interactive job
Once you have an interactive job running, you may reattach to it, or open several shells within that job calling ecinteractive again.
If you have a job already running, ecinteractive will always attach you to that one regardless of the resources options you pass. If you wish to run a job with different settings, you will have to cancel it first
[user@aa6-100 ~]$ ecinteractive Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:57:56 4 16G ssdtmp:3G WARNING: Your existing job 10225018 may have a different setup than requested. Cancel the existing job and rerun if you wish to run with different setup To cancel the job: /usr/local/bin/ecinteractive -k Last login: Mon Dec 13 09:39:14 2021 from aa6-100.bullx [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on aa6-104 at 20211213_094114.197, PID: 1742608, JOBID: 10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/ec/res4/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/ec/res4/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/ec/res4/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/ec/res4/scratchdir/user/8/10225018 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] Job 10225018 time left: 11:57:54 [user@aa6-104 ~]$
Race conditions possbile
If you run multiple ecinteractive on different terminals with very short time between them, and you did not have an interactive job already running, you may experience some issues as multiple interactive jobs may be submitted. If that happens, it is best to cancel all of them and rerun just one ecinteractive, waiting for that one to be ready before opening other parallel sessions:
for j in $(ecsqueue -ho "%i" -u $USER -q ni); do ecscancel $j; done
Checking the status of a running interactive job
You may query ecinteractive for existing interactive jobs, and you can do so from within or outside the job. It may be useful to see how much time is left
[user@aa6-100 ~]$ ecinteractive -q CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:55:40 4 16G ssdtmp:3G
Killing/Cancelling a running interactive job
Logging out of your interactive shells spawn through ecinteractive will not cancel the job. If you have finished working with it, you should cancel it with:
[user@aa6-100 ~]$ ecinteractive -k cancelling job 10225018... CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:55:34 4 16G ssdtmp:3G Cancel job_id=10225018 name=user-ecinteractive partition=inter [y/n]? y Connection to aa-login closed.
Opening graphical applications within your interactive job
if you need to run graphical applications, you can do so through the standard x11 forwarding.
- If running it from an Atos HPCF login node, make sure you have connected there with ssh -X and that you have a working X11 server on your end user device (i.e. XQuartz on MAC, MobaXterm, Xming or similar on Windows)
- If running it from the Red Hat Linux VDI, it should work out of the box
- If running it from your end user device, make sure you have a working X11 server on your end user device (i.e. XQuartz on MAC, MobaXterm, Xming or similar on Windows)
Alternatively, you may use ecinteractive to open a basic window manager running on the allocated interactive node, which will open a VNC client on your end to connect to the running desktop in the allocated node:
[user@aa6-100 ~]$ ecinteractive -d Submitted batch job 10225277 Waiting 5 seconds for the job to be ready... Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225277 RUNNING aa6-104 6:00:00 5:59:55 2 8G ssdtmp:3G To cancel the job: /usr/local/bin/ecinteractive -k Attaching to vnc session... To manually re-attach: vncviewer -passwd ~/.vnc/passwd aa6-104:9598
You can use ecinteractive to open up a Jupyter Lab instance on the HPCF. The application would effectively run on the allocated node for the job, and would allow you to conveniently interact with it from your browser. When running from VDI or your end user device, ecinteractive will try to open it in a new tab automatically. Alternatively you may manually open the URL provided to connect to your Jupyter Lab session. To use your own conda environment as a kernel for Jupyter notebook you will need to have ipykernel installed in the conda environment before starting ecinteractive job. ipykernel can be installed with: The same is true if you want to make your own Python virtual environment visible in Jupyterlab To remove your personal kernels from Jupyterlab once you don't need them anymore, you could do so with: If you wish to run Juptyer Lab on HTTPS instead of plain HTTP, you may use the In order to avoid browser security warnings, you may fetch the By default, ecinteractive will start the jupyterlab coming from the default version of python 3. If you wish to customise the version of python or jupyterlab, or simply want to tailor its environment in your ecinteractive session, create the following file in your Atos HPCF HOME: Then add in it the commands needed to set up the environment so that the jupyter and node commands can be found in the path. This would be equivalent to the default behaviour:Opening a Jupyter Lab instance
[user@aa6-100 ~]$ ecinteractive -j
Using interactive job:
CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE
aa 10225277 RUNNING aa6-104 6:00:00 5:58:07 2 8G ssdtmp:3G
To cancel the job:
/usr/local/bin/ecinteractive -k
Attaching to Jupyterlab session...
To manually re-attach go to http://aa6-104.ecmwf.int:33698/?token=b1624da17308654986b1fd66ef82b9274401ea8982f3b747
[user@aa6-100 ~]$ conda activate {myEnv}
[user@aa6-100 ~]$ conda install ipykernel
[user@aa6-100 ~]$ python3 -m ipykernel install --user --name={myEnv}
[user@aa6-100 ~]$ source {myEnv}/bin/activate
[user@aa6-100 ~]$ pip3 install ipykernel
[user@aa6-100 ~]$ python3 -m ipykernel install --user --name={myEnv}
jupyter kernelspec uninstall {myEnv}
HTTPS access
-J
option in ecinteractive
. In that case, a personal SSL certificate would be created under ~/.ssl
the first time, and would be used to encrypt the HTTP traffic between your browser and the compute node.~/.ssl/selfCA.crt
certificate from the HPCF and import it into your browser as a trusted Certificate Authority. This is only needed once.Customising your jupyter version and environment
~/.ecinteractive/jupyter_setup.sh
module load python3 node
Examples of contents for
~/.ecinteractive/jupyter_setup.sh
module load python3/new node/new
module load conda
conda activate myjupyterenv
9 Comments
Siham El Garroussi
If you want to use "Metview", you have to load the module ecmwf-toolbox through jupyterlab before running your kernel (or restart loading the module). It is only valid for this session of jupyter. Next time you open jupyter you would need to do it again. On the vertical left bar click on the "Softwares" tab (second from the bottom), look for ecmwf-toolbox and load it. If an issue arises, you have to restart your kernel.
Daniel Varela Santoalla
Loading modules in Jupyter should only be necessary when using the default (module loaded itself) python installation. If using "conda" all should be self-contained.
Please do everyone make sure that you unload all modules (or "module load conda" which will do the same thing) before creating conda environments and installing libraries there, to avoid cross-dependencies with any of the standard installation from outside conda.
Matthew Chantry
With jupyter hub, how can we load/save notebooks on PERM/HPCPERM? Only relative paths on HOME seem to work.
Xavier Abellan
The trick is to create softlinks on your home pointing to those locations, and then you will be able to reach them. For example, from a shell on the Atos HPCF:
Sandor Kertesz
Is there a way with using the default Python installation to pre-load ecmwf-toolbox/new for an ecinteractive -j session?
Jonathan Day
Is the maximum timelimit 12H for an ecinteractive job? If I try and specify a longer time it seems to be overriden. It's a bit of a pain to have to open editors from the previous day and restart any tasks that did not complete before the session was terminated.
Iain Russell
-t 168:00:00 gives you 7 days, at least it works for me
Jonathan Day
I think it may be that I'm already have a session running, if I run ecinteractive -t xxx in a new terminal it attaches to the session that is already running (with a shorter session length). I can see from the page that it is not possible to start a new session while the old one is still running. Would it make sense to have a longer default length?
Iain Russell
Yes, you need to kill your current session with 'ecinteractive -k' before your new settings will take effect (or wait until tomorrow!)