Only available on EUMETSAT side. Currently in beta testing. This service is available as best effort for testing. Be aware when trying this service.

Introduction

Many problems in Earth Observation and modelling communities require a common processing algorithm independently applied to thousands (or millions!) of bits of input data. A system to do this with many processing nodes is "High Throughput Computing" (vs "High Performance Computing", which concentrates on running large jobs that will not fit on a single machine on a pool of processing nodes, typically using MPI).

EWC provides a common solution for HTC batch processing, using HTCondor. The major advantage of this approach is that it provides a centrally-managed system where users can take advantage of a much larger pool of resources than they have themselves. The resources come from tenants contributing their spare resources for the common good, and additional spare resources from EWC that are also made available for anyone to use.

The common EWC solution for HTC batch processing service is HTCondor. HTCondor is a specialized batch system for managing compute-intensive jobs. HTCondor provides a queuing mechanism, scheduling policy, priority scheme, and resource classifications.

Users can submit their compute jobs to HTCondor, HTCondor puts the jobs in a queue, runs them, and then informs the user as to the result.

Of course, any tenant can install their own batch processing systems for their own purposes with their own resources, but will not be able to take advantage of other shared resources in a centrally organised way.

General

EWC HTcondor is a managed service. The central manager node is deployed in a tenancy on the EWC. Users can join the existing pool adding compute and submit nodes.

Some features of the HTCondor in EWC:

Maintenance	Centrally Managed Tenancy, easy 'one click' deployment
Deployment	Multi tenancy
Resource	Join automatically the main HTCondor Pool, no need for password or any configuration, only choose the plan for the machine your want to add
Usage	Easy 'one click' deployment, simple examples for running a job with docker universe
Network	VPN, which allows processing nodes in a tenancy to communicate with the scheduler / master nodes
Scheduling	Single schedulers in each tenancy, no possibility to erase other tenancies jobs

Execute nodes

No access to execute host for containers
No access to other containers running on execute node
Isolated environment for containers
No autoscaling
No NFS

Submit nodes

Only docker universe allowed
Only condor_submit command allowed
Private network in the tenancy enabled to allow access to tenancy-internal resources/files
Condor transfer mechanism allowed

Deploy HTCondor nodes

Pre-requisite

Before deploying an HTcondor node, you need to create an htcondor specific security group. You can follow this page: Creating Security Groups in Morpheus to know how to create security groups.

htcondor security group with the following rules:

Rule name	Direction	Rule Type	Protocol	Port Range	Source Type	Source	Destination Type
	egress	Custom Rule	TCP		All		Instance
	egress	Custom Rule	UDP		All		Instance
9618-tcp	ingress	Custom Rule	TCP	9618	Network	100.64.0.0/10	Instance

Deploy execute or submit node

Go to Provisioning → Instances and click on Add+ to add a new instance
Select Htcondor Submit/Execute node
Fill data required:
- plan: choose your plan
- network: private
- security group: htcondor, ssh (only for submit node)
4. Finalize provisioning steps.

Once submit node is up:

ssh into your machine
create a simple job

# dockertest.sub -- example docker job

universe                = docker
docker_image            = debian
executable              = /bin/cat
arguments               = /etc/hosts
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = log/job_$(Process)_sleep.log
output                  = output/job_$(Process)_output.txt
error                   = error/job_$(Process)_errors.txt

request_cpus   = 1
request_memory = 1024M
request_disk   = 10240K

queue 10

use condor_submit <job_name>
verify jobs are running, using condor_q command

Once execute node is up:

ssh into your submit node
check if the node appears in the list of execute nodes, running condor_status

Docker universe job in HTCondor

Try this tutorial on how to create a container and push it to a registry using docker. Moreover it provides an example job that can be submitted to HTCondor.

Space shortcuts

Page tree

Introduction

General

Execute nodes

Submit nodes

Deploy HTCondor nodes

Pre-requisite

Deploy execute or submit node

Docker universe job in HTCondor

Space shortcuts

Page tree

EUMETSAT - HTC batch processing

Introduction

General

Execute nodes

Submit nodes

Deploy HTCondor nodes

Pre-requisite

Deploy execute or submit node

Docker universe job in HTCondor