Running EasyVVUQ on HPC resources with QCG-PilotJob

Author: Bartosz Bosak, PSNC ([email protected])

If this is your first Jupyter Notebook - you can execute code cells by selecting them and pressing Shift+Enter. Just have in mind that the order of execution might matter (if later cells depend on things done in earlier ones).

As defined in the VECMA glossary , uncertainty quantification UQ is a “discipline, which seeks to estimate the uncertainty in the model input and output parameters, to analyse the sources of these uncertainties, and to reduce their quantities.” However, this process can quickly become cumbersome because just a few uncertain inputs could require hundreds or even thousands of samples. If a single sample is a demanding simulation, such a number of tasks cannot be performed effectively without (1) adequate computational resources, (2) a dedicated approach and (3) specialised programming solutions.

EasyVVUQ has been designed as a modular software that can benefit from the solutions providing advanced capabilities for execution of demanding operations on computational resources. One of such solutions is QCG-PilotJob, which allows to efficiently run a number of tasks inside a single large allocation on a Slurm cluster.

In this tutorial, based on the scenario presented in the basic tutorial, we demonstrate how EasyVVUQ workflows can be adapted to enable their executions with QCG-PilotJob on HPC machines. As it will be shown the adaptation is quite easy.

Prerequisities

You need to have EasyVVUQ installed in your environment. There is no need to install QCG-PilotJob's packages separatetly since they are installed as EasyVVUQ's dependencies.

In [2]:
pip install git+https://github.com/UCL-CCS/EasyVVUQ.git@qcgpj-executor
Collecting git+https://github.com/UCL-CCS/[email protected]
  Cloning https://github.com/UCL-CCS/EasyVVUQ.git (to revision qcgpj-executor) to /tmp/pip-req-build-1pet2s_e
  Running command git clone -q https://github.com/UCL-CCS/EasyVVUQ.git /tmp/pip-req-build-1pet2s_e
  Running command git checkout -b qcgpj-executor --track origin/qcgpj-executor
  Switched to a new branch 'qcgpj-executor'
  Branch 'qcgpj-executor' set up to track remote branch 'qcgpj-executor' from 'origin'.
Collecting numpy
  Using cached numpy-1.20.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.3 MB)
Collecting pandas
  Using cached pandas-1.2.4-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
Collecting scipy
  Using cached scipy-1.6.3-cp37-cp37m-manylinux1_x86_64.whl (27.4 MB)
Requirement already satisfied: wheel in /opt/conda/lib/python3.7/site-packages (from easyvvuq==0.9.3+130.g4fb85615) (0.34.1)
Collecting chaospy==4.3.2
  Using cached chaospy-4.3.2-py3-none-any.whl (247 kB)
Processing /home/jovyan/.cache/pip/wheels/a1/31/12/e9aa7dc4ac7d4bae86f75769474a0f514d2339221e92e7187b/SALib-1.3.13-py2.py3-none-any.whl
Collecting pytest
  Using cached pytest-6.2.4-py3-none-any.whl (280 kB)
Processing /home/jovyan/.cache/pip/wheels/b1/f5/ee/9c072cedde5286a5f4b27e07269ad83a67878ff249794c8c97/pytest_pep8-1.0.6-py3-none-any.whl
Collecting pytest-benchmark
  Using cached pytest_benchmark-3.4.1-py2.py3-none-any.whl (50 kB)
Processing /home/jovyan/.cache/pip/wheels/7e/72/eb/c96a0b4b22f42d092914ba8fe7b4c639443ef02b529dbbefcf/pytest_dependency-0.5.1-py3-none-any.whl
Requirement already satisfied: SQLAlchemy in /opt/conda/lib/python3.7/site-packages (from easyvvuq==0.9.3+130.g4fb85615) (1.3.13)
Processing /home/jovyan/.cache/pip/wheels/6c/bd/4b/a3b7dcd0bc718e601a9ba7cba5f7c1d73fd777b6dc48d6eaa0/Cerberus-1.3.4-py3-none-any.whl
Collecting dask[complete]
  Using cached dask-2021.6.0-py3-none-any.whl (965 kB)
Collecting dask_jobqueue
  Using cached dask_jobqueue-0.7.2-py2.py3-none-any.whl (39 kB)
Collecting cloudpickle
  Using cached cloudpickle-1.6.0-py3-none-any.whl (23 kB)
Collecting scikit-learn
  Using cached scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from easyvvuq==0.9.3+130.g4fb85615) (2.11.0)
Collecting kubernetes
  Using cached kubernetes-17.17.0-py3-none-any.whl (1.8 MB)
Collecting autopep8
  Using cached autopep8-1.5.7-py2.py3-none-any.whl (45 kB)
Collecting squarify
  Using cached squarify-0.4.3-py3-none-any.whl (4.3 kB)
Collecting dill
  Using cached dill-0.3.3-py2.py3-none-any.whl (81 kB)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from easyvvuq==0.9.3+130.g4fb85615) (4.42.0)
Collecting [email protected] git+https://github.com/vecma-project/[email protected]#subdirectory=components/core&egg=qcg-pilotjob
  Cloning https://github.com/vecma-project/QCG-PilotJob.git (to revision namespaces) to /tmp/pip-install-ifzs9czc/qcg-pilotjob
  Running command git clone -q https://github.com/vecma-project/QCG-PilotJob.git /tmp/pip-install-ifzs9czc/qcg-pilotjob
  Running command git checkout -b namespaces --track origin/namespaces
  Switched to a new branch 'namespaces'
  Branch 'namespaces' set up to track remote branch 'namespaces' from 'origin'.
  Running command git submodule update --init --recursive -q
Collecting [email protected] git+https://github.com/vecma-project/[email protected]#subdirectory=components/executor_api&egg=qcg-pilotjob-executor-api
  Cloning https://github.com/vecma-project/QCG-PilotJob.git (to revision namespaces) to /tmp/pip-install-ifzs9czc/qcg-pilotjob-executor-api
  Running command git clone -q https://github.com/vecma-project/QCG-PilotJob.git /tmp/pip-install-ifzs9czc/qcg-pilotjob-executor-api
  Running command git checkout -b namespaces --track origin/namespaces
  Switched to a new branch 'namespaces'
  Branch 'namespaces' set up to track remote branch 'namespaces' from 'origin'.
  Running command git submodule update --init --recursive -q
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas->easyvvuq==0.9.3+130.g4fb85615) (2.8.1)
Collecting pytz>=2017.3
  Using cached pytz-2021.1-py2.py3-none-any.whl (510 kB)
Collecting numpoly<2.0.0,>=1.1.2
  Using cached numpoly-1.2.3-py3-none-any.whl (147 kB)
Collecting matplotlib
  Using cached matplotlib-3.4.2-cp37-cp37m-manylinux1_x86_64.whl (10.3 MB)
Collecting iniconfig
  Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting packaging
  Using cached packaging-20.9-py2.py3-none-any.whl (40 kB)
Collecting toml
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: attrs>=19.2.0 in /opt/conda/lib/python3.7/site-packages (from pytest->easyvvuq==0.9.3+130.g4fb85615) (19.3.0)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /opt/conda/lib/python3.7/site-packages (from pytest->easyvvuq==0.9.3+130.g4fb85615) (1.5.0)
Collecting pluggy<1.0.0a1,>=0.12
  Using cached pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py>=1.8.2
  Using cached py-1.10.0-py2.py3-none-any.whl (97 kB)
Collecting pep8>=1.3
  Using cached pep8-1.7.1-py2.py3-none-any.whl (41 kB)
Processing /home/jovyan/.cache/pip/wheels/76/fb/36/4304dce3f49d3aecf92d63f079db516641a8061a83a0b5a292/pytest_cache-1.0-py3-none-any.whl
Processing /home/jovyan/.cache/pip/wheels/d2/f1/1f/041add21dc9c4220157f1bd2bd6afe1f1a49524c3396b94401/py_cpuinfo-8.0.0-py3-none-any.whl
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from cerberus->easyvvuq==0.9.3+130.g4fb85615) (45.1.0.post20200119)
Collecting toolz>=0.8.2
  Using cached toolz-0.11.1-py3-none-any.whl (55 kB)
Collecting pyyaml
  Using cached PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
Collecting fsspec>=0.6.0
  Using cached fsspec-2021.6.0-py3-none-any.whl (114 kB)
Collecting partd>=0.3.10
  Using cached partd-1.2.0-py3-none-any.whl (19 kB)
Processing /home/jovyan/.cache/pip/wheels/71/96/e1/80f7dbc16a958ed861b3c3ccd00b4c3b11951ff004856c398a/bokeh-2.3.2-py3-none-any.whl
Collecting distributed==2021.06.0; extra == "complete"
  Using cached distributed-2021.6.0-py3-none-any.whl (715 kB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
Collecting joblib>=0.11
  Using cached joblib-1.0.1-py3-none-any.whl (303 kB)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.7/site-packages (from jinja2->easyvvuq==0.9.3+130.g4fb85615) (1.1.1)
Requirement already satisfied: urllib3>=1.24.2 in /opt/conda/lib/python3.7/site-packages (from kubernetes->easyvvuq==0.9.3+130.g4fb85615) (1.25.7)
Requirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from kubernetes->easyvvuq==0.9.3+130.g4fb85615) (2.22.0)
Collecting google-auth>=1.0.1
  Downloading google_auth-1.31.0-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 36 kB/s  eta 0:00:01
Requirement already satisfied: certifi>=14.05.14 in /opt/conda/lib/python3.7/site-packages (from kubernetes->easyvvuq==0.9.3+130.g4fb85615) (2019.11.28)
Collecting websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0
  Using cached websocket_client-1.1.0-py2.py3-none-any.whl (68 kB)
Requirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.7/site-packages (from kubernetes->easyvvuq==0.9.3+130.g4fb85615) (1.14.0)
Collecting requests-oauthlib
  Using cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting pycodestyle>=2.7.0
  Using cached pycodestyle-2.7.0-py2.py3-none-any.whl (41 kB)
Processing /home/jovyan/.cache/pip/wheels/3d/19/38/5312d02df169de6d79ffa36d2096b3affd5cf8e44682acd749/zmq-0.0.0-py3-none-any.whl
Collecting click
  Using cached click-8.0.1-py3-none-any.whl (97 kB)
Collecting psutil
  Using cached psutil-5.8.0-cp37-cp37m-manylinux2010_x86_64.whl (296 kB)
Collecting pillow>=6.2.0
  Using cached Pillow-8.2.0-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)
Collecting cycler>=0.10
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.3.1-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
Collecting pyparsing>=2.2.1
  Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->easyvvuq==0.9.3+130.g4fb85615) (2.1.0)
Collecting execnet>=1.1.dev1
  Using cached execnet-1.8.1-py2.py3-none-any.whl (39 kB)
Collecting locket
  Using cached locket-0.2.1-py2.py3-none-any.whl (4.1 kB)
Requirement already satisfied: tornado>=5.1 in /opt/conda/lib/python3.7/site-packages (from bokeh!=2.0.0,>=1.0.0; extra == "complete"->dask[complete]->easyvvuq==0.9.3+130.g4fb85615) (6.0.3)
Collecting typing-extensions>=3.7.4
  Using cached typing_extensions-3.10.0.0-py3-none-any.whl (26 kB)
Collecting sortedcontainers!=2.0.0,!=2.0.1
  Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting zict>=0.1.3
  Using cached zict-2.0.0-py3-none-any.whl (10 kB)
Collecting msgpack>=0.6.0
  Using cached msgpack-1.0.2-cp37-cp37m-manylinux1_x86_64.whl (273 kB)
Collecting tblib>=1.6.0
  Using cached tblib-1.7.0-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->kubernetes->easyvvuq==0.9.3+130.g4fb85615) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests->kubernetes->easyvvuq==0.9.3+130.g4fb85615) (3.0.4)
Collecting rsa<5,>=3.1.4; python_version >= "3.6"
  Using cached rsa-4.7.2-py3-none-any.whl (34 kB)
Collecting cachetools<5.0,>=2.0.0
  Using cached cachetools-4.2.2-py3-none-any.whl (11 kB)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from requests-oauthlib->kubernetes->easyvvuq==0.9.3+130.g4fb85615) (3.0.1)
Requirement already satisfied: pyzmq in /opt/conda/lib/python3.7/site-packages (from zmq->[email protected] git+https://github.com/vecma-project/[email protected]#subdirectory=components/core&egg=qcg-pilotjob->easyvvuq==0.9.3+130.g4fb85615) (18.1.1)
Collecting apipkg>=1.4
  Using cached apipkg-1.5-py2.py3-none-any.whl (4.9 kB)
Collecting heapdict
  Using cached HeapDict-1.0.1-py3-none-any.whl (3.9 kB)
Collecting pyasn1>=0.1.3
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Building wheels for collected packages: easyvvuq, qcg-pilotjob, qcg-pilotjob-executor-api
  Building wheel for easyvvuq (setup.py) ... done
  Created wheel for easyvvuq: filename=easyvvuq-0.9.3+130.g4fb85615-py3-none-any.whl size=198906 sha256=150ed31a9d236fad8cda2025d952ea5e894cf3d3198d252b53f7ee3668e35b41
  Stored in directory: /tmp/pip-ephem-wheel-cache-h1eso0k0/wheels/ce/19/20/1358337eadfe29295d3cfe175f1dc8bac71f3dca2dd042f3dc
  Building wheel for qcg-pilotjob (setup.py) ... done
  Created wheel for qcg-pilotjob: filename=qcg_pilotjob-0.10.0+145.g69df110-py3-none-any.whl size=128217 sha256=9f63fa317c3b8e96cc4de81c5a72d6a4b29b65354330f7b1b4bb878ed91938eb
  Stored in directory: /tmp/pip-ephem-wheel-cache-h1eso0k0/wheels/cb/92/4a/ced119bd031bbd10a6ea12f8c5edc4d5195419b8075b936573
  Building wheel for qcg-pilotjob-executor-api (setup.py) ... done
  Created wheel for qcg-pilotjob-executor-api: filename=qcg_pilotjob_executor_api-0.10.0+145.g69df110-py3-none-any.whl size=7178 sha256=4f3103c323265aebc6f596d28222aa1142699885aedab3f6bf63171e7adfde4f
  Stored in directory: /tmp/pip-ephem-wheel-cache-h1eso0k0/wheels/fb/af/e1/b95264465e7c8b72eaa853b0838aabf5b99dd335d6b3e28550
Successfully built easyvvuq qcg-pilotjob qcg-pilotjob-executor-api
Installing collected packages: numpy, pytz, pandas, scipy, numpoly, chaospy, pillow, cycler, kiwisolver, pyparsing, matplotlib, SALib, iniconfig, packaging, toml, pluggy, py, pytest, pep8, apipkg, execnet, pytest-cache, pytest-pep8, py-cpuinfo, pytest-benchmark, pytest-dependency, cerberus, toolz, cloudpickle, pyyaml, fsspec, locket, partd, typing-extensions, bokeh, click, sortedcontainers, heapdict, zict, psutil, msgpack, tblib, distributed, dask, dask-jobqueue, threadpoolctl, joblib, scikit-learn, pyasn1, rsa, cachetools, pyasn1-modules, google-auth, websocket-client, requests-oauthlib, kubernetes, pycodestyle, autopep8, squarify, dill, zmq, qcg-pilotjob, qcg-pilotjob-executor-api, easyvvuq
Successfully installed SALib-1.3.13 apipkg-1.5 autopep8-1.5.7 bokeh-2.3.2 cachetools-4.2.2 cerberus-1.3.4 chaospy-4.3.2 click-8.0.1 cloudpickle-1.6.0 cycler-0.10.0 dask-2021.6.0 dask-jobqueue-0.7.2 dill-0.3.3 distributed-2021.6.0 easyvvuq-0.9.3+130.g4fb85615 execnet-1.8.1 fsspec-2021.6.0 google-auth-1.31.0 heapdict-1.0.1 iniconfig-1.1.1 joblib-1.0.1 kiwisolver-1.3.1 kubernetes-17.17.0 locket-0.2.1 matplotlib-3.4.2 msgpack-1.0.2 numpoly-1.2.3 numpy-1.20.3 packaging-20.9 pandas-1.2.4 partd-1.2.0 pep8-1.7.1 pillow-8.2.0 pluggy-0.13.1 psutil-5.8.0 py-1.10.0 py-cpuinfo-8.0.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycodestyle-2.7.0 pyparsing-2.4.7 pytest-6.2.4 pytest-benchmark-3.4.1 pytest-cache-1.0 pytest-dependency-0.5.1 pytest-pep8-1.0.6 pytz-2021.1 pyyaml-5.4.1 qcg-pilotjob-0.10.0+145.g69df110 qcg-pilotjob-executor-api-0.10.0+145.g69df110 requests-oauthlib-1.3.0 rsa-4.7.2 scikit-learn-0.24.2 scipy-1.6.3 sortedcontainers-2.4.0 squarify-0.4.3 tblib-1.7.0 threadpoolctl-2.1.0 toml-0.10.2 toolz-0.11.1 typing-extensions-3.10.0.0 websocket-client-1.1.0 zict-2.0.0 zmq-0.0.0
Note: you may need to restart the kernel to use updated packages.

Application scenario

Let's remind the basic use-case. It is a simulation of a vertical deflection of a round metal tube suspended on each end in response to a force applied at certain point a along its length. Our goal is to determine the influence of the input parameters on the vertical deflection at point a.

The usage of the application is:

beam <input_file>

It outputs calculated displacements to a file called output.json. Its content will look like

{'g1': x, 'g2': y, 'g3': y}

In order to produce statistically significant results, EasyVVUQ needs to run a number of model evaluations appropriately selecting input arguments from a given sample parameter space. Once selected, input parameters need to be transformed into a format understandable by the application. Our application takes a single file as an input and the transformation may be based on a single template file, called beam.template, with the following content:

{"outfile": "$outfile", "F": $F, "L": $L, "a": $a, "D": $D, "d": $d, "E": $E}

The template will be used to generate files called input.json that will be the input to each run of beam. All placeholders (signified by the $ delimeter) will be replaced by concrete values from the sample parameter space.

So, for example (commands preceded by an exclamation mark are treated as shell commands):

In [3]:
!pwd
!echo "{\"outfile\": \"output.json\", \"F\": 1.0, \"L\": 1.5, \"a\": 1.0, \"D\": 0.8, \"d\": 0.1, \"E\": 200000}" > input.json
/home/jovyan/tutorials/VECMAtk/BEAM
In [4]:
!./beam input.json
In [5]:
!cat output.json
{"g1": -6.909453505549654e-06, "g2": -1.3818907011099308e-05, "g3": 1.7273633763874136e-05}

In this tutorial, in the similar fashion as in the basic one, we will demonstrate how to use of EasyVVUQ to do variance based sensitivity analysis of beam application using stochastic collocation.

Nevertheless, the way of usage of QCG-PilotJob in EasyVVUQ is generic and will look the same also for other applications and other supported by EasyVVUQ methods.

Campaign

In order to use EasyVVUQ, we need to configure the EasyVVUQ Campaign object. We do this in almost the same way as in case of basic use-case. Firstly we import the same set of libraries as in the original example:

In [6]:
import os
import easyvvuq as uq
import chaospy as cp
import matplotlib.pyplot as plt
from easyvvuq.actions import CreateRunDirectory, Encode, Decode, CleanUp, ExecuteLocal, Actions

We only extend this set of imports with two modules for QCG-PilotJob:

In [7]:
from easyvvuq.actions import QCGPJPool, ExecuteQCGPJ

Then, we can continue the code from the basic workflow. For the validation purposes, we describe a set of parameters used by the application:

In [8]:
params = {
    "F": {"type": "float", "default": 1.0}, 
    "L": {"type": "float", "default": 1.5}, 
    "a": {"type": "float", "min": 0.7, "max": 1.2, "default": 1.0}, 
    "D": {"type": "float", "min": 0.75, "max": 0.85, "default": 0.8},
    "d": {"type": "float", "default": 0.1},
    "E": {"type": "float", "default": 200000},
    "outfile": {"type": "string", "default": "output.json"}
}

and, by specification of encoder and decoder, define how EasyVVUQ should convert data between its internal and the application logic:

In [9]:
encoder = uq.encoders.GenericEncoder(template_fname='beam.template', delimiter='$', target_filename='input.json')
decoder = uq.decoders.JSONDecoder(target_filename='output.json', output_columns=['g1'])

Hence our application takes and produces very simple data structures, we use build-in Encoder and Decoder classes, but you are able to provide custom implementations of encoders and decoders that will fit your own use-case.

The next step in the original tutorial was a definition of an execute action that will be used to run the beam application with a prepared input file. We could use the basic form of this action, but then the potential of QCG-PilotJob wouldn't be fully exploited. Instead, we wrap the original action in a ExecuteQCGPJ decorator that configures the action to be properly executed within a QCG-PilotJob's task:

In [10]:
execute = ExecuteQCGPJ(
    ExecuteLocal('{}/beam input.json'.format(os.getcwd()))
)

Now we are allowed to push all actions we want to execute for samples into the Actions object:

In [11]:
actions = Actions(CreateRunDirectory('/tmp'), 
                  Encode(encoder), execute, Decode(decoder))

and finally create EasyVVUQ Campaign:

In [12]:
campaign = uq.Campaign(name='beam', params=params, actions=actions)

The remaining steps to configure the campaign object are to define distributions for the input parameters and initialise a sampler. This code is the same regardeless we use or not QCG-PilotJob:

In [13]:
vary = {
    "F": cp.Normal(1, 0.1),
    "L": cp.Normal(1.5, 0.01),
    "a": cp.Uniform(0.7, 1.2),
    "D": cp.Triangle(0.75, 0.8, 0.85)
}
campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1))

Now we are able to execute all evaluations. However, in contrast to the basic tutorial, where the code was prepared to be run on a local machine, our target here is to demonstrate how to use QCG-PilotJob to execute evaulations on an HPC cluster.

To this end, we need to create a QCGPJPool object and provide it to the campaign's execute method. In the most simplistic configuration, we can initialise QCGPjPool within the with statement and don't provide any arguments to the constructor:

In [39]:
with QCGPJPool() as qcgpj:
    campaign.execute(pool=qcgpj).collate()

Empty list of QCGPJPool constructor's parameter lead to the default settings of the pool and execution of all evaluations with a default task template. This may be sufficient for basic use-cases, but in order to support more advanced execution scenarios, several parameters may need to be provided to the constructor:

  • qcgpj-executor - allows to setup specific parameters of the QCG-PilotJob service by creation of a custom QCGPJExecutor instance. For example, if we skip this parameter, QCG-PilotJob will automatically set up to execute on all available resources, which is perfectly fine if we run the code on HPC resources, since it will take care of proper and efficient scheduling of tasks to the resources available in the allocation. However, if we would like to make some tests on a local machine, it may be more optimal to define virtual resources, and this may be defined with the resources parameter to the QCGPJExecutor constructor.
  • template and template_params - by default the tasks use a predefined template (EasyVVUQBasicTemplate) that leads to execution of QCG-PilotJob's tasks in a default mode, only on a single core. It could be altered by providing custom task template and template_params.
  • polling_interval - allows to change the default interval between queries asking about the status of tasks to the QCG-PilotJob Manager service.

Let us show how to modify this example to demonstrate usage of these more advanced options.

Firstly, we would change the template and template_params to enable execution of tasks on many resources. Thus, instead of a default template class, we will employ EasyVVUQParallelTemplate that allows us also to set numCores and numNodes parameters.

In order to demonstrate it, we have to have a clean situation and therefore we need to initialise a new campaign:

In [14]:
campaign = uq.Campaign(name='beam', params=params, actions=actions)
campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1))

Now we are able to execute campaign actions once again. This time it looks as follows:

In [14]:
from easyvvuq.actions.execute_qcgpj import EasyVVUQParallelTemplate

with QCGPJPool(
        template=EasyVVUQParallelTemplate(), 
        template_params={'numCores': 4}) as qcgpj:
    campaign.execute(pool=qcgpj).collate()

We have set numCores to 4, which is one of parameters supported by EasyVVUQParallelTemplate. It results in assigning 4 cores for each task. Please note that this setting is not optimal for our example beam code, which is not parallel - in case of such codes single core tasks are perfectly fine. Nevertheless, possibility to define numCores and numNodes is essential for the proper execution of MPI or OpenMP applications.

If you want to get information what parameters a given template supports, you can print its content and check the keys in ${} expressions.

In [38]:
print(EasyVVUQParallelTemplate().template()[0])
            {
                'name': '${name}',
                'execution': {
                    'exec': '${exec}',
                    'args': ${args},
                    'stdout': '${stdout}',
                    'stderr': '${stderr}',
                    'venv': '${venv}',
                    'model': '${model}',
                    'model_opts': ${model_opts}
                },
                'resources': {
                    'numCores': {
                        'exact': ${numCores}
                    },
                    'numNodes': {
                        'exact': ${numNodes}
                    }
                }
            }
             

You can also easily get information about default values for the keys:

In [40]:
EasyVVUQParallelTemplate().template()[1]
Out[40]:
{'args': [],
 'stdout': 'stdout',
 'stderr': 'stderr',
 'venv': '',
 'model': 'default',
 'model_opts': {},
 'numCores': 1,
 'numNodes': 1}
  • Note 1: If the functionality of built-in templates is not sufficient, you can always create a new one by extensions of existings.
  • Note 2: The keys name, stdout and stderr are necessary for the code to work properly, so newly created templates must define these keys in an analogous way as they are defined in the existing templates. It is also not possible to set these keys to custom values, because they are substituted automatically by the internal software logic.

Now let's get back to the EasyVVUQParallelTemplate which should be sufficient for many scenarios and try to set some larger number of nodes:

In [38]:
campaign = uq.Campaign(name='beam', params=params, actions=actions)
campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1))

try:
    with QCGPJPool(
            template=EasyVVUQParallelTemplate(), 
            template_params={'numNodes': 2, 'numCores': 4}) as qcgpj:
        campaign.execute(pool=qcgpj).collate()
except Exception as e:
    print(e)
Request failed - Not enough resources for job 1

If you see that the exception saying that there is not enough resources has been cought, it is not a suprise. Well, we don't have 2 nodes to use and QCG-PilotJob reports that our task is too big.

But what if we want to prepare and test the workflow on a local machine before it will be transfered to the HPC environment? QCG-PilotJob has a solution for this which is called Local Mode and allows to define virtual resources. Let's modify our example a bit:

In [26]:
campaign = uq.Campaign(name='beam', params=params, actions=actions)
campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1))

from qcg.pilotjob.executor_api.qcgpj_executor import QCGPJExecutor

with QCGPJPool(
        qcgpj_executor=QCGPJExecutor(resources="node1:4,node2:4"),
        template=EasyVVUQParallelTemplate(), 
        template_params={'numNodes': 2, 'numCores': 4}) as qcgpj:
    campaign.execute(pool=qcgpj).collate()

As it can be seen, we added qcpj_executor parameter to the QCGPJPool constructor. The parameter is set to the customised QCGPJExecutor instance, which has been created with parameter resources set to node1:4,node2:4. In this way we have defined two virtual nodes, each with 4 cores. In a result this example can been executed successfully.

At this moment we should have our evaluations ready and their results collated and stored in the campaing's database. Since we have used QCGPJPool inside the with statement it has been already cleaned up and we don't need any other code specific for QCG-PilotJob's execution. Thus the remaining part of the tutorial can be no different from its basic version. In other words, starting from now, we have all data needed to perform analysis in a typical for EasyVVUQ way.

At the beginning we can display the collattion results:

In [27]:
campaign.get_collation_result()
Out[27]:
run_id iteration F L a D d E outfile g1
0 0 0 0 0 0 0 0 0 0
0 1 0 0.9 1.49 0.805662 0.779588 0.1 200000 output.json -0.000008
1 2 0 0.9 1.49 0.805662 0.820412 0.1 200000 output.json -0.000007
2 3 0 0.9 1.49 1.094338 0.779588 0.1 200000 output.json -0.000005
3 4 0 0.9 1.49 1.094338 0.820412 0.1 200000 output.json -0.000004
4 5 0 0.9 1.51 0.805662 0.779588 0.1 200000 output.json -0.000009
5 6 0 0.9 1.51 0.805662 0.820412 0.1 200000 output.json -0.000007
6 7 0 0.9 1.51 1.094338 0.779588 0.1 200000 output.json -0.000006
7 8 0 0.9 1.51 1.094338 0.820412 0.1 200000 output.json -0.000005
8 9 0 1.1 1.49 0.805662 0.779588 0.1 200000 output.json -0.000010
9 10 0 1.1 1.49 0.805662 0.820412 0.1 200000 output.json -0.000008
10 11 0 1.1 1.49 1.094338 0.779588 0.1 200000 output.json -0.000006
11 12 0 1.1 1.49 1.094338 0.820412 0.1 200000 output.json -0.000005
12 13 0 1.1 1.51 0.805662 0.779588 0.1 200000 output.json -0.000011
13 14 0 1.1 1.51 0.805662 0.820412 0.1 200000 output.json -0.000009
14 15 0 1.1 1.51 1.094338 0.779588 0.1 200000 output.json -0.000007
15 16 0 1.1 1.51 1.094338 0.820412 0.1 200000 output.json -0.000006

We then call the analyse method whose functionality will depend on the sampling method used. It returns an AnalysisResults object which can be used to retrieve numerical values or plot the results. In this case Sobols indices.

In [28]:
results = campaign.analyse(qoi_cols=['g1'])

We can plot the results in a treemap format. Each square representing the relative influence of that parameter to the variance of the output variable (vertical displacement at point a). A square labeled higher orders represent the influence of the interactions between the input parameters.

In [29]:
results.plot_sobols_treemap('g1', figsize=(10, 10))
plt.axis('off');

Alternatively you can get the Sobol index values using the method call below.

In [30]:
results.sobols_first('g1')
Out[30]:
{'F': array([0.13515478]),
 'L': array([0.01220653]),
 'a': array([0.69667914]),
 'D': array([0.13994264])}