#!/usr/bin/env python # coding: utf-8 # # Running EasyVVUQ on HPC resources with QCG-PilotJob # **Author**: Bartosz Bosak, PSNC (bbosak@man.poznan.pl) # If this is your first Jupyter Notebook - you can execute code cells by selecting them and pressing ```Shift+Enter```. Just have in mind that the order of execution might matter (if later cells depend on things done in earlier ones). # As defined in the [VECMA glossary](https://wiki.vecma.eu/glossary), uncertainty quantification UQ is a “discipline, which seeks to estimate the # uncertainty in the model input and output parameters, to analyse the sources of these uncertainties, # and to reduce their quantities.” However, this process can quickly become cumbersome because just # a few uncertain inputs could require hundreds or even thousands of samples. If a single sample is a demanding simulation, # such a number of tasks cannot be performed effectively without (1) adequate # computational resources, (2) a dedicated approach and (3) specialised programming solutions. # # EasyVVUQ has been designed as a modular software that can benefit from the solutions providing # advanced capabilities for execution of demanding operations on computational resources. # One of such solutions is [QCG-PilotJob](https://qcg-pilotjob.readthedocs.io), which allows to efficiently run a number of tasks inside # a single large allocation on a Slurm cluster. # # In this tutorial, based on the scenario presented in the basic tutorial, we demonstrate how EasyVVUQ workflows # can be adapted to enable their executions with QCG-PilotJob on HPC machines. # As it will be shown the adaptation is quite easy. # ## Prerequisites # You need to have EasyVVUQ installed in your environment. There is no need to install QCG-PilotJob's packages separately since they are installed as EasyVVUQ's dependencies. # In[3]: pip install easyvvuq # ## Application scenario # Let's remind the basic use-case. It is a simulation of a vertical deflection of a round metal # tube suspended on each end in response to a force applied at certain point ```a``` along its length. # Our goal is to determine the influence of the input parameters on the vertical deflection at point ```a```. # The usage of the application is: # # ```beam ``` # # It outputs calculated displacements to a file called `output.json`. Its content will look like # # ```{'g1': x, 'g2': y, 'g3': y}``` # # In order to produce statistically significant results, EasyVVUQ needs to run a number of model evaluations # appropriately selecting input arguments from a given sample parameter space. # Once selected, input parameters need to be transformed into a format understandable by the application. # Our application takes a single file as an input and the transformation may be based on a single template file, # called `beam.template`, with the following content: # # ```{"outfile": "$outfile", "F": $F, "L": $L, "a": $a, "D": $D, "d": $d, "E": $E}``` # # The template will be used to generate files called `input.json` that will be the input to each run of beam. # All placeholders (signified by the $ delimeter) will be replaced by concrete values from the sample parameter space. # So, for example (commands preceded by an exclamation mark are treated as shell commands): # In[3]: get_ipython().system('pwd') get_ipython().system('echo "{\\"outfile\\": \\"output.json\\", \\"F\\": 1.0, \\"L\\": 1.5, \\"a\\": 1.0, \\"D\\": 0.8, \\"d\\": 0.1, \\"E\\": 200000}" > input.json') # In[4]: get_ipython().system('./beam input.json') # In[5]: get_ipython().system('cat output.json') # In this tutorial, in the similar fashion as in the basic one, # we will demonstrate how to use of EasyVVUQ to do variance based sensitivity analysis of `beam` application using stochastic collocation. # # Nevertheless, the way of usage of QCG-PilotJob in EasyVVUQ # is generic and will look the same also for other applications and other supported by EasyVVUQ methods. # ## Campaign # In order to use EasyVVUQ, we need to configure the EasyVVUQ Campaign object. We do this in almost the same way as in case of basic use-case. # Firstly we import the same set of libraries as in the original example: # In[5]: import os import easyvvuq as uq import chaospy as cp import matplotlib.pyplot as plt from easyvvuq.actions import CreateRunDirectory, Encode, Decode, CleanUp, ExecuteLocal, Actions # We only extend this set of imports with a module for the QCG-PilotJob pool: # In[6]: from easyvvuq.actions import QCGPJPool # Then, we can continue the code from the basic workflow. For the validation purposes, we describe a set of parameters used by the application: # In[7]: params = { "F": {"type": "float", "default": 1.0}, "L": {"type": "float", "default": 1.5}, "a": {"type": "float", "min": 0.7, "max": 1.2, "default": 1.0}, "D": {"type": "float", "min": 0.75, "max": 0.85, "default": 0.8}, "d": {"type": "float", "default": 0.1}, "E": {"type": "float", "default": 200000}, "outfile": {"type": "string", "default": "output.json"} } # and, by specification of encoder and decoder, define how EasyVVUQ should convert data between its internal and the application logic: # In[8]: encoder = uq.encoders.GenericEncoder(template_fname='beam.template', delimiter='$', target_filename='input.json') decoder = uq.decoders.JSONDecoder(target_filename='output.json', output_columns=['g1']) # Hence our application takes and produces very simple data structures, we use build-in Encoder and Decoder classes, but you are able to provide custom implementations of encoders and decoders that will fit your own use-case. # # The next step is a definition of an execute action that will be used to run the `beam` application with a prepared input file. # In[9]: execute = ExecuteLocal('{}/beam input.json'.format(os.getcwd())) # Now we are allowed to push all actions we want to execute for samples into the Actions object: # In[10]: actions = Actions(CreateRunDirectory('/tmp'), Encode(encoder), execute, Decode(decoder)) # and finally create EasyVVUQ Campaign: # In[11]: campaign = uq.Campaign(name='beam', params=params, actions=actions) # The remaining steps to configure the campaign object are to define distributions for the input parameters and initialise a sampler. This code is the same regardeless we use or not QCG-PilotJob: # In[12]: vary = { "F": cp.Normal(1, 0.1), "L": cp.Normal(1.5, 0.01), "a": cp.Uniform(0.7, 1.2), "D": cp.Triangle(0.75, 0.8, 0.85) } campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1)) # ## QCGPJ-Pool # # Now we are able to execute all evaluations. However, in contrast to the basic tutorial, where the code was prepared to be run on a local machine, our target here is to demonstrate how to use QCG-PilotJob to execute evaluations on an HPC cluster. # # To this end, we need to create a QCGPJPool object and provide it to the campaign's execute method. In the most simplistic configuration, we can initialise `QCGPjPool` within the `with` statement and don't provide any arguments to the constructor: # In[13]: with QCGPJPool() as qcgpj: campaign.execute(pool=qcgpj).collate() # Empty list of QCGPJPool constructor's parameter lead to the default settings of the pool and execution of all evaluations with a default task template. This may be sufficient for basic use-cases, but in order to support more advanced execution scenarios, # several parameters may need to be provided to the constructor: # * `qcgpj-executor` - allows to setup specific parameters of the QCG-PilotJob service by creation of a custom QCGPJExecutor instance. For example, if we skip this parameter, QCG-PilotJob will automatically set up to execute on all available resources, which is perfectly fine if we run the code on HPC resources, since it will take care of proper and efficient scheduling of tasks to the resources available in the allocation. However, if we would like to make some tests on a local machine, it may be more optimal to define virtual resources, and this may be defined with the `resources` parameter to the `QCGPJExecutor` constructor. # * `template` and `template_params` - by default the tasks use a predefined template (`EasyVVUQBasicTemplate)` that leads to execution of QCG-PilotJob's tasks in a default mode, only on a single core. It could be altered by providing custom task `template` and `template_params`. # * `polling_interval` - allows to change the default interval between queries asking about the status of tasks to the QCG-PilotJob Manager service. # # Let us show how to modify this example to demonstrate usage of these more advanced options. # ## Parallel Tasks # Firstly, we would change the `template` and `template_params` to enable execution of tasks on many resources. Thus, instead of a default template class, we will employ `EasyVVUQParallelTemplate` that allows us also to set `numCores` and `numNodes` parameters. # # In order to demonstrate it, we have to have a clean situation and therefore we need to initialise a new campaign: # In[14]: campaign = uq.Campaign(name='beam', params=params, actions=actions) campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1)) # Now we are able to execute campaign actions once again. This time it looks as follows: # In[15]: from easyvvuq.actions.execute_qcgpj import EasyVVUQParallelTemplate with QCGPJPool( template=EasyVVUQParallelTemplate(), template_params={'numCores': 4}) as qcgpj: campaign.execute(pool=qcgpj).collate() # We have set `numCores` to 4, which is one of parameters supported by `EasyVVUQParallelTemplate`. It results in assigning 4 cores for each task. Please note that this setting is not optimal for our example `beam` code, which is not parallel - in case of such codes single core tasks are perfectly fine. Nevertheless, possibility to define `numCores` and `numNodes` is essential for the proper execution of MPI or OpenMP applications. # # **Note:** In order to run parallel code inside a QCG-PilotJob task, a full command for the paralleled run should be given to the ExecuteLocal action # (e.g. `mpirun -n 4 NAME_OF_PROGRAM`). # # Now let's try to set some larger number of nodes: # In[38]: campaign = uq.Campaign(name='beam', params=params, actions=actions) campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1)) try: with QCGPJPool( template=EasyVVUQParallelTemplate(), template_params={'numNodes': 2, 'numCores': 4}) as qcgpj: campaign.execute(pool=qcgpj).collate() except Exception as e: print(e) # If you see that the exception saying that there is not enough resources has been caught, it is not a surprise. Well, we don't have 2 nodes to use and QCG-PilotJob reports that our task is too big. # ## Local / virtualised mode of execution # # What in a case if we want to prepare and test the workflow on a local machine before it will be transferred to the HPC environment? QCG-PilotJob has a solution for this which is called *Local Mode* and allows to define virtual resources. Let's modify our example a bit: # In[ ]: campaign = uq.Campaign(name='beam', params=params, actions=actions) campaign.set_sampler(uq.sampling.SCSampler(vary=vary, polynomial_order=1)) from qcg.pilotjob.executor_api.qcgpj_executor import QCGPJExecutor with QCGPJPool( qcgpj_executor=QCGPJExecutor(resources="node1:4,node2:4"), template=EasyVVUQParallelTemplate(), template_params={'numNodes': 2, 'numCores': 4}) as qcgpj: campaign.execute(pool=qcgpj).collate() # As it can be seen, we added `qcpj_executor` parameter to the `QCGPJPool` constructor. The parameter is set to the customised `QCGPJExecutor` instance, which has been created # with parameter `resources` set to `node1:4,node2:4`. In this way we have defined two virtual nodes, each with 4 cores. In a result this example can been executed successfully. # ## Tasks templates # You can get more information about the parameteres available in the templates with the following way: # In[38]: print(EasyVVUQParallelTemplate().template()[0]) # You can also easily get information about default values for the keys: # In[40]: EasyVVUQParallelTemplate().template()[1] # * **Note 1:** If the functionality of built-in templates is not sufficient, you can always create a new one by extensions of existing. # * **Note 2:** The keys `name`, `stdout` and `stderr` are necessary for the code to work properly, so newly created templates must define these keys in an analogous way as they are defined in the existing templates. It is also not possible to set these keys to custom values, because they are substituted automatically by the internal software logic. # ## Analysis # At this moment we should have our evaluations ready and their results collated and stored in the campaing's database. Since we have used `QCGPJPool` inside the `with` statement it has been already cleaned up and we don't need any other code specific for QCG-PilotJob's execution. Thus the remaining part of the tutorial can be no different from its basic version. In other words, starting from now, we have all data needed to perform analysis in a typical for EasyVVUQ way. # # At the beginning we can display the collattion results: # In[16]: campaign.get_collation_result() # We then call the analyse method whose functionality will depend on the sampling method used. It returns an [AnalysisResults]() object which can be used to retrieve numerical values or plot the results. In this case Sobols indices. # In[17]: results = campaign.analyse(qoi_cols=['g1']) # We can plot the results in a treemap format. Each square representing the relative influence of that parameter to the variance of the output variable (vertical displacement at point ```a```). A square labeled ```higher orders``` represent the influence of the interactions between the input parameters. # In[18]: results.plot_sobols_treemap('g1', figsize=(10, 10)) plt.axis('off'); # Alternatively you can get the Sobol index values using the method call below. # In[30]: results.sobols_first('g1')