Parallel GST using MPI

The purpose of this tutorial is to demonstrate how to compute GST estimates in parallel (using multiple CPUs or "processors"). The core PyGSTi computational routines are written to take advantage of multiple processors via the MPI communication framework, and so one must have a version of MPI and the mpi4py python package installed in order use run pyGSTi calculations in parallel.

Since mpi4py doesn't play nicely with Jupyter notebooks, this tutorial is a bit more clunky than the others. In it, we will create a standalone Python script that imports mpi4py and execute it.

We will use as an example the same "standard" single-qubit model of the first tutorial. We'll first create a dataset, and then a script to be run in parallel which loads the data. The creation of a simulated data is performed in the same way as the first tutorial. Since random numbers are generated and used as simulated counts within the call to generate_fake_data, it is important that this is not done in a parallel environment, or different CPUs may get different data sets. (This isn't an issue in the typical situation when the data is obtained experimentally.)

In [1]:
#Import pyGSTi and the "stardard 1-qubit quantities for a model with X(pi/2), Y(pi/2), and idle gates"
import pygsti
from pygsti.construction import std1Q_XYI

#Create a data set
target_model = std1Q_XYI.target_model()
fiducials = std1Q_XYI.fiducials
germs = std1Q_XYI.germs
maxLengths = [1,2,4,8,16,32]

mdl_datagen = target_model.depolarize(op_noise=0.1, spam_noise=0.001)
listOfExperiments = pygsti.construction.make_lsgst_experiment_list(target_model.operations.keys(), fiducials, fiducials, germs, maxLengths)
ds = pygsti.construction.generate_fake_data(mdl_datagen, listOfExperiments, nSamples=1000,
                                            sampleError="binomial", seed=1234)
pygsti.io.write_dataset("example_files/mpi_example_dataset.txt", ds)

Next, we'll write a Python script that will load in the just-created DataSet, run GST on it, and write the output to a file. The only major difference between the contents of this script and previous examples is that the script imports mpi4py and passes a MPI comm object (comm) to the do_long_sequence_gst function. Since parallel computing is best used for computationaly intensive GST calculations, we also demonstrate how to set a per-processor memory limit to tell pyGSTi to partition its computations so as to not exceed this memory usage. Lastly, note the use of the gaugeOptParams argument of do_long_sequence_gst, which can be used to weight different model members differently during gauge optimization.

In [2]:
mpiScript = """
import time
import pygsti
from pygsti.construction import std1Q_XYI

#get MPI comm
from mpi4py import MPI
comm = MPI.COMM_WORLD

print("Rank %d started" % comm.Get_rank())

#define target model, fiducials, and germs as before
target_model = std1Q_XYI.target_model()
fiducials = std1Q_XYI.fiducials
germs = std1Q_XYI.germs
maxLengths = [1,2,4,8,16,32]

#tell gauge optimization to weight the operation matrix
# elements 100x more heavily than the SPAM vector elements, and
# to specifically weight the Gx gate twice as heavily as the other
# gates.
goParams = {'itemWeights':{'spam': 0.01, 'gates': 1.0, 'Gx': 2.0} }

#Specify a per-core memory limit (useful for larger GST calculations)
memLim = 2.1*(1024)**3  # 2.1 GB

#Perform TP-constrained GST
target_model.set_all_parameterizations("TP")
    
#load the dataset
ds = pygsti.io.load_dataset("example_files/mpi_example_dataset.txt")

start = time.time()
results = pygsti.do_long_sequence_gst(ds, target_model, fiducials, fiducials,
                                      germs, maxLengths,memLimit=memLim,
                                      gaugeOptParams=goParams, comm=comm,
                                      verbosity=2)
end = time.time()
print("Rank %d finished in %.1fs" % (comm.Get_rank(), end-start))
if comm.Get_rank() == 0:
    import pickle
    pickle.dump(results, open("example_files/mpi_example_results.pkl","wb"))
"""
with open("example_files/mpi_example_script.py","w") as f:
    f.write(mpiScript)

Next, we run the script with 3 processors using mpiexec. The mpiexec executable should have been installed with your MPI distribution -- if it doesn't exist, try replacing mpiexec with mpirun.

In [3]:
! mpiexec -n 3 python3 "example_files/mpi_example_script.py"
Rank 0 started
Rank 1 started
Rank 2 started
--- Circuit Creation ---
   1702 sequences created
   Dataset has 1702 entries: 1702 utilized, 0 requested sequences were missing
--- LGST ---
  Singular values of I_tilde (truncating to first 4 of 6) = 
  4.244089943192679
  1.1594632778409208
  0.9651516670737965
  0.9297628363691268
  0.049256811347238104
  0.025150658372136828
  
  Singular values of target I_tilde (truncating to first 4 of 6) = 
  4.242640687119286
  1.414213562373096
  1.4142135623730956
  1.4142135623730954
  2.5038933168948026e-16
  2.023452063009528e-16
  
--- Iterative MLGST: Iter 1 of 6  92 operation sequences ---: 
  --- Minimum Chi^2 GST ---
  Memory limit = 2.10GB
  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB
  Finding num_nongauge_params is too expensive: using total params.
  Sum of Chi^2 = 95.7297 (92 data params - 43 model params = expected mean of 49; p-value = 7.43161e-05)
  Completed in 0.3s
  2*Delta(log(L)) = 96.1163
  Iteration 1 took 0.3s
  
--- Iterative MLGST: Iter 2 of 6  168 operation sequences ---: 
  --- Minimum Chi^2 GST ---
  Memory limit = 2.10GB
  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB
  Finding num_nongauge_params is too expensive: using total params.
  Sum of Chi^2 = 162.192 (168 data params - 43 model params = expected mean of 125; p-value = 0.0141238)
  Completed in 0.2s
  2*Delta(log(L)) = 162.56
  Iteration 2 took 0.2s
  
--- Iterative MLGST: Iter 3 of 6  450 operation sequences ---: 
  --- Minimum Chi^2 GST ---
  Memory limit = 2.10GB
  Cur, Persist, Gather = 0.13, 0.00, 0.21 GB
  Finding num_nongauge_params is too expensive: using total params.
  Sum of Chi^2 = 484.676 (450 data params - 43 model params = expected mean of 407; p-value = 0.00480708)
  Completed in 0.5s
  2*Delta(log(L)) = 485.572
  Iteration 3 took 0.5s
  
--- Iterative MLGST: Iter 4 of 6  862 operation sequences ---: 
  --- Minimum Chi^2 GST ---
  Memory limit = 2.10GB
  Cur, Persist, Gather = 0.14, 0.00, 0.21 GB
  Finding num_nongauge_params is too expensive: using total params.
  Sum of Chi^2 = 895.303 (862 data params - 43 model params = expected mean of 819; p-value = 0.0324382)
  Completed in 0.9s
  2*Delta(log(L)) = 896.23
  Iteration 4 took 1.0s
  
--- Iterative MLGST: Iter 5 of 6  1282 operation sequences ---: 
  --- Minimum Chi^2 GST ---
  Memory limit = 2.10GB
  Cur, Persist, Gather = 0.14, 0.00, 0.21 GB
  Finding num_nongauge_params is too expensive: using total params.
  Sum of Chi^2 = 1350.86 (1282 data params - 43 model params = expected mean of 1239; p-value = 0.0140423)
  Completed in 1.5s
  2*Delta(log(L)) = 1351.9
  Iteration 5 took 1.6s
  
--- Iterative MLGST: Iter 6 of 6  1702 operation sequences ---: 
  --- Minimum Chi^2 GST ---
  Memory limit = 2.10GB
  Cur, Persist, Gather = 0.15, 0.00, 0.21 GB
  Finding num_nongauge_params is too expensive: using total params.
  Sum of Chi^2 = 1800.55 (1702 data params - 43 model params = expected mean of 1659; p-value = 0.0081481)
  Completed in 2.3s
  2*Delta(log(L)) = 1801.81
  Iteration 6 took 2.4s
  
  Switching to ML objective (last iteration)
  --- MLGST ---
  Memory: limit = 2.10GB(cur, persist, gthr = 0.15, 0.00, 0.21 GB)
  Finding num_nongauge_params is too expensive: using total params.
    Maximum log(L) = 900.852 below upper bound of -2.84686e+06
      2*Delta(log(L)) = 1801.7 (1702 data params - 43 model params = expected mean of 1659; p-value = 0.00773372)
    Completed in 0.7s
  2*Delta(log(L)) = 1801.7
  Final MLGST took 0.7s
  
Iterative MLGST Total Time: 6.6s
  -- Adding Gauge Optimized (go0) --
--- Re-optimizing logl after robust data scaling ---
  --- MLGST ---
  Memory: limit = 2.10GB(cur, persist, gthr = 0.15, 0.00, 0.21 GB)
  Finding num_nongauge_params is too expensive: using total params.
    Maximum log(L) = 900.852 below upper bound of -2.84686e+06
      2*Delta(log(L)) = 1801.7 (1702 data params - 43 model params = expected mean of 1659; p-value = 0.00773372)
    Completed in 0.5s
Rank 2 finished in 9.4s
Rank 1 finished in 9.4s
  -- Adding Gauge Optimized (go0) --
Rank 0 finished in 9.4s

Notice in the above that output within do_long_sequence_gst is not duplicated (only the first processor outputs to stdout) so that the output looks identical to running on a single processor. Finally, we just need to read the pickled Results object from file and proceed with any post-processing analysis. In this case, we'll just create a report.

In [4]:
import pickle
results = pickle.load(open("example_files/mpi_example_results.pkl","rb"))
pygsti.report.create_standard_report(results, "example_files/mpi_example_brief",
                                    title="MPI Example Report", verbosity=2, auto_open=True)
*** Creating workspace ***
*** Generating switchboard ***
Found standard clifford compilation from std1Q_XYI
*** Generating tables ***
/Users/enielse/research/pyGSTi/packages/pygsti/report/factory.py:785: UserWarning:

Idle tomography failed:
Label{layers}

  targetSpamBriefTable                          took 0.741819 seconds
  targetGatesBoxTable                           took 0.171605 seconds
  datasetOverviewTable                          took 0.119944 seconds
  bestGatesetSpamParametersTable                took 0.00061 seconds
  bestGatesetSpamBriefTable                     took 0.160337 seconds
  bestGatesetSpamVsTargetTable                  took 0.10074 seconds
  bestGatesetGaugeOptParamsTable                took 0.000563 seconds
  bestGatesetGatesBoxTable                      took 0.159274 seconds
  bestGatesetChoiEvalTable                      took 0.467779 seconds
  bestGatesetDecompTable                        took 0.217626 seconds
  bestGatesetEvalTable                          took 0.003944 seconds
  bestGermsEvalTable                            took 0.030369 seconds
  bestGatesetVsTargetTable                      took 0.064669 seconds
  bestGatesVsTargetTable_gv                     took 0.272684 seconds
  bestGatesVsTargetTable_gvgerms                took 0.120676 seconds
  bestGatesVsTargetTable_gi                     took 0.011457 seconds
  bestGatesVsTargetTable_gigerms                took 0.045032 seconds
  bestGatesVsTargetTable_sum                    took 0.2168 seconds
  bestGatesetErrGenBoxTable                     took 0.757526 seconds
  metadataTable                                 took 0.08499 seconds
  stdoutBlock                                   took 0.001135 seconds
  profilerTable                                 took 0.001579 seconds
  softwareEnvTable                              took 0.039033 seconds
  exampleTable                                  took 0.055899 seconds
  singleMetricTable_gv                          took 0.212782 seconds
  singleMetricTable_gi                          took 0.013763 seconds
  fiducialListTable                             took 0.00094 seconds
  prepStrListTable                              took 0.000207 seconds
  effectStrListTable                            took 0.000215 seconds
  colorBoxPlotKeyPlot                           took 0.075305 seconds
  germList2ColTable                             took 0.000329 seconds
  progressTable                                 took 4.463347 seconds
*** Generating plots ***
  gramBarPlot                                   took 0.125494 seconds
  progressBarPlot                               took 0.08425 seconds
  progressBarPlot_sum                           took 0.000674 seconds
  finalFitComparePlot                           took 0.072736 seconds
  bestEstimateColorBoxPlot                      took 16.191067 seconds
  bestEstimateTVDColorBoxPlot                   took 14.47499 seconds
  bestEstimateColorScatterPlot                  took 17.164604 seconds
  bestEstimateColorHistogram                    took 14.966767 seconds
  progressTable_scl                             took 3.376393 seconds
  progressBarPlot_scl                           took 0.044108 seconds
  bestEstimateColorBoxPlot_scl                  took 11.72345 seconds
  bestEstimateColorScatterPlot_scl              took 12.955372 seconds
  bestEstimateColorHistogram_scl                took 11.788572 seconds
  dataScalingColorBoxPlot                       took 0.12912 seconds
*** Merging into template file ***
  Rendering topSwitchboard                      took 0.000115 seconds
  Rendering maxLSwitchboard1                    took 7.8e-05 seconds
  Rendering targetSpamBriefTable                took 0.071255 seconds
  Rendering targetGatesBoxTable                 took 0.061378 seconds
  Rendering datasetOverviewTable                took 0.001056 seconds
  Rendering bestGatesetSpamParametersTable      took 0.001418 seconds
  Rendering bestGatesetSpamBriefTable           took 0.135043 seconds
  Rendering bestGatesetSpamVsTargetTable        took 0.001596 seconds
  Rendering bestGatesetGaugeOptParamsTable      took 0.000922 seconds
  Rendering bestGatesetGatesBoxTable            took 0.119739 seconds
  Rendering bestGatesetChoiEvalTable            took 0.115429 seconds
  Rendering bestGatesetDecompTable              took 0.070942 seconds
  Rendering bestGatesetEvalTable                took 0.011932 seconds
  Rendering bestGermsEvalTable                  took 0.0441 seconds
  Rendering bestGatesetVsTargetTable            took 0.001135 seconds
  Rendering bestGatesVsTargetTable_gv           took 0.002487 seconds
  Rendering bestGatesVsTargetTable_gvgerms      took 0.003616 seconds
  Rendering bestGatesVsTargetTable_gi           took 0.00255 seconds
  Rendering bestGatesVsTargetTable_gigerms      took 0.002752 seconds
  Rendering bestGatesVsTargetTable_sum          took 0.002272 seconds
  Rendering bestGatesetErrGenBoxTable           took 0.261762 seconds
  Rendering metadataTable                       took 0.002315 seconds
  Rendering stdoutBlock                         took 0.000737 seconds
  Rendering profilerTable                       took 0.001567 seconds
  Rendering softwareEnvTable                    took 0.002251 seconds
  Rendering exampleTable                        took 0.022291 seconds
  Rendering metricSwitchboard_gv                took 4.1e-05 seconds
  Rendering metricSwitchboard_gi                took 2.6e-05 seconds
  Rendering singleMetricTable_gv                took 0.005312 seconds
  Rendering singleMetricTable_gi                took 0.004573 seconds
  Rendering fiducialListTable                   took 0.002853 seconds
  Rendering prepStrListTable                    took 0.00201 seconds
  Rendering effectStrListTable                  took 0.001991 seconds
  Rendering colorBoxPlotKeyPlot                 took 0.02336 seconds
  Rendering germList2ColTable                   took 0.003847 seconds
  Rendering progressTable                       took 0.00421 seconds
  Rendering gramBarPlot                         took 0.023216 seconds
  Rendering progressBarPlot                     took 0.020221 seconds
  Rendering progressBarPlot_sum                 took 0.020006 seconds
  Rendering finalFitComparePlot                 took 0.020415 seconds
  Rendering bestEstimateColorBoxPlot            took 0.192293 seconds
  Rendering bestEstimateTVDColorBoxPlot         took 0.187625 seconds
  Rendering bestEstimateColorScatterPlot        took 0.35193 seconds
  Rendering bestEstimateColorHistogram          took 0.204445 seconds
  Rendering progressTable_scl                   took 0.004331 seconds
  Rendering progressBarPlot_scl                 took 0.019671 seconds
  Rendering bestEstimateColorBoxPlot_scl        took 0.189318 seconds
  Rendering bestEstimateColorScatterPlot_scl    took 0.348455 seconds
  Rendering bestEstimateColorHistogram_scl      took 0.210257 seconds
  Rendering dataScalingColorBoxPlot             took 0.043448 seconds
Output written to example_files/mpi_example_brief directory
Opening example_files/mpi_example_brief/main.html...
*** Report Generation Complete!  Total time 115.088s ***
Out[4]:
<pygsti.report.workspace.Workspace at 0x108357358>
In [ ]: