Notebook

File Input and Output¶

This tutorial describes how different pyGSTi objects can be saved to and read back from files. There are four main ways of doing this, depending on the type of object you're wanting to save.

Custom text formats A small number of pyGSTi objects can be converted to and from a pyGSTi-specific text format, that can be written to and from files. Currently DataSet, MultiDataSet, Circuit, and Label are the only such objects. Circuit and Label objects are usually not read or written individually; usually it's a list of circuits that are.
JSON format Many of pyGSTi's objects have the ability to be converted to and from the JavaScript object notation (JSON) format. This is a standard text format that is widely supported and portable, making it the preferred choice for reading and writing just about any pyGSTi object except a data set or and circuit list (which have their own text formats).
Experiment design directory Several high-level objects can be read and written from directories rather than individual files. These include ExperimentDesign, Protocol, ProtocolData, ProtocolResults, and ProtocolResultsDir objects. Because these object types hold lots of other objects, storing them to directories makes looking through the contents of the object possible without even using Python, since different parts of the object are stored in different files. This is further facilitated by saving the parts of the high-level object in the custom text or JSON formats described above - the resulting directory contains portable, human-readable files.
Secondly, this directory format allows for multiple of object types to effectively share a single directory. This is achieved by placing the data for an object in a subdirectory beneath the given directory named based on the object type. For instance, writing an ExperimentDesign to the path/to/root_directory will write the object's contents within the path/to/root_directory/edesign directory. A ProtocolData, which contains an ExperimentDesign, when writing to the same path/to/root_directory will write its non-experiment design contents within path/to/root_directory/data directory and re-use the serialized experiment design in the edesign sub-directory. When experiment designs contain other (nested) experiment designs, this leads to an efficient tree structure organizing circuit lists, experimental data, and analysis results.
Pickle, basic JSON and MSGPACK formats Finally, most pyGSTi objects are able to be "pickled" using Python's pickle format. The codec that allows for this can also write JSON and MessagePack formats via functions in pygsti.io.json and pygsti.io.msgpack. These routines create an alternate format to the "nice" JSON format described above: they produce a format that is fragile, non-portable, and not human readable. As such, using pickle, pygsti.io.json or pygsti.io.msgpack should be a last resort and only used for temporary storage.

Text formats¶

All text-based input and output is done via the pygsti.io sub-package. Below we give examples of how to read and write data sets and circuit objects. (Note: Data set objects also have read_binary and write_binary methods which read and write binary formats -- different from the text formats used by the pygsti.io routines -- but these binary formats are less portable and not human readable, making the text formats preferable).

In [ ]:

import pygsti

#DataSets ------------------------------------------------------------
dataset_txt = \
"""## Columns = 0 count, 1 count
{} 0 100
Gx 10 80
GxGy 40 20
Gx^4 20 70
"""
with open("../tutorial_files/TestDataSet.txt","w") as f:
    f.write(dataset_txt)

ds = pygsti.io.read_dataset("../tutorial_files/TestDataSet.txt")
pygsti.io.write_dataset("../tutorial_files/TestDataSet.txt", ds)


#MultiDataSets ------------------------------------------------------------
multidataset_txt = \
"""## Columns = DS0 0 count, DS0 1 count, DS1 0 frequency, DS1 count total                                
{} 0 100 0 100                                                                                                      
Gx 10 90 0.1 100                                                                                                    
GxGy 40 60 0.4 100                                                                                                  
Gx^4 20 80 0.2 100                                                                                                  
"""

with open("../tutorial_files/TestMultiDataSet.txt","w") as f:
    f.write(multidataset_txt)
    
multiDS = pygsti.io.read_multidataset("../tutorial_files/TestMultiDataSet.txt", cache=True)
pygsti.io.write_multidataset("../tutorial_files/TestDataSet.txt", multiDS)


#DataSets w/timestamped data --------------------------------------------
# Note: left of equals sign is letter, right is spam label
tddataset_txt = \
"""## 0 = 0
## 1 = 1
{} 011001
Gx 111000111
Gy 11001100
"""
with open("../tutorial_files/TestTDDataset.txt","w") as f:
    f.write(tddataset_txt)
    
tdds_fromfile = pygsti.io.read_time_dependent_dataset("../tutorial_files/TestTDDataset.txt")
#NOTE: currently there's no way to *write* a DataSet w/timestamped data to a text file yet.


#Circuits ------------------------------------------------------------
from pygsti.modelpacks import smq1Q_XY  
cList = pygsti.circuits.create_lsgst_circuits(
    [('Gxpi2',0), ('Gypi2',0)], smq1Q_XY.prep_fiducials(), smq1Q_XY.meas_fiducials(),
    smq1Q_XY.germs(), [1,2,4,8])    
pygsti.io.write_circuit_list("../tutorial_files/TestCircuitList.txt",cList,"#Test Circuit List")
pygsti.io.write_empty_dataset("../tutorial_files/TestEmptyDataset.txt",cList) 
  #additionally creates columns of zeros where data should go...
cList2 = pygsti.io.read_circuit_list("../tutorial_files/TestCircuitList.txt")

JSON format¶

Any object that derives from pygsti.baseobjs.NicelySerializable can be written to or read from a "nice" JSON format using the object's write and read methods. This is the best way to serialize Model objects, which can be particularly complex. We give an example of this below.

In [ ]:

pspec = pygsti.processors.QubitProcessorSpec(2, ['Gxpi2', 'Gypi2', 'Gcnot'], geometry='line')
mdl = pygsti.models.modelconstruction.create_crosstalk_free_model(pspec)
mdl.write("../tutorial_files/serialized_model.json")

#A base class can read in derived-class files 
loaded_mdl = pygsti.models.Model.read("../tutorial_files/serialized_model.json")
print(type(loaded_mdl))

import numpy as np
assert(np.allclose(mdl.to_vector(), loaded_mdl.to_vector()))

We can also serialize a QubitProcessorSpec this way:

In [ ]:

pspec.write("../tutorial_files/serialized_pspec.json")
print("wrote a", str(type(pspec)))

But we'll get an error if we try to read in the processor specification using pygsti.models.Model, since a class's .read ensures that it constructs an object of that type or a subclass of it:

In [ ]:

try:
    pygsti.models.Model.read("../tutorial_files/serialized_pspec.json")
except ValueError as v:
    print("ERROR: ", str(v))

p1 = pygsti.processors.QubitProcessorSpec.read("../tutorial_files/serialized_pspec.json")  # ok - reads object of same type
p2 = pygsti.processors.ProcessorSpec.read("../tutorial_files/serialized_pspec.json")  # ok - ProcessorSpec is a base class of QubitProcessorSpec
p3 = pygsti.baseobjs.NicelySerializable.read("../tutorial_files/serialized_pspec.json") # also ok - NicelySerializeable is a base class too

assert(type(p1) == pygsti.processors.QubitProcessorSpec)
assert(type(p2) == pygsti.processors.QubitProcessorSpec)
assert(type(p3) == pygsti.processors.QubitProcessorSpec)

Experiment design directory tree¶

ExperimentDesign, Protocol, ProtocolData, ProtocolResults, and ProtocolResultsDir objects can be written to a directory using using their write methods. These objects place their data in subdirectories beneath the given directory as follows:

ExperimentDesign places its contents in an edesign subdirectory.
ProtocolData places its contained experiment design in an edesign subdirectory (it writes its contained ExperimentDesign to the same root directory) and the rest of its contents in a data subdirectory.
ProtocolResults writes its contained ProtocolData object into the same root directory (this uses edesign and data subdirectories) and the rest of its contents into the results/<name> subdirectory, where <name> is given by the name attribute of the ProtocolResults object.
ProtocolResultsDir writes its contained ProtocolData object into the same root directory (this uses edesign and data subdirectories) and the rest of its contents, which consist of multiple ProtocolResults objects beneath a results subdirectory (i.e. it writes all of its contained ProtocolResults objects to the same root directory).
Protocol doesn't conform to the same structure, and can be independently written to a given directory. Contents are written directly into this directory (no subdirectories), and so the destination directory is unrelated to the common "root directory" used by the preceding classes.

This results in a directory structure where common information (e.g. an experiment design) is only written to disk in one location (the edesign subdirectory) and shared by all the objects that need it. Additional subdirectories may exist beneath the root directory to support nested experiment designs and their related data and results. These subdirectory names are listed in the edesign/subdirs.json file. Additional subdirectories not listed here are ignored and can be used by users as they please. To read these objects back from disk, you use the following pygsti.io functions:

pygsti.io.read_edesign_from_dir(root_directory) reads the experiment design saved beneath the given root directory, i.e., from root_directory/edesign. Also loads nested experiment designs.
pygsti.io.read_data_from_dir(root_directory) reads a ProtocolData from the given root directory, including nested data objects.
pygsti.io.read_results_from_dir(root_directory) reads a ProtocolResultsDir from the given root directory, including nested result directory objects.
pygsti.io.read_protocol_from_dir(directory) reads a Protocol object from the given directory. This directory is independent and only used for holding a protocol's contents, i.e. not like the root_directory arguments to the previous functions.

We demonstrate some of this functionality with a simple example:

In [ ]:

import shutil
shutil.rmtree("../tutorial_files/example_root_directory")  # start with a clean slate, in case you rerun this tutorial

from pygsti.modelpacks import smq1Q_XYI
edesign = smq1Q_XYI.create_gst_experiment_design(1)
edesign.write("../tutorial_files/example_root_directory")  # creates .../example_root_directory/edesign

In [ ]:

# load back in the experiment design (just to demo this function):
loaded_edesign = pygsti.io.read_edesign_from_dir("../tutorial_files/example_root_directory")

#Create & write an empty ProtocolData object to the same root directory
# (this is often useful because it creates an empty data set file in .../example_root_directory/data
#  that can be used as a template for the real experimental data)
pygsti.io.write_empty_protocol_data("../tutorial_files/example_root_directory", loaded_edesign, clobber_ok=True)

#Alternatively, we can create an empty ProtocolData object and then write it:
data = pygsti.protocols.ProtocolData(loaded_edesign)
data.write()  # same as .write("../tutorial_files/example_root_directory") because loaded_edesign remembers 
              # where is was loaded from and this is the default destination (see loaded_edesign._loaded_from)

In [ ]:

#Next, let's create a protocol to run on the data.
protocol = pygsti.protocols.GateSetTomography(smq1Q_XYI.target_model("full TP"), name="TP_GST_Analysis")

#We can read/write this separately, though we don't really need to since it will get saved as a part
# of the results later on.  Still, to demonstrate this, here's how to do it:
protocol.write("../tutorial_files/example_gst_protocol")
loaded_protocol = pygsti.io.read_protocol_from_dir("../tutorial_files/example_gst_protocol")

In [ ]:

#Now, generate some simulated data and run the protocol on it to produce some results

# Mimic filling in or replacing the empty "template" dataset file (dataset.txt) with actual data
datagen_model = smq1Q_XYI.target_model().depolarize(op_noise=0.05, spam_noise=0.2)
pygsti.io.fill_in_empty_dataset_with_fake_data("../tutorial_files/example_root_directory/data/dataset.txt",
                                               datagen_model, num_samples=1000, seed=2021)

# Load the just-simulated data as a ProtocolData object and run our GST protocol on it:
loaded_data = pygsti.io.read_data_from_dir("../tutorial_files/example_root_directory")
results = protocol.run(loaded_data)

In [ ]:

# Save the results:
results.write()  # same as .write("../tutorial_files/example_root_directory") because result's edesign remembers 
                 # the root directory is was loaded from and this is the default destination.

At this point we have directory containing an experiment design, a corresponding set of data, and some results from analyzing that data. It may be interesting to poke around in this example_root_directory and take not of where things are. Notice how the subdirectory under results is the name we gave to the protocol above: TP_GST_Analysis.

We can load the results back in, again using the same root directory:

In [ ]:

loaded_results_dir = pygsti.io.read_results_from_dir("../tutorial_files/example_root_directory")

print("Available results = ", list(loaded_results_dir.for_protocol.keys()))
gst_results = loaded_results_dir.for_protocol['TP_GST_Analysis']

gst_estimate = gst_results.estimates['TP_GST_Analysis']  # protocol name is also used as a default estimate name
gst_model = gst_estimate.models['stdgaugeopt']
dataset = gst_results.data.dataset  # ProtocolResults -> ProtocolData -> DataSet

nSigma = pygsti.tools.two_delta_logl_nsigma(gst_model, dataset)
print("Model violation (goodness of fit) = %g sigma" % nSigma)
if nSigma < 2:
    print("  The GST model gives a very good fit to the data!")

Something we want to emphasize is that all of the I/O function calls (except the simulated data generation and unnecessary separate saving of the protocol) are given the same, single root directory ("../tutorial_files/example_root_directory"). This is a key feature and the intended design of pyGSTi's I/O framework - that a given experiment design, data, and analysis all live together in the same "node" (corresponding to a root directory) in a potential tree of experiments. In the example above, we just have a single node. By using nested experiment designs (e.g. CombinedExperimentDesign and SimultaneousExperimentDesign objects) a tree of such "root directories" (perhaps better called "node directories" then) can be built.

Serialization to pickle, JSON and MSGPACK formats¶

PyGSTi contains support for reading and writing most (if not all) of its objects from and to the JSON and MessagePack formats using a pickle-like format. The modules pygsti.io.json and pygsti.io.msgpack mimic the more general Python json and msgpack packages (json is a standard package, msgpack is a separate package, and must be installed if you wish to use pyGSTi's MessagePack functionality). These, in turn, mimic the load/dump interface of the standard pickle module, so it's very easy to serialize data using any of these formats. Here's a brief summary of the mais advantages and disadvantages of each format:

pickle
Advantages: a standard Python package; very easy to use; can serialize almost anything.
Disadvantages: incompatibility between python2 and python3 pickles; can be large on disk (inefficient); not web safe.
json
Advantages: a standard Python package; web-safe character set.; should be the same on python2 or python3
Disadvantages: large on disk (inefficient)
msgpack
Advantages: should be the same on python2 or python3; efficient binary format (small on disk)
Disadvantages: needs external msgpack package; binary non-web-safe format.

As stated above, because this format is fragile (will completely break when pyGSTi internals change) and not human readable, this should be used as a last resort or just as temporary storage. Below we demonstrate how to use the io.json and io.msgpack modules. Using pickle is essentially the same, as pyGSTi objects support being pickled too.

In [ ]:

import pygsti.serialization.json as json
import pygsti.serialization.msgpack as msgpack

#Models
from pygsti.modelpacks import smq1Q_XYI
target_model = smq1Q_XYI.target_model()
json.dump(target_model, open("../tutorial_files/TestModel.json",'w'))
target_model_from_json = json.load(open("../tutorial_files/TestModel.json"))

msgpack.dump(target_model, open("../tutorial_files/TestModel.mp",'wb'))
target_model_from_msgpack = msgpack.load(open("../tutorial_files/TestModel.mp", 'rb'))

#DataSets
json.dump(ds, open("../tutorial_files/TestDataSet.json",'w'))
ds_from_json = json.load(open("../tutorial_files/TestDataSet.json"))

msgpack.dump(ds, open("../tutorial_files/TestDataSet.mp",'wb'))
ds_from_msgpack = msgpack.load(open("../tutorial_files/TestDataSet.mp",'rb'))

#MultiDataSets
json.dump(multiDS, open("../tutorial_files/TestMultiDataSet.json",'w'))
multiDS_from_json = json.load(open("../tutorial_files/TestMultiDataSet.json"))

msgpack.dump(multiDS, open("../tutorial_files/TestMultiDataSet.mp",'wb'))
multiDS_from_msgpack = msgpack.load(open("../tutorial_files/TestMultiDataSet.mp",'rb'))

# Timestamped-data DataSets
json.dump(tdds_fromfile, open("../tutorial_files/TestTDDataset.json",'w'))
tdds_from_json = json.load(open("../tutorial_files/TestTDDataset.json"))

msgpack.dump(tdds_fromfile, open("../tutorial_files/TestTDDataset.mp",'wb'))
tdds_from_msgpack = msgpack.load(open("../tutorial_files/TestTDDataset.mp",'rb'))

#Circuit Lists
json.dump(cList, open("../tutorial_files/TestCircuitList.json",'w'))
cList_from_json = json.load(open("../tutorial_files/TestCircuitList.json"))

msgpack.dump(cList, open("../tutorial_files/TestCircuitList.mp",'wb'))
cList_from_msgpack = msgpack.load(open("../tutorial_files/TestCircuitList.mp",'rb'))

In [ ]: