Time-dependent data in Data Sets¶

The DataSet tutorial covered the basics of how to use DataSet objects with time-independent counts. When your data is time-stamped, either for each individual count or by groups of counts, there are additional (richer) options for analysis. The DataSet class is also capable of storing time-dependent data by holding series of count data rather than binned numbers-of-counts, which are added via its add_series_data method. Outcome counts are input by giving at least two parallel arrays of 1) outcome labels and 2) time stamps. Optionally, one can provide a third array of repetitions, specifying how many times the corresponding outcome occurred at the time stamp. While in reality no two outcomes are taken at exactly the same time, a DataSet allows for arbitrarily coarse-grained time-dependent data in which multiple outcomes are all tagged with the same time stamp. In fact, the "time-independent" case considered in the aforementioned tutorial is actually just a special case in which the all data is stamped at time=0.

Below we demonstrate how to create and initialize a DataSet using time series data.

In [ ]:
import pygsti

#Create an empty dataset
tdds = pygsti.objects.DataSet(outcome_labels=['0','1'])

#Add a "single-shot" series of outcomes, where each spam label (outcome) has a separate time stamp
['0','0','1','0','1','0','1','1','1','0'], #spam labels
[0.0, 0.2, 0.5, 0.6, 0.7, 0.9, 1.1, 1.3, 1.35, 1.5]) #time stamps

#When adding outcome-counts in "chunks" where the counts of each
# chunk occur at nominally the same time, use 'add_raw_series_data' to
# add a list of count dictionaries with a timestamp given for each dict:
[{'0':10, '1':90}, {'0':30, '1':70}], #count dicts
[0.0, 1.0]) #time stamps - one per dictionary

#For even more control, you can specify the timestamp of each count
# event or group of identical outcomes that occur at the same time:
#Add 3 'plus' outcomes at time 0.0, followed by 2 'minus' outcomes at time 1.0
['0','1'], #spam labels
[0.0, 1.0], #time stamps
[3,2]) #repeats

#The above coarse-grained addition is logically identical to:
#                       ['0','0','0','1','1'], #spam labels
#                       [0.0, 0.0, 0.0, 1.0, 1.0]) #time stamps
# (However, the DataSet will store the coase-grained addition more efficiently.)


When one is done populating the DataSet with data, one should still call done_adding_data:

In [ ]:
tdds.done_adding_data()


Access to the underlying time series data is done by indexing on the gate sequence (to get a DataSetRow object, just as in the time-independent case) which has various methods for retrieving its underlying data:

In [ ]:
tdds_row = tdds[('Gx',)]
print("INFO for Gx string:\n")
print( tdds_row )

print( "Raw outcome label indices:", tdds_row.oli )
print( "Raw time stamps:", tdds_row.time )
print( "Raw repetitions:", tdds_row.reps )
print( "Number of entries in raw arrays:", len(tdds_row) )

print( "Outcome Labels:", tdds_row.outcomes )
print( "Repetition-expanded outcome labels:", tdds_row.get_expanded_ol() )
print( "Repetition-expanded outcome label indices:", tdds_row.get_expanded_oli() )
print( "Repetition-expanded time stamps:", tdds_row.get_expanded_times() )
print( "Time-independent-like counts per spam label:", tdds_row.counts )
print( "Time-independent-like total counts:", tdds_row.total )
print( "Time-independent-like spam label fraction:", tdds_row.fractions )

print("\n")

tdds_row = tdds[('Gy',)]
print("INFO for Gy string:\n")
print( tdds_row )

print( "Raw outcome label indices:", tdds_row.oli )
print( "Raw time stamps:", tdds_row.time )
print( "Raw repetitions:", tdds_row.reps )
print( "Number of entries in raw arrays:", len(tdds_row) )

print( "Spam Labels:", tdds_row.outcomes )
print( "Repetition-expanded outcome labels:", tdds_row.get_expanded_ol() )
print( "Repetition-expanded outcome label indices:", tdds_row.get_expanded_oli() )
print( "Repetition-expanded time stamps:", tdds_row.get_expanded_times() )
print( "Time-independent-like counts per spam label:", tdds_row.counts )
print( "Time-independent-like total counts:", tdds_row.total )
print( "Time-independent-like spam label fraction:", tdds_row.fractions )


Text data-file formats¶

It is possible to read text-formatted time-dependent data in two ways.

The first way is for the special case when

1. the outcomes are all single-shot
2. the time stamps of the outcomes are the integers (starting at zero) for all of the operation sequences. This corresponds to the case when each sequence is performed and measured simultaneously at equally spaced intervals. This is a bit fictitous, but it allows for the compact format given below. Currently, the only way to read in this format is using the separate load_tddataset function:
In [ ]:
tddataset_txt = \
"""## 0 = 0
## 1 = 1
{} 011001
Gx 111000111
Gy 11001100
"""
with open("../../tutorial_files/TDDataset.txt","w") as output:
output.write(tddataset_txt)
print(tdds_fromfile)

print("Some tests:")
print(tdds_fromfile[()].fraction('1'))
print(tdds_fromfile[('Gy',)].fraction('1'))
print(tdds_fromfile[('Gx',)].total)


The second way can describe arbitrary timstamped data, and uses a more general format where each circuit is on a line by itself, followed by two or three subsequent lines giving the timestamps, the outcome labels, and (optionally) the repetition counts for that circuit. If the repetition counts are not given, they are all assumed to equal 1. This is the format that is needed to interact with nicely with ProtocolData objects, e.g. for use with load_data_from_dir. Here's an example that creates the same DataSet as the one loaded in above, and then loads it in using the usual load_dataset function:

In [ ]:
general_tddataset_txt = \
"""{}
times: 0  1  2  3  4  5
outcomes: 0  1  1  0  0  1

Gx
times: 0  1  2  3  4  5  6  7  8
outcomes: 1  1  1  0  0  0  1  1  1

Gy
times: 0  1  2  3  4  5  6  7
outcomes: 1  1  0  0  1  1  0  0

"""
with open("../../tutorial_files/DatasetWithTimestamps.txt","w") as output:
output.write(general_tddataset_txt)

print(general_tdds_fromfile)


The DatasetWithTimestamps.txt file could also have been created by specifying fixed_column_mode=False to the usual write_dataset function, that is:

In [ ]:
pygsti.io.write_dataset("../../tutorial_files/DatasetWithTimestamps.txt", tdds_fromfile, fixed_column_mode=False)


If you're recording several passes through a set of circuits, and all the data on each pass is considered to occur at the same time (i.e. a course-graining of the time-stamped data), then it may be useful to specify the repetition counts. For example, the following data file describes data that was taken in two passes (at time 1.0 and 2.0) of 100 circuit repetitions:

In [ ]:
general_tddataset_txt = \
"""{}
times:        1  1  2  2
outcomes:     0  1  0  1
repetitions: 20 80 25 75

Gx
times:        1  1  2  2
outcomes:     0  1  0  1
repetitions: 50 50 55 45

Gy
times:        1  1  2  2
outcomes:     0  1  0  1
repetitions: 63 37 52 48

"""
with open("../../tutorial_files/DatasetWith2Passes.txt","w") as output:
output.write(general_tddataset_txt)