#!/usr/bin/env python
# coding: utf-8

# # Time-dependent data in Data Sets
# The [DataSet tutorial](../DataSet.ipynb) covered the basics of how to use `DataSet` objects with time-independent counts. When your data is time-stamped, either for each individual count or by groups of counts, there are additional (richer) options for analysis.  The `DataSet` class is also capable of storing time-dependent data by holding *series* of count data rather than binned numbers-of-counts, which are added via its `add_series_data` method.  Outcome counts are input by giving at least two parallel arrays of 1) outcome labels and 2) time stamps.  Optionally, one can provide a third array of repetitions, specifying how many times the corresponding outcome occurred at the time stamp.  While in reality no two outcomes are taken at exactly the same time, a `DataSet` allows for arbitrarily *coarse-grained* time-dependent data in which multiple outcomes are all tagged with the *same* time stamp.  In fact, the "time-independent" case considered in the aforementioned tutorial is actually just a special case in which the all data is stamped at *time=0*.
# 
# Below we demonstrate how to create and initialize a `DataSet` using time series data.

# In[1]:


import pygsti

#Create an empty dataset                                                                       
tdds = pygsti.objects.DataSet(outcomeLabels=['0','1'])

#Add a "single-shot" series of outcomes, where each spam label (outcome) has a separate time stamp
tdds.add_raw_series_data( ('Gx',), #gate sequence                                                                 
            ['0','0','1','0','1','0','1','1','1','0'], #spam labels                                                                                                                 
            [0.0, 0.2, 0.5, 0.6, 0.7, 0.9, 1.1, 1.3, 1.35, 1.5]) #time stamps                                                                                              

#When adding outcome-counts in "chunks" where the counts of each
# chunk occur at nominally the same time, use 'add_raw_series_data' to
# add a list of count dictionaries with a timestamp given for each dict:
tdds.add_series_data( ('Gx','Gx'),  #gate sequence                                                               
                      [{'0':10, '1':90}, {'0':30, '1':70}], #count dicts                                                         
                      [0.0, 1.0]) #time stamps - one per dictionary                                                               

#For even more control, you can specify the timestamp of each count
# event or group of identical outcomes that occur at the same time:
#Add 3 'plus' outcomes at time 0.0, followed by 2 'minus' outcomes at time 1.0
tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               
                      ['0','1'], #spam labels                                                         
                      [0.0, 1.0], #time stamps                                                               
                      [3,2]) #repeats  

#The above coarse-grained addition is logically identical to:
# tdds.add_raw_series_data( ('Gy',),  #gate sequence                                                               
#                       ['0','0','0','1','1'], #spam labels                                                         
#                       [0.0, 0.0, 0.0, 1.0, 1.0]) #time stamps                                                               
# (However, the DataSet will store the coase-grained addition more efficiently.) 


# When one is done populating the `DataSet` with data, one should still call `done_adding_data`:

# In[2]:


tdds.done_adding_data()


# Access to the underlying time series data is done by indexing on the gate sequence (to get a `DataSetRow` object, just as in the time-independent case) which has various methods for retrieving its underlying data: 

# In[3]:


tdds_row = tdds[('Gx',)]
print("INFO for Gx string:\n")
print( tdds_row )
      
print( "Raw outcome label indices:", tdds_row.oli )
print( "Raw time stamps:", tdds_row.time )
print( "Raw repetitions:", tdds_row.reps )
print( "Number of entries in raw arrays:", len(tdds_row) )

print( "Outcome Labels:", tdds_row.outcomes )
print( "Repetition-expanded outcome labels:", tdds_row.get_expanded_ol() )
print( "Repetition-expanded outcome label indices:", tdds_row.get_expanded_oli() )
print( "Repetition-expanded time stamps:", tdds_row.get_expanded_times() )
print( "Time-independent-like counts per spam label:", tdds_row.counts )
print( "Time-independent-like total counts:", tdds_row.total )
print( "Time-independent-like spam label fraction:", tdds_row.fractions )

print("\n")

tdds_row = tdds[('Gy',)]
print("INFO for Gy string:\n")
print( tdds_row )
      
print( "Raw outcome label indices:", tdds_row.oli )
print( "Raw time stamps:", tdds_row.time )
print( "Raw repetitions:", tdds_row.reps )
print( "Number of entries in raw arrays:", len(tdds_row) )

print( "Spam Labels:", tdds_row.outcomes )
print( "Repetition-expanded outcome labels:", tdds_row.get_expanded_ol() )
print( "Repetition-expanded outcome label indices:", tdds_row.get_expanded_oli() )
print( "Repetition-expanded time stamps:", tdds_row.get_expanded_times() )
print( "Time-independent-like counts per spam label:", tdds_row.counts )
print( "Time-independent-like total counts:", tdds_row.total )
print( "Time-independent-like spam label fraction:", tdds_row.fractions )


# Finally, it is possible to read text-formatted time-dependent data in the special case when
# 1. the outcomes are all single-shot 
# 2. the time stamps of the outcomes are the integers (starting at zero) for *all* of the operation sequences.
# This corresponds to the case when each sequence is performed and measured simultaneously at equally spaced intervals.  We realize this is a bit fictitous and more text-format input options will be created in the future.

# In[4]:


tddataset_txt = \
"""## 0 = 0                                                                                                                   
## 1 = 1                                                                                                                      
{} 011001                                                                                                                     
Gx 111000111                                                                                                                  
Gy 11001100                                                                                                                   
"""
with open("../../tutorial_files/TDDataset.txt","w") as output:
    output.write(tddataset_txt)
tdds_fromfile = pygsti.io.load_tddataset("../../tutorial_files/TDDataset.txt")
print(tdds_fromfile)

print("Some tests:")
print(tdds_fromfile[()].fraction('1'))
print(tdds_fromfile[('Gy',)].fraction('1'))
print(tdds_fromfile[('Gx',)].total)


# In[ ]: