# Pure Data Analysis¶

This tutorial covers different methods of analysing data without running GST. So far, there's only one, which checks for consistency between two (or more) datasets, called "Data Set Comparison".

## Data Set Comparison¶

This method declares that two or more DataSets are "consistent" if the observed counts for the same gate strings across the data sets are all consistent with being generated by the same underlying gateset. This protocol can be used to test for, among other things, drift and crosstalk. It can also be used to compare an experimental dataset to an "ideal" dataset.

In :
from __future__ import division, print_function

import pygsti
import numpy as np
import scipy
from scipy import stats
from pygsti.construction import std1Q_XYI

/usr/local/lib/python3.6/site-packages/pyGSTi-0.9.4.4-py3.6.egg/pygsti/tools/matrixtools.py:23: UserWarning: Could not import Cython extension - falling back to slower pure-python routines
_warnings.warn("Could not import Cython extension - falling back to slower pure-python routines")


Let's first compare two Dataset objects where the underlying gate sets are the same. The data sets we'll use will be GST datasets (which allows us to do some nice visualization), but arbitrary datasets will work in general, provided that the gate sequences across the datasets are the same.

In :
#Let's make our underlying gate set have a little bit of random unitary noise.
gs_exp_0 = std1Q_XYI.gs_target.copy()
gs_exp_0 = gs_exp_0.randomize_with_unitary(.01,seed=0)

In :
germs = std1Q_XYI.germs
fiducials = std1Q_XYI.fiducials
max_lengths = [1,2,4,8,16,32,64,128,256]
gate_sequences = pygsti.construction.make_lsgst_experiment_list(std1Q_XYI.gates,fiducials,fiducials,germs,max_lengths)

In :
#Generate the data for the two datasets, using the same gate set, with 100 repetitions of each sequence.
N=100
DS_0 = pygsti.construction.generate_fake_data(gs_exp_0,gate_sequences,N,'binomial',seed=10)
DS_1 = pygsti.construction.generate_fake_data(gs_exp_0,gate_sequences,N,'binomial',seed=20)

In :
#Let's compare the two datasets.
comparator_0_1 = pygsti.objects.DataComparator([DS_0,DS_1])

In :
#Let's get the report from the comparator.
comparator_0_1.report(confidence_level=0.95)

Consistency report- datasets are inconsistent at given confidence level if EITHER of the following scores report inconsistency.

Threshold for individual gatestring scores is 19.835456860117194
As measured by worst-performing gate strings, data sets are CONSISTENT at the 95.0% confidence level.
0 gate string(s) have loglikelihood scores greater than the threshold.

Threshold for sum of gatestring scores is 3114.73885644373.
As measured by sum of gatestring scores, data sets are CONSISTENT at the 95.0% confidence level.
Total loglikelihood is 2720.86373067222
Total number of standard deviations (N_sigma) of model violation is -3.1329611727371303.

In :
#Create a workspace to show plots
w = pygsti.report.Workspace()
w.init_notebook_mode(connected=False, autodisplay=True)