Notebook

Stack Exchange contains questions and answers that can be up or downvoted. The sites datascience.stackexchange and Stack Overflow look useful for our data science goals.

The tables with the highest number of tag counts would be more promising towards finding the most popular content. The include tags for machine learning and building models. Also Posts, Tags, AnswerCount, CommentCount, FavoriteCount, UpVotes, and PostTags.

In [1]:

#SELECT Id,
 #      CreationDate, 
  #     Score, 
   #    ViewCount, 
    #   Tags, 
     #  AnswerCount, 
      # FavoriteCount
#FROM Posts
#WHERE PostTypeID = 1 AND YEAR(CreationDate) = 2019;

In [2]:

import pandas as pd
questions = pd.read_csv("2019_questions.csv", parse_dates = ["CreationDate"])

In [3]:

questions.head()

Out[3]:

	Id	CreationDate	Score	ViewCount	Tags	AnswerCount	FavoriteCount
0	44419	2019-01-23 09:21:13	1	21	<machine-learning><data-mining>	0	NaN
1	44420	2019-01-23 09:34:01	0	25	<machine-learning><regression><linear-regressi...	0	NaN
2	44423	2019-01-23 09:58:41	2	1651	<python><time-series><forecast><forecasting>	0	NaN
3	44427	2019-01-23 10:57:09	0	55	<machine-learning><scikit-learn><pca>	1	NaN
4	44428	2019-01-23 11:02:15	0	19	<dataset><bigdata><data><speech-to-text>	0	NaN

In [4]:

questions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8839 entries, 0 to 8838
Data columns (total 7 columns):
Id               8839 non-null int64
CreationDate     8839 non-null datetime64[ns]
Score            8839 non-null int64
ViewCount        8839 non-null int64
Tags             8839 non-null object
AnswerCount      8839 non-null int64
FavoriteCount    1407 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(4), object(1)
memory usage: 483.5+ KB

FavoriteCount is missing values. In order to get them, we'd have to look at 7400 questions. This is not practical. Also, it should be an integer, not a float. The types of the remaining columns are reasonable. In Tags it would be helpful to remove the <> and study the most common tags.

In [5]:

questions.isnull().sum()

Out[5]:

Id                  0
CreationDate        0
Score               0
ViewCount           0
Tags                0
AnswerCount         0
FavoriteCount    7432
dtype: int64

The above confirms that FavoriteCount has 7432 missing values.

In [6]:

questions = questions.fillna(0)

In [7]:

questions.isnull().sum()

Out[7]:

Id               0
CreationDate     0
Score            0
ViewCount        0
Tags             0
AnswerCount      0
FavoriteCount    0
dtype: int64

We have just replaced the null ("NaN") missing FavoriteCount values with 0. Now we will change the data type for this column to an integer.

In [8]:

questions['FavoriteCount'] = questions['FavoriteCount'].astype(int)

In [9]:

questions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8839 entries, 0 to 8838
Data columns (total 7 columns):
Id               8839 non-null int64
CreationDate     8839 non-null datetime64[ns]
Score            8839 non-null int64
ViewCount        8839 non-null int64
Tags             8839 non-null object
AnswerCount      8839 non-null int64
FavoriteCount    8839 non-null int64
dtypes: datetime64[ns](1), int64(5), object(1)
memory usage: 483.5+ KB

In [10]:

questions['Tags'].head(5)

Out[10]:

0                      <machine-learning><data-mining>
1    <machine-learning><regression><linear-regressi...
2         <python><time-series><forecast><forecasting>
3                <machine-learning><scikit-learn><pca>
4             <dataset><bigdata><data><speech-to-text>
Name: Tags, dtype: object

In [11]:

questions['Tags'] = questions['Tags'].str.replace("^<|>$", "")
questions['Tags'].head(5)

Out[11]:

0                        machine-learning><data-mining
1    machine-learning><regression><linear-regressio...
2           python><time-series><forecast><forecasting
3                  machine-learning><scikit-learn><pca
4               dataset><bigdata><data><speech-to-text
Name: Tags, dtype: object

In [12]:

questions['Tags'] = questions['Tags'].str.split("><")
questions['Tags'].head(5)

Out[12]:

0                      [machine-learning, data-mining]
1    [machine-learning, regression, linear-regressi...
2         [python, time-series, forecast, forecasting]
3                [machine-learning, scikit-learn, pca]
4             [dataset, bigdata, data, speech-to-text]
Name: Tags, dtype: object

In [13]:

questions.head(3)

Out[13]:

	Id	CreationDate	Score	ViewCount	Tags
0	44419	2019-01-23 09:21:13	1	21	[machine-learning, data-mining]
1	44420	2019-01-23 09:34:01	0	25	[machine-learning, regression, linear-regressi...
2	44423	2019-01-23 09:58:41	2	1651	[python, time-series, forecast, forecasting]

In [14]:

num_tags = {}
for tags in questions['Tags']:
    for tag in tags:
        if tag in num_tags:
            num_tags[tag] += 1
        else:
            num_tags[tag] = 1
print(num_tags)

{'pandas': 354, 'scalability': 4, 'probability': 76, 'chatbot': 14, 'kendalls-tau-coefficient': 1, 'cloud-computing': 9, 'colab': 18, 'refit-model': 1, 'beginner': 27, 'orange': 64, 'bayesian-nonparametric': 2, 'xboost': 1, 'math': 37, 'serialisation': 3, 'ai': 25, 'nltk': 43, 'actor-critic': 21, 'tools': 8, 'estimators': 8, 'rmsle': 1, 'tsne': 15, 'similarity': 72, 'multilabel-classification': 92, 'mean-shift': 2, 'multitask-learning': 7, 'redshift': 1, 'kaggle': 43, 'community': 1, 'time': 5, 'code': 5, 'mathematics': 17, 'gan': 85, 'generalization': 12, 'label-flipping': 1, 'azure-ml': 12, 'dirichlet': 4, 'software-recommendation': 4, 'lbp': 2, 'consumerweb': 1, 'data.table': 4, 'bioinformatics': 4, 'multi-output': 7, 'ipython': 18, 'seaborn': 38, 'management': 2, 'text-generation': 17, 'cloud': 6, 'convnet': 111, 'sensors': 5, 'methodology': 10, 'density-estimation': 3, 'mnist': 23, 'education': 3, 'tesseract': 3, 'siamese-networks': 4, 'pca': 85, 'haar-cascade': 1, 'learning-to-rank': 6, 'summarunner-architecture': 1, 'lstm': 402, 'terminology': 16, 'scraping': 5, 'amazon-ml': 1, 'smotenc': 4, 'nvidia': 7, 'parameter-estimation': 6, 'normalization': 74, 'discounted-reward': 5, 'gmm': 2, 'c': 4, 'weighted-data': 14, 'yolo': 21, 'tfidf': 31, 'anova': 2, 'search': 5, 'automation': 4, 'meta-learning': 3, 'bayesian': 40, 'ann': 2, 'efficiency': 2, 'octave': 4, 'imbalanced-learn': 21, 'ab-test': 6, 'market-basket-analysis': 12, 'topic-model': 31, 'naive-bayes-classifier': 42, 'torch': 4, 'matrix': 22, 'feature-extraction': 87, 'data-leakage': 8, 'generative-models': 46, 'siamese': 1, 'prediction': 128, 'nn': 1, 'bayesian-networks': 12, 'sql': 29, 'q-learning': 37, 'discriminant-analysis': 5, 'word-embeddings': 117, 'neural': 16, 'batch-normalization': 29, 'google-cloud': 1, 'categories': 2, 'inception': 10, 'linear-regression': 175, 'feature-selection': 209, 'counts': 3, 'kernel': 27, 'helmert-coding': 1, 'pipelines': 17, 'etl': 6, 'pac-learning': 6, 'deep-learning': 1220, 'spyder': 1, 'random-forest': 159, 'bias': 19, 'implementation': 9, 'multivariate-distribution': 1, 'sematic-similarity': 2, 'jaccard-coefficient': 4, 'counter-inference': 1, 'text': 41, 'sequential-pattern-mining': 17, 'javascript': 8, 'gaussian': 20, 'grid-search': 35, 'anonymization': 3, 'impala': 1, 'mutual-information': 5, 'monte-carlo': 15, 'clustering': 257, 'stemming': 2, 'historgram': 7, 'spacy': 20, 'marginal-effects': 1, 'orange3': 20, 'correlation': 80, 'class-imbalance': 73, 'gensim': 36, 'openai-gpt': 2, 'mse': 8, 'pytorch-geometric': 2, 'descriptive-statistics': 21, 'parquet': 1, 'manhattan': 3, 'k-means': 81, 'dummy-variables': 19, 'missing-data': 43, 'activation': 1, 'bayes-error': 1, 'pip': 4, 'dynamic-programming': 3, 'one-shot-learning': 2, 'paperspace': 1, 'gru': 1, 'unseen-data': 1, 'metadata': 2, 'dump': 1, 'matrix-factorisation': 24, 'dbscan': 18, 'policy-gradients': 27, 'geospatial': 27, 'named-entity-recognition': 36, 'adaboost': 1, 'cosine-distance': 21, 'distribution': 57, 'non-convex': 1, 'predictive-modeling': 265, 'classifier': 18, 'forecast': 34, 'noisification': 1, 'keras': 935, 'markov': 4, 'dataframe': 81, 'model-selection': 58, 'processing': 5, 'dialog-flow': 2, 'project-planning': 6, 'opencv': 39, 'recurrent-neural-net': 91, 'infographics': 2, 'transformer': 45, 'google': 17, 'glm': 3, 'hurdle-model': 1, 'genetic-programming': 2, 'data-transfer': 1, 'books': 7, 'competitions': 2, 'distance': 44, 'vector-space-models': 7, 'hyperparameter': 42, 'rbm': 4, 'doc2vec': 3, 'data-augmentation': 24, 'annotation': 12, 'heatmap': 9, 'aws': 20, 'vae': 14, 'probabilistic-programming': 9, 'association-rules': 19, 'reshape': 9, 'numerical': 6, 'relational-dbms': 7, 'dimensionality-reduction': 69, 'parameter': 5, 'fuzzy-classification': 3, 'data-science-model': 186, 'networkx': 2, 'databases': 29, 'open-set': 2, 'movielens': 2, 'twitter': 8, 'recommender-system': 103, 'expectation-maximization': 5, 'sampling': 38, 'indexing': 6, 'reinforcement-learning': 203, 'experiments': 3, 'survival-analysis': 10, 'r': 268, 'homework': 4, 'computer-vision': 121, 'crawling': 3, 'pattern-recognition': 1, 'csv': 27, 'lasso': 8, 'learning': 10, 'finance': 17, 'lightgbm': 23, 'gbm': 10, 'ml': 7, 'sequence': 25, 'activity-recognition': 5, 'image-preprocessing': 67, 'keras-rl': 6, 'social-network-analysis': 11, 'wolfram-language': 3, 'career': 9, 'optimization': 124, 'boosting': 49, 'gridsearchcv': 28, 'smote': 27, 'epochs': 11, 'sports': 3, 'text-mining': 113, 'encoding': 54, 'data-cleaning': 157, 'word2vec': 88, 'embeddings': 44, 'scipy': 40, 'gradient-descent': 98, 'sparsity': 2, 'unsupervised-learning': 110, 'web-scrapping': 8, 'software-development': 2, 'learning-rate': 8, 'ensemble-modeling': 30, 'regularization': 50, 'xgboost': 165, 'mini-batch-gradient-descent': 10, 'word': 2, 'dqn': 36, 'cnn': 489, 'interpolation': 6, 'google-prediction-api': 2, 'aggregation': 12, 'hardware': 12, 'numpy': 117, 'research': 11, 'ndcg': 5, 'spearmans-rank-correlation': 1, 'softmax': 24, 'outlier': 48, 'arima': 11, 'feature-reduction': 4, 'kitti-dataset': 1, 'variance': 35, 'parallel': 8, 'hyperparameter-tuning': 59, 'anomaly-detection': 92, 'corpus': 1, 'probability-calibration': 11, 'pickle': 9, 'linear-algebra': 24, 'groupby': 2, 'data-product': 3, 'question-answering': 4, 'language-model': 25, 'object-recognition': 14, 'statsmodels': 1, 'pgm': 1, 'objective-function': 4, 'alex-net': 5, 'features': 32, 'reference-request': 18, 'version-control': 1, 'encoder': 1, 'sentiment-analysis': 37, 'library': 2, 'autoencoder': 106, 'cost-function': 25, 'python': 1814, 'least-squares-svm': 1, 'data-stream-mining': 4, 'structured-data': 5, 'confusion-matrix': 27, 'classification': 685, 'data-mining': 217, 'overfitting': 69, 'rnn': 149, 'momentum': 3, 'collinearity': 6, 'nlp': 493, 'audio-recognition': 25, 'vc-theory': 5, 'manifold': 1, 'anaconda': 20, 'machine-learning-model': 224, 'pooling': 4, 'simulation': 11, 'neural-network': 1055, 'image-size': 6, 'julia': 2, 'image-recognition': 86, 'image-segmentation': 3, 'graphs': 47, 'supervised-learning': 82, 'notation': 4, 'deepmind': 7, 'openai-gym': 17, 'explainable-ai': 10, 'logistic-regression': 154, 'time-series': 466, 'vgg16': 21, 'tableau': 9, 'deep-network': 29, 'image-classification': 211, 'information-retrieval': 32, 'sequence-to-sequence': 35, 'svm': 136, 'graphical-model': 3, 'nlg': 9, 'faster-rcnn': 38, 'one-hot-encoding': 4, 'linux': 5, 'decision-trees': 145, 'data-formats': 9, 'usecase': 2, 'neural-style-transfer': 8, 'weka': 19, 'bigdata': 95, 'regression': 347, 'similar-documents': 20, 'pearsons-correlation-coefficient': 2, 'cause-effect-relations': 1, 'randomized-algorithms': 6, 'convolution': 103, 'evaluation': 66, 'markov-hidden-model': 13, 'scikit-learn': 540, 'history': 1, 'non-parametric': 3, 'hog': 1, 'java': 14, 'fastai': 6, 'gpu': 42, 'multiclass-classification': 131, 'svr': 5, 'noise': 17, 'nl2sql': 1, 'self-driving': 3, 'causalimpact': 2, 'loss-function': 161, 'perceptron': 26, 'weight-initialization': 12, 'marketing': 6, 'open-source': 1, 'stanford-nlp': 9, 'frequentist': 1, 'automl': 2, 'normal-equation': 1, 'rbf': 5, 'representation': 9, 'pytorch': 175, 'active-learning': 4, 'dropout': 15, 'forecasting': 85, '3d-object-detection': 1, 'machine-translation': 28, 'regex': 8, 'matplotlib': 77, 'predict': 3, 'data-analysis': 71, 'markov-process': 14, 'mlp': 34, 'machine-learning': 2693, 'scoring': 12, 'hinge-loss': 7, 'object-detection': 109, 'exploitation': 1, 'data-indexing-techniques': 1, 'privacy': 6, 'attention-mechanism': 26, 'data-wrangling': 15, 'distributed': 7, 'pyspark': 40, 'theano': 4, 'text-classification': 1, 'churn': 15, 'data': 213, 'training': 148, 'knime': 1, 'anomaly': 4, 'score': 14, 'finite-precision': 2, 'theory': 11, 'excel': 24, 'cross-validation': 139, 'apache-nifi': 1, '3d-reconstruction': 9, 'clusters': 10, 'unbalanced-classes': 42, 'algorithms': 68, 'feature-engineering': 163, 'powerbi': 10, 'convergence': 17, 'categorical-data': 81, 'rmse': 1, 'gaussian-process': 12, 'parsing': 3, 'matlab': 62, 'error-handling': 17, 'plotting': 32, 'ensemble': 7, 'nosql': 3, 'state-of-the-art': 1, 'dataset': 340, 'categorical-encoding': 3, 'libsvm': 1, 'genetic-algorithms': 16, 'ensemble-learning': 11, 'text-filter': 2, 'mongodb': 2, 'label-smoothing': 1, 'hive': 2, 'evolutionary-algorithms': 11, 'predictor-importance': 9, 'speech-to-text': 8, 'self-study': 8, 'apache-spark': 35, 'feature-scaling': 59, 'wikipedia': 1, '.net': 1, 'labels': 28, 'coursera': 3, 'apache-hadoop': 13, 'information-theory': 9, 'game': 7, 'caffe': 7, 'python-3.x': 13, 'lda': 27, 'map-reduce': 3, 'james-stein-encoder': 1, 'sagemaker': 8, 'dplyr': 6, 'backpropagation': 65, 'accuracy': 89, 'statistics': 234, 'ggplot2': 3, 'fuzzy-logic': 13, 'bert': 64, 'ibm-watson': 1, 'visualization': 126, 'tensorflow': 584, 'auc': 3, 'pathfinder': 1, 'feature-construction': 16, 'allennlp': 2, 'ranking': 22, 'programming': 7, 'sas': 6, 'spss': 2, 'metric': 60, 'feature-map': 2, 'definitions': 4, 'lda-classifier': 1, 'aws-lambda': 2, 'image': 32, 'goss': 1, 'natural-language-process': 124, 'difference': 5, 'h2o': 4, 'data-imputation': 16, 'mcmc': 4, 'ridge-regression': 7, 'cs231n': 1, 'ngrams': 7, 'transfer-learning': 69, 'json': 10, 'domain-adaptation': 3, 'multi-instance-learning': 2, 'binary': 26, 'proximal-svm': 1, 'rdkit': 1, 'preprocessing': 120, 'huggingface': 2, 'jupyter': 41, 'pruning': 3, 'performance': 27, 'stacked-lstm': 7, 'rstudio': 15, 'tokenization': 6, 'finetuning': 7, 'c++': 1, 'search-engine': 4, 'activation-function': 44, 'k-nn': 50, 'hierarchical-data-format': 7, 'online-learning': 13, 'scala': 9, 'methods': 4, 'ocr': 26, 'automatic-summarization': 10, 'inceptionresnetv2': 6, 'semi-supervised-learning': 18}

In [15]:

no_of_tags = pd.DataFrame.from_dict(num_tags, orient='index')
print(no_of_tags.head())

                            0
pandas                    354
scalability                 4
probability                76
chatbot                    14
kendalls-tau-coefficient    1

In [16]:

times_tag_used = no_of_tags.sort_values([0])#times_tag_used is no_of_tags, 
                                            #I have just renamed it
print(times_tag_used)                        

                              0
exploitation                  1
apache-nifi                   1
haar-cascade                  1
nl2sql                        1
cs231n                        1
spyder                        1
multivariate-distribution     1
noisification                 1
counter-inference             1
statsmodels                   1
james-stein-encoder           1
state-of-the-art              1
pathfinder                    1
non-convex                    1
impala                        1
pattern-recognition           1
summarunner-architecture      1
pgm                           1
helmert-coding                1
amazon-ml                     1
data-transfer                 1
proximal-svm                  1
knime                         1
siamese                       1
history                       1
cause-effect-relations        1
nn                            1
adaboost                      1
least-squares-svm             1
hog                           1
...                         ...
feature-engineering         163
xgboost                     165
linear-regression           175
pytorch                     175
data-science-model          186
reinforcement-learning      203
feature-selection           209
image-classification        211
data                        213
data-mining                 217
machine-learning-model      224
statistics                  234
clustering                  257
predictive-modeling         265
r                           268
dataset                     340
regression                  347
pandas                      354
lstm                        402
time-series                 466
cnn                         489
nlp                         493
scikit-learn                540
tensorflow                  584
classification              685
keras                       935
neural-network             1055
deep-learning              1220
python                     1814
machine-learning           2693

[526 rows x 1 columns]

In [17]:

top_times_tags_used = times_tag_used.tail(25)
top_times_tags_used

Out[17]:

	0
reinforcement-learning	203
feature-selection	209
image-classification	211
data	213
data-mining	217
machine-learning-model	224
statistics	234
clustering	257
predictive-modeling	265
r	268
dataset	340
regression	347
pandas	354
lstm	402
time-series	466
cnn	489
nlp	493
scikit-learn	540
tensorflow	584
classification	685
keras	935
neural-network	1055
deep-learning	1220
python	1814
machine-learning	2693

The above are the top most used tags. Let us see how this looks graphically.

In [38]:

import matplotlib.pyplot as plt
from numpy import arange
%magic inline

fig, ax = plt.subplots()
bar_heights = top_times_tags_used.iloc[0].values
bar_positions = arange(25) + 0.25
ax.bar(bar_positions, bar_heights, 0.5)


plt.show()


ValueErrorTraceback (most recent call last)
<ipython-input-38-51e02f659295> in <module>()
      6 bar_heights = top_times_tags_used.iloc[0].values
      7 bar_positions = arange(25) + 0.25
----> 8 ax.bar(bar_positions, bar_heights, 0.5)
      9 
     10 

/dataquest/system/env/python3/lib/python3.4/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1810                     warnings.warn(msg % (label_namer, func.__name__),
   1811                                   RuntimeWarning, stacklevel=2)
-> 1812             return func(ax, *args, **kwargs)
   1813         pre_doc = inner.__doc__
   1814         if pre_doc is None:

/dataquest/system/env/python3/lib/python3.4/site-packages/matplotlib/axes/_axes.py in bar(self, left, height, width, bottom, **kwargs)
   2078         if len(height) != nbars:
   2079             raise ValueError("incompatible sizes: argument 'height' "
-> 2080                               "must be length %d or scalar" % nbars)
   2081         if len(width) != nbars:
   2082             raise ValueError("incompatible sizes: argument 'width' "

ValueError: incompatible sizes: argument 'height' must be length 25 or scalar

In [ ]:

tag = []
times_used = []
for row in top_times_tags_used:
    tag.append(row[0])
    times_used.append(row[1])
    
plt.barh(top_times_tags_used[0], top_times_tags_used[1])
plt.title("Times Each Tag was Used")
plt.xlabel("Times Tag Used")

In [18]:

num_views = {}
for views in questions['ViewCount']:
    if views in num_views:
            num_views[views] += 1
    else:
            num_views[views] = 1
print(num_views)

{2: 1, 3: 3, 4: 14, 5: 29, 6: 49, 7: 60, 8: 82, 9: 107, 10: 111, 11: 120, 12: 124, 13: 166, 14: 165, 15: 157, 16: 153, 17: 162, 18: 169, 19: 168, 20: 168, 21: 164, 22: 165, 23: 149, 24: 132, 25: 149, 26: 136, 27: 138, 28: 145, 29: 139, 30: 142, 31: 111, 32: 124, 33: 102, 34: 123, 35: 97, 36: 108, 37: 107, 38: 71, 39: 101, 40: 72, 41: 98, 42: 95, 43: 84, 44: 64, 45: 56, 46: 66, 47: 46, 48: 64, 49: 48, 50: 52, 51: 67, 52: 54, 53: 50, 54: 60, 55: 60, 56: 55, 57: 43, 58: 48, 59: 42, 60: 38, 61: 44, 62: 36, 63: 49, 64: 30, 65: 33, 66: 31, 67: 34, 68: 20, 69: 39, 70: 31, 71: 32, 72: 38, 73: 30, 74: 26, 75: 29, 76: 24, 77: 29, 78: 24, 79: 14, 80: 22, 81: 25, 82: 19, 83: 24, 84: 26, 85: 15, 86: 19, 87: 22, 88: 21, 89: 14, 90: 20, 91: 18, 92: 26, 93: 24, 94: 17, 95: 15, 96: 16, 97: 9, 98: 16, 99: 24, 100: 19, 101: 19, 102: 7, 103: 14, 104: 19, 105: 17, 106: 16, 107: 10, 108: 15, 109: 14, 110: 15, 111: 6, 112: 13, 113: 12, 114: 16, 115: 8, 116: 12, 117: 14, 118: 11, 119: 14, 120: 11, 121: 3, 122: 7, 123: 18, 124: 15, 125: 10, 126: 9, 127: 12, 128: 12, 129: 14, 130: 12, 131: 10, 132: 10, 133: 11, 134: 12, 135: 10, 136: 11, 137: 6, 138: 8, 139: 9, 140: 11, 141: 11, 142: 9, 143: 9, 144: 9, 145: 5, 146: 10, 147: 12, 148: 10, 149: 9, 150: 7, 151: 7, 152: 10, 153: 4, 154: 11, 155: 7, 156: 6, 157: 12, 158: 8, 159: 12, 160: 7, 161: 10, 162: 10, 163: 12, 164: 11, 165: 2, 166: 7, 167: 9, 168: 5, 169: 6, 170: 9, 171: 14, 172: 6, 173: 7, 174: 4, 2077: 1, 176: 6, 177: 8, 178: 6, 179: 7, 180: 3, 181: 6, 182: 3, 183: 6, 184: 5, 185: 3, 186: 4, 187: 8, 188: 3, 189: 6, 190: 8, 191: 6, 192: 6, 193: 10, 194: 5, 195: 7, 196: 7, 197: 3, 198: 5, 4295: 1, 200: 9, 201: 6, 202: 8, 203: 3, 204: 6, 205: 2, 206: 6, 207: 1, 208: 2, 209: 4, 210: 6, 211: 3, 212: 6, 213: 3, 214: 4, 215: 5, 216: 6, 217: 3, 218: 3, 219: 2, 220: 2, 221: 1, 222: 6, 223: 4, 224: 2, 225: 6, 226: 3, 227: 6, 229: 7, 230: 7, 231: 6, 232: 5, 233: 2, 234: 8, 235: 7, 236: 8, 237: 5, 238: 3, 239: 3, 240: 4, 241: 2, 242: 8, 243: 3, 244: 5, 245: 2, 246: 4, 247: 5, 248: 3, 249: 6, 250: 5, 251: 1, 252: 3, 253: 4, 254: 1, 255: 8, 256: 4, 257: 2, 258: 4, 259: 1, 260: 5, 261: 3, 262: 4, 263: 2, 264: 7, 265: 4, 266: 7, 267: 3, 268: 4, 269: 2, 270: 2, 271: 1, 272: 3, 273: 2, 274: 7, 275: 3, 276: 1, 277: 4, 278: 2, 279: 4, 2328: 1, 281: 2, 282: 3, 283: 9, 284: 3, 285: 2, 286: 3, 287: 4, 288: 6, 2337: 1, 290: 3, 291: 3, 292: 2, 293: 3, 294: 1, 295: 2, 296: 1, 297: 1, 298: 9, 299: 3, 300: 5, 301: 1, 302: 2, 303: 2, 304: 3, 305: 2, 306: 3, 307: 2, 308: 1, 309: 2, 310: 3, 311: 2, 312: 2, 313: 6, 4410: 1, 315: 1, 316: 2, 317: 4, 319: 2, 2369: 1, 322: 2, 324: 4, 327: 1, 328: 3, 329: 1, 330: 1, 331: 1, 332: 2, 334: 2, 335: 2, 336: 2, 337: 5, 339: 2, 341: 2, 342: 3, 344: 2, 345: 2, 346: 4, 347: 1, 348: 2, 349: 1, 350: 1, 351: 2, 352: 2, 353: 2, 354: 3, 355: 1, 356: 1, 10597: 1, 358: 1, 2407: 1, 360: 5, 361: 2, 362: 1, 363: 2, 364: 3, 365: 2, 366: 3, 367: 4, 368: 3, 370: 3, 373: 3, 375: 1, 376: 5, 377: 1, 378: 3, 379: 2, 380: 2, 381: 1, 382: 2, 384: 2, 490: 1, 386: 2, 387: 2, 388: 1, 389: 4, 390: 2, 392: 3, 393: 5, 394: 1, 395: 1, 396: 1, 397: 2, 398: 2, 399: 2, 400: 1, 401: 1, 402: 1, 403: 1, 405: 2, 406: 2, 408: 2, 409: 1, 410: 5, 412: 2, 4509: 1, 414: 2, 416: 2, 417: 2, 418: 2, 2467: 1, 420: 2, 421: 3, 422: 1, 423: 1, 424: 1, 425: 2, 426: 2, 428: 1, 429: 1, 413: 1, 432: 2, 6577: 1, 434: 1, 33203: 1, 437: 1, 438: 2, 2487: 1, 441: 2, 442: 4, 444: 3, 445: 2, 446: 1, 448: 4, 449: 1, 450: 1, 452: 2, 2501: 1, 454: 3, 4551: 1, 456: 4, 457: 4, 458: 1, 460: 1, 461: 3, 462: 1, 463: 3, 464: 2, 465: 1, 466: 1, 467: 1, 469: 4, 471: 1, 472: 1, 473: 1, 474: 1, 475: 1, 476: 2, 478: 1, 479: 1, 480: 1, 481: 3, 482: 2, 483: 1, 484: 1, 485: 2, 4582: 1, 357: 2, 488: 1, 489: 3, 2538: 1, 491: 3, 492: 4, 493: 1, 494: 1, 495: 1, 496: 2, 497: 2, 502: 1, 503: 1, 505: 1, 506: 1, 507: 1, 508: 1, 509: 3, 510: 1, 511: 1, 512: 2, 513: 3, 514: 1, 517: 3, 518: 2, 520: 1, 521: 2, 522: 1, 524: 2, 527: 1, 530: 1, 531: 1, 532: 1, 533: 1, 534: 1, 4136: 1, 537: 1, 538: 1, 540: 1, 543: 2, 359: 2, 550: 1, 551: 2, 552: 2, 553: 1, 554: 1, 12847: 1, 560: 1, 561: 1, 2610: 1, 563: 1, 564: 1, 565: 3, 567: 3, 574: 1, 576: 1, 577: 1, 2626: 1, 579: 1, 580: 2, 582: 2, 583: 1, 584: 3, 586: 2, 587: 1, 588: 1, 591: 2, 597: 2, 599: 3, 2648: 1, 603: 1, 607: 2, 609: 1, 610: 1, 612: 1, 613: 1, 616: 1, 625: 1, 627: 1, 628: 1, 629: 2, 632: 1, 4729: 1, 635: 2, 637: 1, 638: 1, 640: 1, 642: 2, 2155: 1, 2692: 1, 4745: 1, 650: 1, 651: 1, 653: 1, 658: 2, 659: 1, 660: 4, 661: 2, 664: 1, 665: 1, 667: 1, 1045: 1, 669: 1, 670: 2, 671: 4, 673: 3, 674: 1, 675: 1, 2724: 1, 677: 1, 679: 3, 682: 2, 683: 1, 689: 2, 691: 1, 692: 2, 693: 1, 694: 2, 2505: 1, 697: 2, 698: 1, 2069: 1, 701: 1, 702: 3, 704: 1, 708: 1, 710: 2, 801: 2, 713: 1, 714: 1, 715: 1, 718: 3, 724: 1, 726: 2, 727: 1, 728: 2, 729: 1, 2778: 1, 738: 2, 742: 1, 747: 2, 748: 2, 749: 2, 751: 1, 752: 2, 3210: 1, 2174: 1, 758: 2, 759: 1, 2412: 1, 6910: 1, 770: 1, 774: 1, 778: 1, 783: 1, 784: 1, 8977: 1, 788: 1, 789: 1, 791: 1, 793: 2, 794: 1, 796: 2, 2849: 1, 802: 2, 805: 2, 807: 1, 2857: 1, 815: 2, 2864: 1, 817: 1, 822: 1, 825: 2, 827: 1, 829: 3, 839: 2, 840: 1, 843: 1, 1165: 1, 848: 2, 849: 1, 850: 2, 851: 1, 4950: 1, 856: 1, 857: 1, 858: 1, 859: 2, 860: 2, 861: 1, 865: 1, 867: 1, 2916: 1, 486: 1, 870: 1, 876: 1, 877: 1, 711: 1, 879: 1, 11122: 1, 883: 1, 888: 2, 890: 1, 891: 1, 893: 2, 11136: 1, 898: 1, 439: 2, 903: 1, 907: 1, 908: 1, 910: 1, 916: 1, 918: 1, 924: 1, 925: 1, 927: 1, 928: 1, 930: 1, 931: 1, 932: 1, 933: 1, 937: 1, 939: 1, 5040: 1, 945: 1, 946: 1, 950: 2, 951: 1, 956: 1, 958: 1, 960: 1, 963: 1, 964: 2, 968: 2, 969: 1, 970: 1, 971: 2, 3021: 1, 984: 1, 987: 1, 989: 1, 991: 1, 993: 1, 1000: 1, 1002: 1, 1003: 1, 1005: 2, 9209: 1, 1019: 1, 1195: 1, 1029: 1, 1036: 1, 1038: 1, 3093: 1, 1047: 1, 1049: 1, 1050: 1, 175: 5, 1053: 1, 1058: 1, 1060: 1, 1061: 1, 1066: 1, 1067: 1, 1885: 1, 1073: 1, 1074: 1, 1075: 1, 1078: 1, 1079: 1, 1081: 1, 1082: 1, 8373: 1, 1089: 1, 1092: 1, 1096: 1, 1101: 1, 1105: 1, 1112: 1, 1113: 4, 1114: 1, 1115: 1, 1117: 1, 1211: 1, 1128: 1, 1129: 1, 1130: 1, 7278: 1, 1135: 1, 2579: 1, 7284: 1, 1141: 1, 1154: 1, 1155: 1, 1156: 1, 3207: 1, 1162: 1, 1164: 1, 3213: 1, 1169: 1, 1170: 1, 1178: 1, 3229: 1, 1182: 1, 1190: 1, 1191: 1, 199: 5, 1197: 2, 4296: 1, 1210: 1, 2086: 1, 1212: 2, 2250: 1, 1218: 1, 1219: 1, 1225: 1, 1228: 1, 1229: 1, 7374: 1, 547: 1, 1235: 1, 1230: 1, 1239: 1, 1251: 1, 1261: 1, 1269: 1, 1274: 2, 1283: 1, 1286: 1, 1288: 2, 1289: 1, 3341: 1, 1295: 1, 3358: 1, 453: 1, 1324: 1, 1330: 1, 1335: 1, 1338: 1, 2271: 1, 1352: 1, 1354: 1, 1358: 1, 1360: 1, 1362: 1, 1364: 2, 1370: 1, 1374: 1, 1377: 1, 1378: 1, 7523: 1, 1397: 1, 1406: 1, 1407: 1, 3458: 1, 1416: 1, 1418: 1, 3470: 1, 1423: 2, 1428: 1, 3478: 1, 28060: 1, 1444: 1, 730: 2, 1450: 1, 28079: 1, 2292: 1, 3519: 1, 7620: 1, 1477: 1, 2095: 1, 1483: 1, 1487: 1, 5587: 1, 1506: 1, 3556: 1, 1516: 1, 1518: 1, 1541: 1, 1544: 1, 1545: 1, 1557: 1, 3610: 1, 3622: 1, 5672: 1, 1577: 1, 1582: 1, 1585: 1, 3636: 1, 1592: 1, 1598: 1, 1602: 1, 6191: 1, 1615: 1, 1617: 1, 1621: 1, 1625: 1, 6328: 1, 4146: 1, 1651: 1, 1659: 1, 1666: 1, 1676: 1, 1986: 1, 280: 2, 1691: 1, 8474: 1, 5795: 1, 1704: 1, 1708: 1, 1715: 1, 2078: 1, 1752: 1, 3820: 1, 1779: 1, 1787: 1, 5894: 1, 5895: 1, 1801: 1, 3854: 1, 1810: 1, 1813: 1, 1826: 1, 3875: 1, 1828: 1, 1833: 1, 3882: 1, 1842: 1, 649: 1, 2356: 1, 8012: 1, 1871: 1, 1878: 1, 314: 2, 2364: 1, 1900: 1, 2450: 1, 1907: 1, 1914: 1, 1916: 1, 3966: 1, 1919: 2, 3970: 1, 1935: 1, 3991: 1, 1960: 1, 1966: 1, 1969: 1, 1977: 1, 1979: 1, 816: 2, 695: 1, 1987: 1, 4045: 1, 2000: 1, 2010: 1, 2044: 1}

In [19]:

no_of_views = pd.DataFrame.from_dict(num_views, orient='index')
no_of_views.rename(columns={0: "Times Tag Viewed"}, inplace=True)
print(no_of_views)

      Times Tag Viewed
2                    1
3                    3
4                   14
5                   29
6                   49
7                   60
8                   82
9                  107
10                 111
11                 120
12                 124
13                 166
14                 165
15                 157
16                 153
17                 162
18                 169
19                 168
20                 168
21                 164
22                 165
23                 149
24                 132
25                 149
26                 136
27                 138
28                 145
29                 139
30                 142
31                 111
...                ...
1842                 1
649                  1
2356                 1
8012                 1
1871                 1
1878                 1
314                  2
2364                 1
1900                 1
2450                 1
1907                 1
1914                 1
1916                 1
3966                 1
1919                 2
3970                 1
1935                 1
3991                 1
1960                 1
1966                 1
1969                 1
1977                 1
1979                 1
816                  2
695                  1
1987                 1
4045                 1
2000                 1
2010                 1
2044                 1

[912 rows x 1 columns]

In [ ]: