1. Install NLU¶

In [1]:

!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
import nlu

--2022-04-15 11:39:40--  https://setup.johnsnowlabs.com/nlu/colab.sh
Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh [following]
--2022-04-15 11:39:40--  https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1665 (1.6K) [text/plain]
Saving to: ‘STDOUT’

-                     0%[                    ]       0  --.-KB/s               Installing  NLU 3.4.3rc2 with  PySpark 3.0.3 and Spark NLP 3.4.2 for Google Colab ...
-                   100%[===================>]   1.63K  --.-KB/s    in 0.001s  

2022-04-15 11:39:41 (1.67 MB/s) - written to stdout [1665/1665]

Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [696 B]
Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release.gpg [836 B]
Get:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Hit:11 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
Get:12 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:13 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease [15.9 kB]
Hit:15 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Get:16 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Packages [953 kB]
Get:17 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main Sources [1,947 kB]
Get:18 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1,490 kB]
Get:19 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3,134 kB]
Get:20 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2,695 kB]
Get:21 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main amd64 Packages [996 kB]
Get:22 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2,268 kB]
Get:23 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic/main amd64 Packages [45.3 kB]
Fetched 13.8 MB in 4s (3,847 kB/s)
Reading package lists... Done
tar: spark-3.0.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
     |████████████████████████████████| 209.1 MB 60 kB/s 
     |████████████████████████████████| 142 kB 53.1 MB/s 
     |████████████████████████████████| 505 kB 58.6 MB/s 
     |████████████████████████████████| 198 kB 55.2 MB/s 
  Building wheel for pyspark (setup.py) ... done
Collecting nlu_tmp==3.4.3rc10
  Downloading nlu_tmp-3.4.3rc10-py3-none-any.whl (510 kB)
     |████████████████████████████████| 510 kB 5.1 MB/s 
Requirement already satisfied: spark-nlp<3.5.0,>=3.4.2 in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (3.4.2)
Requirement already satisfied: pandas>=1.3.5 in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (1.3.5)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (1.21.5)
Requirement already satisfied: pyarrow>=0.16.0 in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (6.0.1)
Requirement already satisfied: dataclasses in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (0.6)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.3.5->nlu_tmp==3.4.3rc10) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.3.5->nlu_tmp==3.4.3rc10) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=1.3.5->nlu_tmp==3.4.3rc10) (1.15.0)
Installing collected packages: nlu-tmp
Successfully installed nlu-tmp-3.4.3rc10

Download dataset with major news about crypto currencies.¶

We will use the 'title' column for our examples¶

https://www.kaggle.com/kashnitsky/news-about-major-cryptocurrencies-20132018-40k

Crypto

In [2]:

import pandas as pd 
import nlu
!wget http://ckl-it.de/wp-content/uploads/2020/12/small_btc.csv 
df = pd.read_csv('/content/small_btc.csv').title
df

--2022-04-15 11:41:26--  http://ckl-it.de/wp-content/uploads/2020/12/small_btc.csv
Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209
Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22244914 (21M) [text/csv]
Saving to: ‘small_btc.csv’

small_btc.csv       100%[===================>]  21.21M  14.6MB/s    in 1.4s    

2022-04-15 11:41:28 (14.6 MB/s) - ‘small_btc.csv’ saved [22244914/22244914]

Out[2]:

0          Bitcoin Price Update: Will China Lead us Down?
1       Key Bitcoin Price Levels for Week 51 (15 – 22 ...
2       National Australia Bank, Citing Highly Flawed ...
3       Chinese Bitcoin Ban Driven by  Chinese Banking...
4                   Bitcoin Trade Update: Opened Position
                              ...                        
1995    Bitcoin Bill Pay Company Living Room of Satosh...
1996    NYDFS Extends BitLicense Bitcoin Regulation Co...
1997    Bitfinex Passes Stefan Thomas’s Proof Of Solve...
1998    Cryptocurrency Exchange Platform AlphaPoint Pa...
1999    Want to Buy And Sell Bitcoin Fast and Secure? ...
Name: title, Length: 2000, dtype: object

Predict sentiment of News Article titles¶

In [3]:

import nlu
# Predict sentiment on dataset with NLU sentiment model
sentiment_df = nlu.load('emotion').predict(df)
sentiment_df

classifierdl_use_emotion download started this may take some time.
Approximate size to download 21.3 MB
[OK!]
tfhub_use download started this may take some time.
Approximate size to download 923.7 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]

Out[3]:

	emotion	emotion_confidence_confidence	sentence	sentence_embedding_use
0	fear	0.998173	Bitcoin Price Update: Will China Lead us Down?	[0.05829371139407158, -0.036904484033584595, -...
1	joy	0.997696	Key Bitcoin Price Levels for Week 51 (15 – 22 ...	[0.038088250905275345, -0.04514157399535179, -...
2	fear	0.999997	National Australia Bank, Citing Highly Flawed ...	[0.05034318566322327, -0.01303655095398426, -0...
3	fear	0.999135	Chinese Bitcoin Ban Driven by Chinese Banking ...	[0.055152829736471176, -0.05237917602062225, -...
4	joy	0.998864	Bitcoin Trade Update: Opened Position	[0.05926975607872009, -0.056463420391082764, -...
...	...	...	...	...
1996	fear	0.998281	NYDFS Extends BitLicense Bitcoin Regulation Co...	[0.0639236643910408, -0.05505230277776718, -0....
1997	fear	0.772052	Bitfinex Passes Stefan Thomas’s Proof Of Solve...	[0.059178080409765244, -0.041498005390167236, ...
1998	joy	0.999348	Cryptocurrency Exchange Platform AlphaPoint Pa...	[0.05369672179222107, -0.023480931296944618, -...
1999	fear	0.998905	Want to Buy And Sell Bitcoin Fast and Secure?	[0.0626637190580368, -0.05945301055908203, -0....
1999	fear	0.998905	Try CoinRNR	[0.02854502573609352, 0.05557611957192421, 0.0...

2160 rows × 4 columns

Plot sentiment distribution¶

In [4]:

sentiment_df.emotion.value_counts().plot.bar(figsize=(20,14), title = 'Emotion Distribution of Bitcoin News Articles')

Out[4]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e1d797150>

Predict keywords occuring in dataset with YAKE¶

In [5]:

key_df = nlu.load('yake').predict(df)
key_df

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]

Out[5]:

	document	keywords	keywords_confidence
0	Bitcoin Price Update: Will China Lead us Down?	update	0.5798862558280943
0	Bitcoin Price Update: Will China Lead us Down?	china	0.5798862558280943
0	Bitcoin Price Update: Will China Lead us Down?	china lead	0.5066323531331214
1	Key Bitcoin Price Levels for Week 51 (15 – 22 ...	price	0.5798862558280943
1	Key Bitcoin Price Levels for Week 51 (15 – 22 ...	levels	0.5798862558280943
...	...	...	...
1998	Cryptocurrency Exchange Platform AlphaPoint Pa...	growth	0.26804494089513314
1998	Cryptocurrency Exchange Platform AlphaPoint Pa...	support growth	0.1840422979793308
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	bitcoin fast	0.3579604335906263
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	try coinrnr	0.2564243599387429
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	sell bitcoin fast	0.28203029979078753

6085 rows × 3 columns

Plot keyword Distribution¶

You need to call .explode() on the keyword column and then get the count of each keyword

In [7]:

# key_df.explode('keywords_classes').keywords_classes.value_counts()[0:100].plot.bar(title='Top 100 Keywords in Stack Overflow Questions', figsize=(20,8))
key_df.explode('keywords').keywords.value_counts()[0:100].plot.bar(title='Top 100 Keywords in BTC News Articles', figsize=(20,8))

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e1fb3da10>

Stem Data with NLU¶

To reduce dimensionality of the data and yield better results with keyword extraction, we can apply the built in stemmer on our dataset. Especially to merge occurences of termns like bitcoin and bitcoins

Note, Lemmatizing and Normalizing could also applied for further dimension reduction, but they would noch fix the previously mentioned example

In [8]:

stem_df = nlu.load('stem').predict(df, output_level = 'document')
stem_df['stem_string'] = stem_df.stem.str.join(' ')
stem_df

Out[8]:

	document	stem	stem_string
0	Bitcoin Price Update: Will China Lead us Down?	[bitcoin, price, updat, :, will, china, lead, ...	bitcoin price updat : will china lead u down ?
1	Key Bitcoin Price Levels for Week 51 (15 – 22 ...	[kei, bitcoin, price, level, for, week, 51, (,...	kei bitcoin price level for week 51 ( 15 – 22 ...
2	National Australia Bank, Citing Highly Flawed ...	[nation, australia, bank, ,, cite, highli, fla...	nation australia bank , cite highli flawe data...
3	Chinese Bitcoin Ban Driven by Chinese Banking ...	[chines, bitcoin, ban, driven, by, chines, ban...	chines bitcoin ban driven by chines bank crisi ?
4	Bitcoin Trade Update: Opened Position	[bitcoin, trade, updat, :, open, posit]	bitcoin trade updat : open posit
...	...	...	...
1995	Bitcoin Bill Pay Company Living Room of Satosh...	[bitcoin, bill, pai, compani, live, room, of, ...	bitcoin bill pai compani live room of satoshi ...
1996	NYDFS Extends BitLicense Bitcoin Regulation Co...	[nydf, extend, bitlicens, bitcoin, regul, comm...	nydf extend bitlicens bitcoin regul comment pe...
1997	Bitfinex Passes Stefan Thomas’s Proof Of Solve...	[bitfinex, pass, stefan, thomas’, proof, of, s...	bitfinex pass stefan thomas’ proof of solvenc ...
1998	Cryptocurrency Exchange Platform AlphaPoint Pa...	[cryptocurr, exchang, platform, alphapoint, pa...	cryptocurr exchang platform alphapoint partner...
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	[want, to, bui, and, sell, bitcoin, fast, and,...	want to bui and sell bitcoin fast and secur ? ...

2000 rows × 3 columns

Stem+YAKE¶

We can see bitcoins is not a keyword anymore and added to the bitcoin count including a lot of other occurences of Bitcoin in the dataset.

In [9]:

stem_df = nlu.load('yake').predict(stem_df.stem_string)
stem_df.explode('keywords').keywords.value_counts()[0:100].plot.bar(title='Top 100 Keywords in Stack Overflow Questions Lemmatized', figsize=(20,8))

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e1d771cd0>

In [10]:

stem_df.explode('keywords').keywords.value_counts()[1:100].plot.bar(title='Top 100 Keywords in Stack Overflow Questions Lemmatized', figsize=(20,8))

Out[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e1e511990>

Configure Yake to yield longer keywords¶

setNKeywords to increase number of keywords extracted
setMinNGrams Minimum N-grams a keyword should
setMaxNGrams Maximum N-grams a keyword should
setWindowSizeWindow size for Co-Occurrence
setThreshold Keyword Score threshold
setStopWords The words to be filtered out. by default it's english stop words from Spark ML

In [11]:

import nlu

yake_pipe = nlu.load('yake')
yake_pipe.print_info()

The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> component_list['yake_keyword_extraction'] has settable params:
component_list['yake_keyword_extraction'].setMinNGrams(1)      | Info: Minimum N-grams a keyword should have | Currently set to : 1
component_list['yake_keyword_extraction'].setMaxNGrams(3)      | Info: Maximum N-grams a keyword should have | Currently set to : 3
component_list['yake_keyword_extraction'].setNKeywords(3)      | Info: Number of Keywords to extract | Currently set to : 3
component_list['yake_keyword_extraction'].setWindowSize(3)     | Info: Window size for Co-Occurrence | Currently set to : 3
component_list['yake_keyword_extraction'].setThreshold(-1.0)   | Info: Keyword Score threshold | Currently set to : -1.0
component_list['yake_keyword_extraction'].setStopWords(['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', "i'll", "you'll", "he'll", "she'll", "we'll", "they'll", "i'd", "you'd", "he'd", "she'd", "we'd", "they'd", "i'm", "you're", "he's", "she's", "it's", "we're", "they're", "i've", "we've", "you've", "they've", "isn't", "aren't", "wasn't", "weren't", "haven't", "hasn't", "hadn't", "don't", "doesn't", "didn't", "won't", "wouldn't", "shan't", "shouldn't", "mustn't", "can't", "couldn't", 'cannot', 'could', "here's", "how's", "let's", 'ought', "that's", "there's", "what's", "when's", "where's", "who's", "why's", 'would'])  | Info: the words to be filtered out. by default it's english stop words from Spark ML | Currently set to : ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', "i'll", "you'll", "he'll", "she'll", "we'll", "they'll", "i'd", "you'd", "he'd", "she'd", "we'd", "they'd", "i'm", "you're", "he's", "she's", "it's", "we're", "they're", "i've", "we've", "you've", "they've", "isn't", "aren't", "wasn't", "weren't", "haven't", "hasn't", "hadn't", "don't", "doesn't", "didn't", "won't", "wouldn't", "shan't", "shouldn't", "mustn't", "can't", "couldn't", 'cannot', 'could', "here's", "how's", "let's", 'ought', "that's", "there's", "what's", "when's", "where's", "who's", "why's", 'would']
>>> component_list['tokenizer'] has settable params:
component_list['tokenizer'].setTargetPattern('\S+')            | Info: pattern to grab from text as token candidates. Defaults \S+ | Currently set to : \S+
component_list['tokenizer'].setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '"', "'"])  | Info: character list used to separate from token boundaries | Currently set to : ['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '"', "'"]
component_list['tokenizer'].setCaseSensitiveExceptions(True)   | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True
component_list['tokenizer'].setMinLength(0)                    | Info: Set the minimum allowed legth for each token | Currently set to : 0
component_list['tokenizer'].setMaxLength(99999)                | Info: Set the maximum allowed legth for each token | Currently set to : 99999
>>> component_list['document_assembler'] has settable params:
component_list['document_assembler'].setCleanupMode('shrink')  | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink

Extract more keywords¶

In [13]:

yake_pipe['yake_keyword_extraction'].setNKeywords(4)
key_df = yake_pipe.predict(df)
key_df

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]

Out[13]:

	document	keywords	keywords_confidence
0	Bitcoin Price Update: Will China Lead us Down?	update	0.5798862558280943
0	Bitcoin Price Update: Will China Lead us Down?	china	0.5798862558280943
0	Bitcoin Price Update: Will China Lead us Down?	lead	0.5798862558280943
0	Bitcoin Price Update: Will China Lead us Down?	china lead	0.5066323531331214
1	Key Bitcoin Price Levels for Week 51 (15 – 22 ...	price	0.5798862558280943
...	...	...	...
1998	Cryptocurrency Exchange Platform AlphaPoint Pa...	support growth	0.1840422979793308
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	sell bitcoin	0.3579604335906263
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	bitcoin fast	0.3579604335906263
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	try coinrnr	0.2564243599387429
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	sell bitcoin fast	0.28203029979078753

8070 rows × 3 columns

In [14]:

key_df.explode('keywords').keywords.value_counts()[0:100].plot.bar(title='Top 100 Keywords in Stack Overflow Questions', figsize=(20,12))

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e1f5bd350>

Configure NGrams considerd¶

In [15]:

yake_pipe['yake_keyword_extraction'].setMinNGrams(2)
yake_pipe['yake_keyword_extraction'].setMaxNGrams(4)
key_df = yake_pipe.predict(df)
key_df

Out[15]:

	document	keywords	keywords_confidence
0	Bitcoin Price Update: Will China Lead us Down?	bitcoin price	0.7475647452220192
0	Bitcoin Price Update: Will China Lead us Down?	china lead	0.3774989624964526
0	Bitcoin Price Update: Will China Lead us Down?	lead us	0.5619156399368569
0	Bitcoin Price Update: Will China Lead us Down?	china lead us	0.49160495247060043
1	Key Bitcoin Price Levels for Week 51 (15 – 22 ...	key bitcoin	0.7475647452220192
...	...	...	...
1998	Cryptocurrency Exchange Platform AlphaPoint Pa...	bitfinex to support growth	0.3685173882155852
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	sell bitcoin	0.2923195563311814
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	bitcoin fast	0.2923195563311814
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	try coinrnr	0.15815767906792633
1999	Want to Buy And Sell Bitcoin Fast and Secure? ...	sell bitcoin fast	0.20049687371139055

7365 rows × 3 columns

In [16]:

key_df.explode('keywords').keywords.value_counts()[0:100].plot.bar(title='Top 100 Keywords in Stack Overflow Questions', figsize=(20,12))

Out[16]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e1c960c90>

In [ ]: