Open book
and Closed book
question answering with Google's T5¶With the latest NLU release and Google's T5 you can answer general knowledge based questions given no context and in addition answer questions on text databases.
These questions can be asked in natural human language and answerd in just 1 line with NLU!.
open book question
?¶You can imagine an open book
question similar to an examen where you are allowed to bring in text documents or cheat sheets that help you answer questions in an examen. Kinda like bringing a history book to an history examen.
In T5's
terms, this means the model is given a question
and an additional piece of textual information or so called context
.
This enables the T5
model to answer questions on textual datasets like medical records
,newsarticles
, wiki-databases
, stories
and movie scripts
, product descriptions
, 'legal documents' and many more.
You can answer open book question
in 1 line of code, leveraging the latest NLU release and Google's T5.
All it takes is :
nlu.load('answer_question').predict("""
Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards,
and Jebe died on the road back to Samarkand""")
>>> Output: Samarkand
Example for answering medical questions based on medical context
question ='''
What does increased oxygen concentrations in the patient’s lungs displace?
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff.
Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.
Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''
#Predict on text data with T5
nlu.load('answer_question').predict(question)
>>> Output: carbon monoxide
Take a look at this example on a recent news article snippet :
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'
# from https://www.bbc.com/news/business-55728338
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
# join question with context, works with Pandas DF aswell!
questions = [
question1+ news_article_snippet,
question2+ news_article_snippet,
question3+ news_article_snippet,
question4+ news_article_snippet,
question5+ news_article_snippet,
question6+ news_article_snippet,]
nlu.load('answer_question').predict(questions)
This will output a Pandas Dataframe similar to this :
Answer | Question |
---|---|
Alibaba Group founder| Who is Jack ma? |
|Jack Ma |Who is founder of Alibaba Group? |
Wednesday | When did Jack Ma re-appear? |
surged 5% | How did Alibaba stocks react? |
100 rural teachers | Whom did Jack Ma meet? |
Chinese regulators |Who did Jack Ma hide from?|
closed book question
?¶A closed book question
is the exact opposite of a open book question
. In an examen scenario, you are only allowed to use what you have memorized in your brain and nothing else.
In T5's
terms this means that T5 can only use it's stored weights to answer a question
and is given no aditional context.
T5
was pre-trained on the C4 dataset which contains petabytes of web crawling data collected over the last 8 years, including Wikipedia in every language.
This gives T5
the broad knowledge of the internet stored in it's weights to answer various closed book questions
You can answer closed book question
in 1 line of code, leveraging the latest NLU release and Google's T5.
You need to pass one string to NLU, which starts which a question
and is followed by a context:
tag and then the actual context contents.
All it takes is :
nlu.load('en.t5').predict('Who is president of Nigeria?')
>>> Muhammadu Buhari
nlu.load('en.t5').predict('What is the most spoken language in India?')
>>> Hindi
nlu.load('en.t5').predict('What is the capital of Germany?')
>>> Berlin
!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
import nlu
--2021-05-01 23:01:02-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1671 (1.6K) [text/plain] Saving to: ‘STDOUT’ - 0%[ ] 0 --.-KB/s Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ... - 100%[===================>] 1.63K --.-KB/s in 0.001s 2021-05-01 23:01:02 (1.66 MB/s) - written to stdout [1671/1671] |████████████████████████████████| 204.8MB 73kB/s |████████████████████████████████| 153kB 45.6MB/s |████████████████████████████████| 204kB 20.4MB/s |████████████████████████████████| 204kB 50.9MB/s Building wheel for pyspark (setup.py) ... done
t5_closed_book = nlu.load('en.t5')
google_t5_small_ssm_nq download started this may take some time. Approximate size to download 139 MB [OK!]
t5_closed_book.predict('What is the capital of Germany?')
document | t5 | |
---|---|---|
0 | What is the capital of Germany? | [Berlin] |
t5_closed_book.predict('Who is president of Nigeria?')
document | t5 | |
---|---|---|
0 | Who is president of Nigeria? | [Muhammadu Buhari] |
t5_closed_book.predict('What is the most spoken language in India?')
document | t5 | |
---|---|---|
0 | What is the most spoken language in India? | [Hindi] |
Your context must be prefixed with context:
t5_open_book = nlu.load('answer_question')
t5_base download started this may take some time. Approximate size to download 446 MB [OK!]
t5_open_book.predict("""Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand""" )
document | t5 | |
---|---|---|
0 | Where did Jebe die? context: Ghenkis Khan reca... | [Samarkand] |
question1 = 'What does Jimmy like to eat for breakfast usually?'
question2 = 'Why was Jimmy suprised?'
story = """ context:
Once upon a time, there was a squirrel named Joey.
Joey loved to go outside and play with his cousin Jimmy.
Joey and Jimmy played silly games together, and were always laughing.
One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond.
Joey woke up early in the morning to eat some food before they left.
He couldn’t find anything to eat except for pie!
Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast.
After he ate, he and Jimmy went to the pond.
On their way there they saw their friend Jack Rabbit.
They dove into the water and swam for several hours.
The sun was out, but the breeze was cold.
Joey and Jimmy got out of the water and started walking home.
Their fur was wet, and the breeze chilled them.
When they got home, they dried off, and Jimmy put on his favorite purple shirt.
Joey put on a blue shirt with red and green dots.
The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed.
"""
questions = [
question1+ story,
question2+ story,]
t5_open_book.predict(questions)
document | t5 | |
---|---|---|
0 | What does Jimmy like to eat for breakfast usua... | [cereal, fruit (a pear), or oatmeal] |
1 | Why was Jimmy suprised? context: Once upon a t... | [He couldn’t find anything to eat except for pie] |
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'
# from https://www.bbc.com/news/business-55728338
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
questions = [
question1+ news_article_snippet,
question2+ news_article_snippet,
question3+ news_article_snippet,
question4+ news_article_snippet,
question5+ news_article_snippet,
question6+ news_article_snippet,]
t5_open_book.predict(questions)
document | t5 | |
---|---|---|
0 | Who is Jack ma? context: Alibaba Group founder... | [Alibaba Group founder] |
1 | Who is founder of Alibaba Group? context: Alib... | [Jack Ma] |
2 | When did Jack Ma re-appear? context: Alibaba G... | [Wednesday] |
3 | How did Alibaba stocks react? context: Alibaba... | [surged 5%] |
4 | Whom did Jack Ma meet? context: Alibaba Group ... | [100 rural teachers] |
5 | Who did Jack Ma hide from? context: Alibaba Gr... | [Chinese regulators] |
# define Data, add additional context tag between sentence
question ='''
What does increased oxygen concentrations in the patient’s lungs displace?
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''
#Predict on text data with T5
t5_open_book.predict(question)
document | t5 | |
---|---|---|
0 | What does increased oxygen concentrations in t... | [carbon monoxide] |