NLP with Transformers with the T5 Model

Introduction

In this assignment you will learn how to apply the T5 pre-trained model to three tasks

  1. summarization,
  2. translation,
  3. grammar checking.

This will require installation of pytorch and the transformers package. You should already have pytorch installed. To install transformers, you can use

pip install transformers

Then run the following code cell. The first time it is run, the t5-base model will be downloaded.

In [1]:
import transformers as tr

# initialize the model architecture and weights
model = tr.T5ForConditionalGeneration.from_pretrained("t5-base")
# initialize the model tokenizer
tokenizer = tr.T5Tokenizer.from_pretrained("t5-base")

Now you will use model and tokenizer to perform the above tasks. Here are some examples.

Summarize

First, let's summarize this text with at most 100 words.

In [75]:
text = """
Julia was designed from the beginning for high performance. Julia programs compile to efficient 
native code for multiple platforms via LLVM.
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use.
Reproducible environments make it possible to recreate the same Julia environment every time, 
across platforms, with pre-built binaries.
Julia uses multiple dispatch as a paradigm, making it easy to express many object-oriented 
and functional programming patterns. The talk on the Unreasonable Effectiveness of Multiple 
Dispatch explains why it works so well.
Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, 
and more. One can build entire Applications and Microservices in Julia.
Julia is an open source project with over 1,000 contributors. It is made available under the 
MIT license. The source code is available on GitHub.
"""
In [87]:
len(text)
Out[87]:
929
In [76]:
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
inputs
Out[76]:
tensor([[21603,    10, 18618,    47,   876,    45,     8,  1849,    21,   306,
           821,     5, 18618,  1356,  2890,   699,    12,  2918,  4262,  1081,
            21,  1317,  5357,  1009,     3, 10376, 12623,     5, 18618,    19,
          4896,  1427,   686,    26,     6,  4227,   114,     3,     9,  4943,
            53,  1612,     6,    11,    65,   207,   380,    21,  6076,   169,
             5,   419,  1409,  4817,  2317,  8258,   143,    34,   487,    12,
         23952,     8,   337, 18618,  1164,   334,    97,     6,   640,  5357,
             6,    28,   554,    18, 16152,  2701,  5414,     5, 18618,  2284,
          1317, 17648,    38,     3,     9, 20491,     6,   492,    34,   514,
            12,  3980,   186,  3735,    18,  9442,    11,  5014,  6020,  4264,
             5,    37,  1350,    30,     8,   597,   864,   739,   179, 18652,
           655,    13, 16821,     3, 23664, 14547,     3,  9453,   572,    34,
           930,    78,   168,     5, 18618,   795,     3,     9, 30373,    27,
            87,   667,     6, 10531,  7050,    53,     6,    20, 14588,  3896,
             6,     3, 12578,     6,  9639,    53,     6,     3,     9,  2642,
          2743,     6,    11,    72,     5,   555,    54,   918,  1297, 15148,
            11,  5893,  5114,     7,    16, 18618,     5, 18618,    19,    46,
           539,  1391,   516,    28,   147, 11668, 13932,     7,     5,    94,
            19,   263,   347,   365,     8,     3, 12604,  3344,     5,    37,
          1391,  1081,    19,   347,    30,     3, 30516,     5,     1]])
In [77]:
outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
In [78]:
print(outputs)
print(outputs.shape)
tensor([[    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743,     6,    11,    72,     3,     5,     8,  1391,  1081,
            19,   347,    30,     3, 30516,   365,     8,     3, 12604,  3344,
             3,     5,     1],
        [    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743,     6,    11,    72,     3,     5,    34,    19,    46,
           539,  1391,   516,    28,   147, 11668, 13932,     7,     3,     5,
             1,     0,     0],
        [    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743,     6,    11,    72,     3,     5, 18618,    19,    46,
           539,  1391,   516,    28,   147, 11668, 13932,     7,     3,     5,
             1,     0,     0]])
torch.Size([3, 73])
In [79]:
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))
Result 1
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. the source code is available on GitHub under the MIT license.

Result 2
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. it is an open source project with over 1,000 contributors.

Result 3
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. Julia is an open source project with over 1,000 contributors.

Translation

Now let's translate the text to German.

In [91]:
inputs = tokenizer.encode('translate English to German: ' + text, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=1500, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=3)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))
Result 1
Julia wurde von Anfang an für hohe Leistung entwickelt. Julia-Programme kompilieren zu effizientem nativen Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia

Result 2
Julia wurde von Anfang an für hohe Performance entwickelt. Julia-Programme kompilieren zu effizientem nativen Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia verwendet

Result 3
Julia wurde von Anfang an für hohe Leistung entwickelt. Julia-Programme kompilieren zu effizientem nativem Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia

Grammar Checker

Now to check some grammar.

In [81]:
sentence = 'This sentence do not be grammatical.'
inputs = tokenizer.encode('cola sentence: ' + sentence, return_tensors='pt')
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
unacceptable

Requirements

Summarization

Cut and paste a news story that has at least five paragraphs that describes the recent news about the COVID-19 vaccine developed by the University of Oxford University and AstraZenec. Try at least three values for each of the parameters:

  • max_length,
  • min_length,
  • length_penalty, and
  • num_beams.

Copy and paste into a markdown cell what you consider to be the best summarization of the news article. Also, describe the effects of these four parameters on the results with at least four sentences.

This article will help you understand a bit more about these parameters.

Translation

Try translating the first paragraph of your news story into German. Use num_return_sequences=5 and translate the German back to English using translate.google.com. Experiment with at least three values for the above four parameters. Using the google translations, describe which German translation is best, and which parameter values led to its generation.

Grammar Checker

Write a for loop that checks the grammatical correctness of each sentence in a list of sentences. Apply it to the first paragraph of your news article. Describe the results.

Now modify at least three of the sentences in your paragraph to make the sentences grammatically incorrect and repeat the analysis of all sentences. Describe the results. Are your grammatically incorrect sentences correctly identified?

Extra Credit

Read some of the on-line documentation and examples that describe how to fine-tune the T5 model to do better English to German and German to English translations. Try fine-tuning the T5 model we use here on example translations. Does it perform better?

Warning: This will take a lot of time to figure out. First try to find examples on-line of training to fine-tune the model.