After executing the code cell below, you can see further details for your devices in the Jupyter Console.
from tensorflow.python.client import device_lib
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
get_available_devices()
from __future__ import print_function
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from keras_text_summarization.library.utility.plot_utils import plot_and_save_history
from keras_text_summarization.library.seq2seq import Seq2SeqSummarizer
from keras_text_summarization.library.applications.fake_news_loader import fit_text
Using TensorFlow backend.
LOAD_EXISTING_WEIGHTS = True
np.random.seed(42)
data_dir_path = './L4_data/data'
report_dir_path = './L4_data/reports'
model_dir_path = './L4_data/models'
We will use a provided news data-set which contains articles and titles from various news sources.
This data is pre-processed inside the custom functions in the 'keras_text_summarization' folder.
# Load CSV into DataFrame
print('Loading CSV . . .')
df = pd.read_csv(data_dir_path + "/news.csv")
# Extract text for configuration
print('Extracting for config . . . ')
Y = df.title
X = df['text']
config = fit_text(X, Y)
print('-> Complete')
Loading CSV . . . Extracting for config . . . -> Complete
...there are two different approaches for automatic summarization currently:
Extraction and Abstraction.
Extractive summarization methods work by identifying important sections of the text and generating them verbatim;
...Abstractive summarization methods aim at producing important material in a new way. In other words, they interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text.
- [Text Summarization Techniques: A Brief Survey, 2017](https://arxiv.org/abs/1707.02268)
summarizer = Seq2SeqSummarizer(config)
# Change this value to 'false' above to start fresh!
if LOAD_EXISTING_WEIGHTS:
summarizer.load_weights(weight_file_path=Seq2SeqSummarizer.get_weight_file_path(model_dir_path=model_dir_path))
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.2, random_state=42)
In other words - let's start training our model!
# Optional TF Device Selection (code below must be indented)
with tf.device('/GPU:0'):
history = summarizer.fit(Xtrain, Ytrain, Xtest, Ytest, epochs=100, batch_size=5, model_dir_path=model_dir_path)
history_plot_file_path = report_dir_path + '/' + Seq2SeqSummarizer.model_name + '-history.png'
AI/Hub Team Members can also use 'The Beast' to process this training code at a faster rate!
An informational document is being created for using The Beast; It will be available on the ORSIE AI/Hub Internal Site once it has been completed!
Please ask your Lead Researcher for more information regarding this.
However, you will be able to test the current model locally, even with limited training!
(. . . Mind the results)
- 'history' is created on completion of the summarizer.fit() function
- If you manually stop the training, you will not be able to run this cell!
if LOAD_EXISTING_WEIGHTS:
history_plot_file_path = report_dir_path + '/' + Seq2SeqSummarizer.model_name + '-history-v' + str(summarizer.version) + '.png'
# Plot and Save History
plot_and_save_history(history, summarizer.model_name, history_plot_file_path, metrics={'loss', 'acc'})
# Randomize Seed
np.random.seed(42)
# Define Directory Paths
data_dir_path = './L4_data/data' # refers to the demo/data folder
model_dir_path = './L4_data/models' # refers to the demo/models folder
# Load CSV from Directory
print('Loading CSV . . .')
df = pd.read_csv(data_dir_path + "/news.csv")
# Assign dataframe text and title to X and Y values
print('Extracting features . . .')
X = df['text']
Y = df.title
print('-> Complete')
Loading CSV . . . Extracting features . . . -> Complete
# Load stored model configuration using NumPy.load()
config = np.load(Seq2SeqSummarizer.get_config_file_path(model_dir_path=model_dir_path)).item()
# Re-Initialize the model using the stored configuration
summarizer = Seq2SeqSummarizer(config)
# Load the stored weights into the model
summarizer.load_weights(weight_file_path=Seq2SeqSummarizer.get_weight_file_path(model_dir_path=model_dir_path))
# Print predicted headlines along with their original title
print('Predicting Headlines . . .')
for i in range(10):
x = X[i]
actual_headline = Y[i]
headline = summarizer.summarize(x)
print('\n', 'Original: ', actual_headline)
#print('Article: ', x)
print('Generated: ', headline)
print('\n', '-> Complete')
Predicting Headlines . . . Original: You Can Smell Hillary’s Fear Generated: clinton campaign biggest national are - the onion - america's finest news source Original: Watch The Exact Moment Paul Ryan Committed Political Suicide At A Trump Rally (VIDEO) Generated: the trump is what trump's rick of gop debate Original: Kerry to go to Paris in gesture of sympathy Generated: not to back to back at least time Original: Bernie supporters on Twitter erupt in anger against the DNC: 'We tried to warn you!' Generated: the gop debate on the party is in against trump is a bit to twitter Original: The Battle of New York: Why This Primary Matters Generated: the battle of new why why many could go to win Original: Tehran, USA Generated: john obama: political top to daily Original: Girl Horrified At What She Watches Boyfriend Do After He Left FaceTime On Generated: of be hillary’s why trump’s campaign in 2016 Original: ‘Britain’s Schindler’ Dies at 106 Generated: re: clinton’s email and coming Original: Fact check: Trump and Clinton at the 'commander-in-chief' forum Generated: is republicans the jeb bush director up the gop debate in the Original: Iran reportedly makes new push for uranium concessions in nuclear talks Generated: election is coming in the world war iii - the onion - america's finest news source -> Complete
This tutorial showed how to generate headlines for news articles of various length using Keras' sequence2sequence text summarizer.
These are a few suggestions for exercises that may help improve your skills with TensorFlow. It is important to get hands-on experience with TensorFlow in order to learn how to use it properly.
You may want to backup this Notebook before making any changes.