Gathering web pages

This utility script is for gathering the text of a collection of web sites. It assumes you have a CSV with a list of URLs and it adds the results of the gathering back into the CSV.

Opening the CSV

This opens a CSV and extracts the URLs putting them into a list. Alternatively you can use a

Getting the HTML

This function gets the HTML given a URL.

Cleaning the HTML

This function cleans the HTML