%autosave 10
Autosaving every 10 seconds
nltk.PunktTokeniser
. -> Cleaned, parsed, tokensized strings.sklearn.TfidVectorizer
-> TF-IDF 2D matrixsklearn.TruncatedSVD
-> SVD N-D matrixNow, we can:
sklearn.MiniBatchKmeans
(ridiculously fast, coupled with grid search for hyperparameters)sklearn.AffinityPropagation
sklearn.RadiusNeighborsClassifier
nltk
and scikit-learn
allowed rapid development and testing