Author: Adam Rivers
The landscape of ML was discussed a little in the module Framing ML problems. In this section we will go over a few ways that ML methods are classified, give the the major classes for each and place common methods you may come across into context.
This is how we initially broke down ML models. Since we covered this on Lesson 2 we will not go over it further here.
One way to group learning methods is by learning type. These types include supervised learning, unsupervised learning, semi-supervised leaning and active learning.
Supervised learning is what we have focused on in this course. In this method data are divided into training testing sets. Labeled training data is used to train the classifier.Supervised leaning is the most common type of machine learning. Both classification and regression methods fall into this category.
Examples:
Regression
Classification
Unsupervised learning attempts to learn the structure of data without labels. This problem is different from supervised leaning because there is no one correct answer. This also means that the metrics applied to supervised leaning are not relevant. Other metrics such as "lift" or "compactness" are sometimes applied Unsupervised leaning attempts to cluster data, reduce the dimension of data or find associations as in association rule mining.
Clustering
Dimensionality reduction
Association rule mining
Semi-supervised learning takes a small amount of labeled training data and a large amount of unlabeled training data and learns from it. Why would we want to do this? Often labeling data (often manually) is expensive, We may have lots of images we can scrape from the internet but turning them into training data requires a human to identify what's in them (CAPTCHA anyone?). If we assume that similar images identified by clustering have the sam label we now have much larger dataset to learn from.
Examples:
In active learning we start with a large training set which is mostly unlabeled and train with the labeled data. now we find the unlabeled points that are most needed to improve the classifier, for examples those points closets to a decision boundary. The classifier asks for you to get labels for those points then adds them to improve the classifier incrementally. This is often useful when getting the training data is expensive.
For example we are constructing a classifier to predict sanitizer resistance in Salmonella from its genome sequence. We have 200,000 genomes and isolates but no sanitizer resistance data. Measuring sanitizer resistance takes about 1 week per 200 strains, so screening all strains in our Lab would take 20 years! We start building the classifier by selecting 200 of the most distantly related strains and screening them. Then we run the classifier at the end of the week to select the strains for the next week. At the end of several rounds, performance improvements plateau and we stop screening, hopefully this only takes a month or two.
Active learning often bolts onto existing methods but includes special decision functions to select the next round of samples to label.
Examples:
for examples see the Google active learning repo
(with ideas from This site)
The other major method of grouping methods is clustering them by how they operate, terms like neural network, random forest or Logistic regression refer to types of learning methods.
Regression attempts to learn the relationship between a response variable and one or more predictor variables by iteratively adjusting the model parameters to reduce a cost or loss function.
Examples include:
Instance or neighbor based methods use the label of most similar representative to classify something of return a list of the most similar items.
Examples include
These are not really a learning method in their own right but rather a way of improving methods by finding the optimal point in the Bias-Variance tradeoff.
Examples are:
In decision tree methods a tree (directed acyclic graph) is created where each node represents a feature used in the model. For each feature a decision value is set which determines which branch is followed to the next node (feature).
Decision trees are good at handling nonlinear data, smaller training sets and are easy to train.
Examples include:
Often multiple decision trees are used together trees are used in ensemble models like Random Forests.
These are methods that aggregate many weak classifiers to create a larger good classifier. For example you might make one classifier for reach feature that guesses the class based on one value. Each one by iteslf has poor accuracy but when each is combined together the accuracy is good.
Examples include:
This is a broad label and many methods employ some aspect of Bayesian methods. The key distinguishing feature is they apply a Bayesian framework. This means they:
Examples include:
Neural networks are a class of learning algorithms that pass input values through multiple functions represented by neurons. Two paramters in each neuron Weights and Biases are iteratively modified to minimize loss in training data. This modification is typically done through a modification method called back propagation.
Examples include:
Deep Neural networks are really just neural networks with multiple layers.They have been around since the 1980's but initially performed poorly because they were hard to train and required lots of data. Advances in methods, computing and data converged about 10 years ago to allow the development of new configurations of neural networks that performed extremely well on specific tasks. And the hype was born.
Examples include:
Feed forward neural networks (simple multilayer neural networks)
Convolutional neural networks (networks which took advantage of spatial data in pictures
LSTM (Long short-term memory networks, which operate on sequential data