It is important for Data Scientists to understand the basics of Machine Learning in order to understand what is happening under the hood. One of the first questions students wonder is "Why is Machine Learning used synonymously with Data Science?". We shall try to answer this foremost question in this brief course on Machine Learning, abbreviated as ML.
Machine Learning can be described as an sub-area of AI that involves learning or intelligence by computers.
These are the other terms that are commonly used by the Data Scientists. There are important differences that are worth knowing as we progress to understand Data Science.
The field of AI research was founded at a conference on the campus of Dartmouth College in the summer of 1956. Those who attended would become the leaders of AI research for decades. Many of them predicted that a machine as intelligent as a human being would exist in no more than a generation and they were given millions of dollars to make this vision come true [1]. AI is now used to describe algorithms or mathematical techniques that can imitate intelligence involving activities such as learning, inference, predictions and decision making.
Pattern Recognition is an area of AI which involves inference of patterns in the complex data set.
Statistical Learning involves aspects of statistical analysis of the data set, results as well as theories of statistics along with the intelligence aspect of Pattern Recognition.
<img src="../images/supervised_unsupervised_learning.pdf", style="width: 700px;">
The types of problems in ML can be categorized into Supervised Learning, Unsupservised Learning, Reinforcment Learning and Recommendation.
Supervised Learning refers to a class of learning where data with the resulting outputs for certain scenarios are available. The ML algorithms learn from the known data sets and their results. The ML algorithms are termed supervised as we can evaluate how good they are depending on their ability to produce output similar to what is already known. There are mainly 2 categories of Supervised Learning problems:
Prediction - This type of ML algorithms are involved in prediction such as prediction of weather, stocks, etc...
Classification - Image Classification, Character Recognition are usually the type of problems that fall into this category.
The category of problems that involve extracting meaningful information from the data such as clustering are called as Unsupervised Learning. This is because, no target is involved in the operations of the ML algorithms.
<img src="../images/supervised_unsupervised_learning.png", style="width: 700px;">
Recommendations of movies, shopping lists are examples of ML algorithms that fall into this category.
This class of algorithms solve decision making steps in scenarios by taking various actions while maximizing a reward. Robotics to a large extent, is a field that uses Reinforcement Learning.
Given the data set of college majors with information as to who secured a job and who didn't, what is this class of ML problem?
Major | Grade | Internship | Sports | Job at Graduation |
Engineering | A | No | No | No |
Arts | B | Yes | Yes | Yes |
Mathematics | B | B | No | Yes |
supervised = 'Supervised Learning, Classification'
unsupervised = 'UnSupervised Learning'
recommendation = 'Recommendation'
reinforcement = 'Reinforcement Learning'
# Assign the following variable with the variable above representing the class of ML problem.
ml_class = supervised
ml_class = supervised
ref_tmp_var = False
try:
ref_assert_var = False
if ml_class=='Supervised Learning, Classification':
ref_assert_var = True
else:
ref_assert_var = False
#ref_assert_var = False
#import Levenshtein
#ml_class_ = 'Supervised Learning, Classification'
#ratio = Levenshtein.ratio(ml_class_, ml_class)
#if ratio < 0.79:
# ref_assert_var = False
#else:
# ref_assert_var = True
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
if ref_assert_var:
ref_tmp_var = True
else:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
continue
In this lesson, we shall try to understand supervised learning and define it in a mathematical language. Suppose we have a data with points (x, y) that are generated by a process that looks like this when we plot:
As humans, we can infer that the points follow a straight line. How do we explain the process mathematically? We can do so by solving for the straight line equation:
y = mx + c
Let us solve for this equation by assuming that one of the points, (2, 11) lies on the graph:
11 = 2m + c ...(1)
Sovling for c,
c = 11 - 2m
We need another point to solve this equation since there are 2 variables.
Let us now consider another point say (7, 26):
26 = 7m + c ....(2)
With the equations (1) and (2),
we can solve for m = 3 and c = 5. Hence, we can write the equation for the line as:
y = 3x + 5
We can also verify for the known points in x = {0, 1, 2, ..., 10} that the y generated is indeed correct. For example for point on x-axis, 13,
y = 3*13 + 5 = 44 which is correct.
Once we know the equation of the line, we can say that we can predict the values for any point on the x-axis. We have now learnt the process that generated the points. Given these points, providing a mathematical technique to discover the line equation would be termed as learning, as with the known parameters, m & c, we can predict the future values of y, given any x outside of the known dataset.
<img src="../images/SupervisedLearning.png", style="width: 700px;">
The terminologies in ML are going to be important to Data Science as these are used frequently.
Given a line y = 5*x + 3,
compute predictions for x = {1, 5, 10, 12} and assign it to variable y.
# Compute y for x, define x below.
x = 'array'
Use numpy arrays
import numpy as np
x = np.array([1, 5, 10, 12])
y = 5*x + 3
print(y)
[ 8 28 53 63]
ref_tmp_var = False
try:
ref_assert_var = False
import numpy as np
x_ = np.array([1, 5, 10, 12])
y_ = 5*x_ + 3
ref_assert_var = False
if np.all(x == x_):
ref_assert_var = True
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
if ref_assert_var:
ref_tmp_var = True
else:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
continue