Regression analysis is a statistical tool that attempts to identify correlation between independent variables (one variable or more) and a single dependent variable.
Correlation is the degree to which two things change together.
Linear Functions and Models
linear equation is any pattern of numbers that is increasing or decreasing by the same amount every step of the way.
slope-intercept form of the linear equation, y = mx + b
, where the m value is the slope, and the b value is the y-intercept.
https://www.zweigmedia.com/RealWorld/tutorialsf0/framesLA.html
https://study.com/academy/lesson/what-is-a-linear-equation.html
Sum of squares error (SSE)
The regression line or The best fit line
https://www.zweigmedia.com/tuts/tutRegressionb.html
Coefficient of Determination(R-squared) & Adjusted R-squared
Regression assumptions & Residual analysis
a single independent variable is used to predict the value of a dependent variable
# Importing the dataset
dataset = read.csv('data/income.csv')
# Splitting the dataset into the Training set and Test set
install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$income, SplitRatio = 2/3)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)
# Fitting Simple Linear Regression to the Training set
regressor = lm(formula = income ~ Exp,
data = training_set)
# Predicting the Test set results
y_pred = predict(regressor, newdata = test_set)
# Visualising the Training set results
library(ggplot2)
ggplot() +
geom_point(aes(x = training_set$Exp, y = training_set$income),
colour = 'red') +
geom_line(aes(x = training_set$Exp, y = predict(regressor, newdata = training_set)),
colour = 'blue') +
ggtitle('income vs Experience (Training set)') +
xlab('Years of experience') +
ylab('income')
# Visualising the Test set results
library(ggplot2)
ggplot() +
geom_point(aes(x = test_set$Exp, y = test_set$income),
colour = 'red') +
geom_line(aes(x = training_set$Exp, y = predict(regressor, newdata = training_set)),
colour = 'blue') +
ggtitle('income vs Experience (Test set)') +
xlab('Years of experience') +
ylab('income')
two or more independent variables are used to predict the value of a dependent variable
# Importing the dataset
dataset = read.csv('data/companies.csv')
# Encoding categorical data
dataset$State = factor(dataset$State,
levels = c('New York', 'California', 'Florida'),
labels = c(1, 2, 3))
# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Profit, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)
# Fitting Multiple Linear Regression to the Training set
regressor = lm(formula = Profit ~ .,
data = training_set)
# Predicting the Test set results
y_pred = predict(regressor, newdata = test_set)
install.packages('neuralnet')
library("neuralnet")
#Going to create a neural network to perform sqare rooting
#Type ?neuralnet for more information on the neuralnet library
#Generate 50 random numbers uniformly distributed between 0 and 100
#And store them as a dataframe
traininginput <- as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
#Column bind the data into one variable
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
#Train the neural network
#Going to have 10 hidden layers
#Threshold is a numeric value specifying the threshold for the partial
#derivatives of the error function as stopping criteria.
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01)
print(net.sqrt)
#Plot the neural network
plot(net.sqrt)
#Test the neural network on some training data
testdata <- as.data.frame((1:10)^2) #Generate some squared numbers
net.results <- compute(net.sqrt, testdata) #Run them through the neural network
#Lets see what properties net.sqrt has
ls(net.results)
#Lets see the results
print(net.results$net.result)
#Lets display a better version of the results
cleanoutput <- cbind(testdata,sqrt(testdata),
as.data.frame(net.results$net.result))
colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output")
print(cleanoutput)
R Code pulled from:
http://gekkoquant.com/2012/05/26/neural-networks-with-r-simple-example/
Sources & References
https://gl4l.greatlearning.in/building-artificial-neural-networks-using-r/
Further Resources:
http://www.michaeljgrogan.com/neural-network-modelling-neuralnet-r/
https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6