Regression analysis¶

Regression analysis is a statistical tool that attempts to identify correlation between independent variables (one variable or more) and a single dependent variable.

Correlation is the degree to which two things change together.

Any two things are correlated, somewhere between -1 and 1,
0 meaning there is no correlation at all
positively correlated - a positive change in one (advertising) leads to a positive change in something else (sales).

Linear Functions and Models
linear equation is any pattern of numbers that is increasing or decreasing by the same amount every step of the way.
slope-intercept form of the linear equation, y = mx + b, where the m value is the slope, and the b value is the y-intercept.

_{https://www.zweigmedia.com/RealWorld/tutorialsf0/framesLA.html}
_{https://study.com/academy/lesson/what-is-a-linear-equation.html}

Sum of squares error (SSE)

_{https://www.zweigmedia.com/tuts/tutRegression.html}

The regression line or The best fit line
_{https://www.zweigmedia.com/tuts/tutRegressionb.html}

Coefficient of Determination(R-squared) & Adjusted R-squared

_{https://towardsdatascience.com/coefficient-of-determination-r-squared-explained-db32700d924e}

Regression assumptions & Residual analysis

_{https://www.datacamp.com/community/tutorials/linear-regression-R#coefficients}

Simple linear Regression¶

a single independent variable is used to predict the value of a dependent variable

In [ ]:

# Importing the dataset
dataset = read.csv('data/income.csv')

# Splitting the dataset into the Training set and Test set
install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$income, SplitRatio = 2/3)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)

# Fitting Simple Linear Regression to the Training set
regressor = lm(formula = income ~ Exp,
               data = training_set)

# Predicting the Test set results
y_pred = predict(regressor, newdata = test_set)

# Visualising the Training set results
library(ggplot2)
ggplot() +
  geom_point(aes(x = training_set$Exp, y = training_set$income),
             colour = 'red') +
  geom_line(aes(x = training_set$Exp, y = predict(regressor, newdata = training_set)),
            colour = 'blue') +
  ggtitle('income vs Experience (Training set)') +
  xlab('Years of experience') +
  ylab('income')

# Visualising the Test set results
library(ggplot2)
ggplot() +
  geom_point(aes(x = test_set$Exp, y = test_set$income),
             colour = 'red') +
  geom_line(aes(x = training_set$Exp, y = predict(regressor, newdata = training_set)),
            colour = 'blue') +
  ggtitle('income vs Experience (Test set)') +
  xlab('Years of experience') +
  ylab('income')

Multiple linear Regression¶

two or more independent variables are used to predict the value of a dependent variable

In [ ]:

# Importing the dataset
dataset = read.csv('data/companies.csv')

# Encoding categorical data
dataset$State = factor(dataset$State,
                       levels = c('New York', 'California', 'Florida'),
                       labels = c(1, 2, 3))

# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Profit, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Feature Scaling
# training_set = scale(training_set)
# test_set = scale(test_set)

# Fitting Multiple Linear Regression to the Training set
regressor = lm(formula = Profit ~ .,
               data = training_set)

# Predicting the Test set results
y_pred = predict(regressor, newdata = test_set)

Artificial Neural Networks¶

In [ ]:

install.packages('neuralnet')
library("neuralnet")
 
#Going to create a neural network to perform sqare rooting
#Type ?neuralnet for more information on the neuralnet library
 
#Generate 50 random numbers uniformly distributed between 0 and 100
#And store them as a dataframe
traininginput <-  as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
 
#Column bind the data into one variable
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
 
#Train the neural network
#Going to have 10 hidden layers
#Threshold is a numeric value specifying the threshold for the partial
#derivatives of the error function as stopping criteria.
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01)
print(net.sqrt)
 
#Plot the neural network
plot(net.sqrt)
 
#Test the neural network on some training data
testdata <- as.data.frame((1:10)^2) #Generate some squared numbers
net.results <- compute(net.sqrt, testdata) #Run them through the neural network
 
#Lets see what properties net.sqrt has
ls(net.results)
 
#Lets see the results
print(net.results$net.result)
 
#Lets display a better version of the results
cleanoutput <- cbind(testdata,sqrt(testdata),
                         as.data.frame(net.results$net.result))
colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output")
print(cleanoutput)

R Code pulled from:
http://gekkoquant.com/2012/05/26/neural-networks-with-r-simple-example/
Sources & References
https://gl4l.greatlearning.in/building-artificial-neural-networks-using-r/
Further Resources:
http://www.michaeljgrogan.com/neural-network-modelling-neuralnet-r/
https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6

Table of Contents

Regression analysis¶

Simple linear Regression¶

Multiple linear Regression¶

Artificial Neural Networks¶