#!/usr/bin/env python
# coding: utf-8

# ## Exercise 05
# 
# # Neural networks
# 
# ## 4.1 Little Red Riding Hood Network
# 
# Train a neural network to solve the  Little Red Riding Hood problem in sklern and Keras. Try the neural networ with different inputs and report the results.
# 
# ________________
# 
# ## 4.2 Boston House Price Prediction
# 
# In the next questions we are going to work using the dataset *Boston*. This dataset measures the influence of socioeconomical factors on the price of several estates of the city of Boston. This dataset has 506 instances, each one characterized by 13 features:
# 
# * CRIM - per capita crime rate by town
# * ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
# * INDUS - proportion of non-retail business acres per town.
# * CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
# * NOX - nitric oxides concentration (parts per 10 million)
# * RM - average number of rooms per dwelling
# * AGE - proportion of owner-occupied units built prior to 1940
# * DIS - weighted distances to five Boston employment centres
# * RAD - index of accessibility to radial highways
# * TAX - full-value property-tax rate per 10,000 USD
# * PTRATIO - pupil-teacher ratio by town
# * B - $1000(Bk - 0.63)^2$ where $Bk$ is the proportion of blacks by town
# * LSTAT - % lower status of the population
# 
# Output variable:
# * MEDV - Median value of owner-occupied homes in 1000's USD
# 
# **Note:** In this exercise we are going to predict the price of each estate, which is represented in the `MEDV` variable. It is important to remember that we are always aiming to predict `MEDV`, no matter which explanatory variables we are using. That means, in some cases we will use a subset of the 13 previously mentioned variables, while in other cases we will use all the 13 variables. But in no case we will change the dependent variable $y$.
# 
# 
# 
# 1. Load the dataset using `from sklearn.datasets import load_boston`.
# 2. Create a DataFrame using the attribute `.data` from the loading function of Scikit-learn.
# 3. Assign the columns of the DataFrame so they match the `.feature_names` attribute from the loading function of Scikit-learn. 
# 4. Assign a new column to the DataFrame which holds the value to predict, that means, the `.target` attribute of the loading function of Scikit-learn. The name of this columns must be `MEDV`.
# 5. Use the function `.describe()` from Pandas for obtaining statistics about each column.
# 
# ## 4.3 Feature analysis:
# 
# Using the DataFrame generated in the previous section:
# * Filter the dataset to just these features:
#      * Explanatory: 'LSTAT', 'INDUS', 'NOX', 'RM', 'AGE'
#      * Dependent: 'MEDV'.
# * Generate a scatter matrix among the features mentioned above using Pandas (`scatter_matrix`) or Seaborn (` pairplot`).
#      * Do you find any relationship between the features?
# * Generate the correlation matrix between these variables using `numpy.corrcoef`. Also include `MEDV`.
#      * Which characteristics are more correlated?
#      * BONUS: Visualize this matrix as heat map using Pandas, Matplotlib or Seaborn.
# 
# ## 4.4 Modeling linear and non linear relationships
# 
# * Generate two new subsets filtering these characteristics:
#      * $D_1$:  $X = \textit{'RM'}$, $y = \textit{'MEDV'}$
#      * $D_2$:  $X = \textit{'LSTAT'}$, $y = \textit{'MEDV'}$
# * For each subset, generate a training partition and a test partition using a ratio of $ 70 \% - 30 \% $
# * Train a linear regression model on both subsets of data:
#      * Report the mean square error on the test set
#      * Print the values of $ w $ and $ w_0 $ of the regression equation
#      * Generate a graph where you visualize the line obtained by the regression model in conjunction with the training data and the test data
# * How does the model perform on $ D_1 $ and $ D_2 $? Why?
# 
# ## 4.5 Training a regression model
# 
# * Generate a 70-30 partitioning of the data **using all the features**. (Do not include the dependent variable `MEDV`)
# * Train a linear regression model with the objective of predicting the output variable `MEDV`.
#      * Report the mean square error on the test set
# * Train a regression model using `MLPRegressor` in order to predict the output variable` MEDV`.
#      * Report the mean square error on the test set
# * Scale the data so that they have zero mean variance one per feature (only $ X $). You can use the following piece of code:
# 
# ```python
# from sklearn.preprocessing import StandardScaler
# 
# sc_x = StandardScaler()
# sc_x.fit(X)
# X_train_s = sc_x.transform(X_train)
# X_test_s = sc_x.transform(X_test)
# ```
# Check more information about `StandardScaler` [here](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).
# 
# * Train the following models:
#      1. Train a linear regression model using the scaled data.
#          * Report the mean square error on the test set
#      2. Train a regression model using a 2-layer MultiLayer Perceptron (128 neurons in the first and 512 in the second) and with the **scaled data**.
#          * Report the mean square error on the test set
#      3. Which model has better performance? Why?

# In[ ]: