Because in Machine Learning models, features are mapped into n-dimensional space. So let's say there are two variables (x, y) which will be mapped in 2D co-ordinate system. If one variable, say y, is very huge and other, x, is very small, then the euclidean distance will be dominated by the bigger one and smaller one will be ignored. In this case we are losing valuable information, hence feature scaling is used to solve this problem.
Additional reasons for transformation:
Will scale the input to have minimum of 0 and maximum of 1. That is, it scales the data in the range of [0, 1] This is useful when the parameters have to be on same positive scale. But in this case, the outliers are lost. $$X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}}$$
Will scale the input to have mean of 0 and variance of 1. $$X_{stand} = \frac{X - \mu}{\sigma}$$
Will scale the input to make the norm of 1. For instance, for 3D data the 3 independent variables will lie on a unit Sphere.
Taking the log of data after any of above transformation.
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
For most applications, Standardization is recommended. Min Max Scaling is recommended for Neural Networks. Normalizing is recommended when Clustering eg. KMeans.
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler
df = pd.read_csv('Data.csv').dropna()
print(df)
X = df[["Age", "Salary"]].values.astype(np.float64)
Country Age Salary Purchased 0 France 44.0 72000.0 No 1 Spain 27.0 48000.0 Yes 2 Germany 30.0 54000.0 No 3 Spain 38.0 61000.0 No 5 France 35.0 58000.0 Yes 7 France 48.0 79000.0 Yes 8 Germany 50.0 83000.0 No 9 France 37.0 67000.0 Yes
standard_scaler = StandardScaler()
normalizer = Normalizer()
min_max_scaler = MinMaxScaler()
print("Standardization")
print(standard_scaler.fit_transform(X))
print("Normalizing")
print(normalizer.fit_transform(X))
print("MinMax Scaling")
print(min_max_scaler.fit_transform(X))
Standardization [[ 0.69985807 0.58989097] [-1.51364653 -1.50749915] [-1.12302807 -0.98315162] [-0.08137885 -0.37141284] [-0.47199731 -0.6335866 ] [ 1.22068269 1.20162976] [ 1.48109499 1.55119478] [-0.211585 0.1529347 ]] Normalizing [[ 6.11110997e-04 9.99999813e-01] [ 5.62499911e-04 9.99999842e-01] [ 5.55555470e-04 9.99999846e-01] [ 6.22950699e-04 9.99999806e-01] [ 6.03448166e-04 9.99999818e-01] [ 6.07594825e-04 9.99999815e-01] [ 6.02409529e-04 9.99999819e-01] [ 5.52238722e-04 9.99999848e-01]] MinMax Scaling [[ 0.73913043 0.68571429] [ 0. 0. ] [ 0.13043478 0.17142857] [ 0.47826087 0.37142857] [ 0.34782609 0.28571429] [ 0.91304348 0.88571429] [ 1. 1. ] [ 0.43478261 0.54285714]]