Review slides on Linear Regression
In regression, we're always given X. Thus, given X, what's the Y?
MAP: $\textrm{argmax}_w \prod_{i=1}^n p(y_i | w, x_i) p(w)$
MLE: $\textrm{argmax}_w \prod_{i=1}^n p(y_i | w, x_i)$ ... or something
Estimating means for normal distribution:
We have a prior: that $y_i \sim N(\mu, \sigma^2)$
We add a prior: $w \sim N(0, \gamma^2)$
See the slides for how to use these priors
Different noise at each observation: Heteroscedasticicity
With every observation, different noise:
in the real world, the noise on more extreme measurements is often
greater
$y_i \sim N(wx_i, \sigma_i^2)$ <- note how sigma changes with $i$
Sometimes we know something about the noise, and then we can use different sigmas at each point, assume independence among noise, then plugging in eqn for Gaussian and simplifying.
This is called Weighted Regression:
$\textrm{argmin}_w \sum_{i = 1}^R (y_i - wx_i)/sigma_i^2$
i.e., you weigh noisy measurements less
Suppose you know that y is related to a function of x in such a way
that the predicted values [lost slide]...
$y_i ~ N(\sqrt{w + x_i}, \sigma^2)$
MLE: $\textrm{argmin}_w \sum (y_i - \sqrt{w + x_i}_)^2$
Then use non-linear optimization techniques, of which many are
available
$y = a + bx^2$
Is this linear or nonlinear regression?
It is linear
We make a new variable:
z = [1 x_1^2
1 x_2^2
...
1 x_n^2 ]
Now: $\hat{y} = zw$ and it is linear (linear in weights)
$y = \sin(wx)$ <- nonlinear estimation
Often you have some really non-linear relationship between X and Y. Can you do some transformation on these to make the relationship linear?
Let us choose a set of points on x: $z_1 \dots z_k$ For each point we will create a Gaussian distribution $z_j = e^{\frac{||x - \mu_j||}{\sigma^2}}$
For every $x$, generate a bunch of $Z$s where the $Z$s near $X$ will be weighted heavily, and the $Z$s far from $X$ will be zero
One adjustable parameter in this situation: the kernel width, or $\sigma$. If the kernel width is really big, everything comes out. If it is really narrow, then only very close things have an effect
Now the Xs are correlated, so we generally use a Ridge Regression (MAP)
This method is LOESS
Later: the use of kernels in regression