Under the usual notation,
$$ Y = X\beta + \epsilon \\ $$$$ \\\\ X \quad is \quad n*(p+1) \quad matrix\\ \text{essentially, there are n rows of datapoints and (p+1) variables} \\ $$ $$ \hat Y = X\hat\beta \\ $$ $$ \hat\beta = (X'X)^{-1}X'Y \\ \implies \hat\beta' = Y'X(X'X)^{-1} $$
\begin{align} \Sigma (Y_i - \hat Y_i)^2 & = (Y_i - \hat Y_i)'(Y_i - \hat Y_i) \\ & = (X(\beta - \hat \beta) + \epsilon)' (X(\beta - \hat \beta) + \epsilon)\\ & = \underbrace {(\beta - \hat \beta)'X'X(\beta - \hat \beta)}_{term1} + \underbrace {\epsilon'X (\beta - \hat \beta)}_{term2}\\ & + \underbrace {(\beta - \hat \beta)'X'\epsilon}_{term3} + \epsilon'\epsilon \\ \end{align}Simplifying the individual terms
Term 1: \begin{align} (\beta - \hat \beta)'X'X(\beta - \hat \beta) &= (\beta - (X'X)^{-1}X'Y)'X'X(\beta - (X'X)^{-1}X'Y)\\ & = (\beta' - Y'X(X'X)^{-1})X'X(\beta - (X'X)^{-1}X'Y) \\ & = \beta'X'X\beta - Y'X\beta - \beta'(X'X)(X'X)^{-1}X'Y + Y'X(X'X)^{-1}X'Y \\ & = \beta'X'X\beta - (\beta'X' + \epsilon')X\beta \\ &-\beta'(X'X)(X'X)^{-1}X'Y \\ & + (\beta'X' + \epsilon')X(X'X)^{-1}X'Y \quad \text{substituting the value of }Y' \\ & = - \epsilon'X\beta + \epsilon'X(X'X)^{-1}X'Y \quad \text {some terms get cancelled} \\ & = - \epsilon'X\beta + \epsilon'X(X'X)^{-1}X'( X\beta + \epsilon) \quad \text {substituting the value of } Y \\ & = \epsilon'X(X'X)^{-1}X'\epsilon \\ \end{align}
Term 2 : \begin{align} \epsilon'X (\beta - \hat \beta) &= \epsilon'X(\beta - (X'X)^{-1}X'Y)\\ & = \epsilon'X(\beta - (X'X)^{-1}X'(X\beta+\epsilon))\quad \text {substituting the value of } Y \\\\ & = -\epsilon'X(X'X)^{-1}X'\epsilon \end{align}
Similarly, Term 3 is also same as Term 2, i.e. $ -\epsilon'X(X'X)^{-1}X'\epsilon$
$$ \Sigma (Y_i - \hat Y_i)^2\\=\epsilon'X(X'X)^{-1}X'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon+\epsilon'\epsilon\\=\epsilon'\epsilon-\epsilon'X(X'X)^{-1}X'\epsilon\\=\epsilon'\epsilon-\epsilon'P\epsilon $$$P$ is the projection matrix which is symmetric and idempotent
Let's calculate the expectation now,
$$ E[\Sigma (Y_i - \hat Y_i)^2]\\=E(\epsilon'\epsilon-\epsilon'P\epsilon)\\=E(\epsilon'\epsilon)-E(\epsilon'P\epsilon)\\=n\sigma^2-\sigma^2trace(P) \\\text{(proof that trace is (p+1) is given below)}\\ =(n-(p+1))\sigma^2 \\\\ \therefore \frac{\Sigma (Y_i - \hat Y_i)^2}{n-(p+1)} \text{is an unbiased estimator of }\sigma^2 $$