Let xi∈Rn, yi∈R, i=1,⋯,l, be a train set for a linear model on the form y=wTx for some w∈Rn.
We have a loss function as mean square error (MSE):
L(w)=1ll∑i=0(wTxi−yi)2=1l||Xw−y||2,
where X=[xT1⋮xTl].
So, can someone explain me why when we make L′(w)=0, we get w=(XTX)−1XTy?