General Linear Regression
This is a work in progress. It is meant to capture the mathematical proof of how general linear regression works. It is math-heavy.
Introduction
Assume you have some data set where you have $$N$$ independent values $$x_k$$ and dependent values $$y_k$$. You also have some reasonable scientific model that relates the dependent variable to the independent variable. If that model can be written as a general linear fit, that means you can represent the fit function $$\hat{y}(x)$$ as:
where $$\phi_m(x)$$ is the $$m$$th basis function in your model and $$a_m$$ is the constant coefficient. For instance, if you end up having a model:
then you could map these to the summation with $$M=2$$ basis function total and:
Note for the second term that $$\phi(x)$$ must be a function of $$x$$ -- constants are thus the coefficients on an implied $$x^0$$.
The goal, once we have established a scientifically valid model, is to determine the "best" set of coefficients for that model. We are going to define the "best" set of coefficients as the values of $$a_m$$ that minimize the sum of the squares of the estimate residuals, $$S_r$$, for that particular model. Recall that:
Finding the coefficients for the "constant" model
The simplest model you might come up with is a simple constant, $$\hat{y}(x)=a_0x^0$$. This means that the $$S_r$$ value, using the second version above, will be:
Keep in mind that the only variable right now is $$a_0$$; all the $$x$$ and $$y$$ values are constant independent or dependent values from your data set. The only parameter you can adjust is $$a_0$$. This means that to minimize the $$S_r$$ value, you need to solve:
Here goes!
The derivative of a sum is the same as the sum of derivatives, so put the derivative operator inside:
Use the power rule to get that $$d(u^2)=2u~du$$ and note that $$u=(a_0-y_k)$$ so $$\frac{du}{da_0}=1$$ here:
Since we are setting the left side to 0, the 2 is irrelevant. Also, the summand can be split into two parts...
...and then the parts can be separated.
Recognize the $$a_0$$ is a constant; since you are adding that constant to itself for each of the $$N$$ data points, you can replace the summation with:
Dividing by $$N$$ reveals the answer:
The best constant with which to model a data set is its own average! Admittedly, this will lead to an $$r^2$$ value of 0, which is not great, but it is as good as you can get with a model containing nothing more than a constant.
Finding the coefficients for a "straight line" model
So, that was relatively painless (except for the typesetting). Next up, let's look at a slightly more complex model: a straight line. That is to say, $$\hat{y}(x)=a_0x^1+a_1x^0$$. Yes, the indexing is a little unfortunate but that's the way it goes. This means that the $$S_r$$ value, using the second version above, will be:
There are now two variables:$$a_0$$ and $$a_1$$. This means that to minimize the $$S_r$$ value, you need to solve:
where the $$\partial$$ symbol indicates a partial derivative. A partial derivative simply means that you are looking at how something changes with respect to changes in only one of its variables - all the other variables are assumed constant. For example, the volume of a cylinder can be given by $$V=\pi r^2h$$ where $$r$$ is the radius of the base and $$h$$ is the height. Using partial derivatives, you can calculate how the volume changes either as a function of changing the radius of the base or as a function of changing the height:
On the left, the $$h$$ is taken as a constant; on the right, the $$r$$ is taken as a constant.
Here goes round two! First, let's look at $$a_0$$:
The derivative of a sum is the same as the sum of derivatives, so put the derivative operator inside:
Use the power rule to get that $$d(u^2)=2u~du$$ and note that $$u=(a_0x_k+a_1-y_k)$$ so $$\frac{du}{da_0}=x_k$$ here (this is different from what it was above):
Since we are setting the left side to 0, the 2 is irrelevant. Also, the summand can be split into three parts...
...and then the parts can be separated.
None of these terms is simple enough do do anything with other than to recognize that the $$a_0$$ and $$a_1$$ are not functions of $$k$$ and can thus be brought out of the summations:
Now let's look at $$a_1$$:
The derivative of a sum is the same as the sum of derivatives, so put the derivative operator inside:
Use the power rule to get that $$d(u^2)=2u~du$$ and note that $$u=(a_0x_k+a_1-y_k)$$ so $$\frac{du}{da_1}=1$$ here:
Since we are setting the left side to 0, the 2 is irrelevant. Also, the summand can be split into three parts...
...and then the parts can be separated.
While the second term is actually simple enough to do something with (it is just adding up $$a_1$$ $$N$$ times and thus could be $$Na_1$$, we are simply going to recognize that the $$a_0$$ and $$a_1$$ are not functions of $$k$$ and can thus be brought out of the summations:
If we wanted to be explicit about it, we could note that $$\phi_1(x)$$ here is $$x^0$$ and write:
In fact, we could do the same with equation (1) above and write it as:
Equations (1e) and (2e) give two equations with two unknowns; putting them in matrix form yields: