Difference between revisions of "General Linear Regression"
(Created page with "This is a work in progress. It is meant to capture the mathematical proof of how general linear regression works. It is math-heavy. == Introduction == Assume you have some...") |
(→Finding the coefficients for the "constant" model) |
||
Line 31: | Line 31: | ||
== Finding the coefficients for the "constant" model == | == Finding the coefficients for the "constant" model == | ||
− | The simplest model you might come up with is a | + | The simplest model you might come up with is a simple constant, $$\hat{y}(x)=a_0x^0$$. This means that the $$S_r$$ value, using the second version above, will be: |
+ | <center>$$\begin{align*} | ||
+ | S_r&=\sum_k\left(\hat{y}_k-y_k\right)^2=\sum_k\left(a_0-y_k\right)^2 | ||
+ | \end{align*}$$</center> | ||
+ | Keep in mind that the only ''variable'' right now is $$a_0$$; all the $$x$$ and $$y$$ values are constant independent or dependent values from your data set. The ''only parameter'' you can adjust is $$a_0$$. This means that to minimize the $$S_r$$ value, you need to solve: | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | \frac{dS_r}{da_0}&=0 | ||
+ | \end{align*}$$ | ||
+ | </center> | ||
+ | Here goes! | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | \frac{dS_r}{da_0}=\frac{d}{da_0}\left(\sum_k\left(a_0-y_k\right)^2 \right)&=0 | ||
+ | \end{align*}$$ | ||
+ | </center> | ||
+ | The derivative of a sum is the same as the sum of derivatives, so put the derivative operator inside: | ||
+ | <center>$$ | ||
+ | \begin{align*}\sum_k\frac{d}{da_0}\left(a_0-y_k\right)^2&=0 | ||
+ | \end{align*}$$ | ||
+ | </center> | ||
+ | Use the power rule to get that $$d(u^2)=2u~du$$ and note that $$\frac{du}{da_0}=1$$ here: | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | 2\sum_k\left(a_0-y_k\right)&=0\end{align*}$$ | ||
+ | </center> | ||
+ | Since we are setting the left side to 0, the 2 is irrelevant. Also, the summand can be split into two parts... | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | \sum_k\left(a_0\right)-\sum_k\left(y_k\right)&=0 | ||
+ | \end{align*}$$ | ||
+ | </center> | ||
+ | ...and then the parts can be separated. | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | \sum_k\left(a_0\right)&=\sum_k\left(y_k\right) | ||
+ | \end{align*}$$ | ||
+ | </center> | ||
+ | Recognize the $$a_0$$ is a constant; since you are adding that constant to itself for each of the $$N$$ data points, you can replace the summation with: | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | Na_0&=\sum_k\left(y_k\right)\end{align*}$$ | ||
+ | </center> | ||
+ | Dividing by $$N$$ reveals the answer: | ||
+ | <center>$$ | ||
+ | \begin{align*} | ||
+ | a_0&=\frac{1}{N}\sum_k\left(y_k\right)=\bar{y} | ||
+ | \end{align*}$$ | ||
+ | </center> | ||
+ | The best constant with which to model a data set is its own average! |
Revision as of 23:53, 27 October 2019
This is a work in progress. It is meant to capture the mathematical proof of how general linear regression works. It is math-heavy.
Introduction
Assume you have some data set where you have $$N$$ independent values $$x_k$$ and dependent values $$y_k$$. You also have some reasonable scientific model that relates the dependent variable to the independent variable. If that model can be written as a general linear fit, that means you can represent the fit function $$\hat{y}(x)$$ as:
where $$\phi_m(x)$$ is the $$m$$th basis function in your model and $$a_m$$ is the constant coefficient. For instance, if you end up having a model:
then you could map these to the summation with $$M=2$$ basis function total and:
Note for the second term that $$\phi(x)$$ must be a function of $$x$$ -- constants are thus the coefficients on an implied $$x^0$$.
The goal, once we have established a scientifically valid model, is to determine the "best" set of coefficients for that model. We are going to define the "best" set of coefficients as the values of $$a_m$$ that minimize the sum of the squares of the estimate residuals, $$S_r$$, for that particular model. Recall that:
Finding the coefficients for the "constant" model
The simplest model you might come up with is a simple constant, $$\hat{y}(x)=a_0x^0$$. This means that the $$S_r$$ value, using the second version above, will be:
Keep in mind that the only variable right now is $$a_0$$; all the $$x$$ and $$y$$ values are constant independent or dependent values from your data set. The only parameter you can adjust is $$a_0$$. This means that to minimize the $$S_r$$ value, you need to solve:
Here goes!
The derivative of a sum is the same as the sum of derivatives, so put the derivative operator inside:
Use the power rule to get that $$d(u^2)=2u~du$$ and note that $$\frac{du}{da_0}=1$$ here:
Since we are setting the left side to 0, the 2 is irrelevant. Also, the summand can be split into two parts...
...and then the parts can be separated.
Recognize the $$a_0$$ is a constant; since you are adding that constant to itself for each of the $$N$$ data points, you can replace the summation with:
Dividing by $$N$$ reveals the answer:
The best constant with which to model a data set is its own average!