25.4. Multiple Regression#
Regression provides one way of predicting a numerical variable, called a response, based on other variables called predictor variables. The multiple regression model says in essence that
You can think of the first term on the right hand side as a signal. The problem is that we don’t get to observe the signal. The observed response is the sum of the signal and the noise. The data scientist’s task is to use the observations to extract the signal as accurately as possible.
It is worth looking more closely at exactly what is linear in linear regression, now that we are allowing more than one predictor variable. For example, notice that you can fit a quadratic function of
is a quadratic function of
25.4.1. The Model#
As in all of statistical inference, properties of estimates depend on the assumptions under which they are calculated. The multiple regression model is a commonly used set of assumptions that describes a particular kind of linear relation between a numerical response variable and a set of predictor variables. You should use it only if you believe that it makes sense for your data.
The model assumes that there are
in the notation described below.
are the observed constant values of predictor variables for individual . They are not random variables. If you prefer to think of the predictor variables as random, this model assumes that you have conditioned on them.The intercept
and slopes are unobservable constants and are parameters of the model. There are of them, hence the notation for “parameters”. is an unobservable random error that has the normal distribution for some unobservable , and are i.i.d. is the observable response of individual . It is random because is one of its components.
We will assume that
Two special cases are already familiar.
When
The two parameters are the intercept and a slope. The model says that for each individual
25.4.2. Signal and Noise: Matrix Representation#
For any
in the matrix notation described below.
The design matrix
is an matrix of real numbers, not random variables. Column 0 of is a vector of 1’s and Column for consists of the observations on the th predictor variable. For each in the range 1 through , Row contains the values of all the predictor variables for individual .The parameter vector
is a vector of the coefficients.The error vector
is an multivariate normal random vector. Its mean vector is an vector of 0’s and is the identity matrix.The response vector
is a random vector that is the sum of the linear signal and the normal noise .
25.4.3. Ordinary Least Squares#
Based on the observations of the predictor variables and the response, the goal is to find the best estimates of the intercept and slopes in the model.
These estimates can then be used to predict the response of a new individuals, assuming that the model holds for the new individual as well.
We must select a criterion by which we will decide whether one estimate is better than another. To develop one such criterion, start by noting that any linear function of the predictor variables can be written as
The goal of ordinary least squares (OLS) is to find the vector
This is the same as the
Again for compactness it will help to use matrix notation. For an
which is sometimes called the squared norm of
In this notation, the goal of OLS is to find the
Typically you will also have to estimate the unknown error variance
See More
25.4.4. Guessing the Best Estimate of #
Remember that we have assumed
The claim is that OLS estimate of
The claim is motivated by our earlier formula
for the coefficients of the least squares linear predictor a random variable
The key idea is that of projection: the best
The error in the best estimate is
We have assumed that
Before we go further, notice that
Also note that the estimate of
which is also a linear function of
25.4.5. Projection#
Define the
As we have seen repeatedly, the key to least squares is that the prediction error is orthogonal to the space of allowed functions. Our space of allowed functions is all linear functions of
The residual vector is orthogonal to each column of
This is essentially true by construction. Formally, calculate the
25.4.6. Least Squares#
Let
See More
25.4.7. Signal and Noise, Revisited#
Our regression model is
Here
is the unobservable but non-random true signal is an unobservable random vector consisting of the deviations of from the true plane . Elements of are mutually independent.
Once we have carried out the regression, our estimate of the response vector
The residual vector is
Therefore we have another expression for the response vector
It is important to note the distinction between this identity and the model.
is the observable random estimated signal. is the observable random vector consisting of the deviations of from the estimated plane . Elements of are not independent of each other, because they add up to .
In exercises you will show that
25.4.8. Estimate of #
It should come as no surprise that under the multiple regression model, there is an unbiased estimator of
Some more work establishes that
We’ll leave that work for another course. For now, just notice that if the number of data points
which is the natural mean squared error. If you have a lot of data, you don’t have to worry about fine points like dividing by
Special Case
As noted earlier, in the case
You know that the least squares constant is
is an unbiased estimate of
25.4.9. Confidence Intervals#
The upshot of this discussion is that if
For example, a 95% confidence interval for the parameter
Here
The variance of