Problem 23

Question

(Calculus needed.) Consider the multiple regression model: $$Y_{i}=\beta_{1} X_{i 1}+\beta_{2} X_{i 2}+\varepsilon_{i} \quad i=1, \ldots, n$$ where the \(\varepsilon_{i}\) are uncorrelated, with \(E\left[\varepsilon_{i}\right]=0\) and \(\sigma^{2}\left\\{\varepsilon_{i}\right\\}=\sigma^{2}\) a. 'State the least squares criterion and derive the least squares estimators of \(\beta_{1}\) and \(\beta_{2}\). b. Assuming that the \(\varepsilon_{i}\) are independent normal random variables, state the likelihood function and obtain the maximum likelihood estimators of \(\beta_{1}\) and \(\beta_{2}\). Are these the same as the least squares estimators?

Step-by-Step Solution

Verified
Answer
LTE and MLE for β₁ and β₂ are the same under normality.
1Step 1: Define the Least Squares Criterion
The least squares criterion aims to minimize the sum of the squared residuals. The residual for each observation is the difference between the observed value, the model value: RSS = Σ [ Y_{i} - ( β_{1}X_{i1} + β_{2}X_{i2} )]^2.
2Step 2: Form the Normal Equations
To find the least squares estimators, take partial derivatives of RSS with respect to β_{1} and β_{2}, and set them to zero: \(\frac{∂ RSS}{∂ β_{1}} \ = -2 Σ[X_{i1} ( Y_{i} - β_{1}X_{i1} - β_{2}X_{i2})] = 0 \)\ \(\frac{∂ RSS}{∂ β_{2}} \ = -2 Σ[X_{i2} ( Y_{i} - β_{1}X_{i1} - β_{2}X_{i2})] = 0 \)
3Step 3: Solve the Normal Equations
Setting the equations from Step 2 to zero, rearrange to solve for β_{1} and β_{2}. These can be written in matrix form and solved:\( \left( \begin{array}{cc} ΣX_{i1}² & ΣX_{i1}X_{i2} \ \ ΣX_{i1}X_{i2} & ΣX_{i2}² \end{array} \right) \ \left( \begin{array}{c} β_{1} \ \ β_{2} \end{array} \right) \ = \ \left( \begin{array}{c} ΣX_{i1}Y_{i} \ \ ΣX_{i2}Y_{i} \end{array} \right) \)
4Step 4: State the Likelihood Function
Given \(ε_{i} \sim N(0,σ²)\), the likelihood function L is\[ L = ∏_{i=1}^{n} \frac{1}{√2πσ} exp\left( \frac{-(Y_{i} - β_{1}X_{i1} - β_{2}X_{i2})²}{2σ²} \right) \]
5Step 5: Obtain the Log-Likelihood Function
Take the natural log of the likelihood function: \[ ln L = -\frac{n}{2} ln2π - \frac{n}{2} ln σ² - \frac{1}{2σ²} Σ [Y_{i} - ( β_{1}X_{i1} + β_{2}X_{i2})]² \]
6Step 6: Differentiate and Set to Zero
Differentiate the log-likelihood function with respect to β_{1} and β_{2}, then set the equations to zero: \( \frac{∂}{∂ β_{1}}(ln L) = 0 \) and \( \frac{∂}{∂ β_{2}}(ln L) = 0 \)
7Step 7: Solve for Maximum Likelihood Estimators
Solving these equations gives the Maximum Likelihood Estimators (MLE) for β_{1} and β_{2}. They are: \( \left( \begin{array}{c} \hat β_{1} \ \ \hat β_{2} \end{array} \right) = \left( X^{T}X \right)^{-1} \left( X^{T}Y \right) \)
8Step 8: Compare Both Estimators
Observe that the MLE for β_{1} and β_{2} are the same as the Least Squares Estimators (LSE). This is due to the normality assumption of the residuals.

Key Concepts

Least Squares EstimationMaximum Likelihood EstimationNormal EquationsLog-Likelihood Function
Least Squares Estimation
Least Squares Estimation is a fundamental method used to estimate the parameters of a regression model. The goal is to minimize the sum of the squared differences (residuals) between the observed values and the values predicted by the model. This technique is particularly useful when dealing with linear regression models.

In the given exercise, the residual for each observation is defined as the difference between the observed value, \(Y_{i}\), and the predicted value, \(\beta_{1}X_{i1} + \beta_{2}X_{i2}\). The least squares criterion is then expressed as the Residual Sum of Squares (RSS):
\(RSS = \sum_{i=1}^{n} \left(Y_{i} - (\beta_{1}X_{i1} + \beta_{2}X_{i2})\right)^2\).

To find the least squares estimators (\(\hat{\beta_{1}}\) and \(\hat{\beta_{2}}\)), we need to minimize the RSS. This involves taking the partial derivatives of the RSS with respect to each parameter, setting them to zero, and solving the resulting equations for \(\beta_{1}\) and \(\beta_{2}\). This method ensures that the sum of the squared residuals is as small as possible, thereby providing the best-fit line for the given data.
Maximum Likelihood Estimation
Maximum Likelihood Estimation (MLE) is another crucial method for estimating the parameters of a statistical model. The idea is to find the parameter values that maximize the likelihood function, which measures how likely it is to observe the given data under different parameter values.

In this exercise, we assume the errors \(\varepsilon_{i}\) follow a normal distribution with mean 0 and variance \(\sigma^2\). Given this assumption, the likelihood function \(L\) for the observed data is:
\[ L = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( \frac{-(Y_{i} - \beta_{1}X_{i1} - \beta_{2}X_{i2})^2}{2\sigma^2} \right) \]

To simplify the optimization, we usually work with the natural log of the likelihood function, which is called the log-likelihood function \(lnL\):
\[ lnL = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} \left( Y_{i} - (\beta_{1}X_{i1} + \beta_{2}X_{i2}) \right)^2 \]

We then differentiate the log-likelihood function with respect to the parameters, set these derivatives to zero, and solve for \(\beta_{1}\) and \(\beta_{2}\). This yields the maximum likelihood estimators, which, under the normality assumption, turn out to be the same as the least squares estimators.
Normal Equations
Normal Equations are a set of simultaneous linear equations derived from the method of least squares. These equations are used to find the estimators of the parameters in a linear regression model.

To derive these equations, we start with the partial derivatives of the Residual Sum of Squares (RSS) concerning each parameter and set them to zero. For the given model, the total system of normal equations is:
\[\frac{\partial RSS}{\partial \beta_{1}} = -2 \sum_{i=1}^{n} X_{i1} (Y_{i} - \beta_{1}X_{i1} - \beta_{2}X_{i2}) = 0 \]
\[\frac{\partial RSS}{\partial \beta_{2}} = -2 \sum_{i=1}^{n} X_{i2} (Y_{i} - \beta_{1}X_{i1} - \beta_{2}X_{i2}) = 0 \]

These partial derivatives, when set to zero, result in a system of linear equations known as the normal equations. In matrix form, these can be written as:
\[\begin{pmatrix} \sum X_{i1}^2 & \sum X_{i1}X_{i2} \ \sum X_{i1}X_{i2} & \sum X_{i2}^2 \end{pmatrix} \begin{pmatrix} \beta_{1} \ \beta_{2} \end{pmatrix} = \begin{pmatrix} \sum X_{i1}Y_{i} \ \sum X_{i2}Y_{i} \end{pmatrix} \]
By solving this system, we obtain the least squares estimators for \(\beta_{1}\) and \(\beta_{2}\). The solution involves inverting the matrix and multiplying by the vector of summed products, giving us the best-fit parameters.
Log-Likelihood Function
The log-likelihood function is a transformation of the likelihood function that simplifies the process of maximization. Since the likelihood function can be cumbersome due to its multiplicative nature, especially with normally distributed variables, taking the natural logarithm helps by turning products into sums.

In this exercise, the log-likelihood function \(lnL\) for our multiple regression model, given the normal distribution assumption for the residuals, is:
\[lnL = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} \left(Y_{i} - (\beta_{1} X_{i1} + \beta_{2} X_{i2})\right)^2 \]

By differentiating this log-likelihood function with respect to each parameter (\(\beta_{1}\) and \(\beta_{2}\)), setting the derivatives to zero, and solving, we find the maximum likelihood estimators. As shown here, under normality assumptions, these estimators are the same as the least squares estimators. This equivalence makes maximum likelihood estimation a particularly powerful tool in the context of linear regression.