Problem 8

Question

The point of this exercise is to show that tests for functional form cannot be relied on as a general test for omitted variables. Suppose that, conditional on the explanatory variables $x_{1}$ and $x_{2},$ a linear model relating $y$ to $x_{1}$ and $x_{2}$ satisfies the Gauss-Markov assumptions: $$\begin{aligned}y &=\beta_{0}+\beta_{1} x_{1}+\beta_{2} x_{2}+u \\\\\mathrm{E}\left(u | x_{1}, x_{2}\right) &=0 \\\\\operatorname{Var}\left(u | x_{1}, x_{2}\right) &=\sigma^{2}\end{aligned}$$. To make the question interesting, assume $\beta_{2} \neq 0$. Suppose further that $x_{2}$ has a simple linear relationship with $x_{1}$ : $$\begin{aligned}x_{2} &=\delta_{0}+\delta_{1} x_{1}+r \\\\\mathrm{E}\left(r | x_{1}\right) &=0 \\\\\operatorname{Var}\left(r | x_{1}\right) &=\tau^{2}\end{aligned}$$ i. Show that $$\mathbf{E}\left(y | x_{1}\right)=\left(\beta_{0}+\beta_{2} \delta_{0}\right)+\left(\beta_{1}+\beta_{2} \delta_{1}\right) x_{1}$$. Under random sampling, what is the probability limit of the OLS estimator from the simple regression of $y$ on $x_{1} ?$ Is the simple regression estimator generally consistent for $\beta_{1} ?$ ii. If you run the regression of $y$ on $x_{1}, x_{1}^{2},$ what will be the probability limit of the OLS estimator of the coefficient on $x_{1}^{2} ?$ Explain. iii. Using substitution, show that we can write $$y=\left(\beta_{0}+\beta_{2} \delta_{0}\right)+\left(\beta_{1}+\beta_{2} \delta_{1}\right) x_{1}+u+\beta_{2} r$$. It can be shown that, if we define $v=u+\beta_{2} r$ then $\mathrm{E}\left(v | x_{1}\right)=0, \operatorname{Var}\left(v | x_{1}\right)=\sigma^{2}+\beta_{2}^{2} \tau^{2} .$ What consequences does this have for the $t$ statistic on $x_{1}^{2}$ from the regression in part (ii)? iv. What do you conclude about adding a nonlinear function of $x_{1}-$ in particular, $x_{1}^{2}-$ in an attempt to detect omission of $x_{2} ?$

Step-by-Step Solution

Verified

Answer

Tests for functional form won't reliably detect omitted variables; non-linear terms like $x_{1}^{2}$ are ineffective for this purpose.

1Step 1: Derive the Conditional Expectation

Given the relationships \[ x_{2} = \delta_{0} + \delta_{1} x_{1} + r \] and \[ y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + u \], the conditional expectation of $y$ given $x_{1}$ can be expressed as follows:Substituting $x_{2}$ in the equation for $y$:\[ y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} (\delta_{0} + \delta_{1} x_{1} + r) + u \]Simplify and group the terms:\[ y = (\beta_{0} + \beta_{2} \delta_{0}) + (\beta_{1} + \beta_{2} \delta_{1}) x_{1} + \beta_{2} r + u \]Taking the expectation given $x_{1}$:\[ \mathbf{E}(y | x_{1}) = (\beta_{0} + \beta_{2} \delta_{0}) + (\beta_{1} + \beta_{2} \delta_{1}) x_{1} \]This shows the linear relationship of $y$ in terms of $x_{1}$ only.

2Step 2: Check Consistency of OLS Estimator

Under random sampling, the OLS estimator from the simple regression of $y$ on $x_{1}$ aims to estimate the slope $\beta_{1} + \beta_{2} \delta_{1}$. It does not generally identify $\beta_{1}$ becauseadditional contribution from $\beta_{2} \delta_{1}$ distorts the estimate unless $\beta_{2} = 0$. Thus, the OLS estimator will not be generally consistent for $\beta_{1}$.

3Step 3: Analyze OLS Estimator with Quadratic Term

In the regression of $y$ on $x_{1}$ and $x_{1}^{2}$, substitute $y$ from the transformed equation: \[ y = (\beta_{0} + \beta_{2} \delta_{0}) + (\beta_{1} + \beta_{2} \delta_{1}) x_{1} + u + \beta_{2} r \]Since $\mathrm{E}(v | x_{1}) = 0$, adding $x_{1}^{2}$ is unrelated:There is no population quadratic relationship, so the OLS estimator for $x_{1}^2$ converges to zero.

4Step 4: Understand t-statistic Implication

If we define $v = u + \beta_{2} r$, then $\operatorname{Var}(v|x_1) = \sigma^{2} + \beta_{2}^{2} \tau^{2}$. The introduction of $x_{1}^{2}$ should not change this variance component, implying a non-significant t-statistic on $x_{1}^2$. Any significance links wrongly to variance components rather than an omitted $x_{2}$.

5Step 5: Conclusion on Use of Nonlinear Functions

Attempting to detect $x_{2}$ omission by including $x_{1}^{2}$ is futile in this context. Due to the linear adjustment and the expected zero probability limit of its coefficient, nonlinear transformation like $x_{1}^2$ will not resolve omitted variable issues effectively. It doesn't consistently indicate the effect of omitted variables according to tests for functional form changes.

Key Concepts

Gauss-Markov AssumptionsConditional ExpectationOLS EstimatorNonlinear Function

Gauss-Markov Assumptions

The Gauss-Markov assumptions are fundamental in the realm of Ordinary Least Squares (OLS) estimation. These assumptions guarantee that the OLS estimator is the Best Linear Unbiased Estimator (BLUE). Let's break down these assumptions.

1. **Linearity**: This assumes that the relationship between the dependent and independent variables is linear. In mathematical terms, this is expressed as: \[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u \] Here, $y$ is the dependent variable, $x_1$ and $x_2$ are the independent variables, and $u$ is the error term.

2. **Exogeneity**: The expected value of the error term is zero given the independent variables, ensuring no correlation exists between the dependent variables and the error term: \[ \mathrm{E}(u | x_1, x_2) = 0 \] This assumption is crucial for unbiasedness.

3. **Homoscedasticity**: The variance of the errors should be constant across the observations: \[ \operatorname{Var}(u | x_1, x_2) = \sigma^2 \]

When these assumptions hold, OLS produces optimum estimates that have minimal variance among all unbiased linear estimators. Understanding these assumptions helps in ensuring the accuracy of regression models.

Conditional Expectation

Conditional expectation is a crucial concept in statistical models, as it helps in understanding how the expected value of a random variable changes when conditioned on another variable. For example, in the context of our problem, we are interested in the conditional expectation of $y$ given $x_1$.

In the given scenario,1. **Model relations**: We have the expressions: - $ x_2 = \delta_0 + \delta_1 x_1 + r $ - $ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u $

2. **Substituting $x_2$ into $y$**: By substituting the expression for $x_2$ into the equation for $y$, we can derive $ y = (\beta_0 + \beta_2 \delta_0) + (\beta_1 + \beta_2 \delta_1)x_1 + \beta_2 r + u $.

3. **Taking expectation**: Given that $r$ and $u$ are zero in expectation when conditioned on $x_1$, the conditional expectation simplifies to: \[ \mathbf{E}(y | x_1) = (\beta_0 + \beta_2 \delta_0) + (\beta_1 + \beta_2 \delta_1)x_1 \]

This shows that even when $x_2$ is omitted from the regression, it subtly influences the regression through $x_1$. Understanding conditional expectations helps in identifying potential biases in models.

OLS Estimator

The Ordinary Least Squares (OLS) estimator is a method for estimating the parameters of a linear regression model. The goal of OLS is to minimize the sum of the squared differences between the observed dependent variable values and those predicted by the model.

1. **OLS Objective**: It seeks to find coefficients ($\beta_0, \beta_1, \beta_2$) that minimize the squared residuals: \[ \sum(y_i - (\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))^2 \]

2. **Probability Limit**: In the simple regression of $y$ on $x_1$, the OLS estimator aims to identify $\beta_1 + \beta_2\delta_1$, not just $\beta_1$, due to the influence of $x_2$ embodied in $x_1$: \[ \hat{\beta}_1 \rightarrow_p (\beta_1 + \beta_2 \delta_1) \]

3. **Consistency**: An estimator is said to be consistent if it converges in probability to the true parameter value as the sample size increases. Therefore, the simple OLS regression on $x_1$ will not be consistent for $\beta_1$ unless $\beta_2 = 0$. This demonstrates how inclusion or exclusion of variables affects OLS estimates.

Nonlinear Function

Nonlinear functions in regression are used to capture patterns in data that linear models fail to. In our exercise, $x_1^2$ is considered as a nonlinear function to determine if it helps identify the omission of $x_2$.

1. **Transformation to Nonlinear**: By adding $x_1^2$ to the regression model, the aim is to capture any nonlinear relationship between $x_1$ and $y$.

2. **Impact on Omitted Variables**: If using $x_1^2$ does not significantly change the adjusted model's fit, it suggests that $x_2$ was not effectively being captured through nonlinear transformation. Furthermore, in the presence of omitted variables like $x_2$, linear adjustments won't substitute for its exclusion.

3. **T-Statistic and Significance**: The regression's t-statistic for $x_1^2$ is unlikely to reflect significance due to the variance being unrelated to $x_1^2$: \[ \operatorname{Var}(v | x_1) = \sigma^{2} + \beta_{2}^{2} \tau^{2} \]

This exercise highlights the challenge of relying solely on nonlinear functions to detect omitted variables, as they can mask rather than reveal such issues.

Problem 7

Problem 10

Other exercises in this chapter

Problem 5

In Example $4.4,$ we estimated a model relating number of campus crimes to student enrollment for a sample of colleges. The sample we used was not a random sa

View solution

Problem 7

Consider the simple regression model with classical measurement error, $y=\beta_{0}+\beta_{1} x^{*}+u,$ where we have $m$ measures on $x^{*} .$ Write thes

View solution

Problem 10

This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of t

View solution

Problem 4

The following equation explains weekly hours of television viewing by a child in terms of the child's age, mother's education, father's education, and number of

View solution