Problem 20
Question
The Times-Observer is a daily newspaper in Metro City. Like many city newspapers, the Times-Observer is suffering through difficult financial times. The circulation manager is studying other papers in similar cities in the United States and Canada. She is particularly interested in what variables relate to the number of subscriptions to the paper. She is able to obtain the following sample information on 25 newspapers in similar cities. The following notation is used: Sub \(=\) Number of subscriptions (in thousands). Popul = The metropolitan population (in thousands). Adv \(=\) The advertising budget of the paper (in \(\$$ hundreds). Income \)=$ The median family income in the metropolitan area (in \$ thousands). a. Determine the regression equation. b. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not equal to zero. c. Conduct a test for the individual coefficients. Would you consider deleting any coefficients? d. Determine the residuals and plot them against the fitted values. Do you see any problems? e. Develop a histogram of the residuals. Do you see any problems with the normality assumption?
Step-by-Step Solution
VerifiedKey Concepts
Hypothesis Testing
In a regression model, we commonly perform two types of tests: the global F-test and the individual t-tests.
The **global F-test** examines whether at least one predictor variable is associated with the outcome variable. It tests the null hypothesis: \( H_0: \beta_1 = \beta_2 = \beta_3 = 0 \) against the alternative hypothesis: \( H_a: \text{At least one } \beta_i eq 0 \).
If the F-statistic is significantly larger than the critical value from the F-distribution at a chosen significance level, we reject the null hypothesis, indicating that at least one of the coefficients is different from zero. This means that the model has at least one predictor that is significantly impacting the response variable.
On the other hand, **individual t-tests** aim to evaluate each predictor variable's contribution to the model by testing hypotheses such as: \( H_0: \beta_i = 0 \) vs. \( H_a: \beta_i eq 0 \).
Here, a significant t-test implies that the corresponding variable meaningfully contributes to predicting the outcome.
Residual Analysis
When **plotting residuals against fitted values**, it's important to analyze the pattern. Ideally, residuals should appear randomly scattered without forming any discernible pattern. A clear pattern could indicate non-linearity, incorrect model form, or presence of outliers.
A **random dispersion** of residuals generally suggests that the model fits well and the assumptions are met.
Moreover, a histogram of the residuals can be created to **test the normality assumption**. A normal distribution of residuals will look like a bell-shaped curve, indicating that the model's errors are normally distributed.
Deviation from this shape could suggest potential issues, such as skewness or kurtosis, meaning the model may need to be reconsidered or that data transformations might be necessary.
F-test
In an F-test, you calculate the F-statistic, which measures the model's explained variance against its unexplained variance. To perform this test, the following steps are performed:
- Determine the total sum of squares (SST), regression sum of squares (SSR), and residual sum of squares (SSE).
- Calculate the mean square for the regression (MSR) and the mean square for the error (MSE).
- Compute the F-statistic: \( F = \frac{MSR}{MSE} \)
If the computed F-value exceeds this critical value, the null hypothesis is rejected. This result suggests that at least one predictor variable has a statistically significant impact on the dependent variable.
t-test
For each coefficient, the hypotheses are:\( H_0: \beta_i = 0 \)and\( H_a: \beta_i eq 0 \).
The test statistic is calculated as follows: \[ t = \frac{b_i}{\text{SE}(b_i)} \]where:
- \( b_i \) is the estimated regression coefficient.
- \( \text{SE}(b_i) \) is the standard error of the coefficient.
If the absolute t-value is greater than the critical value, the null hypothesis is rejected, revealing that the variable contributes significantly to predicting the dependent variable. This decision helps refine the model by retaining only the most impactful variables.