Problem 2

Question

Bei \(n=21\) normalgewichtigen Männern ungefähr gleichen Alters wurden sowohl Körpergrö3e \(x_{i}\) (in \(\mathrm{cm}\) ) als auch Gewicht \(y_{i}\) (in \(\left.\mathrm{kg}\right)\) ermittelt, \(i=1, \ldots, n .\) Zwischen den \(x\)-Werten und den \(y\)-Werten wird cin lincarer Zusammenhang der Form \(y=a x+b\) angenommen. Für die Werte \(y_{1}, \ldots, y_{n}\) wird vorausgesetzt, dass sie eine Realisierung von unabhängigen, normalverteilten Zufallsvariablen mit gleicher (unbekannter) Varianz \(\sigma^{2}\) sind. Aus den Messwerten ergab sich: $$ \begin{aligned} &\sum_{i=1}^{n} x_{i}=3738, \sum_{i=1}^{n} y_{i}=1743 \\ &\sum_{i=1}^{n} x_{i}^{2}=666134, \sum_{i=1}^{n} y_{i}^{2}=146979 \quad \text { und } \quad \sum_{i=1}^{n} x_{i} y_{i}=310947 \end{aligned} $$ a) Bestimmen Sie die Regressionsgerade. b) Berechnen Sie mit einem selbst hergeleiteten Konfidenzschätzverfahren zum Niveau \(1-\alpha\) ein konkretes Schätzintervall für \(\sigma^{2}\) im Fall \(1-\alpha=0.95\) c) Ärzte verwenden häufig die Faustregel „Normalgewicht gleich KörpergröBe minus \(100^{\prime \prime}\). Überprüfen Sie, ob diese Regel angemessen ist, indem Sie sowohl die Nullhypothese \(H_{0}: a=1\) als auch die Nullhypothese \(\tilde{H}_{0}: b=-100\) jeweils auf dem Niveau \(\alpha=0.05\) testen.

Step-by-Step Solution

Verified
Answer
The regression line is approximately \( y = 0.86x - 70.08 \). Confidence interval for \( \sigma^2 \) and hypothesis tests for \( a = 1 \) and \( b = -100 \) need residual calculations and t-statistics.
1Step 1 - Determine the Regression Line Parameters
To find the regression line of the form \( y = ax + b \), first calculate the slope \( a \) and the intercept \( b \) using the formulas: \[ a = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - (\sum x_i)^2} \] \[ b = \frac{\sum y_i - a \sum x_i}{n} \] Plugging in the given values: \( n = 21 \), \( \sum x_i = 3738 \), \( \sum y_i = 1743 \), \( \sum x_i^2 = 666134 \), and \( \sum x_i y_i = 310947 \), we get: \[ a = \frac{21 \cdot 310947 - 3738 \cdot 1743}{21 \cdot 666134 - 3738^2} \] \[ b = \frac{1743 - a \cdot 3738}{21} \]
2Step 2 - Calculate the Slope \(a\)
Substitute the given values into the formula for \(a\): \[ a = \frac{21 \cdot 310947 - 3738 \cdot 1743}{21 \cdot 666134 - 3738^2} \] Simplify to get: \[ a = \frac{6539887 - 6517974}{13988814 - 13963444} = \frac{21913}{25370} \approx 0.86 \]
3Step 3 - Calculate the Intercept \(b\)
Now, use the value of \(a\) to find \(b\): \[ b = \frac{1743 - 0.86 \cdot 3738}{21} \] Simplify to get: \[ b = \frac{1743 - 3214.68}{21} \approx \frac{-1471.68}{21} \approx -70.08 \] Thus, the regression line is: \[ y = 0.86x - 70.08 \]
4Step 4 - Calculate Confidence Interval for \(\sigma^2\)
To find the confidence interval for \( \sigma^2 \) at a 95% confidence level, use the chi-squared distribution. First, calculate the residuals sum of squares (RSS): \[ RSS = \sum (y_i - \hat{y}_i)^2 = \sum (y_i - (0.86x_i - 70.08))^2 \] Calculate the chi-squared values for a 95% confidence interval: \[ \chi^2_{0.975, n-2} \] and \[ \chi^2_{0.025, n-2} \]. The interval for \( \sigma^2 \) is given by: \[ \frac{RSS}{\chi^2_{0.975, n-2}} \] and \[ \frac{RSS}{\chi^2_{0.025, n-2}} \]
5Step 5 - Test Hypotheses for \(a\) and \(b\)
To test the hypotheses \( H_0: a = 1 \) and \( \tilde{H}_0: b = -100 \): Compute the t-statistics using: \[ t = \frac{a - 1}{SE(a)} \] and \[ t = \frac{b + 100}{SE(b)} \] with \( SE(a) \) and \( SE(b) \) as the standard errors of \( a \) and \( b \), respectively. Compare these t-statistics to the critical value from the Student's t-distribution at \( \alpha = 0.05 \) with \( n - 2 \) degrees of freedom.

Key Concepts

linear regressionconfidence intervalhypothesis testingchi-squared distributiont-test
linear regression
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In this exercise, we explore how the weight (in kg, denoted as \(... y ...\)) of normal-weight men can be predicted based on their height (in cm, denoted as \(... x ...\)). We assume the relationship can be represented by a linear equation of the form \(... y = a x + b ...\), where \(... a ...\) is the slope and \(... b ...\) is the intercept. The goal is to determine the best-fitting line through the given data points. The slope \(... a ...\) indicates how much \(... y ...\) changes for a unit change in \(... x ...\), while the intercept \(... b ...\) is the value of \(... y ...\) when \(... x ...\) is zero. Given the data, we can calculate \(... a ...\) and \(... b ...\) using the formulas:
  • \[ a = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - (\sum x_i)^2} \]
  • \[ b = \frac{\sum y_i - a \sum x_i}{n} \]
In our scenario, plugging the given values into the formulas yields the regression line: \[ y = 0.86 x - 70.08 \].
confidence interval
A confidence interval provides a range of values that is likely to contain the true population parameter. In the context of linear regression, we often calculate confidence intervals for the variance \(... \sigma^2 ...\) of the residuals. To do this, we use the chi-squared distribution. First, we need to find the residual sum of squares (RSS), which tells us how much the observed values deviate from the predicted values: \[ RSS = \sum (y_i - (0.86 x_i - 70.08))^2 \] Next, we use the chi-squared values for our desired confidence level. For a 95% confidence interval with \(... n - 2 ...\) degrees of freedom, we look up the chi-squared values:
  • \[ \chi^2_{0.975, n-2} \]
  • \[ \chi^2_{0.025, n-2} \]
The confidence interval for \(... \sigma^2 ...\) is calculated as: \[ \frac{RSS}{\chi^2_{0.975, n-2}} \] and \[ \frac{RSS}{\chi^2_{0.025, n-2}} \]. This interval gives us a range wherein the true variance is likely to fall.
hypothesis testing
Hypothesis testing allows us to determine whether certain assumptions about a population parameter are plausible. In this exercise, we test if the slope \(... a ...\) of our regression line equals 1 and if the intercept \(... b ...\) equals -100. These hypotheses are expressed as:
  • Null Hypothesis (\(... H_0 ...\)): \(... a = 1 ...\)
  • Null Hypothesis (\(... \tilde{H}_0 ...\)): \(... b = -100 ...\)
To test these hypotheses, we use the t-test statistic for each parameter. The t-test formula for the slope is: \[ t = \frac{a - 1}{SE(a)} \] and for the intercept, it is: \[ t = \frac{b + 100}{SE(b)} \] Here, \(... SE(a) ...\) and \(... SE(b) ...\) represent the standard errors of the slope and intercept, respectively. We compare the t-statistics to the critical value from the Student's t-distribution with \(... n - 2 ...\) degrees of freedom. If the t-statistic exceeds the critical value at the 0.05 significance level, we reject the null hypothesis.
chi-squared distribution
The chi-squared distribution is used in hypothesis testing and constructing confidence intervals for the variance of a normally distributed population. It is particularly useful in tests of goodness of fit and in comparing observed data with expected data. For a 95% confidence interval, we need two critical values from the chi-squared distribution:
  • Lower critical value: \(... \chi^2_{0.025, n-2} ...\)
  • Upper critical value: \(... \chi^2_{0.975, n-2} ...\)
These values help us create a range within which the true variance is likely to fall. In our example, using the RSS and the degrees of freedom, the confidence interval for \(... \sigma^2 ...\) is given by: \[ \frac{RSS}{\chi^2_{0.975, n-2}} \] to \[ \frac{RSS}{\chi^2_{0.025, n-2}} \]. This range provides an estimate of the variability of the data points around the regression line.
t-test
The t-test is used to determine whether there is a significant difference between the means of two groups, or to test if a regression coefficient (e.g., slope or intercept) is significantly different from a hypothesized value. In the context of our regression analysis, we use the t-test to check if:
  • The slope \(... a ...\) significantly differs from 1 (\(... H_0: a = 1 ...\))
  • The intercept \(... b ...\) significantly differs from -100 (\(... \tilde{H}_0: b = -100 ...\))
The test statistic for the slope is calculated as: \[ t = \frac{a - 1}{SE(a)} \] and for the intercept as: \[ t = \frac{b + 100}{SE(b)} \]. We then compare these calculated t-statistics to the critical t-value from the Student's t-distribution at a 0.05 significance level. If the computed t-statistic is beyond the critical value, we reject the null hypothesis in favor of the alternative. This helps us understand if our regression estimates provide a statistically reliable model based on the given data.