Problem 5

Question

In einem Krankenhaus wurden von 20 neugeborenen Kindern die Körperlänge \(x\) (in \(\mathrm{cm}\) ) und der Kopfumfang \(y\) (in \(\mathrm{cm}\) ) gemessen. Dabei ergab sich folgende nach Körperlänge geordnete Messreihe \(\begin{array}{llll}\left(x_{1}, y_{1}\right), \ldots, & \left(x_{20}, y_{20}\right): & & \\ (48.2,34.8) & (48.5,33.4) & (48.6,35.1) & (48.9,34.0) & (49.2,34.9) \\ (49.4,36.0) & (49.5,34.1) & (49.8,35.5) & (50.3,35.3) & (50.3,36.1) \\ (50.7,36.8) & (50.9,35.4) & (51.0,35.9) & (51.1,35.7) & (51.3,35.2) \\ (51.4,36.2) & (51.6,36.9) & (52.1,37.4) & (52.4,36.3) & (52.8,37.8)\end{array}\) \(\begin{array}{lllll}(48.2,34.8) & (48.5,33.4) & (48.6,35.1) & (48.9,34.0) & (49.2,34.9) \\ (49.4,36.0) & (49.5,34.1) & (49.8,35.5) & (50.3,35.3) & (50.3,36.1) \\ (50.7,36.8) & (50.9,35.4) & (51.0,35.9) & (51.1,35.7) & (51.3,35.2) \\ (51.4,36.2) & (51.6,36.9) & (52.1,37.4) & (52.4,36.3) & (52.8,37.8)\end{array}\) mit den Summenwerten $$ \begin{aligned} &\sum_{i=1}^{20} x_{i}=1008 \quad \sum_{i=1}^{20} x_{i}^{2}=50837.86 \\ &\sum_{i=1}^{20} y_{i}=712.8 & \sum_{i=1}^{20} y_{i}^{2}=25428.05 \quad \sum_{i=1}^{20} x_{i} y_{i}=35948.24 \end{aligned} $$ a) Stellen Sie die beobachteten Daten in einem Punktediagramm graphisch dar. b) Berechnen Sie die empirische Kovarianz und den empirischen Korrelationskoeffizienten zu dieser zweidimensionalen Messreihe. c) Der berechnete Wert des empirischen Korrelationskoeffizienten rechtfertigt die Annahme eines annähernd linearen Zusammenhangs zwischen \(x\) und \(y .\) Berechnen Sie daher die Regressionsgerade \(y=\hat{a} x+\hat{b}\) zu der gegebenen Messreihe und zeichnen Sie diese in das Punktediagramm ein. d) Bestimmen Sie mittels der Regressionsgeraden aus c) einen Vorhersagewert fïr den Kopfumfang bei einer Körperlänge von \(50 \mathrm{~cm}\).

Step-by-Step Solution

Verified
Answer
1. Create scatter plot. 2. Calculate covariance: 1.3168. 3. Calculate correlation coefficient: 0.6011. 4. Regression line: y = 0.496x + 10.76. 5. For 50 cm: head circumference = 35.56 cm.
1Step 1: Plot the Data Points
Create a scatter plot for the data points \(x_i, y_i\) where \(x_i\) represents the body length and \(y_i\) represents the head circumference. The x-axis will represent the body length, and the y-axis will represent the head circumference.
2Step 2: Calculate Empirical Covariance
The empirical covariance is given by \(Cov(x,y) = \frac{\sum_{i=1}^{20} (x_{i} - \bar{x})(y_{i} - \bar{y})}{n-1}\), where \(\bar{x} = \frac{1}{n}\sum_{i=1}^{20} x_{i}\) and \(\bar{y} = \frac{1}{n}\sum_{i=1}^{20} y_{i}\). First, calculate \(\bar{x}\) and \(\bar{y}\): \(\bar{x} = \frac{1008}{20} = 50.4\) and \(\bar{y} = \frac{712.8}{20} = 35.64\). Then, use these to find: \(Cov(x,y) = \frac{35948.24 - 20 \cdot 50.4 \cdot 35.64}{19} = 1.3168\).
3Step 3: Calculate Empirical Correlation Coefficient
The empirical correlation coefficient is given by \(r_{xy} = \frac{Cov(x,y)}{s_x s_y}\), where \(s_x = \sqrt{\frac{1}{n-1} \sum_{i=1}^{20} (x_{i}-\bar{x})^2}\) and \(s_y = \sqrt{\frac{1}{n-1} \sum_{i=1}^{20} (y_{i}-\bar{y})^2}\). Calculate \(s_x\) and \(s_y\): \(s_x = \sqrt{\frac{1}{19} (50837.86 - 20 \cdot 50.4^2)} = 1.628\) and \(s_y = \sqrt{\frac{1}{19} (25428.05 - 20 \cdot 35.64^2)} = 1.292\). Finally, substitute to get \(r_{xy} = \frac{1.3168}{1.628 \cdot 1.292} = 0.6011\).
4Step 4: Calculate Regression Line
The regression line is given by \(y = \hat{a} x + \hat{b}\), where \(\hat{a} = \frac{Cov(x, y)}{s_x^2} = \frac{1.3168}{1.628^2} = 0.496\) and \(\hat{b} = \bar{y} - \hat{a} \cdot \bar{x} = 35.64 - 0.496 \cdot 50.4 = 10.76\). Therefore, the regression line is \(y = 0.496 x + 10.76\). Plot this line on the previously created scatter plot.
5Step 5: Predict Head Circumference for 50 cm Body Length
Use the regression equation \(y = 0.496 \cdot 50 + 10.76\) to predict the head circumference for a body length of 50 cm: \(y = 0.496 \cdot 50 + 10.76 = 35.56\) cm.

Key Concepts

scatter plotempirical covariancecorrelation coefficientregression line
scatter plot
A scatter plot is a type of graph used to visualize the relationship between two numerical variables. In this exercise, the variables are body length (\(x\)) and head circumference (\(y\)) of newborns. Each pair (\(x_i, y_i\)) represents a single observation, plotted as a point on the graph. The x-axis represents body length, while the y-axis represents head circumference.

This visual representation helps us see if there is any noticeable trend or pattern. For example, if the points tend to rise together, it suggests a positive relationship. To create your scatter plot, plot each (\(x_i, y_i\)) point on the graph. Once all points are plotted, you can begin to interpret the data and look for any linear relationships or trends.
empirical covariance
Empirical covariance measures how much two variables change together. It's a way to quantify the degree to which the body length and head circumference deviate from their means in sync. To compute it, you use the formula:
\( Cov(x,y) = \frac{ \sum_{i=1}^{20}(x_i - \bar{x})(y_i - \bar{y})}{n-1} \)

Here, \( \bar{x} \) and \( \bar{y} \) are the averages of body length and head circumference, respectively. The results can be positive, negative, or zero.

A positive covariance indicates that as one variable increases, so does the other. A negative covariance suggests that as one variable increases, the other decreases. In this case, empirical covariance calculation shows how much body length and head circumference vary together.
correlation coefficient
The correlation coefficient, denoted as \( r \), quantifies the strength and direction of the linear relationship between two variables. It is calculated using the formula:
\( r_{xy} = \frac{Cov(x,y)}{s_x s_y} \),
where \( s_x \) and \( s_y \) are the standard deviations of body length and head circumference.

It ranges between -1 and 1. A value close to 1 indicates a strong positive linear relationship, while a value close to -1 indicates a strong negative linear relationship. A value near 0 suggests no linear correlation. A correlation coefficient of 0.6011, as calculated in this exercise, suggests a moderate positive linear relationship between body length and head circumference.
regression line
A regression line, or line of best fit, is a straight line that best approximates the data in a scatter plot. It is used to predict the value of the dependent variable (head circumference) based on the independent variable (body length). The equation of the regression line is:
\( y = \hat{a} x + \hat{b} \).

The slope \( \hat{a} \) is calculated as:
\( \hat{a} = \frac{Cov(x, y)}{s_x^2} \),
while the intercept \( \hat{b} \) is:
\( \hat{b} = \bar{y} - \hat{a} \cdot \bar{x} \).

In our example, the regression line equation would be \( y = 0.496x + 10.76 \). Plotting this line on the scatter plot helps visualize the trend and make predictions. For instance, for a body length of 50 cm, the head circumference is predicted to be approximately 35.56 cm based on the regression equation.