Problem 1

Question

The data in Table \(28.3\) represent salaries (in pounds Sterling) in 72 randomly selected advertisements in the The Guardian (April 6, 1992). When a range was given in the advertisement, the midpoint of the range is reproduced in the table. The data are salaries corresponding to two kinds of occupations ( \(n=m=72\) ): (1) creative, media, and marketing and (2) education. The sample mean and sample variance of the two datasets are, respectively: (1) \(\bar{x}_{72}=17410\) and \(s_{x}^{2}=41258741\), (2) \(\bar{y}_{72}=19818\) and \(s_{y}^{2}=50744521\). Suppose that the datasets are modeled as realizations of normal distributions with expectations \(\mu_{1}\) and \(\mu_{2}\), which represent the salaries for occupations (1) and (2). a. Test the null hypothesis that the salary for both occupations is the same at level \(\alpha=0.05\) under the assumption of equal variances. Formulate the proper null and alternative hypotheses, compute the value of the test statistic, and report your conclusion. b. Do the same without the assumption of equal variances. c. As a comparison, one carries out an empirical bootstrap simulation for the nonpooled studentized mean difference. The bootstrap approximations for the critical values are \(c_{l}^{*}=-2.004\) and \(c_{u}^{*}=2.133\). Report your conclusion about the salaries on the basis of the bootstrap results.

Step-by-Step Solution

Verified
Answer
The null hypothesis is rejected in all cases, indicating a significant salary difference between the two occupations.
1Step 1: Formulate Hypotheses for Equal Variances
The null hypothesis for equal variances is that the true salary means for both occupations are the same, i.e., \( H_0: \mu_1 = \mu_2 \). The alternative hypothesis is that the means are different, i.e., \( H_a: \mu_1 eq \mu_2 \).
2Step 2: Calculate the Pooled Standard Deviation
The pooled standard deviation \( s_p \) is calculated using the formula: \[ s_p = \sqrt{\frac{(n-1)s_x^2 + (m-1)s_y^2}{n+m-2}} \]Plug in the values:\[ s_p = \sqrt{\frac{(72-1) \times 41258741 + (72-1) \times 50744521}{72 + 72 - 2}} \]This yields \( s_p \approx 6929.21 \).
3Step 3: Compute the Test Statistic for Equal Variances
The test statistic \( t \) is calculated using the formula: \[ t = \frac{\bar{x} - \bar{y}}{s_p \sqrt{\frac{1}{n} + \frac{1}{m}}} \]Using the sample means and the pooled standard deviation:\[ t = \frac{17410 - 19818}{6929.21 \sqrt{\frac{1}{72} + \frac{1}{72}}} \approx -2.153 \]
4Step 4: Compare t Statistic with Critical Value for Equal Variances
At \( \alpha = 0.05 \), the critical value for \( t \) with \( 142 \) degrees of freedom is approximately \( \pm 1.976 \). Since \( -2.153 \) is less than \( -1.976 \), we reject the null hypothesis \( H_0 \) and conclude that there is a significant difference between the salaries.
5Step 5: Hypotheses for Unequal Variances
For the hypothesis test assuming unequal variances, the null hypothesis is the same: \( H_0: \mu_1 = \mu_2 \) and the alternative hypothesis is: \( H_a: \mu_1 eq \mu_2 \).
6Step 6: Compute the Test Statistic for Unequal Variances
The test statistic \( t \) is calculated using the formula:\[ t = \frac{\bar{x} - \bar{y}}{\sqrt{\frac{s_x^2}{n} + \frac{s_y^2}{m}}} \]Plug in the values:\[ t = \frac{17410 - 19818}{\sqrt{\frac{41258741}{72} + \frac{50744521}{72}}} \approx -2.136 \]
7Step 7: Compare t Statistic with Critical Value for Unequal Variances
The degrees of freedom are calculated using the Welch-Satterthwaite equation, resulting in approximately \( 141.38 \). The critical value remains \( \pm 1.976 \) at \( \alpha = 0.05 \). Since \( -2.136 \) is less than \( -1.976 \), we reject the null hypothesis \( H_0 \).
8Step 8: Bootstrap Results Interpretation
For the bootstrap simulation, the critical values are \( c_{l}^{*} = -2.004 \) and \( c_{u}^{*} = 2.133 \). The test statistic \( t = -2.136 \) falls outside of these bounds. Hence, based on the bootstrap results, we reject the null hypothesis.

Key Concepts

Understanding the Null HypothesisExploring Equal VariancesBootstrap Simulation for Hypothesis Testing
Understanding the Null Hypothesis
In hypothesis testing, the null hypothesis is a fundamental concept. It represents a statement that is assumed to be true until evidence suggests otherwise. In the context of the salary comparison between two occupations, the null hypothesis is that the average salaries for both groups are the same. This can be mathematically expressed as:
  • Null Hypothesis (\( H_0\) ): \(\mu_1 = \mu_2\)
  • Alternative Hypothesis (\( H_a\) ): \(\mu_1 eq \mu_2\)
This setup allows statisticians to test whether observed data provide enough evidence to conclude that there is a significant difference between the two group means. If the evidence doesn't contradict the null hypothesis beyond a pre-defined threshold (usually 0.05 or 5%), we fail to reject it, meaning we assume it holds true. Conversely, if the evidence strongly contradicts \( H_0\) , we reject it, favoring \( H_a\) as the likely scenario. This decision-making process is crucial in statistical analysis, as it helps in interpreting whether observed differences are due to random chance or actual differences.
Exploring Equal Variances
Equal variances, also known as homogeneity of variance, is an assumption in certain statistical tests like the t-test for independent samples. It assumes that different samples have the same variability. In this exercise, when testing salary differences, we first consider that the variances of the two samples are equal. This simplifies calculations and interpretation.
To check this assumption of equal variances, we calculate the pooled standard deviation. The formula involves combining the variances of both groups while accounting for their respective sample sizes. If this assumption holds true, it allows us to use a more straightforward form of the t-test. However, if the assumption of equal variances doesn't hold, a different approach (like the Welch's t-test) is required.
By comparing this pooled variance against thresholds or using statistical software, we determine whether the assumption of homogeneity is valid. If our findings show significant disparities in variance, it guides researchers to opt for more robust tests adjusted for these differences.
Bootstrap Simulation for Hypothesis Testing
Bootstrap simulation is a powerful statistical technique that helps us understand the variability of an estimator by resampling a dataset with replacement. In hypothesis testing, it allows us to estimate the sampling distribution of a statistic when theoretical assumptions (like normality or equal variances) are difficult to justify or test.
For our salary example, the bootstrap technique helps evaluate the non-pooled studentized mean difference without strictly relying on assumptions of equal variances. Through repeated resampling from the original dataset, bootstrap creates many simulated samples. It then calculates the test statistic for each of these samples.
  • Critical Values: \( c_{l}^{*} = -2.004 \) and \( c_{u}^{*} = 2.133 \).
  • Test Statistic: \( t = -2.136\)
The analysis compares the observed test statistic to the critical values derived from the bootstrap simulation. If the test statistic falls outside of the critical value range, as in this exercise, we reject the null hypothesis. This method provides a flexible alternative when traditional assumptions don't hold or require validation, making it a favored tool in modern data analysis.