Problem 10
Question
A method to investigate the sensitivity of the sample mean and the sample median to extreme outliers is to replace one or more elements in a given dataset by a number \(y\) and investigate the effect when \(y\) goes to infinity. To illustrate this, consider the dataset from Quick Exercise 16.1: $$ \begin{array}{lllll} 4.6 & 3.0 & 3.2 & 4.2 & 5.0 \end{array} $$ with sample mean 4 and sample median \(4.2\). a. We replace the element \(3.2\) by some real number \(y\). What happens with the sample mean and the sample median of this new dataset as \(y \rightarrow \infty\) ? b. We replace a number of elements by some real number \(y\). How many elements do we need to replace so that the sample median of the new dataset goes to infinity as \(y \rightarrow \infty\) ? c. Suppose we have another dataset of size \(n\). How many elements do we need to replace by some real number \(y\), so that the sample mean of the new dataset goes to infinity as \(y \rightarrow \infty\) ? And how many elements do we need to replace, so that the sample median of the new dataset goes to infinity?
Step-by-Step Solution
VerifiedKey Concepts
Outliers in Statistics
An outlier might be a value that's much higher or much lower than the others in a dataset, and it's essential to identify these as they can affect statistical results in profound ways. Understanding how outliers are defined and identified helps analysts in making better decisions about which data to consider for their analyses.
Effect of Outliers on Mean
For instance, consider a dataset like \( \{4.6, 3.0, 3.2, 4.2, 5.0\} \). When a single value is altered significantly, such as replacing \( 3.2 \) with \( y \, \) which tends toward infinity, the mean effectively increases indefinitely. This shows how the mean is sensitive to extreme values because it incorporates every value equally, making it an unstable measure in the presence of outliers.
Effect of Outliers on Median
For the example dataset \(\{4.6, 3.0, 3.2, 4.2, 5.0\}\), replacing just \( 3.2 \) with a very large number \( y \) doesn't change the median immediately because it focuses on the middle number in the set. Therefore, the median offers a more stable central tendency measure when outliers are present.
Dataset Modification
For instance, if you take a dataset of size \( n \) and modify its elements to investigate the effect on statistics like the mean and median, you see different results. To make the mean go towards infinity, one element is often enough due to its susceptibility to extremes. However, for the median, more than half the dataset must be influenced, as a few outliers do not change the position of the middle value.
- ### Tips for modifying datasets:
- Identify which values significantly affect your statistical results.
- Understand which statistical measure suits your analysis better.
- Use more outliers for testing median changes.