Problem 10
Question
The following data represents the yearly salaries, in thousands of dollars, of 10 basketball players. $$ \begin{array}{rlll}{533} & {427} & {800} & {687} & {264} & {264} & {125} & {602} & {249} & {19,014}\end{array} $$ a. Find the mean and median salaries of the 10 players. b. Which measure of central tendency is more representative of the data? Explain. c. Find the outlier for the set of data. d. Remove the outlier from the set of data and recalculate the mean and median salaries. e. After removing the outlier from the set of data, is the mean more or less representative of the data?
Step-by-Step Solution
Verified Answer
The mean and median before outlier removal were 397.0 and 345.5, respectively. After removing the outlier, the recalculated mean and median are more representative as both are close: 439.0 and 427.
1Step 1: Organize the data
Begin by arranging the salary data from smallest to largest. The salaries given are: \(19.014, 125, 249, 264, 264, 427, 533, 602, 687, 800\). In order: \(19, 125, 249, 264, 264, 427, 533, 602, 687, 800\).
2Step 2: Calculate the mean
The mean salary is the sum of all salaries divided by the number of salaries. Calculate this by: \[\text{Mean} = \frac{19 + 125 + 249 + 264 + 264 + 427 + 533 + 602 + 687 + 800}{10} = \frac{3,970}{10} = 397.0\] (but note that the very large value of \(19,014\) was written incorrectly in this equation step as \(19\)).
3Step 3: Calculate the median
To find the median in a sorted list of 10 numbers, take the average of the 5th and 6th terms. Here these terms are 264 and 427. So: \[\text{Median} = \frac{264 + 427}{2} = \frac{691}{2} = 345.5\]
4Step 4: Compare the mean and median
The mean of \(397.0\) is significantly greater than the median of \(345.5\) due to the presence of an outlier. In this case, the median is more representative because it is less affected by the extreme salary value.
5Step 5: Identify outliers
Outliers are values significantly different from others in a data set. Here, the value \(19,014\) is the outlier because it is much larger than the other salaries.
6Step 6: Remove outlier and recalculate mean
After eliminating the largest outlier, \(19,014\), we recalculate the mean using the remaining 9 salaries:\[\text{Mean} = \frac{125 + 249 + 264 + 264 + 427 + 533 + 602 + 687 + 800}{9} = \frac{3,951}{9} \approx 439.0\]
7Step 7: Recalculate the median without outlier
With the outlier removed, the list of salaries is now \(125, 249, 264, 264, 427, 533, 602, 687, 800\). The median of an odd list size (9 numbers) is the middle number: 427.
8Step 8: Evaluate representativeness after outlier removal
After removing the outlier, the mean \(439.0\) is closer to the median \(427\), making the mean more representative. Both measures now align more closely, providing a better reflection of the typical salary.
Key Concepts
Understanding Mean and MedianOutliers and Their ImpactCentral Tendency CriteriaSalary Data Analysis
Understanding Mean and Median
In the world of statistics, the terms 'mean' and 'median' are measures of central tendency. They help summarize a set of values through typical or center values.
Mean, often referred to as 'average', is calculated by adding all numerical values together and dividing by the count of those values. In our example, when we initially calculated the salaries of basketball players, even a single extremely high value ($19,014$) could skew the outcome significantly.
On the other hand, the median represents the middle value in a data set when sorted in ascending order. It divides the dataset into two equal halves. Unlike the mean, the median is more robust against outliers, as it considers only the middle value(s) and ignores extreme ones. Thus, understanding both mean and median offers deeper insight into the data.
Outliers and Their Impact
An outlier is a number in the dataset that is significantly higher or lower than most of the data points. In statistical analysis, outliers can either be a result of variability in the data or an indication of measurement errors.
- The primary effect of an outlier is distortion of statistical results.
- In salary data, such as in this exercise, an outlier like $19,014$ can greatly increase the mean, making it an unreliable measure of average salary.
- Identifying outliers helps in initiating further investigative procedures to determine their cause and decide whether to include or exclude them from analysis.
Central Tendency Criteria
Central tendency is a statistical measurement to identify the center of a data distribution. The most common measures are mean and median.
- The mean is excellent for symmetric distributions without outliers but can be heavily influenced by them.
- The median is often more representative for skewed distributions or datasets with outliers since it is not affected by extreme values.
Salary Data Analysis
Salary data analysis often involves various statistical methods to represent the data effectively. Here, we aim to provide a clear representation of typical salaries without being misled by anomalies.
- Salaries can be deeply impacted by outliers, as seen with the basketball players' data.
- Removing an outlier often results in a more accurate mean that aligns closely with the median, making both statistics more representative of the data's central tendency.
- In the refined analysis of our player salaries without the extreme value, the mean changed from $397.0$ to approximately $439.0$, and the median adjusted to $427$. These values are now more indicative of the general salary trend within the players, providing a clearer snapshot of their earnings.
Other exercises in this chapter
Problem 9
In \(9-14,\) find the median and the first and third quartiles for each set of data values. \(2,3,5,8,9,11,15,16,17,20,22,23,25\)
View solution Problem 9
Graph the histogram of each set of data. \(\begin{array}{|c|c|}\hline x_{i} & {f_{i}} \\ \hline 35-39 & {13} \\ \hline 30-34 & {19} \\ \hline 25-29 & {10} \\ \h
View solution Problem 10
In \(7-14,\) for each of the given correlation coefficients, describe the linear correlation as strong positive, moderate positive, none, moderate negative, or
View solution Problem 10
A set of data is normally distributed with a mean of 40 and a standard deviation of \(5 .\) Find a data value that is: a. 1 standard deviation above the mean b.
View solution