Problem 10

Question

The following data represents the yearly salaries, in thousands of dollars, of 10 basketball players. $$ \begin{array}{rlll}{533} & {427} & {800} & {687} & {264} & {264} & {125} & {602} & {249} & {19,014}\end{array} $$ a. Find the mean and median salaries of the 10 players. b. Which measure of central tendency is more representative of the data? Explain. c. Find the outlier for the set of data. d. Remove the outlier from the set of data and recalculate the mean and median salaries. e. After removing the outlier from the set of data, is the mean more or less representative of the data?

Step-by-Step Solution

Verified

Answer

The mean and median before outlier removal were 397.0 and 345.5, respectively. After removing the outlier, the recalculated mean and median are more representative as both are close: 439.0 and 427.

1Step 1: Organize the data

Begin by arranging the salary data from smallest to largest. The salaries given are: $19.014, 125, 249, 264, 264, 427, 533, 602, 687, 800$. In order: $19, 125, 249, 264, 264, 427, 533, 602, 687, 800$.

2Step 2: Calculate the mean

The mean salary is the sum of all salaries divided by the number of salaries. Calculate this by: \[\text{Mean} = \frac{19 + 125 + 249 + 264 + 264 + 427 + 533 + 602 + 687 + 800}{10} = \frac{3,970}{10} = 397.0\] (but note that the very large value of $19,014$ was written incorrectly in this equation step as $19$).

3Step 3: Calculate the median

To find the median in a sorted list of 10 numbers, take the average of the 5th and 6th terms. Here these terms are 264 and 427. So: \[\text{Median} = \frac{264 + 427}{2} = \frac{691}{2} = 345.5\]

4Step 4: Compare the mean and median

The mean of $397.0$ is significantly greater than the median of $345.5$ due to the presence of an outlier. In this case, the median is more representative because it is less affected by the extreme salary value.

5Step 5: Identify outliers

Outliers are values significantly different from others in a data set. Here, the value $19,014$ is the outlier because it is much larger than the other salaries.

6Step 6: Remove outlier and recalculate mean

After eliminating the largest outlier, $19,014$, we recalculate the mean using the remaining 9 salaries:\[\text{Mean} = \frac{125 + 249 + 264 + 264 + 427 + 533 + 602 + 687 + 800}{9} = \frac{3,951}{9} \approx 439.0\]

7Step 7: Recalculate the median without outlier

With the outlier removed, the list of salaries is now $125, 249, 264, 264, 427, 533, 602, 687, 800$. The median of an odd list size (9 numbers) is the middle number: 427.

8Step 8: Evaluate representativeness after outlier removal

After removing the outlier, the mean $439.0$ is closer to the median $427$, making the mean more representative. Both measures now align more closely, providing a better reflection of the typical salary.

Key Concepts

Understanding Mean and MedianOutliers and Their ImpactCentral Tendency CriteriaSalary Data Analysis

Understanding Mean and Median

In the world of statistics, the terms 'mean' and 'median' are measures of central tendency. They help summarize a set of values through typical or center values. Mean, often referred to as 'average', is calculated by adding all numerical values together and dividing by the count of those values. In our example, when we initially calculated the salaries of basketball players, even a single extremely high value ($19,014$) could skew the outcome significantly. On the other hand, the median represents the middle value in a data set when sorted in ascending order. It divides the dataset into two equal halves. Unlike the mean, the median is more robust against outliers, as it considers only the middle value(s) and ignores extreme ones. Thus, understanding both mean and median offers deeper insight into the data.

Outliers and Their Impact

An outlier is a number in the dataset that is significantly higher or lower than most of the data points. In statistical analysis, outliers can either be a result of variability in the data or an indication of measurement errors.

The primary effect of an outlier is distortion of statistical results.
In salary data, such as in this exercise, an outlier like $19,014$ can greatly increase the mean, making it an unreliable measure of average salary.
Identifying outliers helps in initiating further investigative procedures to determine their cause and decide whether to include or exclude them from analysis.

Recognizing and dealing with outliers appropriately can refine your statistical interpretations and lead to more accurate conclusions.

Central Tendency Criteria

Central tendency is a statistical measurement to identify the center of a data distribution. The most common measures are mean and median.

The mean is excellent for symmetric distributions without outliers but can be heavily influenced by them.
The median is often more representative for skewed distributions or datasets with outliers since it is not affected by extreme values.

In this exercise, before removing the outlier, the median ($345.5$) was a more trustworthy representation of central tendency than the mean ($397.0$). With the outlier removed, both the mean and median values aligned closer, enhancing their descriptive power for this dataset.

Salary Data Analysis

Salary data analysis often involves various statistical methods to represent the data effectively. Here, we aim to provide a clear representation of typical salaries without being misled by anomalies.

Salaries can be deeply impacted by outliers, as seen with the basketball players' data.
Removing an outlier often results in a more accurate mean that aligns closely with the median, making both statistics more representative of the data's central tendency.
In the refined analysis of our player salaries without the extreme value, the mean changed from $397.0$ to approximately $439.0$, and the median adjusted to $427$. These values are now more indicative of the general salary trend within the players, providing a clearer snapshot of their earnings.

Analyzing salary data this way helps prevent misjudgments and ensures decisions made from the data are based on more dependable metrics.

Problem 9

Problem 10

Other exercises in this chapter

Problem 9

In $9-14,$ find the median and the first and third quartiles for each set of data values. $2,3,5,8,9,11,15,16,17,20,22,23,25$

View solution

Problem 9

Graph the histogram of each set of data. \(\begin{array}{|c|c|}\hline x_{i} & {f_{i}} \\ \hline 35-39 & {13} \\ \hline 30-34 & {19} \\ \hline 25-29 & {10} \\ \h

View solution

Problem 10

In $7-14,$ for each of the given correlation coefficients, describe the linear correlation as strong positive, moderate positive, none, moderate negative, or

View solution

Problem 10

A set of data is normally distributed with a mean of 40 and a standard deviation of $5 .$ Find a data value that is: a. 1 standard deviation above the mean b.

View solution