Problem 1

Question

In [33] Stephen Stigler discusses data from the Edinburgh Medical and Surgical Journal (1817). These concern the chest circumference of 5732 Scottish soldiers, measured in inches. The following information is given about the histogram with bin width 1 , the first bin starting at \(32.5\). $$ \begin{array}{cccc} \hline \hline \text { Bin } & \text { Count } & \text { Bin } & \text { Count } \\ \hline(32.5,33.5] & 3 & (40.5,41.5] & 935 \\ (33.5,34.5] & 19 & (41.5,42.5] & 646 \\ (34.5,35.5] & 81 & (42.5,43.5] & 313 \\ (35.5,36.5] & 189 & (43.5,44.5] & 168 \\ (36.5,37.5] & 409 & (44.5,45.5] & 50 \\ (37.5,38.5] & 753 & (45.5,46.5] & 18 \\ (38.5,39.5] & 1062 & (46.5,47.5] & 3 \\ (39.5,40.5] & 1082 & (47.5,48.5] & 1 \\ \hline \hline \end{array} $$ a. Compute the height of the histogram on each bin. b. Make a sketch of the histogram. Would you view the dataset as being symmetric or skewed?

Step-by-Step Solution

Verified
Answer
The dataset is skewed to the left.
1Step 1: Understand Histogram Height
The height of a histogram bar corresponds to the density of the data in that bin. For a given bin, the height is calculated by dividing the count (frequency) of observations by both the bin width and the total number of observations. Since the bin width is provided as 1, the formula simplifies to \( \text{height} = \frac{\text{count}}{\text{total count}} \).
2Step 2: Calculate Histogram Heights
Using the simplified formula from Step 1, calculate the height for each bin by dividing the count by the total number of observations (5732). For instance, the height for the bin (32.5, 33.5] is \( \frac{3}{5732} \approx 0.0005235 \). Continue this for each bin.
3Step 3: Compute Heights for Each Bin
- (32.5, 33.5]: \( \frac{3}{5732} \approx 0.0005235 \)- (33.5, 34.5]: \( \frac{19}{5732} \approx 0.003315 \)- (34.5, 35.5]: \( \frac{81}{5732} \approx 0.014128 \)- (35.5, 36.5]: \( \frac{189}{5732} \approx 0.03297 \)- (36.5, 37.5]: \( \frac{409}{5732} \approx 0.071364 \)- Continue calculating similarly for remaining bins.
4Step 4: Sketch The Histogram
Plot the heights of the histogram for each bin starting from 32.5 to 48.5 on the x-axis representing chest circumference, and the heights calculated in Step 3 on the y-axis representing density. Connect the top of each vertical bar to illustrate the distribution.
5Step 5: Analyze Histogram Shape
Observe the plotted histogram's shape. If the plot shows an even distribution of values about the center, it would be symmetric. If the plot has a long tail on one side, it’s skewed in that direction.

Key Concepts

Data VisualizationStatistical DistributionHistogram Height Calculation
Data Visualization
Data visualization transforms raw data into a visual context, making it easier to identify patterns, trends, and outliers within large datasets. In the exercise, we visualize the chest circumference data of Scottish soldiers using a histogram. A histogram itself is a form of data visualization that depicts the frequency distribution of a dataset. Each bin in a histogram represents a range of data, and its height illustrates the frequency of data points within that range.

When creating a histogram, it's essential to choose the right bin width. The bin width affects the level of detail in your data visual. A smaller bin width can offer more detail but might complicate the visualization by adding noise, while a larger bin width can smooth out the details but might miss subtle patterns.

Effective data visualization like histograms can help you quickly grasp insights, identify any potential anomalies, and understand the underlying patterns of the dataset.
Statistical Distribution
Understanding statistical distribution is crucial when analyzing data such as the chest circumference of the soldiers. A distribution in statistics refers to how the values in a dataset are spread or how the frequencies of these data points are allocated.

A normal distribution, also known as a Gaussian distribution, is symmetric and often called a bell curve due to its shape. In contrast, a skewed distribution is asymmetrical and can either have a long tail on the right (positive skew) or on the left (negative skew).

In the histogram of the Scottish soldiers' chest circumferences, the aim is to determine whether the data approximates a symmetric distribution or if it is skewed. By analyzing the shape of the histogram, you can infer the distribution's skewness. If there is an evident tailing effect, this indicates direction of skewness, which can influence the choice of statistical methods used for further data analysis.
Histogram Height Calculation
Calculating the height of each bin in a histogram is key to understanding the density of data in each range. This height, or density, is calculated by dividing the frequency (or count) of data points in the bin by the product of the bin width and the total number of observations.

For example, with a bin width of 1, the formula simplifies to: \[ \text{height} = \frac{\text{count}}{\text{total count}} \]This gives a normalized value that represents the relative frequency of data points within that bin, allowing you to compare different bins.

In the exercise, heights help identify how crowded each bin is with data points. A higher height signifies a higher concentration of data in that range. Continuing this calculation for each bin in a dataset gives you a complete picture of data spread and how each ranges' density compares.