Problem 1
Question
In [33] Stephen Stigler discusses data from the Edinburgh Medical and Surgical Journal (1817). These concern the chest circumference of 5732 Scottish soldiers, measured in inches. The following information is given about the histogram with bin width 1 , the first bin starting at \(32.5\). $$ \begin{array}{cccc} \hline \hline \text { Bin } & \text { Count } & \text { Bin } & \text { Count } \\ \hline(32.5,33.5] & 3 & (40.5,41.5] & 935 \\ (33.5,34.5] & 19 & (41.5,42.5] & 646 \\ (34.5,35.5] & 81 & (42.5,43.5] & 313 \\ (35.5,36.5] & 189 & (43.5,44.5] & 168 \\ (36.5,37.5] & 409 & (44.5,45.5] & 50 \\ (37.5,38.5] & 753 & (45.5,46.5] & 18 \\ (38.5,39.5] & 1062 & (46.5,47.5] & 3 \\ (39.5,40.5] & 1082 & (47.5,48.5] & 1 \\ \hline \hline \end{array} $$ a. Compute the height of the histogram on each bin. b. Make a sketch of the histogram. Would you view the dataset as being symmetric or skewed?
Step-by-Step Solution
VerifiedKey Concepts
Data Visualization
When creating a histogram, it's essential to choose the right bin width. The bin width affects the level of detail in your data visual. A smaller bin width can offer more detail but might complicate the visualization by adding noise, while a larger bin width can smooth out the details but might miss subtle patterns.
Effective data visualization like histograms can help you quickly grasp insights, identify any potential anomalies, and understand the underlying patterns of the dataset.
Statistical Distribution
A normal distribution, also known as a Gaussian distribution, is symmetric and often called a bell curve due to its shape. In contrast, a skewed distribution is asymmetrical and can either have a long tail on the right (positive skew) or on the left (negative skew).
In the histogram of the Scottish soldiers' chest circumferences, the aim is to determine whether the data approximates a symmetric distribution or if it is skewed. By analyzing the shape of the histogram, you can infer the distribution's skewness. If there is an evident tailing effect, this indicates direction of skewness, which can influence the choice of statistical methods used for further data analysis.
Histogram Height Calculation
For example, with a bin width of 1, the formula simplifies to: \[ \text{height} = \frac{\text{count}}{\text{total count}} \]This gives a normalized value that represents the relative frequency of data points within that bin, allowing you to compare different bins.
In the exercise, heights help identify how crowded each bin is with data points. A higher height signifies a higher concentration of data in that range. Continuing this calculation for each bin in a dataset gives you a complete picture of data spread and how each ranges' density compares.