Problem 3

Question

In an article in Biometrika, an example is discussed about mine disasters during the period from March 15,1851 , to March, 22,1962 . A dataset has been obtained of 190 recorded time intervals (in days) between successive coal mine disasters involving ten or more men killed. The ordered data are listed in Table 15.6. $$ \begin{array}{rrrrrrrrrr} \hline \hline 0 & 1 & 1 & 2 & 2 & 3 & 4 & 4 & 4 & 6 \\ 7 & 10 & 11 & 12 & 12 & 12 & 13 & 15 & 15 & 16 \\ 16 & 16 & 17 & 17 & 18 & 19 & 19 & 19 & 20 & 20 \\ 22 & 23 & 24 & 25 & 27 & 28 & 29 & 29 & 29 & 31 \\ 31 & 32 & 33 & 34 & 34 & 36 & 36 & 37 & 40 & 41 \\ 41 & 42 & 43 & 45 & 47 & 48 & 49 & 50 & 53 & 54 \\ 54 & 55 & 56 & 59 & 59 & 61 & 61 & 65 & 66 & 66 \\ 70 & 72 & 75 & 78 & 78 & 78 & 80 & 80 & 81 & 88 \\ 91 & 92 & 93 & 93 & 95 & 95 & 96 & 96 & 97 & 99 \\ 101 & 108 & 110 & 112 & 113 & 114 & 120 & 120 & 123 & 123 \\ 124 & 124 & 125 & 127 & 129 & 131 & 134 & 137 & 139 & 143 \\ 144 & 145 & 151 & 154 & 156 & 157 & 176 & 182 & 186 & 187 \\ 188 & 189 & 190 & 193 & 194 & 197 & 202 & 203 & 208 & 215 \\ 216 & 217 & 217 & 217 & 218 & 224 & 225 & 228 & 232 & 233 \\ 250 & 255 & 275 & 275 & 275 & 276 & 286 & 292 & 307 & 307 \\ 312 & 312 & 315 & 324 & 326 & 326 & 329 & 330 & 336 & 345 \\ 348 & 354 & 361 & 364 & 368 & 378 & 388 & 420 & 431 & 456 \\ 462 & 467 & 498 & 517 & 536 & 538 & 566 & 632 & 644 & 745 \\ 806 & 826 & 871 & 952 & 1205 & 1312 & 1358 & 1630 & 1643 & 2366 \\ \hline \hline \end{array} $$ a. Compute the height on each bin of the histogram with bins \([0,250]\), \((250,500], \ldots,(2250,2500] .\) b. Make a sketch of the histogram. Would you view the dataset as being symmetric or skewed?

Step-by-Step Solution

Verified
Answer
The dataset is right-skewed, with most values in the lower bins.
1Step 1: Determine the Range of Each Bin
Identify the range of values each bin covers based on the given intervals. The bins are \( [0, 250) \), \( [250, 500) \), \( [500, 750) \), \( [750, 1000) \), \( [1000, 1250) \), \( [1250, 1500) \), \( [1500, 1750) \), \( [1750, 2000) \), \( [2000, 2250) \), and \( [2250, 2500] \).
2Step 2: Count the Data Points in Each Bin
Go through the dataset and count the number of data points that fall into each bin. - Bin \( [0, 250) \): 150 data points.- Bin \( [250, 500) \): 27 data points.- Bin \( [500, 750) \): 7 data points.- Bin \( [750, 1000) \): 5 data points.- Bin \( [1000, 1250) \): 3 data points.- Bin \( [1250, 1500) \): 2 data points.- Bin \( [1500, 1750) \): 3 data points.- Bin \( [1750, 2000) \): 1 data point.- Bin \( [2000, 2250) \): 0 data points.- Bin \( [2250, 2500] \): 2 data points.
3Step 3: Calculate the Relative Frequency for Each Bin
Compute the height of the histogram for each bin by dividing the count of the data points in each bin by the total number of data points, which is 190. - Bin \( [0, 250) \): \( \frac{150}{190} \approx 0.789 \)- Bin \( [250, 500) \): \( \frac{27}{190} \approx 0.142 \)- Bin \( [500, 750) \): \( \frac{7}{190} \approx 0.037 \)- Bin \( [750, 1000) \): \( \frac{5}{190} \approx 0.026 \)- Bin \( [1000, 1250) \): \( \frac{3}{190} \approx 0.016 \)- Bin \( [1250, 1500) \): \( \frac{2}{190} \approx 0.011 \)- Bin \( [1500, 1750) \): \( \frac{3}{190} \approx 0.016 \)- Bin \( [1750, 2000) \): \( \frac{1}{190} \approx 0.005 \)- Bin \( [2000, 2250) \): \( \frac{0}{190} = 0 \)- Bin \( [2250, 2500] \): \( \frac{2}{190} \approx 0.011 \)
4Step 4: Sketch the Histogram
Draw a bar for each bin with the height corresponding to its relative frequency:1. Bin \( [0, 250) \) with height approximately 0.789.2. Bin \( [250, 500) \) with height approximately 0.142.3. Bin \( [500, 750) \) with height approximately 0.037.4. Bin \( [750, 1000) \) with height approximately 0.026.5. Bin \( [1000, 1250) \) with height approximately 0.016.6. Bin \( [1250, 1500) \) with height approximately 0.011.7. Bin \( [1500, 1750) \) with height approximately 0.016.8. Bin \( [1750, 2000) \) with height approximately 0.005.9. Bin \( [2000, 2250) \) with height 0.10. Bin \( [2250, 2500] \) with height approximately 0.011.
5Step 5: Analyze the Symmetry of the Dataset
Observe the histogram; the majority of values are concentrated in the lower bins with a gradual decline as the bins increase, indicating a right-skewed distribution.

Key Concepts

Data BinningRelative FrequencySymmetry in Distributions
Data Binning
Histograms are a type of bar chart that helps us to visualize the frequency distribution of a dataset. One of the first steps in creating a histogram is data binning. This involves dividing the range of data into intervals, known as "bins". Each bin represents a continuous range of values. In this exercise, the bin ranges are:
  • \([0, 250)\)
  • \([250, 500)\)
  • \([500, 750)\)
  • \([750, 1000)\)
  • \([1000, 1250)\)
  • \([1250, 1500)\)
  • \([1500, 1750)\)
  • \([1750, 2000)\)
  • \([2000, 2250)\)
  • \([2250, 2500] \)
Data binning simplifies a dataset to make it more manageable and easier to analyze.
It reduces the noise and helps in identifying trends or patterns in the data. However, it's important to choose the number and size of bins carefully.
Selecting too few bins can mask important details, while too many may overcomplicate the dataset.
Relative Frequency
After determining the bins, the next step is to calculate the relative frequency. Relative frequency is the proportion of data points that fall within each bin, compared to the total number of data points. It is a way to understand how the data is divided across different intervals.
The formula to calculate relative frequency is:
  • \( \text{Relative Frequency} = \frac{\text{Number of data points in a bin}}{\text{Total number of data points}} \)
In this exercise, for instance, the bin \([0, 250)\) contains 150 data points. With a total of 190 data points in the dataset, the relative frequency is computed as:
  • \( \frac{150}{190} \approx 0.789 \)
Relative frequencies help in comparing the sizes of different bins without being affected by the total size of the dataset.
This makes it easy to understand the distribution of data across different intervals.
Symmetry in Distributions
Analyzing symmetry in distributions involves looking at how data is spread across a histogram. Symmetry, or the lack thereof, gives insight into data characteristics. Typically, if both halves of the histogram mirror each other around the center, it is symmetric.
However, in this dataset, most data points are concentrated in the lower bins, with the frequency gradually decreasing as the bin numbers increase. This suggests a right-skewed distribution.
  • Right skewness means there are a few very high values compared to the rest.
  • The tail on the right side of the histogram is longer or fatter than on the left.
Understanding whether a distribution is skewed helps in many areas, from statistical inference to predictive modeling.
For example, a right-skewed dataset might require using different statistical methods or transformations to achieve normality for analysis purposes.