Problem 3
Question
In an article in Biometrika, an example is discussed about mine disasters during the period from March 15,1851 , to March, 22,1962 . A dataset has been obtained of 190 recorded time intervals (in days) between successive coal mine disasters involving ten or more men killed. The ordered data are listed in Table 15.6. $$ \begin{array}{rrrrrrrrrr} \hline \hline 0 & 1 & 1 & 2 & 2 & 3 & 4 & 4 & 4 & 6 \\ 7 & 10 & 11 & 12 & 12 & 12 & 13 & 15 & 15 & 16 \\ 16 & 16 & 17 & 17 & 18 & 19 & 19 & 19 & 20 & 20 \\ 22 & 23 & 24 & 25 & 27 & 28 & 29 & 29 & 29 & 31 \\ 31 & 32 & 33 & 34 & 34 & 36 & 36 & 37 & 40 & 41 \\ 41 & 42 & 43 & 45 & 47 & 48 & 49 & 50 & 53 & 54 \\ 54 & 55 & 56 & 59 & 59 & 61 & 61 & 65 & 66 & 66 \\ 70 & 72 & 75 & 78 & 78 & 78 & 80 & 80 & 81 & 88 \\ 91 & 92 & 93 & 93 & 95 & 95 & 96 & 96 & 97 & 99 \\ 101 & 108 & 110 & 112 & 113 & 114 & 120 & 120 & 123 & 123 \\ 124 & 124 & 125 & 127 & 129 & 131 & 134 & 137 & 139 & 143 \\ 144 & 145 & 151 & 154 & 156 & 157 & 176 & 182 & 186 & 187 \\ 188 & 189 & 190 & 193 & 194 & 197 & 202 & 203 & 208 & 215 \\ 216 & 217 & 217 & 217 & 218 & 224 & 225 & 228 & 232 & 233 \\ 250 & 255 & 275 & 275 & 275 & 276 & 286 & 292 & 307 & 307 \\ 312 & 312 & 315 & 324 & 326 & 326 & 329 & 330 & 336 & 345 \\ 348 & 354 & 361 & 364 & 368 & 378 & 388 & 420 & 431 & 456 \\ 462 & 467 & 498 & 517 & 536 & 538 & 566 & 632 & 644 & 745 \\ 806 & 826 & 871 & 952 & 1205 & 1312 & 1358 & 1630 & 1643 & 2366 \\ \hline \hline \end{array} $$ a. Compute the height on each bin of the histogram with bins \([0,250]\), \((250,500], \ldots,(2250,2500] .\) b. Make a sketch of the histogram. Would you view the dataset as being symmetric or skewed?
Step-by-Step Solution
VerifiedKey Concepts
Data Binning
- \([0, 250)\)
- \([250, 500)\)
- \([500, 750)\)
- \([750, 1000)\)
- \([1000, 1250)\)
- \([1250, 1500)\)
- \([1500, 1750)\)
- \([1750, 2000)\)
- \([2000, 2250)\)
- \([2250, 2500] \)
It reduces the noise and helps in identifying trends or patterns in the data. However, it's important to choose the number and size of bins carefully.
Selecting too few bins can mask important details, while too many may overcomplicate the dataset.
Relative Frequency
The formula to calculate relative frequency is:
- \( \text{Relative Frequency} = \frac{\text{Number of data points in a bin}}{\text{Total number of data points}} \)
- \( \frac{150}{190} \approx 0.789 \)
This makes it easy to understand the distribution of data across different intervals.
Symmetry in Distributions
However, in this dataset, most data points are concentrated in the lower bins, with the frequency gradually decreasing as the bin numbers increase. This suggests a right-skewed distribution.
- Right skewness means there are a few very high values compared to the rest.
- The tail on the right side of the histogram is longer or fatter than on the left.
For example, a right-skewed dataset might require using different statistical methods or transformations to achieve normality for analysis purposes.