Problem 3

Question

In an article in Biometrika, an example is discussed about mine disasters during the period from March 15,1851 , to March, 22,1962 . A dataset has been obtained of 190 recorded time intervals (in days) between successive coal mine disasters involving ten or more men killed. The ordered data are listed in Table 15.6. $$ \begin{array}{rrrrrrrrrr} \hline \hline 0 & 1 & 1 & 2 & 2 & 3 & 4 & 4 & 4 & 6 \\ 7 & 10 & 11 & 12 & 12 & 12 & 13 & 15 & 15 & 16 \\ 16 & 16 & 17 & 17 & 18 & 19 & 19 & 19 & 20 & 20 \\ 22 & 23 & 24 & 25 & 27 & 28 & 29 & 29 & 29 & 31 \\ 31 & 32 & 33 & 34 & 34 & 36 & 36 & 37 & 40 & 41 \\ 41 & 42 & 43 & 45 & 47 & 48 & 49 & 50 & 53 & 54 \\ 54 & 55 & 56 & 59 & 59 & 61 & 61 & 65 & 66 & 66 \\ 70 & 72 & 75 & 78 & 78 & 78 & 80 & 80 & 81 & 88 \\ 91 & 92 & 93 & 93 & 95 & 95 & 96 & 96 & 97 & 99 \\ 101 & 108 & 110 & 112 & 113 & 114 & 120 & 120 & 123 & 123 \\ 124 & 124 & 125 & 127 & 129 & 131 & 134 & 137 & 139 & 143 \\ 144 & 145 & 151 & 154 & 156 & 157 & 176 & 182 & 186 & 187 \\ 188 & 189 & 190 & 193 & 194 & 197 & 202 & 203 & 208 & 215 \\ 216 & 217 & 217 & 217 & 218 & 224 & 225 & 228 & 232 & 233 \\ 250 & 255 & 275 & 275 & 275 & 276 & 286 & 292 & 307 & 307 \\ 312 & 312 & 315 & 324 & 326 & 326 & 329 & 330 & 336 & 345 \\ 348 & 354 & 361 & 364 & 368 & 378 & 388 & 420 & 431 & 456 \\ 462 & 467 & 498 & 517 & 536 & 538 & 566 & 632 & 644 & 745 \\ 806 & 826 & 871 & 952 & 1205 & 1312 & 1358 & 1630 & 1643 & 2366 \\ \hline \hline \end{array} $$ a. Compute the height on each bin of the histogram with bins $[0,250]$, $(250,500], \ldots,(2250,2500] .$ b. Make a sketch of the histogram. Would you view the dataset as being symmetric or skewed?

Step-by-Step Solution

Verified

Answer

The dataset is right-skewed, with most values in the lower bins.

1Step 1: Determine the Range of Each Bin

Identify the range of values each bin covers based on the given intervals. The bins are $ [0, 250) $, $ [250, 500) $, $ [500, 750) $, $ [750, 1000) $, $ [1000, 1250) $, $ [1250, 1500) $, $ [1500, 1750) $, $ [1750, 2000) $, $ [2000, 2250) $, and $ [2250, 2500] $.

2Step 2: Count the Data Points in Each Bin

Go through the dataset and count the number of data points that fall into each bin. - Bin $ [0, 250) $: 150 data points.- Bin $ [250, 500) $: 27 data points.- Bin $ [500, 750) $: 7 data points.- Bin $ [750, 1000) $: 5 data points.- Bin $ [1000, 1250) $: 3 data points.- Bin $ [1250, 1500) $: 2 data points.- Bin $ [1500, 1750) $: 3 data points.- Bin $ [1750, 2000) $: 1 data point.- Bin $ [2000, 2250) $: 0 data points.- Bin $ [2250, 2500] $: 2 data points.

3Step 3: Calculate the Relative Frequency for Each Bin

Compute the height of the histogram for each bin by dividing the count of the data points in each bin by the total number of data points, which is 190. - Bin $ [0, 250) $: $ \frac{150}{190} \approx 0.789 $- Bin $ [250, 500) $: $ \frac{27}{190} \approx 0.142 $- Bin $ [500, 750) $: $ \frac{7}{190} \approx 0.037 $- Bin $ [750, 1000) $: $ \frac{5}{190} \approx 0.026 $- Bin $ [1000, 1250) $: $ \frac{3}{190} \approx 0.016 $- Bin $ [1250, 1500) $: $ \frac{2}{190} \approx 0.011 $- Bin $ [1500, 1750) $: $ \frac{3}{190} \approx 0.016 $- Bin $ [1750, 2000) $: $ \frac{1}{190} \approx 0.005 $- Bin $ [2000, 2250) $: $ \frac{0}{190} = 0 $- Bin $ [2250, 2500] $: $ \frac{2}{190} \approx 0.011 $

4Step 4: Sketch the Histogram

Draw a bar for each bin with the height corresponding to its relative frequency:1. Bin $ [0, 250) $ with height approximately 0.789.2. Bin $ [250, 500) $ with height approximately 0.142.3. Bin $ [500, 750) $ with height approximately 0.037.4. Bin $ [750, 1000) $ with height approximately 0.026.5. Bin $ [1000, 1250) $ with height approximately 0.016.6. Bin $ [1250, 1500) $ with height approximately 0.011.7. Bin $ [1500, 1750) $ with height approximately 0.016.8. Bin $ [1750, 2000) $ with height approximately 0.005.9. Bin $ [2000, 2250) $ with height 0.10. Bin $ [2250, 2500] $ with height approximately 0.011.

5Step 5: Analyze the Symmetry of the Dataset

Observe the histogram; the majority of values are concentrated in the lower bins with a gradual decline as the bins increase, indicating a right-skewed distribution.

Key Concepts

Data BinningRelative FrequencySymmetry in Distributions

Data Binning

Histograms are a type of bar chart that helps us to visualize the frequency distribution of a dataset. One of the first steps in creating a histogram is data binning. This involves dividing the range of data into intervals, known as "bins". Each bin represents a continuous range of values. In this exercise, the bin ranges are:

$[0, 250)$
$[250, 500)$
$[500, 750)$
$[750, 1000)$
$[1000, 1250)$
$[1250, 1500)$
$[1500, 1750)$
$[1750, 2000)$
$[2000, 2250)$
$[2250, 2500] $

Data binning simplifies a dataset to make it more manageable and easier to analyze.
It reduces the noise and helps in identifying trends or patterns in the data. However, it's important to choose the number and size of bins carefully.
Selecting too few bins can mask important details, while too many may overcomplicate the dataset.

Relative Frequency

After determining the bins, the next step is to calculate the relative frequency. Relative frequency is the proportion of data points that fall within each bin, compared to the total number of data points. It is a way to understand how the data is divided across different intervals.
The formula to calculate relative frequency is:

$ \text{Relative Frequency} = \frac{\text{Number of data points in a bin}}{\text{Total number of data points}} $

In this exercise, for instance, the bin $[0, 250)$ contains 150 data points. With a total of 190 data points in the dataset, the relative frequency is computed as:

$ \frac{150}{190} \approx 0.789 $

Relative frequencies help in comparing the sizes of different bins without being affected by the total size of the dataset.
This makes it easy to understand the distribution of data across different intervals.

Symmetry in Distributions

Analyzing symmetry in distributions involves looking at how data is spread across a histogram. Symmetry, or the lack thereof, gives insight into data characteristics. Typically, if both halves of the histogram mirror each other around the center, it is symmetric.
However, in this dataset, most data points are concentrated in the lower bins, with the frequency gradually decreasing as the bin numbers increase. This suggests a right-skewed distribution.

Right skewness means there are a few very high values compared to the rest.
The tail on the right side of the histogram is longer or fatter than on the left.

Understanding whether a distribution is skewed helps in many areas, from statistical inference to predictive modeling.
For example, a right-skewed dataset might require using different statistical methods or transformations to achieve normality for analysis purposes.

Problem 1

Problem 4

Other exercises in this chapter

Problem 1

In [33] Stephen Stigler discusses data from the Edinburgh Medical and Surgical Journal (1817). These concern the chest circumference of 5732 Scottish soldiers,

View solution

Problem 4

The ordered software data (see also Table 15.3) are given in the following list. $$ \begin{array}{rrrrrrrrrr} 0 & 0 & 0 & 2 & 4 & 6 & 8 & 9 & 10 & 10 \\ 10 & 12

View solution

Problem 5

Suppose we construct a histogram with bins $[0,1],(1,3],(3,5],(5,8]$, $(8,11],(11,14]$, and $(14,18]$. Given are the values of the empirical distribution

View solution

Problem 6

Given is the following information about a histogram: $$ \begin{array}{cc} \hline \hline \text { Bin } & \text { Height } \\ \hline(0,2] & 0.245 \\ (2,4] & 0.13

View solution