Problem 4

Question

Bei einer Messung der Windgeschwindigkeiten (in Meilen pro Stunde) auf dem La Guardia Flughafen in New York ergaben sich in den ersten 29 Tagen des Monats Mai (1973) jeweils um \(7.00\) Uhr morgens die folgenden Werte: \(\begin{array}{rrrrrrrrrr}7.4 & 8.0 & 12.6 & 11.5 & 14.3 & 14.9 & 8.6 & 13.8 & 20.1 & 8.6 \\ 6.9 & 9.7 & 9.2 & 10.9 & 13.2 & 11.5 & 12.0 & 18.4 & 11.5 & 9.7 \\\ 9.7 & 16.6 & 9.7 & 12.0 & 16.6 & 14.9 & 8.0 & 12.0 & 14.9 & \end{array}\) a) Skizzieren Sie die empirische Verteilungsfunktion der angegebenen Messreihe und zeichnen Sie ein Histogramm mit folgender Klasseneinteilung $$ \begin{array}{ccccc} (5.0,7.0], & (7.0,9.0], & (9.0,11.0], & \ldots, & (19.0,21.0] \end{array} $$b) Berechnen Sie das arithmetische Mittel, den Median, die Spannweite, die empirische Varianz, die empirische Standardabweichung und den Quartilabstand. c) Veranschaulichen Sie die Struktur der Messreihe mit Hilfe eines Boxplots, den Sie unter das Histogramm zeichnen.

Step-by-Step Solution

Verified
Answer
Histogram and EDF show data distribution. Mean, median, range, variance, standard deviation, and IQR quantify data. Boxplot visualizes data spread.
1Step 1: Calculate Frequencies
Count the number of values in each class interval: (5.0,7.0]: 1(7.0,9.0]: 8(9.0,11.0]: 10(11.0,13.0]: 5(13.0,15.0]: 3(15.0,17.0]: 2(17.0,19.0]: 1(19.0,21.0]: 1
2Step 2: Draw Histogram
Create a histogram where the x-axis represents the class intervals and the y-axis represents the frequencies calculated in Step 1.
3Step 3: Calculate Empirical Distribution Function
The empirical distribution function (EDF) is calculated by placing cumulative probabilities at each measured wind speed value. Order the values first, and then add cumulative probabilities.
4Step 4: Calculate Mean
Compute the arithmetic mean (average) of the values: \ \ \[ \bar{x} = \frac{1}{29} \sum_{i=1}^{29} x_i \]
5Step 5: Calculate Median
Find the median (the middle value) of the ordered set of wind speeds. If there is an even number of observations, compute the average of the two middle numbers.
6Step 6: Calculate Range
To calculate the range, subtract the smallest value in the dataset from the largest value: \ \ \[ \text{Range} = \max(x_i) - \min(x_i) \]
7Step 7: Calculate Empirical Variance
The formula for empirical variance is: \ \ \[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]
8Step 8: Calculate Empirical Standard Deviation
The empirical standard deviation is the square root of the variance: \ \ \[ s = \sqrt{s^2} \]
9Step 9: Calculate Interquartile Range (IQR)
The interquartile range (IQR) is the difference between the first quartile (Q1) and the third quartile (Q3).
10Step 10: Draw Boxplot
Construct a boxplot below the histogram from Step 2, displaying the following: Minimum, Q1 (25th percentile), Median (50th percentile), Q3 (75th percentile), and Maximum.

Key Concepts

HistogramArithmetic MeanEmpirical Standard DeviationBoxplotQuartile
Histogram
A histogram is a useful way to visualize the distribution of data. It consists of rectangles (bins) that represent the frequency of data points within certain intervals. For instance, in this exercise, wind speeds are divided into class intervals like (5.0,7.0], (7.0,9.0], etc. The height of each bin shows the number of measurements in that interval.

To draw a histogram:
  • First, divide the data into intervals (bins).
  • Count the frequency of data points in each interval.
  • Draw a bar for each interval, where the height represents the frequency.

Histograms help to visualize the shape of the data distribution, indicating whether it's skewed, symmetrical, or has any gaps or outliers.
Arithmetic Mean
The arithmetic mean, or simply the mean, is a measure of central tendency that gives us an idea about the average value of a dataset. It is calculated by summing all the measurements and dividing by the number of measurements. The formula is:
\ [ \ bar{x} = \ frac{1}{n} \ sum_ {i=1}^{n} x_i \ ]

Where:
  • \( x_i \) are the individual measurements
  • \( n \) is the total number of measurements
In our case, the wind speeds' mean provides an average speed observed during the month. This value can help in understanding typical wind conditions.
Empirical Standard Deviation
The empirical standard deviation measures the amount of variation or dispersion in a dataset. It tells us how much the individual measurements typically deviate from the mean. The formula is:
\ [ s = \ sqrt{ \ frac{1}{n-1} \ sum_ {i=1}^{n} (x_i - \ bar{x})^2 } \ ]

Where:
  • \( x_i \) are the individual measurements
  • \( \ bar{x} \) is the mean
  • \( n \) is the total number of measurements
A high standard deviation means that the data points are spread out over a larger range of values, and a low standard deviation indicates that they tend to be close to the mean.
Boxplot
A boxplot (or box-and-whisker plot) is a graphical representation that shows the summary of a dataset. It displays the dataset's minimum, first quartile (Q1), median, third quartile (Q3), and maximum values. The steps to draw a boxplot are:
  • Identify the five-number summary: minimum, Q1, median, Q3, and maximum.
  • Draw a box from Q1 to Q3.
  • Draw a line inside the box at the median.
  • Extend 'whiskers' from each side of the box to the min and max values.

The boxplot helps in understanding the distribution, spread, and potential outliers in the dataset. It complements the histogram by providing a concise summary of the distribution.
Quartile
Quartiles are values that divide a dataset into four equal parts. They provide insight into the distribution of the data. The key quartiles include:
  • First Quartile (Q1): The 25th percentile, below which 25% of data lies.
  • Median (Q2): The 50th percentile, which is the middle value of the dataset.
  • Third Quartile (Q3): The 75th percentile, below which 75% of data lies.

To calculate them, you need to:
  • Order the dataset from smallest to largest.
  • Find the median (Q2).
  • Q1 is the median of the lower half of the data.
  • Q3 is the median of the upper half of the data.
The interquartile range (IQR) is the difference between Q3 and Q1 and gives the middle 50% of the data, indicating the dataset’s spread.