Problem 7

Question

Let \(X_{1}, X_{2}, \ldots\) be a sequence of independent and identically distributed random variables with distributions function \(F .\) Define \(F_{n}\) as follows: for any \(a\) $$ F_{n}(a)=\frac{\text { number of } X_{i} \text { in }(-\infty, a]}{n} $$ Consider \(a\) fixed and introduce the appropriate indicator random variables (as in Section 13.4). Compute their expectation and variance and show that the law of large numbers tells us that $$ \lim _{n \rightarrow \infty} \mathrm{P}\left(\left|F_{n}(a)-F(a)\right|>\varepsilon\right)=0 $$

Step-by-Step Solution

Verified
Answer
The law of large numbers ensures \(F_{n}(a)\) converges to \(F(a)\) in probability as \(n\) approaches infinity.
1Step 1: Define Indicator Variables
Introduce the indicator random variable for each \(X_i\). Define \(I_i(a) = \mathbb{1}_{(-\infty, a]}(X_i)\), where \(\mathbb{1}_{A}(x)\) is 1 if \(x \in A\) and 0 otherwise. Thus, \(I_i(a)\) is 1 if \(X_i \leq a\) and 0 otherwise. This represents the event that the random variable \(X_i\) is less than or equal to \(a\).
2Step 2: Express \(F_n(a)\) in Terms of \(I_i(a)\)
Express \(F_n(a)\) as the average of the indicator variables: \[ F_n(a) = \frac{1}{n} \sum_{i=1}^{n} I_i(a). \] This shows that \(F_n(a)\) is the average number of times \(X_i \leq a\).
3Step 3: Expectation of Indicator Variables
Compute the expectation of \(I_i(a)\). Since \(I_i(a)\) is an indicator variable, \(E[I_i(a)] = P(X_i \leq a) = F(a)\) by definition of the distribution function \(F\) of each \(X_i\).
4Step 4: Expectation of \(F_n(a)\)
Compute the expectation of \(F_n(a)\) using the linearity of expectation: \[ E[F_n(a)] = E\left(\frac{1}{n} \sum_{i=1}^n I_i(a)\right) = \frac{1}{n} \sum_{i=1}^{n} E[I_i(a)] = \frac{1}{n} \sum_{i=1}^{n} F(a) = F(a). \]
5Step 5: Variance of Indicator Variables
Compute the variance of \(I_i(a)\). Since \(I_i(a)\) is a Bernoulli random variable, \(\text{Var}(I_i(a)) = F(a)(1-F(a))\).
6Step 6: Variance of \(F_n(a)\)
Use the variance sum rule for independent random variables to find the variance of \(F_n(a)\): \[ \text{Var}(F_n(a)) = \text{Var}\left(\frac{1}{n} \sum_{i=1}^{n} I_i(a)\right) = \frac{1}{n^2} \sum_{i=1}^{n} \text{Var}(I_i(a)) = \frac{1}{n} F(a)(1-F(a)). \]
7Step 7: Applying the Weak Law of Large Numbers
The weak law of large numbers states that for large \(n\), \(F_n(a)\) will be close to \(F(a)\). More precisely, for any \(\varepsilon > 0\), \[ P\left(|F_n(a) - F(a)| > \varepsilon\right) \approx 0 \text{ as } n \rightarrow \infty.\] Here, the variance calculation ensures \(F_n(a)\) is a consistent estimator of \(F(a)\).
8Step 8: Conclusion
Thus, by the above steps and using Chebyshev's inequality, which leverages the variance calculated, we can conclude: \[ \lim_{n \rightarrow \infty} \mathrm{P}\left(\left|F_n(a) - F(a)\right| > \varepsilon\right) = 0. \] This concludes the proof using the laws of large numbers and the properties of expectation and variance.

Key Concepts

Bernoulli random variabledistribution functionChebyshev's inequality
Bernoulli random variable
In probability and statistics, a Bernoulli random variable is a simple yet pivotal concept. It represents a random outcome that can have only two possible results, typically labeled as 0 and 1. These outcomes usually correspond to 'failure' (0) and 'success' (1).

It is named after Jacob Bernoulli and is most commonly used to model dichotomous events, such as flipping a coin or checking whether it rains on a given day. The probability of success (1) is denoted by \( p \) and failure (0) by \( 1-p \).

  • The expectation (mean) of a Bernoulli random variable is \( E[I_i(a)] = p \).
  • The variance of a Bernoulli random variable is \( \text{Var}(I_i(a)) = p(1-p) \).


In the context of the law of large numbers, the indicator function \( I_i(a) \) can be seen as a Bernoulli random variable where 'success' occurs when \( X_i \leq a \). This allows us to calculate the expected number of successes in many trials, which is fundamentally important in understanding the frequency and likelihood of events in large samples.
distribution function
A distribution function, also known as the cumulative distribution function (CDF), is a fundamental concept in probability and statistics. It describes the probability that a random variable \( X \) takes on a value less than or equal to a certain threshold. Symbolically, it's denoted as \( F(a) = P(X \leq a) \).

The distribution function has several important properties:
  • It's non-decreasing, meaning it never decreases as the value of \( a \) increases.
  • The function is right-continuous.
  • As \( a \to -\infty \), \( F(a) \to 0 \), and as \( a \to \infty \), \( F(a) \to 1 \).


In practice, the function \( F_n(a) \) is constructed to approximate the distribution function \( F(a) \) by counting the proportion of observed values that fall into the interval \( (-\infty, a] \). As stated in the exercise, this empirical distribution function \( F_n(a) \) converges to \( F(a) \) as \( n \) becomes large, due to the law of large numbers, providing a reliable way of estimating distributions from sample data.
Chebyshev's inequality
Chebyshev's inequality is a powerful tool in probability theory that provides an upper bound on how likely a random variable diverges from its mean. It's particularly useful when you have little information about the distribution of the variable, other than its mean and variance.

In mathematical terms, Chebyshev's inequality states that for any random variable \( X \) with mean \( \mu \) and finite variance \( \sigma^2 \), the probability that \( X \) deviates from \( \mu \) by more than \( k \sigma \) is at most \( \frac{1}{k^2} \). Formally, \[ P(|X - \mu| > k\sigma) \leq \frac{1}{k^2}. \]

One of the most critical applications of Chebyshev's inequality is in proving the law of large numbers. It allows us to show that the sample mean \( F_n(a) \) converges to the true mean \( F(a) \) with increasing sample size \( n \).
  • By knowing the variance \( \text{Var}(F_n(a)) \), Chebyshev's inequality helps in bounding \( P(|F_n(a) - F(a)| > \varepsilon) \).

  • This inequality indicates that as \( n \) increases, the probability that \( F_n(a) \) and \( F(a) \) differ by more than \( \varepsilon \) approaches zero.
This proves that \( F_n(a) \) is a consistent estimator of \( F(a) \), which reinforces the reliability of using empirical data to estimate population parameters.