Problem 9
Question
An examination of 1000 people showed that 41 were carriers (heterozygotic) of the gene for cystic fibrosis. In a second, independent examination of 2000 people, 79 were found to be carriers of cystic fibrosis. Let \(p\) be the proportion of all people who are carriers of cystic fibrosis. For any number \(p\) in \([0,1],\) let \(L(p)\) be the likelihood of finding that 41 of 1000 people in one study and 79 out of 2000 people in a second independent study are carriers of cystic fibrosis given that the probability of being a carrier is \(p\). Then $$ L(p)=\left(\begin{array}{c} 1000 \\ 41 \end{array}\right) p^{41} \times(1-p)^{959} \times\left(\begin{array}{c} 2000 \\ 79 \end{array}\right) p^{79} \times(1-p)^{1921} $$ where \(\left(\begin{array}{c}1000 \\ 41\end{array}\right) \doteq 1.3 \times 10^{73}\) and \(\left(\begin{array}{c}2000 \\ 79\end{array}\right) \doteq 1.4 \times 10^{143}\) are constants. a. Simplify \(L(p)\). b. Compute \(L^{\prime}(p)\). c. Find the value \(\hat{p}\) of \(p\) for which \(L^{\prime}(p)=0\). The value \(L(\hat{p})\) is the maximum value of \(L(p)\) and \(\hat{p}\) is called the maximum likelihood estimator of \(p\).
Step-by-Step Solution
VerifiedKey Concepts
Likelihood Function
To compute \(L(p)\), we utilize combinations and probabilities. The formula uses binomial coefficients for each study: \( \left(\begin{array}{c} 1000 \ 41 \end{array}\right) \) and \( \left(\begin{array}{c} 2000 \ 79 \end{array}\right) \), which represent the number of ways to choose 41 carriers out of 1000 and 79 out of 2000, respectively. This is then multiplied by probabilities \(p\) for carriers and \(1-p\) for non-carriers, raised to the powers corresponding to the number of carriers and non-carriers in each sample.
- The exponents: \(p^{41}\), \((1-p)^{959}\), \(p^{79}\), and \((1-p)^{1921}\).
- It shows how combinatorics and probability theory come together to form a complex, yet meaningful expression.
This function helps statisticians and researchers estimate the true probability of being a carrier, based on the data provided.
Derivative
For our likelihood function \(L(p)\), the log-likelihood is \( \ln(L(p)) = \ln(1.3 \times 10^{73}) + \ln(1.4 \times 10^{143}) + 120 \ln(p) + 2880 \ln(1-p) \). By differentiating this expression with respect to \(p\), we obtain:
- \(\frac{d}{dp} \ln(L(p)) = \frac{120}{p} - \frac{2880}{1-p}\)
- This derivative helps us determine the conditions when our likelihood function is maximized.
Setting this derivative to zero and solving the resulting equation is a common procedure to find extreme points, like maximums or minimums. Here, it provides critical information about the probability \(p\) that maximizes our likelihood function.
Probability
To understand this, it's essential to grasp that probability ranges from 0 to 1, where 0 means the event never occurs, and 1 means it always occurs. For carriers, \(p\) would be between these limits, showing the portion of the population carrying the gene for cystic fibrosis.
- If \(p\) is close to 1, most individuals are carriers.
- If \(p\) is closer to 0, very few are carriers.
When calculating the likelihood function, the probabilities \(p\) and \(1-p\) are used to measure how well a certain value of \(p\) predicts the observed outcomes (number of carriers) in our samples. In summary, understanding probability is key to interpreting and calculating likelihoods, which then feed into larger concepts like maximum likelihood estimation.
Maximum Likelihood Estimator
To find the MLE, we follow these steps:
- Take the derivative of the log-likelihood function.
- Set the derivative equal to zero to find the point where the likelihood function does not increase or decrease.
The MLE is crucial because it provides a single, best estimate that makes the observed data most probable under the model. It is widely used because of its desirable statistical properties, especially in large samples, where it tends to be unbiased and efficient.