Problem 1
Question
The data in Table 4.5 show the numbers of cases of AIDS in Australia by date of diagnosis for successive 3 -months periods from 1984 to 1988 (Data from National Centre for HIV Epidemiology and Clinical Research \(1994 .\) In this early phase of the epidemic, the numbers of cases seemed to be increasing exponentially. (a) Plot the number of cases \(y_{i}\) against time period \(i(i=1, \ldots, 20)\) (b) A possible model is the Poisson distribution with parameter \(\lambda_{i}=i^{\theta}\) or equivalently \\[ \log \lambda_{i}=\theta \log i \\] Plot \(\log y_{i}\) against \(\log i\) to examine this model (c) Fit a generalized linear model to these data using the Poisson distribution, the log-link function and the equation \\[ g\left(\lambda_{i}\right)=\log \lambda_{i}=\beta_{1}+\beta_{2} x_{i} \\] $$\begin{array}{crrrr} \hline & {}{\underline{\phantom{xx}}} {\text { Quarter }} \\ \text { Year } & 1 & 2 & 3 & 4 \\ \hline 1984 & 1 & 6 & 16 & 23 \\ 1985 & 27 & 39 & 31 & 30 \\ 1986 & 43 & 51 & 63 & 70 \\ 1987 & 88 & 97 & 91 & 104 \\ 1988 & 110 & 113 & 149 & 159 \\ \hline \end{array}$$ where \(x_{i}=\log i .\) Firstly, do this from first principles, working out expressions for the weight matrix \(\mathbf{W}\) and other terms needed for the iterative equation \\[ \mathbf{X}^{T} \mathbf{W} \mathbf{X} \mathbf{b}^{(m)}=\mathbf{X}^{T} \mathbf{W} \mathbf{z} \\] and using software which can perform matrix operations to carry out the calculations. (d) Fit the model described in (c) using statistical software which can perform Poisson regression. Compare the results with those obtained in (c).
Step-by-Step Solution
VerifiedKey Concepts
Poisson Distribution
The key parameter of the Poisson distribution is \(\lambda \), which represents the mean number of events (here, AIDS cases) in a fixed interval of time. Since the cases are expected to grow, \(\lambda_i = i^{\theta}\)suggests an increasing pattern, dependent on the time period \(i\).
Key points to remember about the Poisson distribution:
- The distribution is defined for positive integers, which naturally relate to count data such as the number of AIDS cases.
- It assumes that events are distributed independently.
- The variance of the distribution equals its mean.
Log-Link Function
The log-link function can be defined as:\[g(\lambda_{i}) = \log(\lambda_{i})\]where \(\lambda_{i}\)is the expected count, and \(\beta_1 + \beta_2 x_i\)represents the linear predictor.
Benefits of using the log-link function in this model:
- Transforms multiplicative relationships in the data to additive ones.
- Maintains non-negativity of the predictions.
- Simplifies dealing with exponential growth, common in epidemic data.
Iterative Equation
The iterative equation used is:\[\mathbf{X}^{T} \mathbf{W} \mathbf{X} \mathbf{b}^{(m)} = \mathbf{X}^{T} \mathbf{W} \mathbf{z}\]here, \(\mathbf{W}\)is the weight matrix related to current estimates ofg(the predicted response), and \(\mathbf{z}\)is the adjusted dependent variable.
Steps involved:
- Build initial estimates of parameters \(\beta\).
- Calculate the weighted least-squares updating equation.
- Iterate: update parameter estimates using matrix operations, check for convergence.
- Use statistical software when manual calculations become complex and time-consuming.
AIDS Epidemic Data Analysis
To capture such trends:
- Organize data chronologically to track changes over time.
- Plot the number of cases against time periods to visualize trends.
- Use logarithmic transformations to stabilize variance and linearize growth patterns.
- Apply GLMs, particularly with a Poisson distribution and log-link function for count data.
Understanding the methods and rationale behind these steps aids in modeling real-world epidemic patterns, enabling the development of informed public health strategies.