STA 349 Statistics Winter2023 Solutions 7225
STA 349 Statistics Winter2023 Solutions 7225
Scarlett X. Mitchell
Dr. Carol Roberts
STAT 220
Northern Valley Highland University
Solution:
Solution for high-dimensional statistics (p = 929):
a) LASSO oracle inequalities:
||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 ≤ C_1 sλ_n^2/n with probability 1-δ
where s = 17, λ_n ~ √(log p/n), C_1 depends on RE constant
b) Restricted Eigenvalue condition:
RE(κ,s) = min_{||δ||_0≤s, ||δ_{S^c}||_1≤κ||δ_S||_1} ||δ||_2^2/||Xδ||_2^2
If RE(3,s) > 0, then LASSO achieves oracle rate O(s log p/n)
c) Non-asymptotic bounds:
Prediction: ||X(β̂ ᵗᵁᵔᵔᴿ - β₀)||_2^2 = O_p(s λ_n^2)
Estimation: ||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 = O_p(s λ_n^2/φ_0^2) where φ_0 = RE constant
d) Ridge vs Elastic Net comparison:
Ridge: bias increases with signal strength, doesn't perform variable selection
Elastic Net: α||β||_1 + (1-α)||β||_2^2 bridges LASSO and Ridge
β-min condition: min_{j∈S}|β_j| ≥ Cλ_n ensures variable selection consistency
e) Dantzig selector:
min ||β||_1 s.t. ||X'(Y-Xβ)||_∞ ≤ λ_n
Equivalent to linear program, achieves same oracle rates as LASSO
f) Screening and safe rules:
Strong rules: discard j if |x_j'r^{old}| < 2λ_n - λ_n^{old}
Page of
STAT 220 | September 6, 2025
Problem 2: Let X₁, X₂, ..., X_{57} be i.i.d. random variables from N(μ, σ²) arising in lucas
sequence analysis (Note: 57 is the sixteenth discrete semiprime and the sixth in the (3.q)
family.). The population mean relates to e ≈ 2.7183, and we test H₀: μ = 9 in modular
arithmetic base p = 3061. a) Derive the joint sufficient statistic (T₁, T₂) for (μ, σ²) using the
exponential family factorization theorem b) Prove T₁ and T₂ form a complete sufficient
statistic using the uniqueness of moment generating functions c) Find the UMVUE for μ and
σ² using the Lehmann-Scheffé theorem and Rao-Blackwell theorem d) Construct a 99%
confidence interval for μ incorporating e scaling e) Derive the likelihood ratio test statistic
and its asymptotic distribution under local alternatives f) Analyze the power function β(μ)
and compute the sample size for achieving power 0.8 g) Study connections between lucas
growth rates and statistical efficiency bounds h) Investigate how modular arithmetic affects
the Fisher information matrix
Solution:
Solution for enhanced mathematical statistics with lucas context:
a) Joint sufficient statistic in exponential family:
L(μ, σ²) = (2πσ²)^{-57/2} exp(-∑(xᵢ - μ)²/(2σ²))
= exp([μ/σ²]∑xᵢ - [1/(2σ²)]∑xᵢ² - [nμ²/(2σ²)] - [n/2]log(2πσ²))
Natural parameters: η₁ = μ/σ², η₂ = -1/(2σ²)
Sufficient statistics: T₁ = ∑Xᵢ, T₂ = ∑Xᵢ² (jointly sufficient)
b) Completeness proof via MGF uniqueness:
E[g(T₁,T₂)] = 0 for all θ ⟹ g(T₁,T₂) = 0 a.s.
The exponential family has full rank, ensuring completeness
c) UMVUE by Lehmann-Scheffé theorem:
- μ̂ = X̄ = T₁/57 (unbiased, sufficient, complete ⟹ UMVUE)
- σ̂ ² = S² = (T₂ - 57X̄ ²)/(57-1) (unbiased for σ²)
Var(X̄ ) = σ²/57, Var(S²) = 2σ⁴/(57-1)
d) Enhanced confidence interval with e scaling:
Page of
STAT 220 | September 6, 2025
Problem 3: Let X₁, X₂, ..., X_{67} be i.i.d. random variables from N(μ, σ²) arising in fibonacci
sequence analysis (Note: 67 is the smallest number which is palindromic in bases 5 and 6.).
The population mean relates to ζ(2) = π²/6 ≈ 1.6449, and we test H₀: μ = 8 in modular
arithmetic base p = 1069. a) Derive the joint sufficient statistic (T₁, T₂) for (μ, σ²) using the
exponential family factorization theorem b) Prove T₁ and T₂ form a complete sufficient
statistic using the uniqueness of moment generating functions c) Find the UMVUE for μ and
σ² using the Lehmann-Scheffé theorem and Rao-Blackwell theorem d) Construct a 99%
confidence interval for μ incorporating ζ(2) = π²/6 scaling e) Derive the likelihood ratio test
statistic and its asymptotic distribution under local alternatives f) Analyze the power
function β(μ) and compute the sample size for achieving power 0.8 g) Study connections
between fibonacci growth rates and statistical efficiency bounds h) Investigate how modular
arithmetic affects the Fisher information matrix
Solution:
Solution for enhanced mathematical statistics with fibonacci context:
a) Joint sufficient statistic in exponential family:
L(μ, σ²) = (2πσ²)^{-67/2} exp(-∑(xᵢ - μ)²/(2σ²))
= exp([μ/σ²]∑xᵢ - [1/(2σ²)]∑xᵢ² - [nμ²/(2σ²)] - [n/2]log(2πσ²))
Natural parameters: η₁ = μ/σ², η₂ = -1/(2σ²)
Sufficient statistics: T₁ = ∑Xᵢ, T₂ = ∑Xᵢ² (jointly sufficient)
Page of
STAT 220 | September 6, 2025
Page of
STAT 220 | September 6, 2025
connections between catalan sequences and regularization paths h) Apply random matrix
theory to analyze the sample covariance eigenvalue spectrum
Solution:
Solution for high-dimensional statistics (p = 892):
a) LASSO oracle inequalities:
||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 ≤ C_1 sλ_n^2/n with probability 1-δ
where s = 19, λ_n ~ √(log p/n), C_1 depends on RE constant
b) Restricted Eigenvalue condition:
RE(κ,s) = min_{||δ||_0≤s, ||δ_{S^c}||_1≤κ||δ_S||_1} ||δ||_2^2/||Xδ||_2^2
If RE(3,s) > 0, then LASSO achieves oracle rate O(s log p/n)
c) Non-asymptotic bounds:
Prediction: ||X(β̂ ᵗᵁᵔᵔᴿ - β₀)||_2^2 = O_p(s λ_n^2)
Estimation: ||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 = O_p(s λ_n^2/φ_0^2) where φ_0 = RE constant
d) Ridge vs Elastic Net comparison:
Ridge: bias increases with signal strength, doesn't perform variable selection
Elastic Net: α||β||_1 + (1-α)||β||_2^2 bridges LASSO and Ridge
β-min condition: min_{j∈S}|β_j| ≥ Cλ_n ensures variable selection consistency
e) Dantzig selector:
min ||β||_1 s.t. ||X'(Y-Xβ)||_∞ ≤ λ_n
Equivalent to linear program, achieves same oracle rates as LASSO
f) Screening and safe rules:
Strong rules: discard j if |x_j'r^{old}| < 2λ_n - λ_n^{old}
Safe rules: KKT conditions guarantee β_j = 0 if |x_j'y| < λ_n
g) Catalan regularization paths:
λ sequence follows catalan growth: λ_k = 1430 × γ^k
Solution path complexity bounded by catalan recurrence properties
h) Random matrix theory with scaling 0.5772:
Sample covariance eigenvalues follow Marchenko-Pastur law when p/n → c
Phase transition at c = 1, bulk eigenvalues in [0.240, 1.760]
This demonstrates cutting-edge high-dimensional statistical theory.
Problem 5: Let X₁, X₂, ..., X_{69} be i.i.d. random variables from N(μ, σ²) arising in fibonacci
sequence analysis (Note: 69 is a value of n where n^{2} and n^{3} together contain each
digit once.). The population mean relates to ζ(2) = π²/6 ≈ 1.6449, and we test H₀: μ = 8 in
modular arithmetic base p = 2063. a) Derive the joint sufficient statistic (T₁, T₂) for (μ, σ²)
Page of
STAT 220 | September 6, 2025
using the exponential family factorization theorem b) Prove T₁ and T₂ form a complete
sufficient statistic using the uniqueness of moment generating functions c) Find the UMVUE
for μ and σ² using the Lehmann-Scheffé theorem and Rao-Blackwell theorem d) Construct a
95% confidence interval for μ incorporating ζ(2) = π²/6 scaling e) Derive the likelihood ratio
test statistic and its asymptotic distribution under local alternatives f) Analyze the power
function β(μ) and compute the sample size for achieving power 0.8 g) Study connections
between fibonacci growth rates and statistical efficiency bounds h) Investigate how modular
arithmetic affects the Fisher information matrix
Solution:
Solution for enhanced mathematical statistics with fibonacci context:
a) Joint sufficient statistic in exponential family:
L(μ, σ²) = (2πσ²)^{-69/2} exp(-∑(xᵢ - μ)²/(2σ²))
= exp([μ/σ²]∑xᵢ - [1/(2σ²)]∑xᵢ² - [nμ²/(2σ²)] - [n/2]log(2πσ²))
Natural parameters: η₁ = μ/σ², η₂ = -1/(2σ²)
Sufficient statistics: T₁ = ∑Xᵢ, T₂ = ∑Xᵢ² (jointly sufficient)
b) Completeness proof via MGF uniqueness:
E[g(T₁,T₂)] = 0 for all θ ⟹ g(T₁,T₂) = 0 a.s.
The exponential family has full rank, ensuring completeness
c) UMVUE by Lehmann-Scheffé theorem:
- μ̂ = X̄ = T₁/69 (unbiased, sufficient, complete ⟹ UMVUE)
- σ̂ ² = S² = (T₂ - 69X̄ ²)/(69-1) (unbiased for σ²)
Var(X̄ ) = σ²/69, Var(S²) = 2σ⁴/(69-1)
d) Enhanced confidence interval with ζ(2) = π²/6 scaling:
(X̄ - t_{0.025,68}·S/√69, X̄ + t_{0.025,68}·S/√69)
Adjusted for ζ(2) = π²/6 ≈ 1.6449: width × 1.64
e) Likelihood ratio test statistic:
Λ = sup_{H₀}L(θ)/sup_{Θ}L(θ) = [(S²+(X̄ -8)²)/(S²)]^{-69/2}
-2log Λ = 69log(1 + (X̄ -8)²/S²) →ᵈ χ²₁ under H₀
f) Power function and sample size:
β(μ) = P(|T| > t_{0.025,68} | μ) where T ~ t_{68}(δ)
Noncentrality: δ = (μ-8)/(σ/√69)
For power 0.8: n ≥ (t_{0.025} + t_{0.2})²σ²/(μ-8)²
g) Fibonacci efficiency connection:
Growth rate 8 relates to Cramér-Rao lower bound efficiency
Fisher information: I(μ,σ²) = diag(69/σ², 69/(2σ⁴))
h) Modular arithmetic Fisher information (mod 2063):
Information matrix properties preserved under finite field operations
Page of
STAT 220 | September 6, 2025
Problem 6: Consider extreme value analysis with block maxima of size 25 (Note: 25 is the
smallest square that can be written as a sum of 2 squares.). Threshold selection at 90%
quantile with shape parameter influenced by e ≈ 2.7183. Return levels relate to 9-th catalan
number: 4862. a) Derive the Generalized Extreme Value (GEV) distribution and its domain
of attraction b) Establish the Fisher-Tippett-Gnedenko theorem for block maxima
convergence c) Implement the Peaks-Over-Threshold (POT) method with Generalized Pareto
Distribution d) Estimate the shape, scale, and location parameters using Maximum
Likelihood e) Construct confidence intervals for high quantiles and return levels f) Apply the
Delta method for uncertainty quantification in return level estimation g) Study the
connection between catalan extremal indices and clustering h) Analyze how e affects tail
dependence in multivariate extremes
Solution:
Solution for extreme value theory:
a) GEV distribution and domain of attraction:
F_ξ(x) = exp(-(1+ξ(x-μ)/σ)_+^{-1/ξ}) for ξ ≠ 0
Domain of attraction: Type I (Gumbel, ξ=0), Type II (Fréchet, ξ>0), Type III (Weibull, ξ<0)
b) Fisher-Tippett-Gnedenko theorem:
If M_n = max(X_1,...,X_n) and P(a_n^{-1}(M_n-b_n) ≤ x) → G(x), then G is GEV
Normalizing constants: a_n = F^{-1}(1-1/n) - F^{-1}(1-2/n), b_n = F^{-1}(1-1/n)
c) POT method with GPD:
For X > u (threshold 0.9 quantile): F_u(x) = 1 - (1+ξ x/σ_u)_+^{-1/ξ}
Scale parameter: σ_u = σ + ξ(u-μ), shape ξ unchanged
d) MLE estimation:
Log-likelihood: ℓ(μ,σ,ξ) = -25logσ - (1+1/ξ)∑log(1+ξ(x_i-μ)/σ)
Profile likelihood for ξ, then conditional MLE for μ,σ
e) Confidence intervals for quantiles:
x_p = μ + (σ/ξ)[(-log(1-p))^{-ξ} - 1] for return level
Delta method: Var(x_p) ≈ ∇x_p' × I(θ)^{-1} × ∇x_p
f) Delta method uncertainty:
∇x_p = [∂x_p/∂μ, ∂x_p/∂σ, ∂x_p/∂ξ]'
Asymptotic normality: (x̂ _p - x_p)/√Var(x̂_p) →ᵈ N(0,1)
g) Catalan extremal index:
θ = lim P(M_n ≤ u_n | X_1 > u_n) where u_n = 4862/n quantile
Page of
STAT 220 | September 6, 2025
Problem 7: Consider a Bayesian analysis with normal likelihood and conjugate prior. a)
Specify the likelihood function L(θ|x₁, ..., x_{33}) for the normal model b) Choose an
appropriate conjugate prior π(θ) and justify your choice c) Derive the posterior distribution
π(θ|data) using Bayes' theorem d) Find the Bayesian point estimators: posterior mean,
median, and mode e) Construct a 95% credible interval for θ f) Compare with the frequentist
approach: MLE and confidence interval g) Perform posterior predictive checking and model
assessment h) Discuss the choice of loss function and its impact on Bayesian inference
Solution:
Solution for normal likelihood with conjugate prior:
a) Likelihood function:
L(μ,σ²|x) = (2πσ²)^{-n/2} exp(-∑(xᵢ-μ)²/(2σ²))
b) Prior specification:
Conjugate prior for normal: natural conjugate family
c) Posterior distribution:
π(θ|data) ∝ L(θ|data) × π(θ)
Posterior derivation for conjugate prior with normal likelihood
d) Bayesian point estimators:
Posterior mean, median, and mode for the given prior-likelihood combination
e) Credible interval:
95% credible interval using posterior quantiles
f) Frequentist comparison:
Comparison with MLE and confidence intervals for normal model
g) Posterior predictive checking:
Posterior predictive distribution and model checking procedures
h) Loss function considerations:
Analysis of squared error, absolute error, and 0-1 loss functions
This demonstrates advanced Bayesian statistical theory and computational methods.
Page of
STAT 220 | September 6, 2025
Problem 8: Let X₁, X₂, ..., X_{48} be i.i.d. observations from N_5(μ, Σ) (Note: 5 is the second
Sierpinski number of the first kind, and can be written as S2=(22)+1.). The covariance
structure involves the matrix \begin{pmatrix}2 & -1 & -3 & 0 & -1 \\ -1 & 2 & 2 & -1 & 0 \\ -3
& 2 & 0 & -3 & 0 \\ 0 & -1 & -3 & -4 & 1 \\ -1 & 0 & 0 & 1 & 5\end{pmatrix} scaled by √2 ≈
1.4142. Analysis parameters relate to the 13-th lucas number: 521. a) Derive the MLE for μ
and Σ using matrix differential calculus and the Wishart distribution b) Establish the joint
distribution (μ̂ , Σ̂) and their asymptotic properties under regularity conditions c) Construct
Hotelling's T² statistic and Roy's union-intersection test for H₀: μ = μ₀ d) Derive the exact and
asymptotic distributions of T², including non-central F distribution e) Construct simultaneous
confidence intervals using Bonferroni and Scheffé methods f) Test sphericity H₀: Σ = σ²I using
likelihood ratio and implement Mauchly's test g) Perform PCA with eigenvalue
decomposition and analyze explained variance ratios h) Apply canonical correlation analysis
between subvectors of dimension 2 i) Study connections between lucas eigenvalue
distributions and random matrix theory
Solution:
Solution for 5-dimensional enhanced multivariate analysis:
a) MLE derivation using matrix calculus:
L(μ,Σ) = (2π)^{-485/2}|Σ|^{-48/2}exp(-1/2 ∑(xᵢ-μ)'Σ^{-1}(xᵢ-μ))
∂log L/∂μ = Σ^{-1}∑(xᵢ-μ) = 0 ⟹ μ̂ = X̄
∂log L/∂Σ^{-1} = 48/2 Σ - 1/2 ∑(xᵢ-μ)(xᵢ-μ)' = 0 ⟹ Σ̂ = S
b) Joint distribution and asymptotics:
- X̄ ~ N_5(μ, Σ/48) exactly
- 48Σ̂ ~ W_5(48-1, Σ) (Wishart distribution)
- Asymptotically: √48(μ̂ -μ) →ᵈ N_5(0, Σ)
- vec(Σ̂-Σ) →ᵈ N(0, 2Σ⊗Σ/48) by delta method
c) Test statistics and union-intersection principle:
Hotelling's T² = 48(X̄ -μ₀)'S^{-1}(X̄ -μ₀)
Roy's test: max eigenvalue of S^{-1}A where A = (X̄ -μ₀)(X̄ -μ₀)'
d) Exact and asymptotic distributions:
Under H₀: T²((48-5)/(5(48-1))) ~ F_{5,48-5}
Under H₁: T² ~ (5(48-1)/(48-5))F_{5,48-5}(δ) with δ = 48(μ-μ₀)'Σ^{-1}(μ-μ₀)
e) Simultaneous confidence intervals:
Bonferroni: P(∩|μᵢ-μ₀ᵢ| ≤ t_{α/(25),48-1}sᵢᵢ^{1/2}/√48) ≥ 1-α
Scheffé: P(|a'(μ-μ₀)| ≤ √((5(48-1)F_{5,48-5,α})/(48-5))s_{aa}/√48 ∀a) = 1-α
f) Sphericity testing:
LRT: Λ = |S|^{48/2}/(∑sᵢᵢ/5)^{548/2}
-2logΛ →ᵈ χ²_{5(5+1)/2-1} under H₀
Mauchly's test: W = |S|/(∑sᵢᵢ/5)^5 with Box's correction
g) PCA with √2 scaling:
Page of
STAT 220 | September 6, 2025
Solution:
Solution for ARIMA(1, 1, 2) model:
a) Stationarity conditions:
Stationarity conditions for AR(1) component: roots outside unit circle
b) Autocovariance function:
Autocovariance function for ARIMA(1,1,2) model
c) ACF and PACF:
ACF and PACF analysis for model identification
d) Maximum likelihood estimation:
MLE estimation using conditional likelihood for ARIMA parameters
e) Diagnostic checking:
Residual analysis, normality tests, and independence testing
f) Forecasting:
Multi-step ahead forecasting with prediction intervals
g) Model comparison:
Model selection using information criteria and cross-validation
This demonstrates advanced time series analysis including ARIMA modeling and forecasting.
Page of
STAT 220 | September 6, 2025
Problem 10: Consider the ARIMA(3, 1, 2) model: (1 - -0.26B + -0.25B + -0.10BB)(1 - B)X ₜ =
(1 + -0.23B + -0.08BB)εₜ where B is the backshift operator and ε ₜ ~ WN(0, σ²). a) Verify the
stationarity conditions for this model b) Derive the autocovariance function γ(h) for h = 0, 1,
2, ... c) Find the autocorrelation function ρ(h) and partial autocorrelation function d)
Estimate the model parameters using maximum likelihood estimation e) Perform diagnostic
checking using residual analysis and Ljung-Box test f) Forecast Xₜ₊ₕ for h = 1, 2, 3 with
prediction intervals g) Compare with alternative models using AIC, BIC, and cross-validation
Solution:
Solution for ARIMA(3, 1, 2) model:
a) Stationarity conditions:
Stationarity conditions for AR(3) component: roots outside unit circle
b) Autocovariance function:
Autocovariance function for ARIMA(3,1,2) model
c) ACF and PACF:
ACF and PACF analysis for model identification
d) Maximum likelihood estimation:
MLE estimation using conditional likelihood for ARIMA parameters
e) Diagnostic checking:
Residual analysis, normality tests, and independence testing
f) Forecasting:
Multi-step ahead forecasting with prediction intervals
g) Model comparison:
Model selection using information criteria and cross-validation
This demonstrates advanced time series analysis including ARIMA modeling and forecasting.
Note: The textbook example on page 157 shows a similar approach
Problem 11: Consider extreme value analysis with block maxima of size 51 (Note: 51 is the
14th discrete biprime and the 5th in the {3.q} semiprime family having the prime factors
(3.17).). Threshold selection at 92% quantile with shape parameter influenced by e ≈ 2.7183.
Return levels relate to 12-th lucas number: 322. a) Derive the Generalized Extreme Value
(GEV) distribution and its domain of attraction b) Establish the Fisher-Tippett-Gnedenko
theorem for block maxima convergence c) Implement the Peaks-Over-Threshold (POT)
method with Generalized Pareto Distribution d) Estimate the shape, scale, and location
parameters using Maximum Likelihood e) Construct confidence intervals for high quantiles
and return levels f) Apply the Delta method for uncertainty quantification in return level
Page of
STAT 220 | September 6, 2025
estimation g) Study the connection between lucas extremal indices and clustering h) Analyze
how e affects tail dependence in multivariate extremes
Solution:
Solution for extreme value theory:
a) GEV distribution and domain of attraction:
F_ξ(x) = exp(-(1+ξ(x-μ)/σ)_+^{-1/ξ}) for ξ ≠ 0
Domain of attraction: Type I (Gumbel, ξ=0), Type II (Fréchet, ξ>0), Type III (Weibull, ξ<0)
b) Fisher-Tippett-Gnedenko theorem:
If M_n = max(X_1,...,X_n) and P(a_n^{-1}(M_n-b_n) ≤ x) → G(x), then G is GEV
Normalizing constants: a_n = F^{-1}(1-1/n) - F^{-1}(1-2/n), b_n = F^{-1}(1-1/n)
c) POT method with GPD:
For X > u (threshold 0.92 quantile): F_u(x) = 1 - (1+ξ x/σ_u)_+^{-1/ξ}
Scale parameter: σ_u = σ + ξ(u-μ), shape ξ unchanged
d) MLE estimation:
Log-likelihood: ℓ(μ,σ,ξ) = -51logσ - (1+1/ξ)∑log(1+ξ(x_i-μ)/σ)
Profile likelihood for ξ, then conditional MLE for μ,σ
e) Confidence intervals for quantiles:
x_p = μ + (σ/ξ)[(-log(1-p))^{-ξ} - 1] for return level
Delta method: Var(x_p) ≈ ∇x_p' × I(θ)^{-1} × ∇x_p
f) Delta method uncertainty:
∇x_p = [∂x_p/∂μ, ∂x_p/∂σ, ∂x_p/∂ξ]'
Asymptotic normality: (x̂ _p - x_p)/√Var(x̂_p) →ᵈ N(0,1)
g) Lucas extremal index:
θ = lim P(M_n ≤ u_n | X_1 > u_n) where u_n = 322/n quantile
Clustering measured by runs estimator: θ̂ = (number of clusters)/(number of exceedances)
h) Multivariate extremes with e:
Tail dependence: λ_U = lim P(Y > F_Y^{-1}(t) | X > F_X^{-1}(t))
Copula approach with scaling factor 2.7183 in tail regions
This demonstrates advanced extreme value theory with practical applications.
Problem 12: Let X₁, X₂, ..., X_{48} be i.i.d. observations from N_4(μ, Σ) (Note: 4 is the second
square number, the second centered triangular number.). The covariance structure involves
the matrix \begin{pmatrix}-3 & 1 & 2 & 1 \\ 1 & -5 & 0 & -3 \\ 2 & 0 & 1 & -3 \\ 1 & -3 & -3 &
-5\end{pmatrix} scaled by √2 ≈ 1.4142. Analysis parameters relate to the 9-th fibonacci
number: 34. a) Derive the MLE for μ and Σ using matrix differential calculus and the Wishart
distribution b) Establish the joint distribution (μ̂ , Σ̂) and their asymptotic properties under
Page of
STAT 220 | September 6, 2025
regularity conditions c) Construct Hotelling's T² statistic and Roy's union-intersection test for
H₀: μ = μ₀ d) Derive the exact and asymptotic distributions of T², including non-central F
distribution e) Construct simultaneous confidence intervals using Bonferroni and Scheffé
methods f) Test sphericity H₀: Σ = σ²I using likelihood ratio and implement Mauchly's test g)
Perform PCA with eigenvalue decomposition and analyze explained variance ratios h) Apply
canonical correlation analysis between subvectors of dimension 2 i) Study connections
between fibonacci eigenvalue distributions and random matrix theory
Solution:
Solution for 4-dimensional enhanced multivariate analysis:
a) MLE derivation using matrix calculus:
L(μ,Σ) = (2π)^{-484/2}|Σ|^{-48/2}exp(-1/2 ∑(xᵢ-μ)'Σ^{-1}(xᵢ-μ))
∂log L/∂μ = Σ^{-1}∑(xᵢ-μ) = 0 ⟹ μ̂ = X̄
∂log L/∂Σ^{-1} = 48/2 Σ - 1/2 ∑(xᵢ-μ)(xᵢ-μ)' = 0 ⟹ Σ̂ = S
b) Joint distribution and asymptotics:
- X̄ ~ N_4(μ, Σ/48) exactly
- 48Σ̂ ~ W_4(48-1, Σ) (Wishart distribution)
- Asymptotically: √48(μ̂ -μ) →ᵈ N_4(0, Σ)
- vec(Σ̂-Σ) →ᵈ N(0, 2Σ⊗Σ/48) by delta method
c) Test statistics and union-intersection principle:
Hotelling's T² = 48(X̄ -μ₀)'S^{-1}(X̄ -μ₀)
Roy's test: max eigenvalue of S^{-1}A where A = (X̄ -μ₀)(X̄ -μ₀)'
d) Exact and asymptotic distributions:
Under H₀: T²((48-4)/(4(48-1))) ~ F_{4,48-4}
Under H₁: T² ~ (4(48-1)/(48-4))F_{4,48-4}(δ) with δ = 48(μ-μ₀)'Σ^{-1}(μ-μ₀)
e) Simultaneous confidence intervals:
Bonferroni: P(∩|μᵢ-μ₀ᵢ| ≤ t_{α/(24),48-1}sᵢᵢ^{1/2}/√48) ≥ 1-α
Scheffé: P(|a'(μ-μ₀)| ≤ √((4(48-1)F_{4,48-4,α})/(48-4))s_{aa}/√48 ∀a) = 1-α
f) Sphericity testing:
LRT: Λ = |S|^{48/2}/(∑sᵢᵢ/4)^{448/2}
-2logΛ →ᵈ χ²_{4(4+1)/2-1} under H₀
Mauchly's test: W = |S|/(∑sᵢᵢ/4)^4 with Box's correction
g) PCA with √2 scaling:
S = PΛP' where Λ = diag(λ₁,...,λ_{4}), λ₁ ≥ ... ≥ λ_{4}
Proportion of variance: λᵢ/∑λⱼ, cumulative: ∑ᵼₖ₁λⱼ/∑ⱼλⱼ
Adjusted eigenvalues: λᵢ × 1.4142 for enhanced interpretation
h) Canonical correlation analysis:
Partition X = [X₁' X₂']' with X₁ ∈ ℝ^2, X₂ ∈ ℝ^2
Page of
STAT 220 | September 6, 2025
Page of