Open navigation menu

Scribd

0% found this document useful (0 votes)

38 views14 pages

STA 349 Statistics Winter2023 Solutions 7225

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views14 pages

STA 349 Statistics Winter2023 Solutions 7225

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

STAT 220 | September 6, 2025

Scarlett X. Mitchell
Dr. Carol Roberts
STAT 220
Northern Valley Highland University

Statistical Analysis Practice

With Solutions

Problem 1: Consider high-dimensional regression Y = Xβ + ε with p = 929 predictors (Note:

929 is an uninteresting number.). The design matrix X has entries scaled by ζ(2) = π²/6 ≈
1.6449, and β has 17 non-zero entries. Regularization parameter relates to the 5-th catalan
number: 42. a) Analyze the LASSO estimator β̂ ᵗᵁᵔᵔᴿ and establish oracle inequalities b) Study
the Restricted Eigenvalue (RE) condition and its implications for consistency c) Derive non-
asymptotic bounds on prediction and estimation error d) Compare with Ridge regression
and Elastic Net in the context of β-min conditions e) Analyze the Dantzig selector and its
connections to linear programming f) Study screening procedures and safe rules for
computational efficiency g) Investigate connections between catalan sequences and
regularization paths h) Apply random matrix theory to analyze the sample covariance
eigenvalue spectrum

Solution:
Solution for high-dimensional statistics (p = 929):
a) LASSO oracle inequalities:
||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 ≤ C_1 sλ_n^2/n with probability 1-δ
where s = 17, λ_n ~ √(log p/n), C_1 depends on RE constant
b) Restricted Eigenvalue condition:
RE(κ,s) = min_{||δ||_0≤s, ||δ_{S^c}||_1≤κ||δ_S||_1} ||δ||_2^2/||Xδ||_2^2
If RE(3,s) > 0, then LASSO achieves oracle rate O(s log p/n)
c) Non-asymptotic bounds:
Prediction: ||X(β̂ ᵗᵁᵔᵔᴿ - β₀)||_2^2 = O_p(s λ_n^2)
Estimation: ||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 = O_p(s λ_n^2/φ_0^2) where φ_0 = RE constant
d) Ridge vs Elastic Net comparison:
Ridge: bias increases with signal strength, doesn't perform variable selection
Elastic Net: α||β||_1 + (1-α)||β||_2^2 bridges LASSO and Ridge
β-min condition: min_{j∈S}|β_j| ≥ Cλ_n ensures variable selection consistency
e) Dantzig selector:
min ||β||_1 s.t. ||X'(Y-Xβ)||_∞ ≤ λ_n
Equivalent to linear program, achieves same oracle rates as LASSO
f) Screening and safe rules:
Strong rules: discard j if |x_j'r^{old}| < 2λ_n - λ_n^{old}

Page of
STAT 220 | September 6, 2025

Safe rules: KKT conditions guarantee β_j = 0 if |x_j'y| < λ_n

g) Catalan regularization paths:
λ sequence follows catalan growth: λ_k = 42 × γ^k
Solution path complexity bounded by catalan recurrence properties
h) Random matrix theory with scaling 1.6449:
Sample covariance eigenvalues follow Marchenko-Pastur law when p/n → c
Phase transition at c = 1, bulk eigenvalues in [-0.283, 2.283]
This demonstrates cutting-edge high-dimensional statistical theory.
Note: I used the method from homework 6

Problem 2: Let X₁, X₂, ..., X_{57} be i.i.d. random variables from N(μ, σ²) arising in lucas
sequence analysis (Note: 57 is the sixteenth discrete semiprime and the sixth in the (3.q)
family.). The population mean relates to e ≈ 2.7183, and we test H₀: μ = 9 in modular
arithmetic base p = 3061. a) Derive the joint sufficient statistic (T₁, T₂) for (μ, σ²) using the
exponential family factorization theorem b) Prove T₁ and T₂ form a complete sufficient
statistic using the uniqueness of moment generating functions c) Find the UMVUE for μ and
σ² using the Lehmann-Scheffé theorem and Rao-Blackwell theorem d) Construct a 99%
confidence interval for μ incorporating e scaling e) Derive the likelihood ratio test statistic
and its asymptotic distribution under local alternatives f) Analyze the power function β(μ)
and compute the sample size for achieving power 0.8 g) Study connections between lucas
growth rates and statistical efficiency bounds h) Investigate how modular arithmetic affects
the Fisher information matrix

Solution:
Solution for enhanced mathematical statistics with lucas context:
a) Joint sufficient statistic in exponential family:
L(μ, σ²) = (2πσ²)^{-57/2} exp(-∑(xᵢ - μ)²/(2σ²))
= exp([μ/σ²]∑xᵢ - [1/(2σ²)]∑xᵢ² - [nμ²/(2σ²)] - [n/2]log(2πσ²))
Natural parameters: η₁ = μ/σ², η₂ = -1/(2σ²)
Sufficient statistics: T₁ = ∑Xᵢ, T₂ = ∑Xᵢ² (jointly sufficient)
b) Completeness proof via MGF uniqueness:
E[g(T₁,T₂)] = 0 for all θ ⟹ g(T₁,T₂) = 0 a.s.
The exponential family has full rank, ensuring completeness
c) UMVUE by Lehmann-Scheffé theorem:
- μ̂ = X̄ = T₁/57 (unbiased, sufficient, complete ⟹ UMVUE)
- σ̂ ² = S² = (T₂ - 57X̄ ²)/(57-1) (unbiased for σ²)
Var(X̄ ) = σ²/57, Var(S²) = 2σ⁴/(57-1)
d) Enhanced confidence interval with e scaling:

Page of
STAT 220 | September 6, 2025

(X̄ - t_{0.005,56}·S/√57, X̄ + t_{0.005,56}·S/√57)

Adjusted for e ≈ 2.7183: width × 2.72
e) Likelihood ratio test statistic:
Λ = sup_{H₀}L(θ)/sup_{Θ}L(θ) = [(S²+(X̄ -9)²)/(S²)]^{-57/2}
-2log Λ = 57log(1 + (X̄ -9)²/S²) →ᵈ χ²₁ under H₀
f) Power function and sample size:
β(μ) = P(|T| > t_{0.005,56} | μ) where T ~ t_{56}(δ)
Noncentrality: δ = (μ-9)/(σ/√57)
For power 0.8: n ≥ (t_{0.005} + t_{0.2})²σ²/(μ-9)²
g) Lucas efficiency connection:
Growth rate 521 relates to Cramér-Rao lower bound efficiency
Fisher information: I(μ,σ²) = diag(57/σ², 57/(2σ⁴))
h) Modular arithmetic Fisher information (mod 3061):
Information matrix properties preserved under finite field operations
Efficiency bounds modified by characteristic 3061
This demonstrates advanced mathematical statistics with unique interdisciplinary
connections.

Problem 3: Let X₁, X₂, ..., X_{67} be i.i.d. random variables from N(μ, σ²) arising in fibonacci
sequence analysis (Note: 67 is the smallest number which is palindromic in bases 5 and 6.).
The population mean relates to ζ(2) = π²/6 ≈ 1.6449, and we test H₀: μ = 8 in modular
arithmetic base p = 1069. a) Derive the joint sufficient statistic (T₁, T₂) for (μ, σ²) using the
exponential family factorization theorem b) Prove T₁ and T₂ form a complete sufficient
statistic using the uniqueness of moment generating functions c) Find the UMVUE for μ and
σ² using the Lehmann-Scheffé theorem and Rao-Blackwell theorem d) Construct a 99%
confidence interval for μ incorporating ζ(2) = π²/6 scaling e) Derive the likelihood ratio test
statistic and its asymptotic distribution under local alternatives f) Analyze the power
function β(μ) and compute the sample size for achieving power 0.8 g) Study connections
between fibonacci growth rates and statistical efficiency bounds h) Investigate how modular
arithmetic affects the Fisher information matrix

Solution:
Solution for enhanced mathematical statistics with fibonacci context:
a) Joint sufficient statistic in exponential family:
L(μ, σ²) = (2πσ²)^{-67/2} exp(-∑(xᵢ - μ)²/(2σ²))
= exp([μ/σ²]∑xᵢ - [1/(2σ²)]∑xᵢ² - [nμ²/(2σ²)] - [n/2]log(2πσ²))
Natural parameters: η₁ = μ/σ², η₂ = -1/(2σ²)
Sufficient statistics: T₁ = ∑Xᵢ, T₂ = ∑Xᵢ² (jointly sufficient)

Page of
STAT 220 | September 6, 2025

b) Completeness proof via MGF uniqueness:

E[g(T₁,T₂)] = 0 for all θ ⟹ g(T₁,T₂) = 0 a.s.
The exponential family has full rank, ensuring completeness
c) UMVUE by Lehmann-Scheffé theorem:
- μ̂ = X̄ = T₁/67 (unbiased, sufficient, complete ⟹ UMVUE)
- σ̂ ² = S² = (T₂ - 67X̄ ²)/(67-1) (unbiased for σ²)
Var(X̄ ) = σ²/67, Var(S²) = 2σ⁴/(67-1)
d) Enhanced confidence interval with ζ(2) = π²/6 scaling:
(X̄ - t_{0.005,66}·S/√67, X̄ + t_{0.005,66}·S/√67)
Adjusted for ζ(2) = π²/6 ≈ 1.6449: width × 1.64
e) Likelihood ratio test statistic:
Λ = sup_{H₀}L(θ)/sup_{Θ}L(θ) = [(S²+(X̄ -8)²)/(S²)]^{-67/2}
-2log Λ = 67log(1 + (X̄ -8)²/S²) →ᵈ χ²₁ under H₀
f) Power function and sample size:
β(μ) = P(|T| > t_{0.005,66} | μ) where T ~ t_{66}(δ)
Noncentrality: δ = (μ-8)/(σ/√67)
For power 0.8: n ≥ (t_{0.005} + t_{0.2})²σ²/(μ-8)²
g) Fibonacci efficiency connection:
Growth rate 377 relates to Cramér-Rao lower bound efficiency
Fisher information: I(μ,σ²) = diag(67/σ², 67/(2σ⁴))
h) Modular arithmetic Fisher information (mod 1069):
Information matrix properties preserved under finite field operations
Efficiency bounds modified by characteristic 1069
This demonstrates advanced mathematical statistics with unique interdisciplinary
connections.
Note: This is a classic application of chain rule

Problem 4: Consider high-dimensional regression Y = Xβ + ε with p = 892 predictors (Note:

892 is the smallest integer ratio of a 13-digit number to its product of digits.). The design
matrix X has entries scaled by γ (Euler-Mascheroni) ≈ 0.5772, and β has 19 non-zero entries.
Regularization parameter relates to the 8-th catalan number: 1430. a) Analyze the LASSO
estimator β̂ ᵗᵁᵔᵔᴿ and establish oracle inequalities b) Study the Restricted Eigenvalue (RE)
condition and its implications for consistency c) Derive non-asymptotic bounds on prediction
and estimation error d) Compare with Ridge regression and Elastic Net in the context of β-
min conditions e) Analyze the Dantzig selector and its connections to linear programming f)
Study screening procedures and safe rules for computational efficiency g) Investigate

Page of
STAT 220 | September 6, 2025

connections between catalan sequences and regularization paths h) Apply random matrix
theory to analyze the sample covariance eigenvalue spectrum

Solution:
Solution for high-dimensional statistics (p = 892):
a) LASSO oracle inequalities:
||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 ≤ C_1 sλ_n^2/n with probability 1-δ
where s = 19, λ_n ~ √(log p/n), C_1 depends on RE constant
b) Restricted Eigenvalue condition:
RE(κ,s) = min_{||δ||_0≤s, ||δ_{S^c}||_1≤κ||δ_S||_1} ||δ||_2^2/||Xδ||_2^2
If RE(3,s) > 0, then LASSO achieves oracle rate O(s log p/n)
c) Non-asymptotic bounds:
Prediction: ||X(β̂ ᵗᵁᵔᵔᴿ - β₀)||_2^2 = O_p(s λ_n^2)
Estimation: ||β̂ ᵗᵁᵔᵔᴿ - β₀||_2^2 = O_p(s λ_n^2/φ_0^2) where φ_0 = RE constant
d) Ridge vs Elastic Net comparison:
Ridge: bias increases with signal strength, doesn't perform variable selection
Elastic Net: α||β||_1 + (1-α)||β||_2^2 bridges LASSO and Ridge
β-min condition: min_{j∈S}|β_j| ≥ Cλ_n ensures variable selection consistency
e) Dantzig selector:
min ||β||_1 s.t. ||X'(Y-Xβ)||_∞ ≤ λ_n
Equivalent to linear program, achieves same oracle rates as LASSO
f) Screening and safe rules:
Strong rules: discard j if |x_j'r^{old}| < 2λ_n - λ_n^{old}
Safe rules: KKT conditions guarantee β_j = 0 if |x_j'y| < λ_n
g) Catalan regularization paths:
λ sequence follows catalan growth: λ_k = 1430 × γ^k
Solution path complexity bounded by catalan recurrence properties
h) Random matrix theory with scaling 0.5772:
Sample covariance eigenvalues follow Marchenko-Pastur law when p/n → c
Phase transition at c = 1, bulk eigenvalues in [0.240, 1.760]
This demonstrates cutting-edge high-dimensional statistical theory.

Problem 5: Let X₁, X₂, ..., X_{69} be i.i.d. random variables from N(μ, σ²) arising in fibonacci
sequence analysis (Note: 69 is a value of n where n^{2} and n^{3} together contain each
digit once.). The population mean relates to ζ(2) = π²/6 ≈ 1.6449, and we test H₀: μ = 8 in
modular arithmetic base p = 2063. a) Derive the joint sufficient statistic (T₁, T₂) for (μ, σ²)

Page of
STAT 220 | September 6, 2025

using the exponential family factorization theorem b) Prove T₁ and T₂ form a complete
sufficient statistic using the uniqueness of moment generating functions c) Find the UMVUE
for μ and σ² using the Lehmann-Scheffé theorem and Rao-Blackwell theorem d) Construct a
95% confidence interval for μ incorporating ζ(2) = π²/6 scaling e) Derive the likelihood ratio
test statistic and its asymptotic distribution under local alternatives f) Analyze the power
function β(μ) and compute the sample size for achieving power 0.8 g) Study connections
between fibonacci growth rates and statistical efficiency bounds h) Investigate how modular
arithmetic affects the Fisher information matrix

Solution:
Solution for enhanced mathematical statistics with fibonacci context:
a) Joint sufficient statistic in exponential family:
L(μ, σ²) = (2πσ²)^{-69/2} exp(-∑(xᵢ - μ)²/(2σ²))
= exp([μ/σ²]∑xᵢ - [1/(2σ²)]∑xᵢ² - [nμ²/(2σ²)] - [n/2]log(2πσ²))
Natural parameters: η₁ = μ/σ², η₂ = -1/(2σ²)
Sufficient statistics: T₁ = ∑Xᵢ, T₂ = ∑Xᵢ² (jointly sufficient)
b) Completeness proof via MGF uniqueness:
E[g(T₁,T₂)] = 0 for all θ ⟹ g(T₁,T₂) = 0 a.s.
The exponential family has full rank, ensuring completeness
c) UMVUE by Lehmann-Scheffé theorem:
- μ̂ = X̄ = T₁/69 (unbiased, sufficient, complete ⟹ UMVUE)
- σ̂ ² = S² = (T₂ - 69X̄ ²)/(69-1) (unbiased for σ²)
Var(X̄ ) = σ²/69, Var(S²) = 2σ⁴/(69-1)
d) Enhanced confidence interval with ζ(2) = π²/6 scaling:
(X̄ - t_{0.025,68}·S/√69, X̄ + t_{0.025,68}·S/√69)
Adjusted for ζ(2) = π²/6 ≈ 1.6449: width × 1.64
e) Likelihood ratio test statistic:
Λ = sup_{H₀}L(θ)/sup_{Θ}L(θ) = [(S²+(X̄ -8)²)/(S²)]^{-69/2}
-2log Λ = 69log(1 + (X̄ -8)²/S²) →ᵈ χ²₁ under H₀
f) Power function and sample size:
β(μ) = P(|T| > t_{0.025,68} | μ) where T ~ t_{68}(δ)
Noncentrality: δ = (μ-8)/(σ/√69)
For power 0.8: n ≥ (t_{0.025} + t_{0.2})²σ²/(μ-8)²
g) Fibonacci efficiency connection:
Growth rate 8 relates to Cramér-Rao lower bound efficiency
Fisher information: I(μ,σ²) = diag(69/σ², 69/(2σ⁴))
h) Modular arithmetic Fisher information (mod 2063):
Information matrix properties preserved under finite field operations

Page of
STAT 220 | September 6, 2025

Efficiency bounds modified by characteristic 2063

This demonstrates advanced mathematical statistics with unique interdisciplinary
connections.

Problem 6: Consider extreme value analysis with block maxima of size 25 (Note: 25 is the
smallest square that can be written as a sum of 2 squares.). Threshold selection at 90%
quantile with shape parameter influenced by e ≈ 2.7183. Return levels relate to 9-th catalan
number: 4862. a) Derive the Generalized Extreme Value (GEV) distribution and its domain
of attraction b) Establish the Fisher-Tippett-Gnedenko theorem for block maxima
convergence c) Implement the Peaks-Over-Threshold (POT) method with Generalized Pareto
Distribution d) Estimate the shape, scale, and location parameters using Maximum
Likelihood e) Construct confidence intervals for high quantiles and return levels f) Apply the
Delta method for uncertainty quantification in return level estimation g) Study the
connection between catalan extremal indices and clustering h) Analyze how e affects tail
dependence in multivariate extremes

Solution:
Solution for extreme value theory:
a) GEV distribution and domain of attraction:
F_ξ(x) = exp(-(1+ξ(x-μ)/σ)_+^{-1/ξ}) for ξ ≠ 0
Domain of attraction: Type I (Gumbel, ξ=0), Type II (Fréchet, ξ>0), Type III (Weibull, ξ<0)
b) Fisher-Tippett-Gnedenko theorem:
If M_n = max(X_1,...,X_n) and P(a_n^{-1}(M_n-b_n) ≤ x) → G(x), then G is GEV
Normalizing constants: a_n = F^{-1}(1-1/n) - F^{-1}(1-2/n), b_n = F^{-1}(1-1/n)
c) POT method with GPD:
For X > u (threshold 0.9 quantile): F_u(x) = 1 - (1+ξ x/σ_u)_+^{-1/ξ}
Scale parameter: σ_u = σ + ξ(u-μ), shape ξ unchanged
d) MLE estimation:
Log-likelihood: ℓ(μ,σ,ξ) = -25logσ - (1+1/ξ)∑log(1+ξ(x_i-μ)/σ)
Profile likelihood for ξ, then conditional MLE for μ,σ
e) Confidence intervals for quantiles:
x_p = μ + (σ/ξ)[(-log(1-p))^{-ξ} - 1] for return level
Delta method: Var(x_p) ≈ ∇x_p' × I(θ)^{-1} × ∇x_p
f) Delta method uncertainty:
∇x_p = [∂x_p/∂μ, ∂x_p/∂σ, ∂x_p/∂ξ]'
Asymptotic normality: (x̂ _p - x_p)/√Var(x̂_p) →ᵈ N(0,1)
g) Catalan extremal index:
θ = lim P(M_n ≤ u_n | X_1 > u_n) where u_n = 4862/n quantile

Page of
STAT 220 | September 6, 2025

Clustering measured by runs estimator: θ̂ = (number of clusters)/(number of exceedances)

h) Multivariate extremes with e:
Tail dependence: λ_U = lim P(Y > F_Y^{-1}(t) | X > F_X^{-1}(t))
Copula approach with scaling factor 2.7183 in tail regions
This demonstrates advanced extreme value theory with practical applications.

Problem 7: Consider a Bayesian analysis with normal likelihood and conjugate prior. a)
Specify the likelihood function L(θ|x₁, ..., x_{33}) for the normal model b) Choose an
appropriate conjugate prior π(θ) and justify your choice c) Derive the posterior distribution
π(θ|data) using Bayes' theorem d) Find the Bayesian point estimators: posterior mean,
median, and mode e) Construct a 95% credible interval for θ f) Compare with the frequentist
approach: MLE and confidence interval g) Perform posterior predictive checking and model
assessment h) Discuss the choice of loss function and its impact on Bayesian inference

Solution:
Solution for normal likelihood with conjugate prior:
a) Likelihood function:
L(μ,σ²|x) = (2πσ²)^{-n/2} exp(-∑(xᵢ-μ)²/(2σ²))
b) Prior specification:
Conjugate prior for normal: natural conjugate family
c) Posterior distribution:
π(θ|data) ∝ L(θ|data) × π(θ)
Posterior derivation for conjugate prior with normal likelihood
d) Bayesian point estimators:
Posterior mean, median, and mode for the given prior-likelihood combination
e) Credible interval:
95% credible interval using posterior quantiles
f) Frequentist comparison:
Comparison with MLE and confidence intervals for normal model
g) Posterior predictive checking:
Posterior predictive distribution and model checking procedures
h) Loss function considerations:
Analysis of squared error, absolute error, and 0-1 loss functions
This demonstrates advanced Bayesian statistical theory and computational methods.

Page of
STAT 220 | September 6, 2025

Problem 8: Let X₁, X₂, ..., X_{48} be i.i.d. observations from N_5(μ, Σ) (Note: 5 is the second
Sierpinski number of the first kind, and can be written as S2=(22)+1.). The covariance
structure involves the matrix \begin{pmatrix}2 & -1 & -3 & 0 & -1 \\ -1 & 2 & 2 & -1 & 0 \\ -3
& 2 & 0 & -3 & 0 \\ 0 & -1 & -3 & -4 & 1 \\ -1 & 0 & 0 & 1 & 5\end{pmatrix} scaled by √2 ≈
1.4142. Analysis parameters relate to the 13-th lucas number: 521. a) Derive the MLE for μ
and Σ using matrix differential calculus and the Wishart distribution b) Establish the joint
distribution (μ̂ , Σ̂) and their asymptotic properties under regularity conditions c) Construct
Hotelling's T² statistic and Roy's union-intersection test for H₀: μ = μ₀ d) Derive the exact and
asymptotic distributions of T², including non-central F distribution e) Construct simultaneous
confidence intervals using Bonferroni and Scheffé methods f) Test sphericity H₀: Σ = σ²I using
likelihood ratio and implement Mauchly's test g) Perform PCA with eigenvalue
decomposition and analyze explained variance ratios h) Apply canonical correlation analysis
between subvectors of dimension 2 i) Study connections between lucas eigenvalue
distributions and random matrix theory

Solution:
Solution for 5-dimensional enhanced multivariate analysis:
a) MLE derivation using matrix calculus:
L(μ,Σ) = (2π)^{-485/2}|Σ|^{-48/2}exp(-1/2 ∑(xᵢ-μ)'Σ^{-1}(xᵢ-μ))
∂log L/∂μ = Σ^{-1}∑(xᵢ-μ) = 0 ⟹ μ̂ = X̄
∂log L/∂Σ^{-1} = 48/2 Σ - 1/2 ∑(xᵢ-μ)(xᵢ-μ)' = 0 ⟹ Σ̂ = S
b) Joint distribution and asymptotics:
- X̄ ~ N_5(μ, Σ/48) exactly
- 48Σ̂ ~ W_5(48-1, Σ) (Wishart distribution)
- Asymptotically: √48(μ̂ -μ) →ᵈ N_5(0, Σ)
- vec(Σ̂-Σ) →ᵈ N(0, 2Σ⊗Σ/48) by delta method
c) Test statistics and union-intersection principle:
Hotelling's T² = 48(X̄ -μ₀)'S^{-1}(X̄ -μ₀)
Roy's test: max eigenvalue of S^{-1}A where A = (X̄ -μ₀)(X̄ -μ₀)'
d) Exact and asymptotic distributions:
Under H₀: T²((48-5)/(5(48-1))) ~ F_{5,48-5}
Under H₁: T² ~ (5(48-1)/(48-5))F_{5,48-5}(δ) with δ = 48(μ-μ₀)'Σ^{-1}(μ-μ₀)
e) Simultaneous confidence intervals:
Bonferroni: P(∩|μᵢ-μ₀ᵢ| ≤ t_{α/(25),48-1}sᵢᵢ^{1/2}/√48) ≥ 1-α
Scheffé: P(|a'(μ-μ₀)| ≤ √((5(48-1)F_{5,48-5,α})/(48-5))s_{aa}/√48 ∀a) = 1-α
f) Sphericity testing:
LRT: Λ = |S|^{48/2}/(∑sᵢᵢ/5)^{548/2}
-2logΛ →ᵈ χ²_{5(5+1)/2-1} under H₀
Mauchly's test: W = |S|/(∑sᵢᵢ/5)^5 with Box's correction
g) PCA with √2 scaling:

Page of
STAT 220 | September 6, 2025

S = PΛP' where Λ = diag(λ₁,...,λ_{5}), λ₁ ≥ ... ≥ λ_{5}

Proportion of variance: λᵢ/∑λⱼ, cumulative: ∑ᵼₖ₁λⱼ/∑ⱼλⱼ
Adjusted eigenvalues: λᵢ × 1.4142 for enhanced interpretation
h) Canonical correlation analysis:
Partition X = [X₁' X₂']' with X₁ ∈ ℝ^2, X₂ ∈ ℝ^3
Canonical correlations from eigenvalues of Σ₁₁^{-1}Σ₁₂Σ₂₂^{-1}Σ₂₁
i) Lucas random matrix connection:
Eigenvalue spacings follow lucas growth patterns in large 5 limit
Tracy-Widom distribution for largest eigenvalue when 5/48 → c < 1
This demonstrates advanced multivariate statistics with random matrix theory connections.
Note: I verified my answer by substituting back into the original equation

Problem 9: Consider the ARIMA(1, 1, 2) model: (1 - 0.01BB)(1 - B)Xₜ = (1 + 0.24B + -

0.19BB)εₜ where B is the backshift operator and εₜ ~ WN(0, σ²). a) Verify the stationarity
conditions for this model b) Derive the autocovariance function γ(h) for h = 0, 1, 2, ... c) Find
the autocorrelation function ρ(h) and partial autocorrelation function d) Estimate the model
parameters using maximum likelihood estimation e) Perform diagnostic checking using
residual analysis and Ljung-Box test f) Forecast Xₜ₊ₕ for h = 1, 2, 3 with prediction intervals g)
Compare with alternative models using AIC, BIC, and cross-validation

Solution:
Solution for ARIMA(1, 1, 2) model:
a) Stationarity conditions:
Stationarity conditions for AR(1) component: roots outside unit circle
b) Autocovariance function:
Autocovariance function for ARIMA(1,1,2) model
c) ACF and PACF:
ACF and PACF analysis for model identification
d) Maximum likelihood estimation:
MLE estimation using conditional likelihood for ARIMA parameters
e) Diagnostic checking:
Residual analysis, normality tests, and independence testing
f) Forecasting:
Multi-step ahead forecasting with prediction intervals
g) Model comparison:
Model selection using information criteria and cross-validation
This demonstrates advanced time series analysis including ARIMA modeling and forecasting.

Page of
STAT 220 | September 6, 2025

Note: According to the formula from section 10.6

Problem 10: Consider the ARIMA(3, 1, 2) model: (1 - -0.26B + -0.25B + -0.10BB)(1 - B)X ₜ =
(1 + -0.23B + -0.08BB)εₜ where B is the backshift operator and ε ₜ ~ WN(0, σ²). a) Verify the
stationarity conditions for this model b) Derive the autocovariance function γ(h) for h = 0, 1,
2, ... c) Find the autocorrelation function ρ(h) and partial autocorrelation function d)
Estimate the model parameters using maximum likelihood estimation e) Perform diagnostic
checking using residual analysis and Ljung-Box test f) Forecast Xₜ₊ₕ for h = 1, 2, 3 with
prediction intervals g) Compare with alternative models using AIC, BIC, and cross-validation

Solution:
Solution for ARIMA(3, 1, 2) model:
a) Stationarity conditions:
Stationarity conditions for AR(3) component: roots outside unit circle
b) Autocovariance function:
Autocovariance function for ARIMA(3,1,2) model
c) ACF and PACF:
ACF and PACF analysis for model identification
d) Maximum likelihood estimation:
MLE estimation using conditional likelihood for ARIMA parameters
e) Diagnostic checking:
Residual analysis, normality tests, and independence testing
f) Forecasting:
Multi-step ahead forecasting with prediction intervals
g) Model comparison:
Model selection using information criteria and cross-validation
This demonstrates advanced time series analysis including ARIMA modeling and forecasting.
Note: The textbook example on page 157 shows a similar approach

Problem 11: Consider extreme value analysis with block maxima of size 51 (Note: 51 is the
14th discrete biprime and the 5th in the {3.q} semiprime family having the prime factors
(3.17).). Threshold selection at 92% quantile with shape parameter influenced by e ≈ 2.7183.
Return levels relate to 12-th lucas number: 322. a) Derive the Generalized Extreme Value
(GEV) distribution and its domain of attraction b) Establish the Fisher-Tippett-Gnedenko
theorem for block maxima convergence c) Implement the Peaks-Over-Threshold (POT)
method with Generalized Pareto Distribution d) Estimate the shape, scale, and location
parameters using Maximum Likelihood e) Construct confidence intervals for high quantiles
and return levels f) Apply the Delta method for uncertainty quantification in return level

Page of
STAT 220 | September 6, 2025

estimation g) Study the connection between lucas extremal indices and clustering h) Analyze
how e affects tail dependence in multivariate extremes

Solution:
Solution for extreme value theory:
a) GEV distribution and domain of attraction:
F_ξ(x) = exp(-(1+ξ(x-μ)/σ)_+^{-1/ξ}) for ξ ≠ 0
Domain of attraction: Type I (Gumbel, ξ=0), Type II (Fréchet, ξ>0), Type III (Weibull, ξ<0)
b) Fisher-Tippett-Gnedenko theorem:
If M_n = max(X_1,...,X_n) and P(a_n^{-1}(M_n-b_n) ≤ x) → G(x), then G is GEV
Normalizing constants: a_n = F^{-1}(1-1/n) - F^{-1}(1-2/n), b_n = F^{-1}(1-1/n)
c) POT method with GPD:
For X > u (threshold 0.92 quantile): F_u(x) = 1 - (1+ξ x/σ_u)_+^{-1/ξ}
Scale parameter: σ_u = σ + ξ(u-μ), shape ξ unchanged
d) MLE estimation:
Log-likelihood: ℓ(μ,σ,ξ) = -51logσ - (1+1/ξ)∑log(1+ξ(x_i-μ)/σ)
Profile likelihood for ξ, then conditional MLE for μ,σ
e) Confidence intervals for quantiles:
x_p = μ + (σ/ξ)[(-log(1-p))^{-ξ} - 1] for return level
Delta method: Var(x_p) ≈ ∇x_p' × I(θ)^{-1} × ∇x_p
f) Delta method uncertainty:
∇x_p = [∂x_p/∂μ, ∂x_p/∂σ, ∂x_p/∂ξ]'
Asymptotic normality: (x̂ _p - x_p)/√Var(x̂_p) →ᵈ N(0,1)
g) Lucas extremal index:
θ = lim P(M_n ≤ u_n | X_1 > u_n) where u_n = 322/n quantile
Clustering measured by runs estimator: θ̂ = (number of clusters)/(number of exceedances)
h) Multivariate extremes with e:
Tail dependence: λ_U = lim P(Y > F_Y^{-1}(t) | X > F_X^{-1}(t))
Copula approach with scaling factor 2.7183 in tail regions
This demonstrates advanced extreme value theory with practical applications.

Problem 12: Let X₁, X₂, ..., X_{48} be i.i.d. observations from N_4(μ, Σ) (Note: 4 is the second
square number, the second centered triangular number.). The covariance structure involves
the matrix \begin{pmatrix}-3 & 1 & 2 & 1 \\ 1 & -5 & 0 & -3 \\ 2 & 0 & 1 & -3 \\ 1 & -3 & -3 &
-5\end{pmatrix} scaled by √2 ≈ 1.4142. Analysis parameters relate to the 9-th fibonacci
number: 34. a) Derive the MLE for μ and Σ using matrix differential calculus and the Wishart
distribution b) Establish the joint distribution (μ̂ , Σ̂) and their asymptotic properties under

Page of
STAT 220 | September 6, 2025

regularity conditions c) Construct Hotelling's T² statistic and Roy's union-intersection test for
H₀: μ = μ₀ d) Derive the exact and asymptotic distributions of T², including non-central F
distribution e) Construct simultaneous confidence intervals using Bonferroni and Scheffé
methods f) Test sphericity H₀: Σ = σ²I using likelihood ratio and implement Mauchly's test g)
Perform PCA with eigenvalue decomposition and analyze explained variance ratios h) Apply
canonical correlation analysis between subvectors of dimension 2 i) Study connections
between fibonacci eigenvalue distributions and random matrix theory

Solution:
Solution for 4-dimensional enhanced multivariate analysis:
a) MLE derivation using matrix calculus:
L(μ,Σ) = (2π)^{-484/2}|Σ|^{-48/2}exp(-1/2 ∑(xᵢ-μ)'Σ^{-1}(xᵢ-μ))
∂log L/∂μ = Σ^{-1}∑(xᵢ-μ) = 0 ⟹ μ̂ = X̄
∂log L/∂Σ^{-1} = 48/2 Σ - 1/2 ∑(xᵢ-μ)(xᵢ-μ)' = 0 ⟹ Σ̂ = S
b) Joint distribution and asymptotics:
- X̄ ~ N_4(μ, Σ/48) exactly
- 48Σ̂ ~ W_4(48-1, Σ) (Wishart distribution)
- Asymptotically: √48(μ̂ -μ) →ᵈ N_4(0, Σ)
- vec(Σ̂-Σ) →ᵈ N(0, 2Σ⊗Σ/48) by delta method
c) Test statistics and union-intersection principle:
Hotelling's T² = 48(X̄ -μ₀)'S^{-1}(X̄ -μ₀)
Roy's test: max eigenvalue of S^{-1}A where A = (X̄ -μ₀)(X̄ -μ₀)'
d) Exact and asymptotic distributions:
Under H₀: T²((48-4)/(4(48-1))) ~ F_{4,48-4}
Under H₁: T² ~ (4(48-1)/(48-4))F_{4,48-4}(δ) with δ = 48(μ-μ₀)'Σ^{-1}(μ-μ₀)
e) Simultaneous confidence intervals:
Bonferroni: P(∩|μᵢ-μ₀ᵢ| ≤ t_{α/(24),48-1}sᵢᵢ^{1/2}/√48) ≥ 1-α
Scheffé: P(|a'(μ-μ₀)| ≤ √((4(48-1)F_{4,48-4,α})/(48-4))s_{aa}/√48 ∀a) = 1-α
f) Sphericity testing:
LRT: Λ = |S|^{48/2}/(∑sᵢᵢ/4)^{448/2}
-2logΛ →ᵈ χ²_{4(4+1)/2-1} under H₀
Mauchly's test: W = |S|/(∑sᵢᵢ/4)^4 with Box's correction
g) PCA with √2 scaling:
S = PΛP' where Λ = diag(λ₁,...,λ_{4}), λ₁ ≥ ... ≥ λ_{4}
Proportion of variance: λᵢ/∑λⱼ, cumulative: ∑ᵼₖ₁λⱼ/∑ⱼλⱼ
Adjusted eigenvalues: λᵢ × 1.4142 for enhanced interpretation
h) Canonical correlation analysis:
Partition X = [X₁' X₂']' with X₁ ∈ ℝ^2, X₂ ∈ ℝ^2

Page of
STAT 220 | September 6, 2025

Canonical correlations from eigenvalues of Σ₁₁^{-1}Σ₁₂Σ₂₂^{-1}Σ₂₁

i) Fibonacci random matrix connection:
Eigenvalue spacings follow fibonacci growth patterns in large 4 limit
Tracy-Widom distribution for largest eigenvalue when 4/48 → c < 1
This demonstrates advanced multivariate statistics with random matrix theory connections.

Page of

You might also like

MATH 393 Statistics Solutions 2541
No ratings yet
MATH 393 Statistics Solutions 2541
14 pages
STAT 320 Statistics Spring2025 Solutions 3041
No ratings yet
STAT 320 Statistics Spring2025 Solutions 3041
14 pages
STA 337I Statistics Solutions 7642
No ratings yet
STA 337I Statistics Solutions 7642
14 pages
MATH 365 Statistics Solutions 1561
No ratings yet
MATH 365 Statistics Solutions 1561
14 pages
MATH 283Y Statistics Solutions 7309
No ratings yet
MATH 283Y Statistics Solutions 7309
15 pages
MATH 252 Statistics Solutions 7267
No ratings yet
MATH 252 Statistics Solutions 7267
14 pages
MATH 237 Statistics Solutions 1005
No ratings yet
MATH 237 Statistics Solutions 1005
15 pages
STA 221 Statistics Solutions 7045
No ratings yet
STA 221 Statistics Solutions 7045
15 pages
MATH 258J Statistics Solutions 3530
No ratings yet
MATH 258J Statistics Solutions 3530
15 pages
MATH 220 Statistics Solutions 8303
No ratings yet
MATH 220 Statistics Solutions 8303
14 pages
STAT 270 Statistics
No ratings yet
STAT 270 Statistics
14 pages
MATH 392M Statistics Solutions 3275
No ratings yet
MATH 392M Statistics Solutions 3275
15 pages
STA 351 Statistics Solutions 5527
No ratings yet
STA 351 Statistics Solutions 5527
14 pages
MATH 334Y Statistics Solutions 5456
No ratings yet
MATH 334Y Statistics Solutions 5456
15 pages
STAT 397Q Statistics Solutions 8046
No ratings yet
STAT 397Q Statistics Solutions 8046
15 pages
STA 277 Statistics Summer2023 Solutions 7075
No ratings yet
STA 277 Statistics Summer2023 Solutions 7075
13 pages
HMWK 4
No ratings yet
HMWK 4
5 pages
MATH 256 Statistics Solutions 7605
No ratings yet
MATH 256 Statistics Solutions 7605
15 pages
STAT 328 Statistics Solutions 3808
No ratings yet
STAT 328 Statistics Solutions 3808
14 pages
Sufficient Statistics & Exponential Families
No ratings yet
Sufficient Statistics & Exponential Families
8 pages
MATH 340 Statistics Solutions 5169
No ratings yet
MATH 340 Statistics Solutions 5169
14 pages
Y # - Y Matrix Setup
No ratings yet
Y # - Y Matrix Setup
11 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
hw1 PDF
No ratings yet
hw1 PDF
3 pages
Mew
No ratings yet
Mew
13 pages
GU4291 GR5291 Homework1 23079925
No ratings yet
GU4291 GR5291 Homework1 23079925
3 pages
Master's Exam: Statistics & Probability
No ratings yet
Master's Exam: Statistics & Probability
11 pages
STAT 709 Midterm Fall 2023
No ratings yet
STAT 709 Midterm Fall 2023
2 pages
Siqualsoln
No ratings yet
Siqualsoln
80 pages
MLB Assignment 7 Final
No ratings yet
MLB Assignment 7 Final
16 pages
Col726 A2
No ratings yet
Col726 A2
5 pages
CS229 Practice Midterm Overview
No ratings yet
CS229 Practice Midterm Overview
4 pages
Math 3423 Past Paper
No ratings yet
Math 3423 Past Paper
6 pages
DAMA 50 Exam Final 22-23
No ratings yet
DAMA 50 Exam Final 22-23
11 pages
Sol3 2015
No ratings yet
Sol3 2015
8 pages
ECE 534 Exam II Solutions
No ratings yet
ECE 534 Exam II Solutions
2 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
CMU Statistical ML Homework Solutions
No ratings yet
CMU Statistical ML Homework Solutions
16 pages
Engineering Exam Analysis
No ratings yet
Engineering Exam Analysis
7 pages
Chapter 4
No ratings yet
Chapter 4
7 pages
Homework 1
No ratings yet
Homework 1
8 pages
Lecture 21
No ratings yet
Lecture 21
9 pages
Engineering Statistics Solutions
No ratings yet
Engineering Statistics Solutions
18 pages
Wooldridge 6e AppE IM
No ratings yet
Wooldridge 6e AppE IM
5 pages
STAT4027 Assignment 1: Lewis Hastie
No ratings yet
STAT4027 Assignment 1: Lewis Hastie
26 pages
QB of Modern Mathematics
No ratings yet
QB of Modern Mathematics
6 pages
Statistical Inference Exam Problems and Solutions
No ratings yet
Statistical Inference Exam Problems and Solutions
61 pages
STA360/601 Midterm Solutions Guide
No ratings yet
STA360/601 Midterm Solutions Guide
6 pages
BSA 301 Final Exam Solutions
No ratings yet
BSA 301 Final Exam Solutions
7 pages
MidTerm EE6180-2020
No ratings yet
MidTerm EE6180-2020
2 pages
Regression Analysis for Economists
No ratings yet
Regression Analysis for Economists
13 pages
Mtech Pe 1 Sem Advanced Engineering Mathematics 201
No ratings yet
Mtech Pe 1 Sem Advanced Engineering Mathematics 201
4 pages
GPA Prediction from ACT Scores
No ratings yet
GPA Prediction from ACT Scores
8 pages
Nachiket New
No ratings yet
Nachiket New
13 pages
AI HL Revision Worksheet-Statistics and Probability
No ratings yet
AI HL Revision Worksheet-Statistics and Probability
30 pages
2022 Final
No ratings yet
2022 Final
25 pages
MAT AI HL - Paper 3 Practice-1
No ratings yet
MAT AI HL - Paper 3 Practice-1
158 pages
Unit 3 1 Aiml Notes
No ratings yet
Unit 3 1 Aiml Notes
43 pages
Agree Ii
No ratings yet
Agree Ii
15 pages
Machine Learning for Professionals
No ratings yet
Machine Learning for Professionals
32 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Statistical Inference for Variances
No ratings yet
Statistical Inference for Variances
50 pages
Excel Descriptive Stats & Regression Analysis
No ratings yet
Excel Descriptive Stats & Regression Analysis
5 pages
Afroz 21 Assignment-2
No ratings yet
Afroz 21 Assignment-2
19 pages
DataAnalysis1 Lectures12and13
No ratings yet
DataAnalysis1 Lectures12and13
31 pages
OLS Regression Analysis Guide
No ratings yet
OLS Regression Analysis Guide
32 pages
Exploring Tools To Find Relationships Between Vari
No ratings yet
Exploring Tools To Find Relationships Between Vari
2 pages
Statistics for Students
No ratings yet
Statistics for Students
30 pages
System Dynamics Model Validation Case Study
No ratings yet
System Dynamics Model Validation Case Study
15 pages
Project Case Description:: Vintage
No ratings yet
Project Case Description:: Vintage
2 pages
Malikhaing Kamay - The Effects of Social Media Marketing On Sales
No ratings yet
Malikhaing Kamay - The Effects of Social Media Marketing On Sales
5 pages
SSRN 4824172
No ratings yet
SSRN 4824172
29 pages
Chapter Six Handout
No ratings yet
Chapter Six Handout
57 pages
Ms Data Science Deakin Programme Deakin
No ratings yet
Ms Data Science Deakin Programme Deakin
20 pages
Statistics for Data Science Overview
No ratings yet
Statistics for Data Science Overview
24 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
1.3 Poorter Et Al. (2010)
No ratings yet
1.3 Poorter Et Al. (2010)
12 pages
STAT FYUGP Syllabus Mar 19 (Major+Minor)
No ratings yet
STAT FYUGP Syllabus Mar 19 (Major+Minor)
77 pages
Bca Semester-Iv 2024-25
No ratings yet
Bca Semester-Iv 2024-25
14 pages
Student Score Prediction System Based On Studies: Jay Patel D20DIT084, Nishchal Thakkar D20DIT088
No ratings yet
Student Score Prediction System Based On Studies: Jay Patel D20DIT084, Nishchal Thakkar D20DIT088
7 pages
Predictive Analytics Descriptive Answer Key
No ratings yet
Predictive Analytics Descriptive Answer Key
2 pages
2024 - Data Analytics Book
No ratings yet
2024 - Data Analytics Book
193 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Literature Review On Family Centered Maternity Care
100% (5)
Literature Review On Family Centered Maternity Care
21 pages
Qbus2810 Notes PDF
100% (1)
Qbus2810 Notes PDF
58 pages
ISC Artificial Intelligence
No ratings yet
ISC Artificial Intelligence
13 pages
Correlation and Simple Linear Regression: Department of Health Informatics BINF 5210 Spring 2011
No ratings yet
Correlation and Simple Linear Regression: Department of Health Informatics BINF 5210 Spring 2011
27 pages