STAT 32950 Multivariate Statistical Analysis: Applications and Techniques
Description
This course introduces statistical methods for analyzing, modeling, and interpreting multivariate and high-dimensional data. The focus is on understanding dependence structure, reducing dimensionality, uncovering latent patterns, and building predictive models. Topics include principal component analysis, factor models, canonical correlation analysis, clustering and mixture models, regularized regression (ridge and lasso), sparse methods, covariance estimation, and tree-based methods such as random forests.
Emphasis is placed on geometric intuition, computational implementation, and practical modeling considerations rather than classical distribution theory. Students will gain experience applying these methods to real datasets and comparing linear and nonlinear approaches in modern data analysis settings.
Suggested reading:
Applied multivariate statistical analysis by Johnson, R. and Wichern, D. (2007).
An Introduction to Statistical Learning with Applications in R by James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021).
Grading
- Homework assignments: 35%
- There will be 3 assignments in total.
- Late homework will not be accepted for grading (medical emergencies excepted with proof).
- Homework will be submitted through Gradescope and is due at 11:59pm the due date.
- Midterm: 25%
- Final: 30%
- Group project: 10%