Statistical Inference for Cell Type Deconvolution

Abstract

Integrating heterogeneous datasets across different measurement platforms poses fundamental challenges for statistical inference. An important example is cell type deconvolution, where cell type proportions in bulk RNA-seq data are estimated using reference single-cell data from different sources, leading to platform-specific scaling effects, measurement noise, and biological heterogeneity. Existing methods often treat estimated proportions as observed in downstream analyses, potentially compromising validity when comparing multiple individuals. We introduce MEAD, a statistical framework for estimation and inference in deconvolution with externally approximated design matrices. We establish necessary and sufficient conditions for identifiability under arbitrary gene-specific cross-platform scaling differences and develop valid inferential procedures for both individual-level proportions and comparisons across individuals, accounting for gene–gene correlation and shared estimation uncertainty. Simulations and real-data analyses demonstrate competitive estimation accuracy and reliable statistical inference.

Publication
The Journal of the Royal Statistical Society, Series B, accepted
Lin Gui
Lin Gui
Ph.D. student
Jingshu Wang
Jingshu Wang
Assistant Professor in Statistics