Statistical Inference for Cell Type Deconvolution

Feb 25, 2026

Abstract

Integrating heterogeneous datasets across different measurement platforms poses fundamental challenges for statistical inference. An important example is cell type deconvolution, where cell type proportions in bulk RNA-seq data are estimated using reference single-cell data from different sources, leading to platform-specific scaling effects, measurement noise, and biological heterogeneity. Existing methods often treat estimated proportions as observed in downstream analyses, potentially compromising validity when comparing multiple individuals. We introduce MEAD, a statistical framework for estimation and inference in deconvolution with externally approximated design matrices. We establish necessary and sufficient conditions for identifiability under arbitrary gene-specific cross-platform scaling differences and develop valid inferential procedures for both individual-level proportions and comparisons across individuals, accounting for gene–gene correlation and shared estimation uncertainty. Simulations and real-data analyses demonstrate competitive estimation accuracy and reliable statistical inference.

Type

Journal article

Publication

The Journal of the Royal Statistical Society, Series B, accepted

Statistical Inference for Cell Type Deconvolution

Abstract

Lin Gui

Ph.D. student

Jingshu Wang

Assistant Professor in Statistics