Latent Unknown Clustering with Integrated Data

Estimate latent unknown clusters using multiple omics data and phenotypic traits in individual-level data. Faced with high-dimensional data, alternative approaches attempt to explore all possible direct and indirect effects of the omic features within a mediation framework. Across multiple data types of high dimension, interpretation can unfortunately be challenging with so many resultant estimates of effect. Potential solutions include data reduction techniques, such as principal components or feature selection algorithms. However, for these approaches it is often unclear how to interpret or combine multiple types of omics data or how to select features from the high-dimensional integrated data. We are developing a statistical approach that attempts to unify these various ideas into an innovative latent variable analysis to estimate subgroups of individuals with differential association to the outcome and profiles characterized with multiple omic factors (LUCID). The goal is to leverage the omic data to identify clusters of individuals with differences in the outcome and with similar profiles of risk factors and biomarkers. Rather than perform the analysis in a staged approach, latent cluster estimation and corresponding effect estimation is performed jointly. The model can be used to better identify disease associations or predict an individual’s potential risk, while also suggesting possible biological mechanisms defined by a combination of all factors.

Selected Related Publications

A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits
Bioinformatics. 2020 Feb 01; 36(3):842-850. Read More


Yinqi Zhao (yinqi at usc dot edu), Jingxuan He (hejingxu at usc dot edu)