Active Projects

Latent Unknown Clustering with Integrated Data (LUCID) is designed to leverage the omic data for identifying clusters of individuals with differences in the outcome and with similar profiles of risk factors and biomarkers. Rather than perform the analysis in a staged approach, latent cluster estimation and corresponding effect estimation is performed jointly. The model can be used to better identify disease associations or predict an individual’s potential risk, while also suggesting possible biological mechanisms defined by a combination of all factors.

Joint Analysis of Marginal summary statistics (JAM) unites both ideas of mediation and latent clustering using summary statistics from multiple omic studies and develops a causal inference framework to identify mediating effects of biologically relevant factors on outcomes. Using only summary statistics, this approach is innovative by going well beyond current methods to characterize pathways and corresponding intermediates and SNPs contributing to those associations.

xtune: Regularized Regression with Feature-Specific Penalties Integrating External Information extends standard penalized regression (Lasso, Ridge, and Elastic-net) to allow feature-specific shrinkage based on external information with the goal of achieving a better prediction accuracy and variable selection. Examples of external information include the grouping of predictors, prior knowledge of biological importance, external p-values, function annotations, etc. The choice of multiple tuning parameters is done using an Empirical Bayes approach. A majorization-minimization algorithm is employed for implementation.

Completed Projects

PriorityPruner is a software program which can prune a list of SNPs that are in high linkage disequilibrium (LD) with other SNPs in the list, while preferentially keeping/selecting SNPs of higher priority (e.g., the most significant SNPs in a genome-wide association study).

Snagger is an extension to the existing open-source software, Haploview, which uses pairwise rlinkage disequilibrium between single nucleotide polymorphisms (SNPs) to select tagSNPs. 

Bayesian Variable Selection (BVS) focus on analyzing case-control association studies involving a group of genetic variants. In particular, we are interested in modeling the outcome variable as a function of multivariate genetic profile using Bayesian model uncertainty and variable selection techniques.