Breast Cancer

& Genetic Epidemiology

Breast Cancer genetic study in African-ancestry populations (PIs Zheng and Haiman: R01CA202981):

Breast cancer is the most commonly diagnosed malignancy in the United States. The age-adjusted mortality rate of this cancer is more than 40% higher in African Americans (AAs) than in whites for reasons poorly understood. Since 2007, genome-wide association studies (GWAS) conducted in Asian and European descendants have identified nearly 100 susceptibility loci for this cancer. However, only a few of the initially identified risk variants can be directly replicated in AAs due to a small sample sizein previous studies and racial differences in genetic architectures and genetic/environmental modifiers. GWAS are often not equipped to study structural variants and are inefficient for capturing low-frequency variants. These variants, although virtually uninvestigated to date, are believed to contribute substantially to the heritability of breast cancer and other complex traits, particularly in African-ancestry populations. Furthermore, compared with Asian- and European-ancestry populations, the African-ancestry genome is much more heterogeneous and thus more informative, particularly as we expand the scope of genetic studies from common to less-common variants using next-generation sequencing technology. Herein, we propose a large consortium study in AAs to systematically search the whole genome to discover novel genetic susceptibility factors for breast cancer and further evaluate the influence of germline risk variants on breast cancer biology. Nearly 20,000 AA breast cancer patients and an equal number of controls will be included in this study. In Stage 1, we propose to sequence the whole genome for 1,200 breast cancer cases and 600 controls for association analyses. We will then use these sequencing data, along with data from other sources, to build a novel, comprehensive reference panel for imputation and meta-analysis of approximately 6,300 cases and 6,300 controls genotyped in four previous GWAS conducted in African-ancestry populations. We will utilize publically available genetic data, including functional genomic data, to enhance the abilit of the two aforementioned analyses to identify promising breast cancer susceptibility genes and variants for replication. In Stage 2, we will replicate approximately 60,000 promising variants in 5,500 cases and 5,500 controls. Genes/variants which show a promising association in Stage 2 will be evaluated further in Stage 3, including two additional stages (3A and 3B) in approximately 7,500 cases and 7,500 controls. Finally, we will use gene expression signatures to evaluate how germline risk variants identified in this study and previous studies affect the major signaling pathways of breast cancer. This proposed study will generate critically needed data in AAs to improve the understanding of the genetics, biology, and etiology of breast cancer.

Genetic Risk Prediction of Breast Cancer for African Americans (PI Huo and Haiman: R01CA228198):

More than 180 susceptibility loci for breast cancer have been identified by genome-wide association studies (GWAS), mainly in Caucasian populations. However, many of these risk variants cannot be directly replicated in women of African ancestry, suggesting that causal variants are yet to be identified. Polygenic risk scores (PRS), which aggregate common genetic variants identified by GWAS, have been developed to predict genetic risk of breast cancer for Caucasian women, but there is no validated PRS for African American women. The linkage equilibrium in African ancestry populations is much less extensive than in Caucasian and Asian populations, which makes African ancestry population the ideal population to find causative variants after localizing a breast cancer susceptibility locus. Therefore, we propose a comprehensive analytical study that leverages several types of existing genetic datasets for breast cancer available to us and in public domains to address three specific aims. First, we aim to conduct cross-ethnic fine-mapping analysis for narrowing down casual variant candidate lists in 180+ loci of breast cancer. We will compile and harmonize genetic data from studies of breast cancer in women of African ancestry, including 7,525 cases and 6,207 controls, and leverage the association results from Caucasians (>122,000 cases and >105,000 controls), East Asians (>14,000 cases and >13,000 controls), and Latinos (4,400 cases and 7,500 controls). We will use a Bayesian statistical method to directly incorporate multiple functional annotations for the top variants in each locus. Second, we aim to develop breast cancer polygenic risk score models in African Americans by leveraging functional annotations, linkage disequilibrium, and gene expression data. Several PRSs will be developed for overall breast cancer risk and by estrogen receptor, cross-validated internally, and validated with external studies. Third, we aim to develop breast cancer risk prediction model by combining both genetic and non-genetic factors. The proposed study will efficiently utilize several types of existing data using innovative integrative approaches and has the potential to advance the field by narrowing down the genetic regions containing causal variants. More importantly, the risk prediction model has a good potential to translate knowledge from GWAS to the practice of breast cancer screening.