Hua Loo-Keng Center for Mathematical Sciences

Post-GWAS Secondary Phenotype Analysis is Cost-Benefit Only with Valid Analytical Approach

2018-07-17 【Print】【Close】

Speaker: Guolian Kang

Time: June 27, 9:00-10:00

Venue: N219

Abstract: Genome-wide association studies (GWAS) have been successful in the last decades to identify common variants associated with common or rare diseases. Study designs most commonly used for GWAS are based on a primary outcome including the case-control study (CC) for studying a common disease or extreme phenotype sequencing design (EPS) for studying an ordinal or continuous phenotype, such as the well-known National Heart, Lung, and Blood Institute Exome Sequencing Project. Besides the primary outcome, extensive data on secondary phenotypes (SP) that may correlate and share the common genetic variants with the primary outcomes are available. Although naïve methods for GWAS could be applied to analyze the secondary phenotypes, they lead to biased risk estimates if there is correlation between the primary outcome and secondary phenotype. This is resulted from the fact that the GWAS samples selected are not a random representative sample of the secondary outcome. Thus, the critical question is how to analyze these secondary outcomes in post-GWAS era? Here, two novel statistical methods for CC (STcc) and EPS (STEPS) designs are proposed. Extensive simulation studies show that the two methods can control false positive rate well and have larger power compared to naïve methods, which is robust to effect pattern of the genetic variant (risk or protective), rare or common variants, and trait distributions). To show their cost-benefit, we also mimicked to re-design two new retrospective studies as in the real practice based on primary outcomeof interest, which is same as SP in the EPS study. Application to a genome-wide association study of Benign Ethnic Neutropenia with 7 SPs under an EPS design also demonstrates the striking superiority of the proposed two methods over their alternatives.