返回
类型 应用研究 预答辩日期 2018-03-22
开始(开题)日期 2014-12-15 论文结束日期 2018-01-19
地点 逸夫科技馆三楼会议室 论文选题来源 国际合作研究项目    论文字数 4.5 (万字)
题目 贝叶斯方法在复杂疾病关联分析和产前筛查中的应用
主题词 贝叶斯方法,隐马尔科夫模型,关联分析,局部单倍型共享,连锁不平衡
摘要 全基因组关联分析对人群样本进行高密度遗传标记,通过开展多中心、大样本的临床试验,反复验证基因型与疾病表型的关联,以期找到影响疾病的遗传标记。但是对于大部分的复杂疾病,常见变异的单核苷酸多态性(SNP)仅能解释不到10%的表型变异,这一现象被称为“遗传性缺失”。针对“遗传性缺失”的问题,在基因组范围内将众多常见变异的SNP联合在一起进行分析,就有可能解释大量的复杂表型变异。因此,与传统全基因组关联分析相比,基于单倍型的关联分析更有助于揭示复杂疾病的遗传机制。但随着SNP位点数目不断增多,单倍型种类急剧增加,每种单倍型的群体频率都很低。这种高维、海量、稀疏的数据集,给统计分析带来很大的挑战,为精确定位致病位点带来很大的困难。本论文对单倍型关联分析进行了数据统计学方面的探讨和研究,通过开发新的关联方法有效地对数据进行降维,以实现低成本、高效地找到更多遗传标记与复杂疾病的关联。 本论文提出了一个基于隐马尔科夫模型和贝叶斯回归模型的单倍型关联分析方法,并验证了它的统计功效。该方法首先建立一个双层隐马尔科夫模型来拟合连锁不平衡,从而推断出祖先型单倍型以及这些祖先型单倍型在每个个体的每个位点上的载量;然后计算出局部单倍型共享(即两个二倍体的个体继承同一个祖先型单倍型的概率);最后采用贝叶斯回归模型对局部单倍型共享与表型进行关联分析。该方法的优势在于克服了单倍型分型的不确定性;避免使用固定窗口作为单倍型的宽度;与单个SNP分析的检验数量相同。在优化运算性能方面,论文将时间复杂度从二次降低为线性,能够胜任数万样本上数百万位点的大数据分析。 我们开发了相应的算法软件,应用该软件分析了Wellcome Trust Case Control Consortium公共数据集的7种复杂疾病数据,发现了7个基因区域与5种疾病表型之间的8个新关联。其中GRIK4基因,编码一种谷氨酸门控离子通道家族的蛋白质,同时与冠心病和类风湿关节炎都有强关联。 论文在上述基础上引入了贝叶斯矩阵回归模型,将单倍型关联分析方法从单一表型分析拓展到适用于多个表型的联合分析,并开发出第二版算法软件。应用该软件分析一组三价流感疫苗免疫应答的数据,发现了2个显著的反式作用数量性状基因座(Trans-acting eQTL)。第一个是干扰素α结合蛋白基因IFNAR2上的SNP,与嗅觉受体基因OR2AG1表达的关联;第二个是降钙素受体基因CALCR上的SNP,与干扰素α诱导蛋白基因IFI27表达的关联。 同时,我们又将贝叶斯理论应用到无创产前筛查,开发了基于贝叶斯推断的无创产前胎儿染色体筛查方法。模拟数据证实贝叶斯方法的统计功效显著优于传统Z检验。论文分析了3504例临床孕妇样本,发现了9例(共51例阳性)传统检验方法的假阳。与传统Z检验方法相比,该方法有效利用了胎儿比例的先验信息,提高了筛查的准确性;降低对测序数据量的要求,检测成本更低;能够计算阳性预测值和阴性预测值,有着更好的临床指导意义。
英文题目 BAYESIAN METHOD FOR COMPLEX DISEASE ASSOCIATION STUDY AND PRENATAL SCREENING
英文主题词 Bayes, hidden Markov model, association, local haplotype sharing, linkage disequilibrium
英文摘要 Genome-wide association studies (GWAS) obtain a genome-wide set of genetic variants in different individuals, through multi-center and large sample clinical trials, to detect and verify genetic variants in association with phenotypes. Current GWAS typically focus on associations between single-nucleotide polymorphisms (SNP) and phenotypes like major human diseases. However, for majority of complex phenotypes, single SNP common variants only explain less than 10% of phenotypic variations, which is known as “missing heritability”. A signicant amount of phenotypic variation can be explained by common variants, if genome-wide SNPs are jointly analyzed. Thus, haplotype association is much more powerful for unveiling the etiology of complex phenotypes than single SNP association. But with the increasing number of SNPs, the number of haplotypes increase dramatically, and the population frequency of each haplotype is very low. This high dimensional massive sparse data brings great challenges to the statistical analysis. We studied the structure of haplotype and developed a novel haplotype association method, in order to effectively find more causal variants. A novel haplotype association method is presented, and its power demonstrated. Relying on a two-layer hidden Markov model for linkage disequilibrium (LD), the method first infers ancestral haplotypes and their loadings at each marker for each individual. The loadings are then used to quantify local haplotype sharing between individuals at each marker. A Bayesian regression model was developed to link the local haplotype sharing and phenotypes to test for association. Compared to existing haplotype association methods, our method integrated out phase uncertainty, avoided arbitrariness in specifying haplotypes, and had the same number of tests as the single SNP analysis. In addition, we reduced the time complexity from putatively quadratic to linear, consequently, our method is applicable to big data sets. We developed an algorithm software, applied the software to data from the Wellcome Trust Case Control Consortium, and discovered eight novel associations between seven gene regions and five disease phenotypes. Among these, GRIK4, which encodes a protein that belongs to the glutamate-gated ionic channel family, is strongly associated with both coronary artery disease and rheumatoid arthritis. Based on the above, we introduced Bayesian matrix regression to extend the haplotype association method from single phenotype analysis to multi-phenotype jointly analysis, and developed the second version algorithm software. We applied the software to a set of immune responseres data to trivalent vaccine and discovered two trans-acting response-eQTLs. The first is between a SNP in IFNAR2, a gene encodes an interferon alpha binding protein, and a probe in OR2AG1, a member of olfactory receptor. The second is between a SNP in CALCR, a calcitonin receptor that maintains calcium homeostasis, and a probe in IFI27, an interferon alpha-inducible protein 27. Meanwhile, we applied Bayesian inference to the field of disease screening, and developed a novel method to analyze NIPT dataset to detect fetal trisomy such as the Down syndrome. The power comparison demonstrated that our Bayesian method is markedly better than the current Z-test method. We analyzed 3405 NIPS samples and spotted at least 9 (out of 51) possible Z-test false positives. Compared with Z-test method, Bayesian method emphasize fetal DNA fraction in NIPS to improve the accuracy of screening, permit even lower sequencing coverage, and can provide positive predictive value and negative predictive value, which are of clinical importance. Based on the clinical trials, the corresponding commercial software is on trial.
学术讨论
主办单位时间地点报告人报告主题
国家卫生计生委 2017年4月12日 北京 Mark I Evans NIPT hysteria: the right answer to the wrong question
同济大学附属第一妇婴保健院 2016年5月26日 上海 Dick Oepkes Prenatal screening & diagnosis in Netherland: development, experience and challenges
北大医学部 2015年6月14日 北京 徐寒黎 Monte Carlo methods
北京大学 2015年4月2日 北京 徐寒黎 Treatment of acute-on-chronic liver failure and chronic liver failure in chronic severe hepatitis B with Traditional Chinese Medicine combined therapy
USDA/ARS CNRC 2013年12月6日 Houston 徐寒黎 Group reading: a map of positive selection in human genome
USDA/ARS CNRC 2013年8月16日 Houston 徐寒黎 Group reading: heritability and genomics of gene expression in peripheral blood
Baylor College of Medicine 2013年5月8日 Houston, TX, USA 徐寒黎 Detecting association with haplotype and multiple phenotypes
北京医院 2016年12月10日 北京 徐寒黎 基因组病和单基因病的无创筛查技术
     
学术会议
会议名称时间地点本人报告本人报告题目
第八届全国耳聋基因诊断高端研讨会 2015年12月10-16日 北京 Noninvasive prenatal diagnosis of Single Gene Disorders
遗传统计学术报告会 2014年5月23日 Houston, TX, USA Detecting local haplotype sharing and haplotype association
     
代表作
论文名称
Informative priors on fetal fraction increase power of the noninvasive prenatal screen
 
答辩委员会组成信息
姓名职称导师类别工作单位是否主席备注
陈峰 正高 教授 博导 南京医科大学
肖鹏峰 正高 教授 博导 东南大学
关永涛 副高 Asistant Professor 博导 贝勒医学院
顾万君 正高 教授 博导 东南大学
刘宏德 副高 副教授 博导 东南大学
      
答辩秘书信息
姓名职称工作单位备注
涂景 其他 讲师 东南大学