返回
类型 基础研究 预答辩日期 2018-03-18
开始(开题)日期 2016-10-20 论文结束日期 2017-11-15
地点 东南大学中心楼105室 论文选题来源 国家自然科学基金项目     论文字数 8 (万字)
题目 大规模本体推理方法研究
主题词 本体推理,大规模易处理性,并行计算,NC复杂度
摘要 本体语言 RDFS及 OWL被广泛地用于知识库相关的应用中。本体推理作为支撑这些应用的重要服务在查询问答、本体调试、不一致性处理等方面扮演者重要的角色。另一方面,本体推理具有较高的计算复杂度。因此,随着越来越多的真实本体数据集被发布,如何提升大规模本体推理的效率给研究者们带来了挑战。 为了处理大规模本体推理问题,已有的工作均采用并行技术设计新的推理算法并且实现高效的推理系统。现有的工作在大量实验的基础上论证了并行技术对大规模本体推理具有较好的性能提升作用。然而,从时间复杂度的角度分析,对于“并行技术是否能提升本体推理性能”这个问题并不能一概而论。在第 6章,我们通过相应实验证实,存在本体使得并行技术不能有效地提升推理任务的性能。另一方面,诸多著名的数据集,如 YAGO,被已有的工作证明:对其的并行推理具有较高的性能 [1];然而,这些数据集却是由非大规模易处理( non-scaling-tractable)的本体语言表示的,即其推理问题在最坏情况是串行的,或者问题复杂度在 NC复杂度 [2]之上。基于以上讨论,为使得并行技术更好地服务于大规模本体推理,本文以本体推理的大规模易处理性( scalable-tractability)为核心展开研究,即从并行复杂度和数据的角度来寻找对于怎样的本体而言,其推理问题是大规模易处理的,或者是在 NC复杂度中的。本文将给出一系列性质;基于这些性质,进一步设计和优化并行算法;开发者和用户也可以利用这些性质来构建领域本体,使得其推理问题在理论上被保证是大规模易处理的。 针对以上问题,本文主要进行以下研究: 1)因为 datalog语言作为研究本体推理任务的基础,本文正文部分首先着重研究 datalog推理任务(即 datalog评估计算)的大规模易处理性,也就是考察满足怎样性质的 datalog程序其推理任务在 NC复杂度类中。这一部分将给出可以处理datalog推理任务的 NC算法。基于对该 NC算法的分析给出 datalog程序的大规模易处理类,即该类中的 datalog程序都可以由上述 NC算法完成推理任务。 2)接下来,文章分别探讨两类重要的本体推理任务:物化任务( materialization)和分类任务(classification)。针对物化任务,研究两个在实际中应用广泛的本体语言 DL-Lite和 DHL (Description Horn Logic)语言。本文研究给出结论: DL-Lite的两个最核心子语言 DL-Litecore和 DL-LiteR是大规模易处理的。针对 DHL语言,提出了语言使用上的一个约束,使得满足这个约束的 DHL本体,其物化任务都可被 NC算法处理,也就是满足了大规模易处理性。接着,本文进一步地将对于 DHL的研究结论推广到它的一个扩展语言 DHL(?),该扩展语言允许复杂角色包含公理(complex role inclusionaxioms)。 3)针对分类任务,本文研究了本体语言 OWL EL的核心语言 EL+。该部分首先定义了 EL+语言的一个子语言,并且证明该子语言的分类任务可被规约到 DHL(?)语言的物化任务;也就是说该子语言只要满足针对 DHL(o)语言提出的相应约束便可保证其分类任务是大规模易处理的。进一 步地,文章将研究完整的 EL+语言,在 DHL(o)语言约束上进一步给出针对 EL+语言的约束,使得其分类任务是大规模易处理的。 4)文章在实验部分考察了诸多知名的基准数据,大型知识库以及从不同数据源收集而来的真实本体。通过考察,本文发现这些本体中有很大一部分是满足正文研究中提出的约束,也就是说,针对它们的推理任务是大规模易处理的。本文针对物化任务和分类任务利用研究中提出的技术实现了并行物化算法以及并行分类算法,并将它们分别与目前知名的推理系统(具体为 RDFox, CEL和 ELK)进行比较;所使用的测试数据也是从不同领域收集而来的真实本体。实验结果表明了,本文的两个系统相较于所对比的推理机,在处理大规模易处理的本体上具有更好的效果。
英文题目 Research On Approaches to Large-scale Ontology Reasoning
英文主题词 Ontology Reasoning, Scalable-Tractability, Parallelism, NC Complexity
英文摘要 The ontology languages RDFS and OWL have been widely used in many real applications. The task of reasoning plays an important role in the services built on ontologies and supports other tasks, like query answering, ontology repairing and inconsistency handling. With a rapid growth of semantic data, it is challenging to conduct the reasoning tasks on large-scale ontologies efficiently. Current work proposes parallel algorithms and employs computing platforms of high-performance to make materialization sufficiently efficient and scalable in practice. However, there do exist ontologies, for which the reasoning tasks cannot always be improved by parallelism (an experiment is set to show it in Chapter 6). On the other hand, some well-known large-scale ontologies, such as YAGO, have been shown to have good performance for parallel reasoning[1], but they are expressed in ontology languages that are not scaling-tractable in theory, i.e., the efficiency of reasoning may not be improved on a parallel implementation[2]. Motivated by the above issue, this paper attempts to study the problem of scalable-tractability of ontology reasoning from a theoretical perspective. That is, this work aims to identify the ontologies for which the corresponding reasoning task is scaling-tractable, i.e., in NC complexity. The results of the scalable-tractability can be further used to optimize reasoning algorithms and to guide ontology engineers in creating scalingtractable ontologies. To this end, this paper provides the following solutions. 1) This paper focuses on datalog rewritable ontology languages, and first studies the scalable-tractability of the datalog reasoning task. Two NC algorithms are proposed in this part. Based on the analysis of these NC algorithms, this paper identifies two classes of datalog rewritable ontologies (called scaling-tractable classes) such that reasoning over them is scaling-tractable. 2) In the main part of this paper, two important ontology reasoning tasks are studied, materialization and classification. For materialization, this paper studies two ontology languages, DL-Lite and DHL (Description Horn Logic). This paper shows that materialization over any ontology expressed in DL-Litecore or DL-LiteR is scaling-tractable. For DHL, there exist ontologies where materialization can hardly be parallelized. This paper proposes to restrict the usage of DHL such that the restricted ontologies are scaling-tractable, and, further extends the results to a DHL extension DHL(o) that also allows complex role inclusion axioms. 3) For classification, this paper studies the core language of OWL EL, EL+. In this part, a sub-language ELP of EL+ is first defined. The classification of ELP is proved to be reducible to the materialization of DHL(o). Thus, the restriction of DHL(o) can also guarantee the scalable-tractability of ELP classification. Further, the full EL+ is studied. A new usage restriction is proposed such that the classification of EL+ can be handled by an NC algorithm; in other words, it is scaling-tractable. 4) To verify the practical usability of the above results, several real-world datasets are investigated. This paper then shows that many ontologies belong to the scaling-tractable classes. Based on the results and techniques of scalable-tractability, this paper further proposed optimized parallel algorithms for materialization and classification, and compares them to the state-of-the-art reasoners, i.e., RDFox, CEL and ELK. The experimental results show that the optimization used in the implementation of this paper results in a better performance on scaling-tractable ontologies compared with the test reasoners.
学术讨论
主办单位时间地点报告人报告主题
东南大学 2014.8.13 计算机楼4楼 林钦佑 人工智能的下一个前沿: 知识计算
东南大学 2015.12.17 计算机楼4楼 Amit Sheth Knowledge will Propel Machine Understanding of Big Data
东南大学 2016.8.15 计算机楼2楼 肖仰华 知识图谱顶级会议回顾及进展报告
东南大学 2017.7.2 计算机楼4楼 Prof. Hans Uszkoreit Open Knowledge,Corporate Knowledge and the Analytics of Dynamic Unstructured Data
东南大学 2015.1.2 计算机楼4楼 周张泉 Reasoning with fuzzy-EL+ Ontologies using MapReduce
东南大学 2015.7.8 计算机楼2楼 周张泉 浅谈逻辑、规则推理及其内核与应用
东南大学 2016.12.2 计算机楼2楼 周张泉 OWL本体语言与规则推理
东南大学 2017.5.6 计算机楼4楼 周张泉 On Parallel Reasoning
     
学术会议
会议名称时间地点本人报告本人报告题目
ICTAI2015 2015.11.9 意大利滨海维耶特里 A Platform-independent Approach for Parallel Reasoning with OWL EL Ontologies using Graph Representation
ECAI2016 2016.8.14 荷兰海牙 Exploring Parallel Tractability of Ontology Materialization
     
代表作
论文名称
A Platform-Independent Approach for Parallel Reasoning with OWL EL Ontologies Using Graph
Exploring Parallel Tractability of Ontology Materialization
Reasoning with Large Scale OWL 2 EL Ontologies based on MapReduce
The Unreliability of Language - A Common Issue for Knowledge Engineering and Buddhism
GEL: A Platform-Independent Reasoner for Parallel Classification with OWL EL Ontologies Using Graph
 
答辩委员会组成信息
姓名职称导师类别工作单位是否主席备注
瞿裕忠 正高 教授 博导 南京大学计算机学院
许卓明 正高 教授 博导 河海大学计算机学院
高志强 正高 教授 博导 东南大学计算机学院
翟玉庆 正高 硕导 东南大学计算机学院
李必信 正高 教授 博导 东南大学计算机学院
      
答辩秘书信息
姓名职称工作单位备注
倪庆剑 副高 副教授 东南大学计算机学院