返回
类型 基础研究 预答辩日期 2018-03-29
开始(开题)日期 2015-10-10 论文结束日期 2017-12-18
地点 东南大学九龙湖校区计算机楼2楼会议室 论文选题来源 973、863项目     论文字数 6.3 (万字)
题目 基于用户行为的流媒体服务质量优化方法
主题词 互联网电视,用户行为分析,流行度预测,缓存策略,大数据计算模型
摘要 随着高清、3D等技术的成熟与普及,以互联网电视为代表的流媒体业务成为互联网流量的主要组成部分。互联网电视作为“互联网+”时代重要的流媒体应用,由于内容丰富和使用方便,受到了广大用户的喜爱。与此同时,有限的网络传输资源难以满足海量互联网电视的收视需要,导致网络拥塞、丢包延迟,严重影响了互联网电视的用户体验。为减轻互联网数据传输压力,提高互联网电视系统的服务质量和用户体验,本文从分析互联网电视用户行为入手,研究了互联网电视内容流行度变化规律,构建了高精度的内容流行度预测模型,设计了适应多种流行度演化趋势的缓存调度算法,实现了具备交互处理能力的互联网用户行为分析平台。论文的主要工作和创新点如下: (1)针对现有预测模型训练时间长、样本数量多、对突发热点节目流行度预测效果差等问题,提出了一种基于收视行为的互联网电视节目流行度预测模型。论文采用行为动力学分类方法将节目流行度演化过程分为内源临界、内源亚临界、外源临界和外源亚临界4种类型,采用双种群粒子优化的最小二乘支持向量机构建节目流行度预测模型。本文测量分析了某互联网电视平台280万用户的60亿条收视行为数据,实验结果表明,与现有其他方法相比,本文所提模型预测准确度可提升17%以上,并能有效缩短预测周期。 (2)针对不同互联网电视平台节目流行度演化趋势差异巨大,现有预测模型存在不能跨平台使用的问题,提出了基于趋势侦测的互联网流行度预测方法。首先,采用动态时间弯曲距离计算出节目流行度时间序列之间的相似性,并使用随机森林回归算法为K-Medoids聚类后的互联网电视节目构建流行度预测模型;然后,使用梯度提升决策树识别新发布的节目。最后,基于最大熵原理,结合每种趋势预测值和分类概率得到节目流行度预测值。实验结果表明,与传统的单一预测模型相比,论文的方法可以根据不同地区用户收视行为特点动态调整,预测准确度提高了20%。 (3)针对互联网电视平台为提高热点节目命中率而过度消耗存储空间的问题,提出一种基于流行度预测的节目缓存调度算法PPRA。首先,使用随机森林算法构建节目流行度预测模型,并针对所选特征存在的“维数灾难”问题,利用主成分分析法实施特征降维处理。然后,基于节目流行度预测数据调度缓存中的节目。最后,以某广电运营商130万用户120天的收视数据对PPRA算法进行实验。实验结果表明相同缓存命中率情况下 PPRA算法仅需LRU、LFU算法30%的存储空间,显著降低平台的建设成本。 (4)基于上述方法,论文设计与实现了互联网电视用户行为分析系统,构建了一个由嵌入在机顶盒中的代理程序和捕获服务器集群组成的行为事件捕获子系统。另外,开发了一套兼容非结构化数据源的分布式事务管理功能的SQL-on-Hadoop引擎。实验结果表明在1600万用户的商业互联网电视平台上采用32个节点的集群环境,每天可以处理10.2 TB的用户行为数据,处理速度比基于Hadoop/Hive的同类系统快近40倍。
英文题目 OPTIMIZATION APPROACHES FOR QOS OF STREAMING MEDIA BASED ON USER BEHAVIORS
英文主题词 Internet TV,user behavior analysis, popularity prediction, caching strategy, big data computation model
英文摘要 As HD and 3D technologies become popular, streaming media services represented by Internet TV become a major part of Internet traffic. As an important streaming media application in the era of "Internet +", Internet TV has been favored by the majority of users due to its rich content and convenient usage. At the same time, the limited network transmission resources are difficult to meet the demand of massive Internet TV viewing, resulting in network congestion and packet loss delay, seriously affecting the Internet TV user experience. In order to reduce the pressure of Internet data transmission and to improve the service quality and user viewing experience of Internet TV, this paper analyzes the behavior of Internet TV users, studies the changing rules of Internet TV content popularity and builds a high-precision content popularity prediction model. A cache scheduling algorithm that adapts to the evolutionary trend of many kinds of popularity is proposed. And an Internet user behavior analysis platform with interactive processing capabilities is implemented. The main work and innovation are as follows: (1) Aiming at the problems of long training time, large quantity of samples and poor prediction accuracy on the popularity of suddenly hot programs in existing prediction models, an Internet TV program popularity prediction model based on viewing behavioral dynamics features is proposed. 6 billion view behavior records from 2.8 million subscribers of an Internet TV platform are measured, and the evolution process of program popularity is divided into 4 types, which are endogenous, internal subcritical, exogenous and exogenous subcritical. The prediction models of Internet TV program popularity are constructed for each type using least squares support vector machines with double population particle swarm optimization. The experimental results show that, compared to the existing prediction model, the prediction accuracy can be increased by more than 17%, and the forecast period can be effectively shortened. (2) Aiming at the problem that the program popularity evolution of different Internet TV platforms varies greatly and the existing prediction models cannot be used across the platforms, a forecasting approach of program popularity based on trend detection is proposed. Firstly, a dynamic time warping distance based K-Medoids algorithm is applied to group programs’ popularity evolution into four trends. Then, four trend-specific prediction models are built separately using random forests regression. According to the features extracted from an electronic program guide and early viewing records, newly published programs are classified into the four trends by a gradient boost decision tree. Finally, by combining forecasting values from the trend-specific models and the classification probability, the approach achieves better prediction results. The experimental results show that compared with the traditional single forecasting model, this approach can be dynamically adjusted according to users’ viewing behavior in different regions, and the prediction accuracy is improved by 20%. (3) Aiming at the problem that the existing Internet TV platform excessively consumes storage space to increase the hit rate of the hot programs, a caching scheduling algorithm PPRA based on popularity prediction is proposed. Firstly, a random forest algorithm is used to construct the forecasting model of program popularity, and the "dimension disaster" of the selected features is solved by principal component analysis. Then, the programs in the cache are scheduled based on the program popularity prediction data. Finally, the PPRA algorithm is tested on the 120-day viewing data of 1.3 million users. Experimental results show that the PPRA algorithm only needs 30% storage space of LRU and LFU algorithm under the same cache hit rate, which can significantly reduce the platform construction costs. (4) Based on the above methods, the paper designs and implements an Internet TV user behavior analysis system, and constructs a behavior event capture subsystem consisting of agents embedded in the set-top box and a capturing cluster. In addition, we develop a SQL-on-Hadoop engine with distributed transaction management that is compatible with unstructured data sources. Experimental results show that using a 32-node cluster environment on 16 million users’ commercial Internet TV platforms, it can handle 10.2 TB of user behavior data daily, which is nearly 40 times faster than the similar systems based on Hadoop/Hive.
学术讨论
主办单位时间地点报告人报告主题
计算机科学与工程学院 2014年05月27日 南京江宁无线谷A6232 朱琛刚 YouTube加密流量视频码率&分辨率识别
计算机科学与工程学院 2014年12月10日 东南大学九龙湖计算机楼三楼会议室 朱琛刚 基于OpenFlow的网络故障诊断研究
计算机科学与工程学院 2015年06月25日 东南大学南京江宁无线谷A6232 胡晓燕 缓存可感知的路由机制研究
计算机科学与工程学院 2013年06月09日 东南大学九龙湖计算机楼三楼会议室 闫营亮 基于网页的僵尸检测系统设计与实现
计算机科学与工程学院 2014年12月10日 东南大学九龙湖计算机楼三楼会议室 毕 军 未来网络:网络体系变革与软件定义网络和地址驱动网络
计算机科学与工程学院 2015年12月14日 南京江宁无线谷A6232 郭晓军 主动网络流水印技术研究进展
计算机科学与工程学院 2012年12月07日 南京江宁无线谷A6232 朱琛刚 大规模高清互动电视平台用户行为研究
计算机科学与工程学院 2013年10月16日 东南大学九龙湖计算机楼三楼会议室 朱琛刚 基于流行度预测的互联网电视节目缓存调度算法
     
学术会议
会议名称时间地点本人报告本人报告题目
ACM 东南大学 2016年6月17日 江苏南京 RBAS:A Real-Time User Behavior Analysis System for Internet TV in Cloud Computing
EAI 胡志明交通大学 2017年9月4日 越南胡志明市 Program Popularity Prediction Approach for Internet TV Based on Trend Detecting
     
代表作
论文名称
基于流行度预测的互联网+电视节目缓存调度算法
RBAS: A Real-Time User Behavior Analysis System for Internet TV in Cloud Computing
基于收视行为的互联网电视节目流行度预测模型
Big Data Analytics for Program Popularity Prediction in Broadcast TV Industries
 
答辩委员会组成信息
姓名职称导师类别工作单位是否主席备注
朱跃龙 正高 教授 博导 河海大学
龚俭 正高 教授 博导 东南大学
聂长海 正高 教授 博导 南京大学
张功萱 正高 教授 博导 南京理工大学
王堃 正高 教授 博导 南京邮电大学
      
答辩秘书信息
姓名职称工作单位备注
胡晓艳 其他 讲师 东南大学