秦偉俊
摘 要:隨著信息技術和互聯網的發展,搜索引擎作為支撐互聯網服務的一個重要的基礎設施,其顯著的特點就是超大規模,主要體現在數據存儲與數據處理兩個方面。該課題以網絡操作系統新型體系結構為目標,突破大規模分布式數據共享與管理、高效資源調度、虛擬資源管理、大規模分布式系統運維等關鍵技術,提出了高性能存儲引擎與存儲系統設計、MapReduce環境中基于虛擬分區的負載平衡策略、異構Hadoop集群環境中性能驅動的數據平衡方法、云計算資源計量方法、分布式計算引擎設計、數據密集型計算系統設計、基于歷史信息比例因子動態調整的作業完成時間預測算法、空閑節點評估調度算法、MapReduce作業輸入負載均衡策略、應用遠程虛擬運營與應用開放平臺設計、超大規模數據中心多層次網絡互聯設計、大規模機群系統監控管理技術、大規模機群系統軟件遠程部署技術、低功耗高密度服務器和機柜設計技術,以及機器與機群管理系統等多項技術成果。利用百度強大的互聯網服務作為支撐,基于超大規模的云計算數據中心,設計并實現具有自主知識產權的新一代網絡操作系統。該系統將以支持搜索業務為核心,以數據中心的集群平臺為服務支撐,為搜索引擎服務提供超大規模數據存儲和數據處理能力,同時為基于搜索的開放應用服務提供運營支撐。
關鍵詞:網絡操作系統 搜索服務 數據存儲引擎 資源調度 系統監控 Hadoop MapReduce
Abstract: With the development of information technology and Internet, search engine plays an important role of internet service infrastructure with the feature of large scale dimensions, which focus on two aspects of data storage and data processing. With the aim of new architecture of network operating system, this project breaks through key technologies in large-scale distributed data sharing and management, efficient resource scheduling, virtual resource management, large-scale distributed system maintenance, and proposes several technical achievements including high performance storage engine and system design, load balance strategy based on virtual partition in MapReduce environment, performance-driven data balancing method in heterogeneous Hadoop cluster environment, resource measurement method in cloud computing, distributed computing engine design, system design of data intensive computing, job finishing time schedule algorithm based on historical information ratio factor dynamic adjustment, idle node evaluation schedule algorithm, MapReduce job injection load balance strategy, application remote virtual operation and platform design, multi-level network design for large-scale data center, large-scale cluster system monitoring and management, large-scale cluster system software remote deployment, low-energy, high-density server cabinet design, and clustering management system. With the support of internet search service from Baidu Inc., the project design and implement new generation of network operating system with independent intellectual property rights based on large-scale cloud computing data center, which aims to provide technical supports for search services and clustering platform in data center, large-scale data storage and processing capacity for search engine and operational supports for upper open application services.
Key Words: Network Operating System; Search Service; Data Storage Engine; Resource Scheduling; System Monitoring; Hadoop; MapReduce
閱讀全文鏈接(需實名注冊):http://www.nstrs.cn/xiangxiBG.aspx?id=65523&flag=1