摘 要:探討基于孤立點(diǎn)挖掘的異常檢測的可行性,將基于2k-距離的孤立點(diǎn)挖掘方法應(yīng)用到入侵檢測中,并針對該方法無法很好地處理符號型屬性數(shù)據(jù)的問題,采用編碼映射方法對符號型數(shù)據(jù)進(jìn)行處理,同時利用主成分分析來實(shí)現(xiàn)對編碼映射后擴(kuò)展的屬性進(jìn)行降維。詳細(xì)闡述了具體實(shí)現(xiàn)方案,并通過仿真實(shí)驗(yàn)驗(yàn)證了該方法的可行性。
關(guān)鍵詞: 入侵檢測; 孤立點(diǎn); 2k-距離; 編碼映射; 主成分分析
中圖分類號:TP391 文獻(xiàn)標(biāo)識碼:A
文章編號:1004-373X(2010)11-0114-03
Intrusion Detection Method Based on Outlier Mining
YANG Cheng-cheng1, HUANG Bin2
(1. Chang’an University, Xi’an 710021, China; 2. Putian University, Putian 351100,China)
Abstract: The feasibility of the anomaly detection based on the outlier mining is discussed. The anomaly detection method is presented. The outlier detection method based on similar coefficient sum is applied to the intrusion detection. In order to overcome the poor ability of outlier detection techniques,the code mapping method is adopted to process sign type data. The dimension reduction of the mixed attribute expanded after code mappingwas realized by the principal components analysis (PCA). The feasibility of the method was verified with a simulation experiment.
Keywords: intrusion detection; outlier detection; 2k-distance; code mapping; principal component analysis
0 引 言
孤立點(diǎn)挖掘[1]是數(shù)據(jù)挖掘技術(shù)中一個重要的研究方向,其任務(wù)是從大量復(fù)雜的數(shù)據(jù)中挖掘出存在于小部分異常數(shù)據(jù)中的新穎的、與常規(guī)數(shù)據(jù)模式顯著不同的數(shù)據(jù)模式。在統(tǒng)計學(xué)上,孤立點(diǎn)挖掘與聚類分析雖然在一定程度上是相似的,但兩者還是有著本質(zhì)的區(qū)別:聚類的目的在于尋找性質(zhì)相同或相近的記錄,并歸為一個類,而孤立點(diǎn)挖掘的目的則是尋找那些與所有類別性質(zhì)都不一樣的記錄。孤立點(diǎn)挖掘的往往是夾雜在大量高維數(shù)據(jù)中的異常數(shù)據(jù),這些異常數(shù)據(jù)產(chǎn)生的行為可能帶來較嚴(yán)重的后果,例如網(wǎng)絡(luò)入侵中異常數(shù)據(jù)產(chǎn)生的行為會使得網(wǎng)絡(luò)中的終端不能工作、且發(fā)生數(shù)據(jù)丟失和程序被改變等情況。
本文嘗試將孤立點(diǎn)挖掘算法應(yīng)用于入侵檢測[2-3]中,將基于2k-距離的孤立點(diǎn)挖掘算法應(yīng)用到入侵檢測中。采用KDD99數(shù)據(jù)集[4]作為實(shí)驗(yàn)數(shù)據(jù)來分析該方案的可行性和檢測性能。由于現(xiàn)有的孤立……