












摘" 要: 通過離群點檢測可以及時發現計算機網絡中的異常,從而為風險預警和控制提供重要線索。為此,設計一種基于局部信息熵的計算機網絡高維數據離群點檢測系統。在高維數據采集模塊中,利用Wireshark工具采集計算機網絡原始高維數據包;并在高維數據存儲模塊中建立MySQL數據庫、Zooleeper數據庫與Redis數據庫,用于存儲采集的高維數據包。在高維數據離群點檢測模塊中,通過微聚類劃分算法劃分存儲的高維數據包,得到數個微聚類;然后計算各微聚類的局部信息熵,確定各微聚類內是否存在離群點;再依據偏離度挖掘微聚類內的離群點;最后,利用高維數據可視化模塊呈現離群點檢測結果。實驗證明:所設計系統不僅可以有效采集計算機網絡高維數據并劃分計算機網絡高維數據,還能夠有效檢測高維數據離群點,且離群點檢測效率較快。
關鍵詞: 計算機網絡; 高維數據; 離群點檢測; 局部信息熵; Wireshark工具; 微聚類劃分
中圖分類號: TN919.1?34; TP391" " " " " " " " " "文獻標識碼: A" " " " " " " " " " 文章編號: 1004?373X(2024)10?0091?05
A computer network high?dimensional data outlier detection system based on
local information entropy
Abstract:" The anomalies in computer networks can be detected in a timely manner by means of outlier detection, so as to provide important clues for risk warning and control. On this basis, a computer network high?dimensional data outlier detection system based on local information entropy is designed. In the high?dimensional data collection module, Wireshark tool is used to collect raw high?dimensional data packets from computer networks. The high?dimensional data storage module is established by means of MySQL database, Zooleeper database, and Redis database to store the collected high?dimensional data packets. In the high?dimensional data outlier detection module, the stored high?dimensional data packets are divided by means of micro clustering partitioning algorithm to obtain several micro clusters. The local information entropy of each micro cluster is calculated to determine whether there are outliers within each micro cluster, and outliers within micro clustering are mined based on the degree of deviation. The high?dimensional data visualization module is used to present outlier detection results. The experimental results show that the system can not only effectively collect high?dimensional data from computer networks and partition them, but also effectively detect outliers in high?dimensional data, and the efficiency of outlier detection is fast.
Keywords: computer network; high dimensional data; outlier detection; local information entropy; Wireshark tool; microclustering division
0" 引" 言
計算機網絡中,網絡流量、用戶行為、社交網絡等數據呈現出高維度的特性[1],這些數據中可能隱藏著重要的信息和模式。離群點是數據集中與其他數據點顯著不同的觀測值[2?4],也可能是異常事件、惡意行為或重要機會的指示器,因此,準確、高效地檢測離群點對于網絡安全、數據分析和決策支持等方面具有重要意義。例如,在網絡安全領域,檢測異常流量可以識別網絡攻擊和病毒傳播;……