







摘" 要: 為提升數據的完整性,保證數據的效用程度,提出一種基于概率相似度的不完備數據填補方法。量化計算不完備數據的概率相似度矩陣,將計算結果和ROUSTIDA算法相結合進行不完備數據填補,獲取完備數據集。在此基礎上,構建決策規則,保證多屬性缺失數據的填補性能,并且設定可辨識矩陣優化算法的不完備數據填補效果。測試結果顯示,所提方法能夠計算不同數據對象之間的相似度值,可有效完成數據填補,填補后數據的完備程度均在95%以上,填補數據的填補值誤差均在0.17以下,填補效果良好。
關鍵詞: 概率相似度; 不完備數據; 數據填補; ROUSTIDA算法; 相似度矩陣; 可辨識矩陣; 決策規則
中圖分類號: TN919?34; TP301" " " " " " " " " "文獻標識碼: A" " " " " " " " " " " 文章編號: 1004?373X(2025)04?0079?04
Research on filling incomplete data based on probability similarity
TONG Lihong, SUN Shibao
(Henan University of Science and Technology, Luoyang 471000, China)
Abstract: In order to improve the integrity of data and ensure the utility level of data, a method for filling incomplete data based on probability similarity is proposed. The probabilistic similarity matrix of incomplete data is calculated quantitatively, and the calculated results are combined with ROUSTIDA algorithm to fill the incomplete data, so as to obtain the complete data set. On this basis, the decision rules are constructed to ensure the filling performance of missing data with multiple attributes, and the incomplete data filling effect of identifiable matrix optimization algorithm is set. The testing results show that the proposed method can calculate the similarity values between different data objects and effectively complete data fill. The completeness of the filled data is above 95%, and the filling error of the filled data is below 0.17, indicating good filling effect.
Keywords: probability similarity; incomplete data; data fill; ROUSTIDA algorithm; similarity matrix; discernible matrix; decision rule
0" 引" 言
在實際應用和研究中,由于各種原因,數據往往會存在缺失或不完整的情況,這給數據分析和建模帶來了挑戰[1]。因此,研究不完備數據填補方法旨在發展有效的技術和算法,能夠利用已有數據的信息填補缺失部分,提高數據處理和分析的準確性和效率,從而更好地支持決策制定和問題解決[2?3],為各行業提供更可靠的數據處理方案,推動數據科學和人工智能技術的發展和應用。
文獻[4]為實現數據的有效填補,采用稀疏向量描述缺失數據,通過構建稀疏矩陣進行數據的稀疏化處理,再利用迭代加權閾值算法完成數據填補。在應用過程中,如果數據屬性差異較大,該方法的填補效果不理想。文獻[5]為保證數據填補效果,依據不完整數據屬性之間的關聯復雜程度構建數據填補模型,選擇學習能力較好的單輸出子網完成填補。……