陳少權(quán) 杜翠鳳



【摘 要】針對現(xiàn)有社團發(fā)現(xiàn)算法忽略節(jié)點之間的關(guān)系強度及其動態(tài)性的問題,提出改進CPM算法挖掘移動通信的用戶關(guān)系圈。首先,采用TF-IDF剔除非重要通話群體;然后,引入時間衰減因子,采用用戶通信行為衡量用戶之間動態(tài)關(guān)系強度,結(jié)合關(guān)系閾值剔除節(jié)點之間的弱連接構(gòu)建大小不同的派系;再次,利用變異系數(shù)來衡量派系中用戶關(guān)系之間的相近性,采用變異系數(shù)閾值C.V*剔除關(guān)系不緊密的派系;最后,根據(jù)用戶輸入的k值,發(fā)現(xiàn)k-派系社團。實驗證明,通過重疊社團模塊度EQ評測指標(biāo)驗證,該算法的精確度高于傳統(tǒng)算法的精確度,具有一定的擴展性。
【關(guān)鍵詞】關(guān)系強度;時間衰減;關(guān)系閾值;變異系數(shù)
Relationship Mining of Mobile Communication Users Based on Improved CPM
CHEN Shaoquan, DU Cuifeng
[Abstract] In view of that existing association discovery algorithms ignore the relationship strength and dynamic nature, an improved CPM algorithm is proposed for relationship mining of mobile communication users. First, TF-IDF is used to eliminate unimportant communication groups. Then, the time attenuation factor is introduced and user communication behavior is used to measure the dynamic relationship strength between users. The relationship threshold is used to eliminate the weak connections between nodes to construct different factions of different sizes. Thirdly, the variation coefficient is used to measure the similarity between the user relationships in the factions. The variation coefficient threshold C.V* is used to eliminate the factions which are not closely related. Finally, the k- factional community is found on the basis of the k value input by the user. The experiments prove that the accuracy of the algorithm is higher than that of the traditional algorithm by the EQ evaluation index of overlapping community module degree, and it has a certain extensibility.
[Key words]relationship strength; time attenuation; relationship threshold; coefficient of variation
1 引言
移動通信用戶關(guān)系圈是根據(jù)移動用戶的通話關(guān)系特征、移動用戶的行為特征,對移動用戶進行社團劃分。這種基于用戶特征的劃分,能夠幫助運營商更加了解用戶關(guān)系網(wǎng)絡(luò)的構(gòu)成,為電信業(yè)務(wù)的拓展提供科學(xué)的支撐。當(dāng)前有不少成熟的社團劃分算法,如Palla等人[1]于2005年首先提出了派系過濾算法(CPM,Clique Percolation Method),該算法突破了傳統(tǒng)的非重疊社團劃分算法的限制,能夠用來分析重疊的社團結(jié)構(gòu),其分析結(jié)果更貼近現(xiàn)實的社團結(jié)構(gòu)。由于CPM的計算復(fù)雜度較大,Lancichinetti等人[2]于2009年從網(wǎng)絡(luò)局部結(jié)構(gòu)的角度出發(fā),提出了基于局部擴展思想的重疊社團挖掘算法(LFM, Local Fitness Measure),提升了社團劃分的速度。Evans[3]和Ahn[4]等人突破傳統(tǒng)以網(wǎng)絡(luò)節(jié)點為研究對象進行網(wǎng)絡(luò)劃分的局限,提出了邊聚類的社團劃分方法?!?br>