孫玉娣
基于電信大數(shù)據(jù)的5G網(wǎng)絡海量用戶復訪行為預測模型
孫玉娣
(江蘇經(jīng)貿(mào)職業(yè)技術(shù)學院數(shù)字商務學院,江蘇 南京 211168)
5G網(wǎng)絡中的用戶會產(chǎn)生大量的訪問數(shù)據(jù),導致用戶復訪行為難以精準預測,因此提出基于電信大數(shù)據(jù)的5G網(wǎng)絡海量用戶復訪行為預測模型。從電信大數(shù)據(jù)中提取用戶上網(wǎng)歷史行為特征數(shù)據(jù),構(gòu)建數(shù)據(jù)集。引入多階加權(quán)馬爾可夫鏈模型,通過計算各階自相關(guān)系數(shù),得到模型權(quán)重值,計算模型的統(tǒng)計量。經(jīng)過分析后得到各階步長的馬爾可夫氏鏈一步轉(zhuǎn)移概率矩陣,從而實現(xiàn)對5G網(wǎng)絡海量用戶復訪行為的精準預測。實驗結(jié)果表明,該模型擁有最低的均值誤差和標準差,以及最高的精度、查全率、查準率、1指標,可證明該方法在預測用戶復訪行為方面有著非常明顯的優(yōu)勢。
電信大數(shù)據(jù);用戶復訪行為預測;多階加權(quán)馬爾可夫鏈模型;一步轉(zhuǎn)移概率矩陣;自相關(guān)系數(shù)
隨著5G電信網(wǎng)絡迅速發(fā)展,人們可以通過各種各樣的網(wǎng)站瀏覽新聞、下載數(shù)據(jù)以及購買商品,在方便生活的同時豐富了知識儲備。這一系列操作必然會產(chǎn)生海量的網(wǎng)絡數(shù)據(jù),利用相關(guān)算法從這些數(shù)據(jù)中挖掘出有用的信息,并對用戶未來可能訪問的網(wǎng)站和購買的商品進行預測,已經(jīng)成為一項十分熱門的研究內(nèi)容。針對可能復訪或者復購的用戶,根據(jù)其先前訪問的歷史和偏好進行針對性的推薦,可以在一定程度上提高用戶的購買欲望。用戶的瀏覽、操作、訪問等歷史行為數(shù)據(jù)都以日志文件的形式存儲在數(shù)據(jù)庫中,如何利用這些行為數(shù)據(jù)分析用戶是否會復訪,對于網(wǎng)絡平臺的可持續(xù)發(fā)展具有十分重要的意義。
文獻[1]將深度神經(jīng)網(wǎng)絡算法與不用正則化方法聯(lián)合起來,通過建立不同的分組,根據(jù)一定的數(shù)據(jù)特征對數(shù)據(jù)集進行復訪行為的預測;文獻[2]在用戶行為序列的基礎上實現(xiàn)用戶點擊預測。按照交互時間對用戶歷史行為進行排序,得到用戶歷史行為序列;將詞嵌入模型引入深度因子分解機(deep factorization machine,DeepFM)模型,對用戶歷史行為序列進行自適應學習,得到用戶的興趣列表,捕捉用戶的興趣變化,從而實現(xiàn)預測。
上述兩種方法已無法適應當前的5G大數(shù)據(jù)網(wǎng)絡環(huán)境,因此,本文提出了一種基于電信大數(shù)據(jù)的5G網(wǎng)絡海量用戶復訪行為預測模型。首先,從服務器節(jié)點中提取用戶的瀏覽數(shù)據(jù)、行為數(shù)據(jù)、操作數(shù)據(jù)以及屬性數(shù)據(jù)等各類信息構(gòu)建5G電信網(wǎng)絡數(shù)據(jù)集;然后,構(gòu)建多階加權(quán)馬爾可夫鏈模型,并對模型的轉(zhuǎn)移矩陣和初始概率向量進行計算;最后,根據(jù)各階步長的自相關(guān)系數(shù)計算權(quán)重值,分析權(quán)重值后得到各階步長的馬爾可夫鏈一步轉(zhuǎn)移概率矩陣,實現(xiàn)對5G網(wǎng)絡用戶復訪行為的精準預測。在實驗中,將本文模型與其他方法進行預測性能對比,結(jié)果表明本文模型在多個方面均展現(xiàn)出了明顯優(yōu)勢,預測均值誤差、標準差始終低于其他兩種方法,而預測精度則大大高于其他兩種方法。

表1 5G電信網(wǎng)絡采集數(shù)據(jù)解析
在進行用戶復訪行為預測之前,需要建立5G電信網(wǎng)絡數(shù)據(jù)集[3],為了確保用戶行為數(shù)據(jù)的精準性和實時性,在5G電信網(wǎng)絡中選取若干個服務器節(jié)點,將采集裝置部署在這些節(jié)點上進行數(shù)據(jù)采集。采集內(nèi)容包含用戶瀏覽數(shù)據(jù)、用戶屬性數(shù)據(jù)、用戶訪問行為數(shù)據(jù)[4]、用戶訪問深度數(shù)據(jù)等幾大類數(shù)據(jù),5G電信網(wǎng)絡采集數(shù)據(jù)解析見表1。
5G電信網(wǎng)絡數(shù)據(jù)的采集頻率[5]設定為0.2次/s,根據(jù)采集信息種類的不同,將數(shù)據(jù)分別存儲在30個數(shù)據(jù)庫中,其中包含280多個字段以及若干個擴展字段。本文采集的數(shù)據(jù)來自真實網(wǎng)站的公開數(shù)據(jù)庫,數(shù)據(jù)表示用戶訪問一次頁面的所有瀏覽、操作行為,可以真實、有效地反映用戶的行為特點。

圖1 5G電信網(wǎng)絡數(shù)據(jù)集構(gòu)建過程

1.2.1 多階加權(quán)馬爾可夫鏈模型
由于電信大數(shù)據(jù)具有用戶數(shù)量大、用戶產(chǎn)生的數(shù)據(jù)量大、用戶數(shù)據(jù)多樣等諸多特點,在對其進行分析處理時常常出現(xiàn)效率低、難度大等問題。為此,引入馬爾可夫鏈模型[7-9],對5G電信網(wǎng)絡用戶進行復訪行為預測。
馬爾可夫鏈模型針對用戶的上網(wǎng)行為做出了以下假設:用戶上網(wǎng)瀏覽的過程是一個隨機過程,即齊次的離散馬爾可夫鏈,因此可以將用戶上網(wǎng)行為構(gòu)成的特征集合看作離散隨機變量[10]的值域,也就是說,用戶上網(wǎng)過程構(gòu)成了的取值序列,且序列具有馬爾可夫性。






綜上所述,只要已知馬爾可夫鏈模型的初始概率向量,就可以實現(xiàn)對任何時間下用戶的復訪概率以及復訪網(wǎng)絡區(qū)間的預測。
1.2.2 用戶復訪行為預測





表2 不同模型階數(shù)下的和
(2)根據(jù)表2計算統(tǒng)計量:

為了驗證本文模型在實際應用中是否同樣合理有效,進行對比實驗測試。實驗所用數(shù)據(jù)從某大型網(wǎng)絡的公開數(shù)據(jù)庫中提取得到,為了更好地進行實驗,預先對采集到的數(shù)據(jù)進行清洗處理,剔除掉缺失率較大的缺失值,并利用scikit-learn接口中的分類模型對數(shù)據(jù)集進行訓練。
首先,將本文模型與文獻[1]和文獻[2]提出的模型進行對比。分別應用3種模型對同一時間段內(nèi)的用戶上網(wǎng)行為進行分析,并給出最終的復訪行為預測結(jié)果。3種模型的用戶復訪行為預測均值誤差和標準差分別如圖2、圖3所示。

圖2 3種模型的用戶復訪行為預測均值誤差
通過觀察圖2和圖3可以很清楚地看出,隨著數(shù)據(jù)量的不斷增加,本文模型的用戶復訪行為預測均值誤差和標準差最小,文獻[2]模型的均值誤差較文獻[2]模型低一些,而文獻[1]模型的標準差較文獻[2]模型低一些。

圖3 3種模型的用戶復訪行為預測標準差
接下來通過查全率、查準率、1指標、精度ACC以及受試者操作特征(receiver operator characteristic,ROC)曲線下面積(area under the curve,AUC)5個指標,進一步驗證3種模型的用戶復訪行為預測性能。用戶復訪行為預測從本質(zhì)上來說是一個二分類問題,可以根據(jù)數(shù)據(jù)樣本的真實類別和算法預測的類別將預測結(jié)果分為真陽性(true positive,TP)、假陽性(false positive,F(xiàn)P)、真陰性(true negative,TN)、假陰性(false negative,F(xiàn)N)4種。TP、FP、TN、FN之和等于數(shù)據(jù)樣本總數(shù)。當算法預測結(jié)果為TP+FP、TP+FN時,表示正類;當結(jié)果為FN+TN、FP+TN時,表示負類。



ACC是一個性能度量指標,正確數(shù)據(jù)樣本數(shù)量與數(shù)據(jù)樣本總數(shù)的比值就是ACC。

對訓練集進行預測,會得到一個預測概率,將預測概率與概率閾值進行對比,當預測概率大于閾值概率時,數(shù)據(jù)樣本為正類,反之則被認定為負類。將訓練集按照預測概率進行排序,從而得到算法的最終預測性能。為了更加公平、準確地對比3種模型的預測性能,引入10倍交叉驗證法統(tǒng)計最終的實驗結(jié)果,3種模型的用戶復訪行為預測結(jié)果見表3。

表3 3種模型的用戶復訪行為預測結(jié)果
通過觀察表3可以看出,3種模型中,本文模型的預測結(jié)果始終都是最優(yōu)的,由此可以說明本文模型在預測5G網(wǎng)絡用戶復訪行為時的精準度最高。這是由于本文模型利用多階加權(quán)馬爾可夫鏈模型對電信大數(shù)據(jù)進行分階分析和處理,通過計算各階步長的一步轉(zhuǎn)移概率矩陣,得到用戶上網(wǎng)歷史行為特征數(shù)據(jù),隨著對特征數(shù)據(jù)分析的不斷深入,可得到用戶復訪行為預測結(jié)果。
在5G電信網(wǎng)絡環(huán)境下,本文利用多階加權(quán)馬爾可夫鏈模型,從大數(shù)據(jù)中提取用戶上網(wǎng)歷史行為特征數(shù)據(jù),通過對這些數(shù)據(jù)進行分析來確定用戶的瀏覽習慣和偏好,從而精準且高效地預測。將本文模型與其他模型進行對比實驗,實驗結(jié)果表明,本文模型有著最優(yōu)秀的預測性能,可實現(xiàn)對用戶復訪行為的精準預測。
[1] 盧宇紅, 宋佳麗, 王萌, 等. 基于深度神經(jīng)網(wǎng)絡融合稀疏分組lasso的預測模型研究[J]. 中國衛(wèi)生統(tǒng)計, 2021, 38(6): 821-827.
LU Y H, SONG J L, WANG M, et al. The study on the prediction model based on deep neural network together with sparse group lasso[J]. Chinese Journal of Health Statistics, 2021, 38(6): 821-827.
[2] 顧亦然, 王雨, 楊海根. 基于用戶行為序列的短視頻用戶多行為點擊預測模型[J]. 電子與信息學報, 2023: 10.11999/JEIT211458.
GU Y R, WANG Y, YANG H G. Multi-action click prediction model for short video users based on user’s behavior sequence[J]. Journal of Electronics & Information Technology, 2023: 10.11999/JEIT211458.
[3] CAO W C, WANG K, GAN H C, et al. User online purchase behavior prediction based on fusion model of CatBoost and Logit[J]. Journal of Physics: Conference Series, 2021, 2003(1): 012011.
[4] LI H R, LIN F Q, LU X, et al. Systematic analysis of fine-grained mobility prediction with on-device contextual data[J]. IEEE Transactions on Mobile Computing, 2022, 21(3): 1096-1109.
[5] QIAO S B, PANG S C, WANG M, et al. Online video popularity regression prediction model with multichannel dynamic scheduling based on user behavior[J]. Chinese Journal of Electronics, 2021, 30(5): 876-884.
[6] NIU B, SUI L, TANG J R, et al. Prediction of microblog users’ forwarding behavior based on interactive and active information[C]//Proceedings of the 2020 International Conference on Aviation Safety and Information Technology. New York: ACM Press, 2020: 554-559.
[7] XIAO Y P, LI J H, ZHU Y F, et al. User behavior prediction of social hotspots based on multimessage interaction and neural network[J]. IEEE Transactions on Computational Social Systems, 2020, 7(2): 536-545.
[8] HU G Y, ZHOU Z J, HU C H, et al. Hidden behavior prediction of complex system based on time-delay belief rule base forecasting model[J]. Knowledge-Based Systems, 2020, 203: 106147.
[9] SUDAN B, CANSIZ S, OGRETICI E, et al. Prediction of success and complex event processing in E-learning[C]//Proceedings of 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE). Piscataway: IEEE Press, 2020: 1-6.
[10] SOLTANI N Y. Online learning of sparse Gaussian conditional random fields with application to prediction of energy consumers behavior[C]//Proceedings of 2021 IEEE Statistical Signal Processing Workshop (SSP). Piscataway: IEEE Press, 2021: 486-490.
[11] SUN L T, GAO S W, WANG L. An automatic test sequence generation method based on Markov chain model[C]//Proceedings of 2021 World Conference on Computing and Communication Technologies (WCCCT). Piscataway: IEEE Press, 2021: 91-96.
[12] DENNIS L A, FU Y, SLAVKOVIK M. Markov chain model representation of information diffusion in social networks[J]. Journal of Logic and Computation, 2022, 32(6): 1195-1211.
[13] PENG L, WEN L, QIANG L, et al. Research on complexity model of important product traceability efficiency based on Markov chain[J]. Procedia Computer Science, 2020, 166: 456-462.
[14] HAN C, CHEN J, TAN M K, et al. A tensor-based Markov chain model for heterogeneous information network collective classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(9): 4063-4076.
[15] CRUZ I R, LINDSTR?M J, TROFFAES M C M, et al. Iterative importance sampling with Markov chain Monte Carlo sampling in robust Bayesian analysis[J]. Computational Statistics & Data Analysis, 2022, 176: 107558.
[16] ALAMOUDI A, LIU M L, PAYANI A, et al. Predicting mobile users traffic and access-time behavior using recurrent neural networks[C]//Proceedings of 2021 IEEE Wireless Communications and Networking Conference (WCNC). Piscataway: IEEE Press, 2021: 1-6.
[17] LIU K, TATINATI S, KHONG A W H. A weighted feature extraction technique based on temporal accumulation of learner behavior features for early prediction of dropouts[C]//Proceedings of 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). Piscataway: IEEE Press, 2021: 295-302.
[18] SETIA S, JYOTI V, DUHAN N. HPM: a hybrid model for user’s behavior prediction based on N-gram parsing and access logs[J]. Scientific Programming, 2020: 1-18.
[19] CHEN L Y, WANG L H, ZHOU Y X. Research on data mining combination model analysis and performance prediction based on students’ behavior characteristics[J]. Mathematical Problems in Engineering, 2022: 1-10.
[20] RASOULI A, ROHANI M, LUO J. Bifold and semantic reasoning for pedestrian behavior prediction[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2022: 15580-15590.
[21] ZHOU H, YU K M, CHEN Y C, et al. A hybrid feature selection method RFSTL for manufacturing quality prediction based on a high dimensional imbalanced dataset[J]. IEEE Access, 2021, 9: 29719-29735.
[22] JIANG L, LIU H, JIANG H, et al. Heuristic and neural network based prediction of project-specific API member access[J]. IEEE Transactions on Software Engineering, 2022, 48(4): 1249-1267.
A prediction model of massive 5G network users’ revisit behavior based on telecom big data
SUN Yudi
School of Digital Commerce, Jiangsu Vocational Institute of Commerce, Nanjing 211168, China
Users in 5G networks will generate a large amount of access data, which makes it difficult to accurately predict users’ revisit behavior. Therefore, a prediction model of massive 5G network users’ revisit behavior based on telecom big data was proposed. The user’s historical online behavior characteristic data was extracted from the telecom big data to build a data set. Multi order weighted Markov chain model was introduced. The model weight value was obtained by calculating the autocorrelation coefficient of each order, and the statistics of the model were calculated. After analysis, the one-step transition probability matrix of Markov chain with each step size was obtained, so as to accurately predict the revisit behavior of massive users in 5G network. The experimental results show that the proposed model has the lowest mean error and standard deviation, as well as the highest accuracy, recall, precision and1 indicators, which can prove that the proposed method has a very obvious advantage in predicting users’ revisit behavior.
telecom big data, prediction of users’ revisit behavior, multi order weighted Markov chain model, one step transition probability matrix, autocorrelation coefficient
TP357
A
10.11959/j.issn.1000–0801.2023026

孫玉娣(1981– ),女,江蘇經(jīng)貿(mào)職業(yè)技術(shù)學院數(shù)字商務學院副教授,主要研究方向為本體、知識工程。
2022–12–28;
2023–02–07
2021年江蘇高校“青藍工程”優(yōu)秀教學團隊項目;江蘇經(jīng)貿(mào)職業(yè)技術(shù)學院“領(lǐng)軍人才”資助項目
“Qing Lan Project” in Jiangsu Universities in 2021, “Leading Talents” Program of Jiangsu Vocational Institute of Commerce