一種基于線性回歸的新型推薦方法

2017-09-18 09:11:24王兆國謝峰關毅薛一波

智能計算機與應用 2017年4期

王兆國++謝峰++關毅薛一波

術學院，哈爾濱 150001； 2 清華大學信息科學與技術國家實驗室，北京 100084）

摘要：關鍵詞：中圖分類號：文獻標志碼： A文章編號： 2095-2163（2017）04-0001-05（1 School of Computer Science and Technology， Harbin Institute of Technology， Harbin 150006， China；

2 National Lab for Information Sci. & Tech， Tsinghua University， Beijing 100084， China）

Abstract： With the development of social media， Internet is not only people's tool to get information， but also a channel to share information. Usergenerated contents make people face overload information. So that a lot of really valuable information is difficult to be found. On the strength of lower user involvement， the personalized recommendation system has been considered as one of the most potential methods to solve information overload at present. However， currently the most mature and widely used collaborative filtering recommendation method is facing such problems as data sparseness， diversity and so on. Its recommended effect is not ideal. A recommendation method based on linear regression is proposed in this paper. A linear regression model is established by using the rating frequency information of the users or items to predict the uses' scores on nonscored items. The method has the advantages of low complexity， incremental updating， and high accuracy and so on.

Keywords：

基金項目：

作者簡介：

收稿日期： 0引言

近年來，社交網絡的普及和發展，改變了人們被動獲取信息的方式，用戶產生內容呈爆炸式增長。對于普通用戶來說，面對海量的信息難以找到自己真正感興趣的部分，這就是信息過載問題＼[1＼]。門戶網站按照信息的屬性分門別類以幫助用戶快速索引，搜索引擎通過分析用戶輸入的查詢返回最相關的內容。盡管兩者在很大程度上提高了用戶獲取信息的效率，但都需要用戶過多的參與，不能自動感知用戶的興趣，況且很多時候用戶根本不知道自己想要什么，或者不能有效運用關鍵詞描述自己的興趣。此外，分類和搜索技術返回的結果嚴重缺乏個性，用戶體驗不佳。推薦系統＼[2＼]通過分析用戶的歷史行為，為每一個用戶建立個性化的興趣模型，主動向用戶推送可能感興趣的內容，這被認為是解決信息過載最具潛力的設計研發方式。

目前居于應用流行首位的個性化推薦系統所采用的推薦方法是協同過濾＼[3＼]，維基百科給協同過濾方法的定義是：“利用某興趣相投、擁有共同經驗之群體的喜好來推薦使用者感興趣的資訊”。協同過濾方法主要分為2類＼[3＼]：基于啟發式的方法＼[4-9＼]和基于模型的方法＼[10-16＼]。其中，基于啟發式的方法利用用戶對物品的隱性或顯性行為得到用戶物品評分矩陣，然后計算用戶或物品間的相似度，最后根據鄰居用戶或物品的評分及相似度給出評分預測和結果推薦。根據相似性計算的主題是用戶還是物品，基于啟發式的方法可以進一步分為基于用戶的協同過濾方法＼[4＼]和基于物品的協同過濾方法＼[17＼]。目前，啟發式方法由于呈現的易部署、高效率的特性，已然廣泛應用于商業系統中，如Amazon。然而，由于數據稀疏性、多樣性等問題則使得啟發式方法的推薦性能難以得到有效提升。

為了提高推薦準確性，基于模型的方法利用用戶物品評分矩陣訓練更為精準的評分預測模型，比如：聚類＼[16，18＼]、貝葉斯信念網絡＼[6，19＼]、馬爾可夫決策過程＼[20＼]以及潛在語義模型＼[21＼]等。盡管基于模型的方法提高了預測準確性，但卻也同樣面臨模型復雜、參數較多并且對數據集的統計特性依賴性較大等問題，這也是基于模型的方法難以應用于實際推薦系統的重要原因。

本文提出了一種基于線性回歸的推薦方法。該方法利用用戶或物品的評分頻次信息，建立了用戶或物品的某次評分與其最高頻次評分的線性回歸模型，進而利用該模型對未知評分直接根據歷史評分頻次進行預測。與傳統方法相比，該方法極大地降低了計算復雜性，使得算法在Ω（n）的時間內完成所有計算，便于應用于實際的工業生產；利用群體智慧，采用統計信息估計模型參數，具有很好的抗噪聲能力；算法同時具有很好的增量更新能力，可以在常數時間內對新產生的用戶行為完成更新，實時性能好。endprint

2.4實驗結果

為了比較不同方法對數據稀疏程度的容忍度，本節將MovieLens 1M數據集切分成不同比例的訓練集和測試集。比例x%從10%以10%的步長增長到90%。分別比較了2.3節所述方法的評分預測準確性、分類準確性指標，以及模型建立和預測時間的相應結果對照，具體研究闡釋論述如下。

2.4.1預測準確性

為了衡量基于線性回歸的推薦方法評分預測準確性，本文采用了2.2節介紹的誤差指標MAE和RMSE，其中RMSE在應用上要更趨廣泛，這里，本文只給出了各方法在不同數據稀疏程度下的RMSE對比結果，如圖1所示。

圖1RMSE對比實驗

Fig. 1RMSE comparative experiment

從圖1可以看出基于線性回歸的推薦方法無論在何種比例的數據集劃分下的RMSE值均遠遠小于基于物品的協同過濾方法，即預測準確性高。同時，也可以看出單純地以用戶的評分頻次以及物品評分頻次的加權值作為預測結果，其準確性也較高，說明評分頻次信息對于評分預測具有較大的價值。盡管當數據集更稀疏的情況下，圖中訓練集的比例僅為10%的時候，基于線性回歸的推薦方法的RMSE值與物品平均評分很接近，但是隨著訓練集比例不斷增加，基于線性回歸的推薦方法的評分預測準確性則呈現出明顯性能優勢。

2.4.2分類準確性

評分預測準確性衡量的是預測評分與實際評分之間的差距，而真實系統中由于只關心給用戶推薦出來的前N個物品是否符合用戶的興趣，因此，本文將預測評分和實際評分與評分喜好閾值（數據集采用5分制，這里閾值取3）進行比較，判斷預測用戶對物品的喜好是否與實際情況一致相符，也就是2.2節所討論的喜好分類指標，其中，precision和recall值都不能單獨衡量預測結果的分類性能高低，本文僅給出了兩者合成指標F值的對比，如圖2所示。

圖2F-Measrue比較實驗

Fig. 2F-Measure comparison experiment

從圖2可以看出，隨著訓練集比例的增加，除基于物品的協同過濾方法外，其余方法的F值均有所增加，并且遠遠大于基于物品的協同過濾方法。此外，基于線性回歸的推薦方法在所有方法中獲得了最佳效果表現，其F值在訓練集比例超過30%開始就大于基于物品平均評分的方法。

2.4.3時間性能

實際生產環境總是對推薦結果的響應時間有一定的需求，特別是用戶和商品過億的大型真實系統對算法的耗時將更加敏感。本節給出了基于線性回歸的推薦方法與2.3節所介紹的方法的建模時間和預測時間的對比，結果如表2所示，其中，IA表示Item Average、IC表示Item Correlation、RF表示Rating Frequency、LR表示Linear Regression。

表2建模時間和預測時間的對比分析

Tab. 2Comparison of modeling time and forecast time

訓練集比例IAICRFLR20%T建模0.16729.0190.1590.250T預測4.0289.1070.95416.63940%T建模0.30259.1330.2680.441T預測3.1799.0960.67421.92860%T建模0.45584.8340.3700.689T預測2.1535.5770.46222.13380%T建模0.585110.3680.4741.136T預測1.1216.4300.25613.485基于物品平均評分的方法在建模的過程中只需要計算各物品的平均評分，而基于用戶和物品評分頻次的方法同樣只需要簡單計算每個用戶的最高頻次評分和物品的最高頻次評分，因此這2種方法的建模時間非常短。建模耗時最長的是基于物品的協同過濾方法，該方法需要計算兩兩物品之間的Pearson相關系數，即物品相似度，然后為每一個物品選擇一定數量的最相似的物品構成鄰居物品集合，最終在評分預測過程中利用用戶對未評分物品的鄰居物品集合中的物品的評分加權得到預測結果。總體來說，基于線性回歸的推薦方法的建模時間和預測時間具有較強的競爭力，能夠滿足真實系統對預測時間性能的要求。

3結束語

本文提出了基于線性回歸的推薦方法，該方法巧妙地利用用戶的評分頻次以及物品的評分頻次信息，分別構建了基于用戶的線性回歸模型和基于物品的線性回歸模型，利用這2個模型同時預測用戶對未評分物品的評分，最終將兩者加權得到預測結果。在公開數據集上的實驗結果表明，基于線性回歸的推薦方法不僅預測準確性高于現有的主流方法（基于物品的協同過濾方法），而且分類準確性也表現得更優。此外，基于線性回歸的推薦方法的時間性能遠遠高于基于物品的協同過濾方法，能夠為更大規模的真實系統所采納使用。未來工作中，研究將進一步改進基于線性回歸的推薦方法，利用增量更新方式建立線性回歸模型，并部署配置到真實的系統中檢驗其設計推薦效果。

參考文獻：

[1] 劉建國，周濤，汪秉宏. 個性化推薦系統的研究進展[J]. 自然科學進展， 2009， 19（1）：1-15.

[2] RESNICK P， VARIAN H R. Recommender systems＼[J＼]. Communications of the ACM， 1997， 40（3）：56-58.

[3] ADOMAVICIUS G， TUZHILIN A. Toward the next generation of recommender systems： A survey of the state-of-the-art and possible extensions＼[J＼]. Knowledge and Data Engineering， IEEE Transactions on， 2005， 17（6）：734-749.endprint

[4] RESNICK P， IACOVOU N， SUCHAK M， et al. Grouplens： An open architecture for collaborative filtering of netnews＼[C＼]// Proceedings of the 1994 ACM conference on Computer supported cooperative work. Chapel Hill， North Carolina， USA：ACM， 1994： 175-186.

[5] SHARDANAND U， MAES P. Social information filtering： Algorithms for automating “word of mouth”＼[C＼]//Proceedings of the SIGCHI conference on human factors in computing systems. Denver，USA：ACM Press， 1995： 210-217.

[6] BREESE J S， HECKERMAN D， KADIE C. Empirical analysis of predictive algorithms for collaborative filtering＼[C＼]// Proceedings of the Fourteenth conference on uncertainty in artificial intelligence. Madison， Wisconsin：ACM， 1998： 43-52.

[7] DELGADO J， ISHII N. Memorybased weighted majority prediction for recommender systems[C]//ACM SIGIR 1999 Workshop on Recommender Systems： Algorithms and Evaluation.Berkeley UC：Citeseer， 1999：1-5.

[8] NAKAMURA A， ABE N. Collaborative filtering using weighted majority prediction algorithms＼[C＼]//ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning.San Francisco， CA， USA： Morgan Kaufmann Publishers Inc，1998： 395-403.

[9] YANG J M， LI K F. Recommendation based on rational inferences in collaborative filtering＼[J＼]. KnowledgeBased Systems， 2009， 22（1）：105-114.

[10]GOLDBERG K， ROEDER T， GUPTA D， et al. Eigentaste： A constant time collaborative filtering algorithm＼[J＼]. Information Retrieval， 2001， 4（2）：133-151.

[11]BILLSUS D， PPZZANI M J. Learning collaborative information filters＼[C＼]//Proceeding ICML'98 proceedings of the Fifteenth International Conference on Machine Learning.San Francisco， CA， USA：Morgan Kaufmann Publishers Inc，1998：46-54.

[12]GETOOR L， SAHAMI M. Using probabilistic relational models for collaborative filtering＼[C＼]// Workshop on Web Usage Analysis and User Profiling （WEBKDD'99）. New York， NY， USA：Citeseer， 1999：1-6.

[13]HOFMANN T. Collaborative filtering via gaussian probabilistic latent semantic analysis＼[C＼]// Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. Toronto， Canada：ACM， 2003： 259-266.

[14]MARLIN B M. Modeling user rating profiles for collaborative filtering＼[C＼]// Advances in neural information processing systems. Vancouver and Whistler， British Columbia， Canada：DBLP， 2003：1-8.

[15]PAVLOV D X， PENNOCK D M. A maximum entropy approach to collaborative filtering in dynamic， sparse， highdimensional domains＼[C＼]//Advances in neural information processing systems.Cambridge， MA， USA：MIT Press， 2002： 1441-1448.endprint

[16]UNGAR L H， FOSTER D P. Clustering methods for collaborative filtering＼[C＼]// AAAI workshop on recommendation systems.Menlo Park， California， AAAI Press， 1998：1-16.

[17]SARWAR B， KARYPIS G， KONSTAN J， et al. Itembased collaborative filtering recommendation algorithms＼[C＼]//Proceedings of the 10th international conference on World Wide Web. Hong Kong ：ACM， 2001： 285-295.

[18]CHEE S H S， HAN J， WANG K. Rectree： An efficient collaborative filtering method＼[M＼]// KAMBAYASHI Y， WINIWARTER W， ARIKAWA M. Data Warehousing and Knowledge Discovery.DaWaK 2001. Lecture Notes in Computer Science. Berlin： Springer， 2001： 141-151.

[19]SU X， KHOSHGOFTAAR T M. Collaborative filtering for multiclass data using belief nets algorithms[C]//Tools with Artificial Intelligence， 2006. ICTAI'06. 18th IEEE International Conference on.Arlington， VA： IEEE， 2006： 497-504.

[20]SHANI G， HECKERMAN D， BRAFMAN R I. An mdpbased recommender system＼[J＼].The Journal of Machine Learning Research，2005，6：1265-1295 .

[21]HOFMANN T. Latent semantic models for collaborative filtering＼[J＼]. ACM Transactions on Information Systems （TOIS）， 2004， 22（1）：89-115.

[22]SARWAR B， KARYPIS G， KONSTAN J， et al. Analysis of recommendation algorithms for ecommerce＼[C＼]// Proceedings of the 2nd ACM conference on Electronic commerce.Minneapolis， Minnesota， USA ： ACM， 2000： 158-167.

[23]KARATZOGLOU A， AMATRIAIN X， BALTRUNAS L， et al. Multiverse recommendation： Ndimensional tensor factorization for contextaware collaborative filtering＼[C＼]// Proceedings of the fourth ACM conference on Recommender systems. Barcelona， Spain：ACM，2010： 79-86.

[24]JAMALI M， ESTER M. A matrix factorization technique with trust propagation for recommendation in social networks＼[C＼]// Proceedings of the fourth ACM conference on Recommender systems.Barcelona， Spain ：ACM， 2010： 135-142.

[25]CREMONESI P， KOREN Y， TURRIN R. Performance of recommender algorithms on topn recommendation tasks＼[C＼]//Proceedings of the fourth ACM conference on Recommender systems.Barcelona， Spain： ACM， 2010： 39-46.

[26]Sarwar B， Karypis G， Konstan J， et al. Application of dimensionality reduction in recommender system-a case study＼[R＼]. Minneapolis： University of Minnesota， 2000.endprint