A User-Recommendattiioonn Method Based on Sociaall MMeeddiiaa

2014-03-22 05:51:41

ZTE Communications 2014年1期

(1.ZTE Corporation，Nanjing 210012，China;

2.National Computer network Emergency Response technical Team，Beijing100029，China)

Hong Chen1,Shengmei Luo1,Lei Hu1,and Xiuwen Wang2

(1.ZTE Corporation，Nanjing 210012，China;

2.National Computer network Emergency Response technical Team，Beijing100029，China)

User?analysis techniques are mainly used to recommend friends and information.This paper discusses the data charac?teristics of microblog users and describes a multidimensional user recommendation algorithm that takes into account micro?blog length，relativity between microblog and users，and famil?iarity between users.The experimental results show that this multidimensional algorithm is more accurate than a traditional recommendation algorithm.

social media;user recommendation;information recommenda?tion;relation analysis

1 Introduction

With the rapid development of the Internet，the manner of exchanging information has been changing.In the Web 1.0 era，information was propagated through simple static webpages; however，in the Web 2.0 era，information is published，propa?gated，and commented on dynamically through user relation?ship chains.Web 2.0 is like a visual society in which every us?er is able to freely share information，and the effects of interac?tion are getting more attention.Users are not only browsers but also producers of information.In the Web 1.0 era，websites were mainly news portals(e.g.Sohu.com，and Sina.com)and search engines(e.g.Google and Baidu).In the Web 2.0 era，websites are mainly online communities，online applications，social websites，blogs，and Wikis.

The most important difference between Web 1.0 and Web 2.0 is the central position of users and user relationships.As in the real world，every user on the Internet has their own inter?ests，which can be observed by looking at their online behav?iors，and some users even form common-interest groups.

This paper discusses certain characteristics of Sina Weibo microblog users in order to determine the interest relation?ships，and behaviors(e.g.publishing，forwarding，commenting，adding contacts)of these users.This can help them better es?tablish their own microblog relationship networks.

2 Related Works

Analyzing Weibo users includes analyzing their relation?ships，subjects talked about，interests，and ego-networks.The results of such analysis can be used for making recommenda?tions，advertising，or providing information.Much research has been done on social networking technologies both domestically and internationally.

In general，mainstream recommendation algorithms are based on collaborative filtering，content，and graph.

2.1 Collaborative Filtering

Recommendation algorithms based on collaborative filtering may be either user-based(UserCF)or item-based(ItemCF).

The main idea behind a user-based collaborative filtering al?gorithm is that things liked by the user’s friends might also be liked by the user themself.First，the algorithm finds a selec?tion of users who have similar interests to the target user. Then from this selection，it finds common items that can be rec?ommended to the target user.The main point of step one is to calculate the similarity between the interests of any two users. The collaborative filtering algorithm is mainly interested in the similarity between user behaviors.

ItemCF algorithms have been widely applied in industrial sector.The recommendation algorithms of Amazon，Netflix，Hulu，and YouTube are all based on ItemCF.The main idea behind ItemCF is that a user tends to prefer similar items. Therefore，the algorithm makes recommendations to the user according to past favorites.However，ItemCF does not use the content of items to calculate this similarity.The algorithm mainly analyzes the user’s behavior in order to calculate the similarity between items.The algorithm calculates the similari?ty between items and generates a recommendation list based on similarities and the user’s historical behaviors.

UserCF recommendations are based on hotspots containing small groups of users with similar interests.ItemCF recommen?dations are based on the user’s historical interests.UserCF recommendations are more social and reflect the popularity of items in a given small group of users.ItemCF recommenda?tions are more personal and reflect a user’s own historical in?terests.

2.2 Content-Based Recommendations

Content-based recommendation algorithms theoretically de?rive from information retrieval[1]，[2]and information filtering [3].They do not require a user to evaluate goods but rely on machine learning to extract the user’s interests from descrip?

tive content.The algorithm extracts the characteristics of items for which the user has given feedback and models the user’s interests，calculates the similarity between the user’s interests and items，and makes a recommendation.

Content-based recommendation algorithms make use of a user’s historical data，and the user’s interests may change in line with their preferences.The merits of this kind of algorithm are[4]，[5]:

·There are no cold-start or sparse matrix problems.

·Recommendations can be made to users with special inter?ests.

·New or unpopular items can be recommended.

·The contents of recommended items can be listed，and the results can be explained.

·Content-modeling technology is relatively mature.

The drawbacks of content-based recommendation algo?rithms are that they require content to be easily extracted into meaningful features;they require the content to be well-struc?tured;and the user’s interests must expressed in terms of the content’s characteristics.Also，it cannot draw on the judgment of other users.

2.3 Latent-Factor Model

The latent-factor model is used in both recommendation sys?tems and machine learning.It is an effective way of calculating the semantic distance for the purpose of classification and clus?tering.The main idea behind a latent-factor model is to link the user’s interests to items through latent factors.Latent-fac?tor technology has produced many well-known models and methods，including pLSA，LDA，latent class model，latent top?ic model，and matrix factorization.These are the same in es?sence，and some can be used in personalized recommendation systems.

3 User Recommendation

The main purpose of user recommendation is to recommend new friends to a target user according to the user’s existing friends and historical behavior.This increases the overall den?sity and activity of social networks.

In a social network，the user-recommendation algorithm is called link prediction.In[6]，the relationship between the pre?diction methods of various users’friends was studied.Here，we introduce a user recommendation algorithm based on social network diagram and content matching.

The main idea behind a content-based matching algorithm is that friends’content is similar/related to their personal attri?butes(e.g.company，school，labels，location，IP).

User recommendation based on social network graphs has been widely used so that friends of the target user’s friends may be recommended to the target user themself.Relation?ships on microblogs can be roughly categorized as following，followed，and mutual following.A suitable policy，therefore，is to recommend by followings or mutual followings.This method is used in Sina Weibo.However，recommendation by follow?ings is flawed in that celebrities are always on the top.Recom?mendation by mutual followings is much better because the number of users who mutually follow a public celebrity is small.

This paper describes a user-recommendation algorithm that can be used online.We combine the two methods previously discussed in order to reduce computational complexity and make the algorithm more suitable for online use.For example，a user may be following 500 other users.In order to make u friend recommendations，it is very important to choose candi?dates.

If the pool of candidates is all the people on the microblog website，then this algorithm becomes too complicated[7]，[8]. Using the conventional method，u’s followers and followers of followers are candidates.In this way，time complexity is great?ly reduced，but database operations still take a lot of time.In order to further reduce complexity，we propose a combined method shown in Algorithm 1.

Algorithm 1.A combined user-recommendation algorithm

4 Experiments

4.1 Data Description,Experimental Strategy and Evaluation Methods

We crawled the social networks and microblogs of 103 users

of Sina Weibo，in order to determine the effectiveness of our proposed algorithm.The data collected from these networks and microblogs included 30，609 users，807，374 relationships，and 2，503，458 microblogs.The data was acquired on May 1，2013.This data is the both the training data set and evaluation data set.

First，we took a selection of each user’s followers and used the recommendation algorithm to produce a recommendation list.Then，we compared this list with the selected followers.

We assume the seed node for the entire experimental data set is U，i.e.the set of experimental users is U，and we make recommendations from three sets of users:professional，ordi?nary，or all users.

We randomly remove half the users from the set of users Followee(u)that u is following.These deleted users comprise the setT(u)，and the remaining set is the training setR(u).

Using the recommendation algorithm，users inS(u)are sort?ed in descending order in order to get the recommendation list，also expressed asS(u).

We takeS(u)as a test result andT(u)as the standard an?swer.By comparing the difference betweenS(u)andT(u)，we can evaluate the recommendation algorithm’s performance. The first N-user subsets ofS(u)are denotedSn1(u).We use two evaluation indexes:precision rate and recall rate.The re?call rate of the recommendation is given by

The precision rate of the recommendation is given by

For a comprehensive evaluation Top-N recommendation precision and recall rate，various list lengths N are selected，the precision rate and recall rate are calculated，and a preci?sion/recall curve is drawn.In this experiment，N is taken as 5，10，15，20，25，30，35，40，45，50.

4.2 Results

The hypothesis of our experiment is that multidimensional recommendation is better than one-dimensional recommenda?tion.Users in the popular or expert set have more than 500，000 fans and are certificated.Users in the second set are ordi?nary users，and there are no popular or expert users in this set. Users in the third set are a mixture of popular and ordinary us?ers.We make recommend from the three sets and evaluate these recommendations using evaluation indexes.

4.2.1 Popular or Expert Users

We select popular or expert users who were removed from the list of followeesT(u)，also fromS(u)，and selected Top-N qualified users to recommend as a result recommendation list. Then，we calculated the precision and recall rate using the evaluation formula.The results are shown in Table 1.

From Table 1，the precision and recall rates are 0.163 and 0.253，respectively，when N=50.The precision and recall curves are shown in Fig.1.

4.2.2 Ordinary Users

We select qualified ordinary users who were removed from T(u)andS(u).Top-N qualified users were selected for incl?usion in a result recommendation list.Then，the precision and recall rate were calculated using the evaluation formula.The results are shown in Table 2.

From Table 2，the precision and recall rates are 0.359 and 0.072，respectively，when N=50.The precision and recall curves are shown in Fig.2.

4.2.3 All Users

Here，recommendation results are not divided into different sets.We mix popular users with ordinary users and calculate

the precision and recall rates using the evaluation formula. The results are shown in Table 3.

▼Table 1.Recall and precision rates of popular users

▲Figure 1.Precision and recall curves for popular or expert users.

From Table 3，the precision and recall rates are 0.329 and 0.058，respectively，when N=50.The precision and recall curves are shown in Fig.3.

4.2.4 Comparison

The precision and recall rates of the three Top-N recommen?dation strategies are shown in Fig.4.

From Fig.4，the precision rate when recommending an ordi?nary user is better than that when recommending a popular us?er or any user.The precision rate when recommending popularusers is lower than that when recommending ordinary users. The main reason for this is that the proportion of popular users on microblog websites is very small.The total number of users is about 500 million，but the number of popular users is less than 3 million.Also，people are usually already aware of popu?lar users before using a microblog websites.Therefore，they may search out and follow popular users through other chan?nels，such as a search engine.A specific user is also likely to follow a limited number of stars or experts and not easily follow

other popular users.Therefore，the precision rate when recom?mending popular users is relatively low.The main reason the precision rate when recommending any user is lower than when recommending an ordinary user is that popular users are included in the mix or all users.In Fig.4a and b，the higher the precision rate，the lower the recall rate.This is also clearly shown in the graph of the ratio of precision to recall for the three recommendation strategies(Fig.5).

▼Table 2.Precision and recall rates for ordinary users

▲Figure 2.Precision and recall curves for ordinary users.

▼Table 3.Precision and recall rates for all users

▲Figure 3.Precision and recall curves for all users.

▲Figure 4.(a)Recall rates and(b)precision rates for the three Top-N recommendation strategies.

▲Figure 5.Ratio of precision to recall for the three recommendation strategies.

The advantages of multidimensional recommendation are that it avoids deviation caused by popular users and is more compatible with user psychology.

5 Conclusion

Hundreds of millions of people use social media，represent?ed in this paper by Sina Weibo.As users publish，communi?cate and review，they generating a lot of content and the rela?tionship between them expands.All of social media is connect?ed by user entities whose offline behaviors gradually migrate online.

The algorithm in section 3 is effective for recommending new friends to a target user.The time and space complexity of the algorithm meets the needs of online applications.However，further work needs to be done on increasing the algorithm’s ac?curacy.

[1]F.Diaz，“Regularizing query-based retrieval scores，”Information Retrieval，vol. 10，no.6，pp.531-562，2007.doi:10.1007/s10791-007-9034-8.

[2]S.Clemencon，G.Lugosi，and N.Vayatis，“Ranking and empirical minimization of U-statistics，”The Annals of Statistic，vol.36，no.2，pp.844-874，2008.doi: 10.1214/009052607000000910.

[3]N.Belkin and B.Croft，“Information filtering and information retrieval，”Comm. ACM，vol.35，no.12，pp.29-37，1992.doi:10.1145/138859.138861.

[4]B.B.Ana，C.M.Enrique，C.B.Juan，R.L.Marta，A.M.F.Fernando and P.Ana，“A hybrid content-based and item-based collaborative filtering approach to rec?ommend TV programs enhanced with singular value decomposition，”Informa?tion Sciences，vol.180，no.22，pp.4290-4311，2010.

[5]M.Pazzani，“A framework for collaborative，content-based，and demographic fil?tering，”Artifical Intelligence Rev.，vol.13，no.5-6，pp.393-408，1999.doi: 10.1023/A:1006544522159.

[6]The Link Prediction Problem for Social Networks[Online].Available:http://www. cs.cornell.edu/home/kleinber/link-pred.pdf

[7]A.I.Schein，A.Popescul，L.H.Ungar，and D.M.Pennock，“Methods and metrics for cold-start recommendations，”in Proc.25thAnn.Int’lACM SIGIR Conf.，2002，pp.253-260.doi:10.1145/564376.564421.

[8]G.Adomavicius，R.Sankaranarayanan，S.Sen，A.Tuzhilin，“Incorporating con?textual information in recommender systems using a multidimensional ap?proach，”ACM Transactions on Information Systems，vol.23，no.1，pp.103-145，2005.doi:10.1145/1055709.1055714.

Manuscript Received:August 20，2013

Biograpphhiieess

Hong Chen(chen.hong3@zte.com.cn)received her BS degree from the Department of Information Engineering，Nangjing University of Posts and Telecommunications，in 2007.She currently works as a senior research engineer at ZTE Corporation.Her research interests incloud social network analysis and Intelligent Question Answer?ing.She holds four patents.

Shengmei Luo(luo.shengmei@zte.com.cn)received his MS degree in telecommuni?cation and electronics engineering from Harbin Institue of Technology，China，in 1996.He is a chief architect at ZTE Corporation.His research interests include cloud mobile Internet，cloud computing，and industrial application of planning.He has published 20 papers.

Lei Hu(hu.lei2@zte.com.cn)received his MS degree from the Laboratory of Intelli?gent Recognition and Image Processing，Beihang University，in 2008.He is a senior research engineer in the area of mass data analysis at ZTE Corporation.His re?search interests include data mining，information retrieval，and social network analy?sis.He holds three patents and has published four papers in these fields.

Xiuwen Wang(wxw@cert.org.cn)received her PhD degree in computer system ar?chitecture from the Institute of Computing Technology，Chinese Academy of Scienc?es，in 2010.She now works as a senior research engineer in the China National Com?puter Network Emergency Response Team(CNCERT).Her research interests in?clude cloud social network analysis and data analysis.

ZTE Communications2014年1期

ZTE Communications的其它文章: Design and Implementatiioonn of a Distributed Complexx
--Event Processing Engiinnee; Mobile Internet WebRRTTCC and Related Technologiieess; Anatomy of Connected Carrss; Networking?GPS:Cooperative Vehicle Localizatiioonn Using Commodity GPS in Urban Area; Trajectory?Based Data Forwarding Schemes forr Vehicular Netwoorrkkss; Advanced Leader Election for Virtual Traffic Lights