999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Modeling and Predicting of News Popularity in Social Media Sources

2019-11-07 03:12:10KemalAkyolandBahaen
Computers Materials&Continua 2019年10期

Kemal Akyol and Baha ?en

Abstract:The popularity of news,which conveys newsworthy events which occur during day to people,is substantially important for the spectator or audience.People interact with news website and share news links or their opinions.This study uses supervised learning based machine learning techniques in order to predict news popularity in social media sources.These techniques consist of basically two phrases:a)the training data is sent as input to the classifier algorithm,b)the performance of prelearned algorithm is tested on the testing data.And so,a knowledge discovery from the data is performed.In this context,firstly,twelve datasets from a set of data are obtained within the frame of four categories:Economic,Microsoft,Obama and Palestine.Second,news popularity prediction in social network services is carried out by utilizing Gradient Boosted Trees,Multi-Layer Perceptron and Random Forest learning algorithms.The prediction performances of all algorithms are examined by considering Mean Absolute Error,Root Mean Squared Error and the R-squared evaluation metrics.The results show that most of the models designed by using these algorithms are proved to be applicable for this subject.Consequently,a comprehensive study for the news prediction is presented,using different techniques,drawing conclusions about the performances of algorithms in this study.

Keywords:News popularity,sentiment scores,social network services,Gradient Boosted Machines,Multi-Layer Perceptron,Random Forest.

1 Introduction

News conveys newsworthy events occurring in the course of day to people.News popularity is substantially important so as to predict the spectator or audience for a particular news or journal in modern mining problems[Alswiti and Rodan(2017)].It is measured through people's interaction with news website.They share links of news or their opinions[Lerman and Ghosh(2010)].Further,social sharing websites and news websites are used in order to read the various news.Online news popularity examines diverse factors such as sharing count,commenting count and liking count etc.on social media.Online examination of news content,which is a large and still growing market for traditional printed media,has undergone major changes[Canneyt,Leroux,Dhoedt et al.(2018)].

Spread of news to large number of readers within a short period is very important for its popularity.Therefore,there exists a competition among different sources to produce content for a major subset of the population[Bandari,Asur and Huberman(2012)].Since user behaviors in social media are a reflection of event in the real world,researchers have discovered that they can use it to predict social media and for predictions about the future.Social media data provides an advantage of information acquisition which may be difficult to collect from relatively large acquisitions,large quantities and other sources of data.That is,news popularity can be measured by means of it[Lawrence,Chase,Kyle et al.(2017)].Evaluation of this subject is relatively novel for researchers.

Some of the studies addressed for this subject are as follows:Alswiti and Rodan examined the effectiveness of feature selection on popularity prediction,by using different features,classification models and attribute ranking models.According to their studies,Random Forest classifier accomplished the best accuracy for all features.J48 and AdaBoost classifiers showed variant sensitivities depending on feature selection[Alswiti and Rodan(2017)].Canneyt et al.presented a model to predict online news popularity.By analyzing the capture view patterns of online news,they introduced suitable models via well-chosen based functions.By means of actual news dataset,they showed that the combination of the content,meta-data,and the temporal behavior features lead to significantly improved predictions.Gradient Tree Boosting algorithm proves to be more successful for news popularity predicting in their studies[Canneyt,Leroux,Dhoedt et al.(2018)].Bandari et al.[Bandari,Asur,Huberman et al.(2012)]built a multi-dimensional feature space derived from attributes of articles and evaluated the effect of these features for online article popularity.By using both regression and classification algorithms,they obtained an overall 84% accuracy on Twitter despite randomness in human behavior.Fletcher and Park explored the influence of individual trust on sharing preferences and online news engagement behaviors in news media across eleven countries[Fletcher and Park(2017)].Anil and Indiramma discussed the importance of recommendation systems,which is useful to find interesting items,different methodologies and social factors[Anil and Indiramma(2015)].Kywe et al.aimed to analyze the massive information and the huge number of people interacted through Twitter system by utilizing taxonomy[Kywe,Lim and Zhu 2012)].Keneshloo et al.dealt with the subject popularity,and built models using metadata,content,temporal,and social features.The study was applied to a real data at the Washington Post[Keneshloo,Wang,Han et al.(2016)].Uddin et al.focused on online news popularity prediction based on sharing the news before publication by using the Gradient Boosting Machine algorithm[Uddin,Patwary,Ahsan et al.(2016)].Lee et al.[Lee,Moon and Salamatian(2012)]proposed a framework for modelling and predicting the online contents popularity based on survival analysis.The framework infers the likelihood for which the content will be popular.A model was introduced by using a lifetime of content and the comment count popular metrics with a set of explanatory factors.Kümpel et al.reviewed the scientific,peer-reviewed 461 articles quantitatively and qualitatively.The articles dealt with the relationship between news sharing and social medias from the year 2004 to 2014[Kümpel,Karnowski and Keyling(2015)].Tatar et al.introduced a valuable study based on user comments.They analyzed the ranking effectiveness of the prediction models online news ranking automatically[Tatar,Antoniadis,Amorim et al.(2014)].Fernandes et al.[Fernandes,Vinagre and Cortez(2015)]introduced a proactive intelligent decision support system in order to detect earlier popularity of news information.Random Forest classifier gave the 73% best accuracy on the 39,000 articles which were taken from the Mashable website.Wu and Shen identified the properties of news propagation by tracing the data on Twitter.They implemented a news popularity prediction model that can predict the final number of retweets of a news tweet very quickly by utilizing these characteristics[Wu and Shen(2015)].Liu and Zhang[Liu and Zhang(2017)]explored that the grammatical construction of titles may affect news popularity positively.They calculated a score of traditional category and author features using logarithmic conversion,and presented a novel methodology in order to predict online news popularity before publication.As it can be seen in these studies,diversified features as input data are used for regression or classification approaches.This study handles out sentiment scores(title and headline),and the number of views in 2 days by interval 20 minutes of news,and presents the news popularity prediction models in social media sources by utilizing the Gradient Boosted Machines(GBM),Multi-Layer Perceptron(MLP)and Random Forest(RF)machine learning algorithms.These algorithms are used in many research areas like medicine,social media and other daily life areas.

The main focus of this study is to carry out the modeling and predicting of news popularity in social media sources.In this context,this study consists of two modules.The first one is to apply the data pre-processing techniques on all datasets.The second one is to demonstrate the performance of boosting,neural networks and ensemble learning based machine learning algorithms.In this context,machine learning algorithms are implemented on the datasets and their performances are discussed in our study.

The rest of the paper is organized as follows.Section 2 presents the materials and methods.Section 3 gives experimental study and results.Finally,the paper ends with conclusions in Section 4.

2 Material and methods

2.1 Data

A set of the data consists of news items and their respective social feedback on multiple platforms:Facebook,Google+and Linkedln.This set is collected from public end-points of the social media sources that are already anonymized and aggregated by the data owners.News data file concerns the description of news items and consists of 93239 instances and each news item is described by 11 attributes,which are explained in Tab.1.The data descriptors are based on information obtained by querying the official media sources Google News and Yahoo News[Moniz and Tongo(2018)].

A set of data files so called Feedbacks is concerned with the evolution of news items'popularity in the social media sources,Facebook,Google+and LinkedIn.News was collected during a two-year period,from January 7,2013 to January 7 2015,for each of the four categories,Economy,Microsoft,Obama and Palestine.News popularity is measured as the number of views 2 days by interval 20 minutes upon publication simultaneously.This set is composed of 12 data files,for all combinations of these categories and social media sources.

Table 1:Descriptions of attributes in news data file

The dataset,which includes enormous data,is a pre-processed and re-structured by discarding the instances which include N/A(null)value(s)from datasets.After preprocessing steps,the number of news in these categories is presented in Tab.2.

Table 2:The number of instances in social media sources

2.2 Methods

In this study,modeling and prediction of news popularity in social media sources is performed by using GBM,MLP and RF which are among the popular evolutionary algorithms and experimental results were compared.

Briefly,GBM conducts new models in repeatedly during learning to better predict the target variable.The goal is to create new basic learning models that will have maximum correlation with the negative gradient of the loss function associated with the whole ensemble[Friedman(2001)].

Then she took her little oil-lamp, and went into her little room, drew off her fur cloak, and washed off the soot from her face and hands, so that her beauty shone forth9, and it was as if one sunbeam after another were coming out of a black cloud

The back-propagated MLP is feed-forward networks updating the weights based on differences between the predicted and actual values for the target variable.The main idea is to minimize the mean square error between the actual and predicted values iteratively[Alpaydin(2010)].

The RF introduced by Breiman is an ensemble learning algorithm created by random decision trees.The main difference of this algorithm from the decision tree is that the RF investigates the best attribute during the division of node while Decision tree investigates the best feature among the random subsets.Therefore,this algorithm gives better results considering better modeling[Breiman(2001)].Internal parameters of algorithms and their values were assigned as given in Tab.3.

Table 3:Internal parameters for algorithms

3 Experiments and results

The proposed study consists of two main modules:data processing and machine learning.The first module carries out the prepared steps mentioned Pseudo Code 1 for machine learning module.In addition to the original data retrieved from the social media sources,the pre-processed dataset consists of the sentiment scores information of both the title and headline of the news items.Therefore,the pre-processed datasets are described by 147 attributes(2 sentiment values,title and headline,144 measurements and outcome variable,the new items' popularity).Flowchart of the proposed study is introduced in Fig.1.

Figure 1:The flow chart of the study

The information of attributes for these datasets is presented in Tab.4.All data collection and processing procedures mentioned in these steps are implemented in Python 2.7 on Anaconda platform.

Table 4:The information of attributes for these datasets

The second module,news popularity prediction,receives the processed data and splits it into training and test sets in order to evaluate the performance of prediction models,GBM,MLP and RF.This module steps mentioned Pseudo Code 2 are executed on ‘Knime'platform by integrated Python programming imported from the ‘protobuf' library.Python codes could run in a node on this platform.The ‘numpy' and ‘pandas' libraries are benefited during the build-up of both modules for practicing of the enormous data.

In our study,the performances of the models are evaluated using measures such as Mean Absolute Error(MAE),Root Mean Squared Error(RMSE)and the R-squared coefficient(R2)to consider how well they are for predictions that match the actual results.These metrics are given by the following equations respectively.

MAE and RMSE metrics are based on statistical summaries of ei(i=1,2,...,n).ei=Pi-Oiis described as individual model prediction error usually.n is the number of data instances,Piand Oiare the predicted and observed values respectively[Willmott and Matsuura(2005)].

where y is the observed response variable,its mean andthe corresponding predicted values.R2coefficient measures the degree of variation in the target variable.This coefficient is a value between 0 and 1,where 1 equates to a perfect fit of the model[Alexander,Tropsha and Winkler(2015)].

This study focuses on the analysis for the attributes of news data in social media sources and evaluates the performances of RF,GBM and MLP algorithms for news popularity prediction.%70 of data is used as a training set randomly,and remain is considered as the test set.Therefore,firstly the models are trained using the training sets and then tested on the test sets.R2,MAE and RMSE measures are used so as to evaluate the performances of the models in all experiments.Tabs.5-8 compares the performance of the models obtained according to Pseudo Code 2 algorithm on the datasets.This module also indicates that sentiment scores of news,and final value of the news items' popularity highly are influential in order to predict news popularity.Sentiment score also known as opinion mining is a field of text mining which examines people' opinions,judgments and ideas about entities[Liu and Zhang(2012)].Theqdap Rpackage[Rinker(2013)]is used in order to obtain this score.

Tab.5 shows the performances of the models on social media sources for Economy dataset.As shown in this table;

a)All algorithms have satisfactory performance on Facebook source for Economy dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with MLP based model on this source.

b)All algorithms have satisfactory performance on Google+source for Economy dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with RF based model on this source.

c)All algorithms have satisfactory performance on Linkedln source for Economy dataset.Further,MAE measure is same for all models.The maximum R2and minimum RMSE measures are obtained with RF based model on this source.

Table 5:The performances of the models for Economy dataset

Table 6:The performances of the models for Microsoft dataset

Tab.6 shows the performances of the models on social media sources for Microsoft dataset.As shown in this table;

a)All algorithms have satisfactory performance on Facebook source for Microsoft dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with RF based model on this source.

b)All algorithms have satisfactory performance on Google+source for Microsoft dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with MLP based model on this source.

c)All algorithms have satisfactory performance on Linkedln source for Microsoft dataset.Further,MAE measure is same for all models.The maximum R2and minimum RMSE measures are obtained with MLP based model on this source.

Tab.7 shows the performances of the models on social media sources for Obama dataset.As shown in this table;all algorithms have satisfactory performance on Facebook,Google+and Linkedln sources for Obama dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with RF based model on for all sources.

Table 7:The performances of the models for Obama dataset

Table 8:The performances of the models for Palestine dataset

Tab.8 shows the performances of the models on social media sources for Palestine dataset.As shown in this table;

a)All algorithms have satisfactory performance on Facebook source for Palestine dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with MLP based model on this source.

b)All algorithms have satisfactory performance on Google+source for Palestine dataset.Further,MAE measures are same for all models.The maximum R2and minimum RMSE measures are obtained with RF based model on this source.

c)All algorithms have satisfactory performance on Linkedln source for Palestine dataset.Further,MAE measure is same for all models.The maximum R2and minimum RMSE measures are obtained with MLP based model on this source.

Since the datasets used in this study were newly released in February 2018,there is no published study that uses these datasets.But the studies were performed on other datasets based on machine learning because this subject is popular.For this reason,sample studies on the use of machine learning for different datasets are presented in Tab.9.

Table 9:Sample of studies performed on different datasets

4 Conclusion

News conveys newsworthy events which occur during day to people.News popularity is measured through people's interaction with news website or social media platforms.They cast in their opinions or news links.The scientists use the social media data since it is the reflection of user behaviors in the real world.This study uses a set of the data consisting of news items and their popularity in the social media sources:Facebook,Google+and LinkedIn.It is composed of 12 data files,for all combinations of the Economy,Microsoft,Obama and Palestine categories,and the social media sources.The study consists of two phrases which are the preparation of the data and the design of prediction models.The pre-processed datasets are described by 147 attributes(2 sentiment values,title and headline,144 measurements of popularity in 20-minute intervals for a total of 2 days and outcome variable,the new items' popularity).The prediction models designed by utilizing GBM,MLP and RF learning algorithms are introduced for twelve datasets and empirical tests are performed.The success of most models for each dataset is approximately same.Further,this study will provide a beneficial reference for news popularity prediction.

Acknowledgement:The authors would like to thank the Fernandes et al.[Fernandes,Vinagre and Cortez(2015)]for providing the datasets.

主站蜘蛛池模板: 精品国产黑色丝袜高跟鞋| 亚洲欧美另类日本| 无码综合天天久久综合网| 中国国语毛片免费观看视频| 欧美成人精品一区二区 | 免费一级毛片在线播放傲雪网| 国产精品手机在线观看你懂的| 午夜激情婷婷| 国产后式a一视频| 波多野结衣在线se| 日本高清免费不卡视频| 国产chinese男男gay视频网| 成人午夜视频免费看欧美| 一本大道无码高清| 久久精品丝袜| 欧美特黄一级大黄录像| 国产亚洲视频中文字幕视频| 欧洲日本亚洲中文字幕| 中文毛片无遮挡播放免费| 国产农村精品一级毛片视频| 国产综合精品日本亚洲777| 日本成人精品视频| 996免费视频国产在线播放| 欧美精品亚洲精品日韩专区va| 国产情侣一区二区三区| 片在线无码观看| 国产成人无码综合亚洲日韩不卡| 二级特黄绝大片免费视频大片| 国产欧美日韩专区发布| 精品国产成人高清在线| 久久99精品久久久久纯品| 伊人久久久大香线蕉综合直播| 欧美啪啪一区| 久久综合九色综合97婷婷| 国产男女XX00免费观看| 国产精品区视频中文字幕| 国产无码精品在线| 中文字幕 日韩 欧美| 狼友视频国产精品首页| 欧美日韩中文字幕在线| 国产精品粉嫩| 久久99精品久久久久久不卡| 亚洲av无码片一区二区三区| 伊人福利视频| 中文字幕亚洲无线码一区女同| 污视频日本| 99久久精品国产麻豆婷婷| 国产成人精品一区二区| 9cao视频精品| 美女一级毛片无遮挡内谢| 一级毛片免费观看不卡视频| 91无码人妻精品一区| 欧美日韩第三页| 无码av免费不卡在线观看| 久精品色妇丰满人妻| 亚洲成a人片在线观看88| 亚洲精品麻豆| 美女啪啪无遮挡| 好紧太爽了视频免费无码| 久久久久国产精品嫩草影院| 99久久精品美女高潮喷水| 欧美中文字幕第一页线路一| 国产欧美高清| 久久国产精品无码hdav| 二级特黄绝大片免费视频大片| 欧美精品色视频| 国产精品深爱在线| 最新加勒比隔壁人妻| 91www在线观看| 欧美午夜精品| 中文纯内无码H| 噜噜噜久久| 久久亚洲天堂| 视频国产精品丝袜第一页| yjizz视频最新网站在线| 亚洲三级影院| 日韩在线视频网| 国产黄在线观看| 亚洲h视频在线| 蜜臀AVWWW国产天堂| 国产黄在线观看| 熟妇无码人妻|