Comparative study for machine learning classifier recommendation to predict political affiliation based on online reviews

2021-10-19 01:19:34HayatUllahBashirAhmadIqraSanaAnumSattarAurangzebKhanSaimaAkbarMuhammadZubairAsghar

CAAI Transactions on Intelligence Technology 2021年3期

1Department of Computer Science, Qurtuba University of Science and Technology,Dera Ismail Khan,Khyber Pakhtunkhwa,Pakistan

2Institute of Computing and Information Technology,Gomal University,Dera Ismail Khan,Khyber Pakhtunkhwa,Pakistan

3Deptartment of Computer Science, University of Science and Technology,Bannu,Khyber Pakhtunkhwa,Pakistan

Abstract In the current era of social media,different platforms such as Twitter and Facebook have frequently been used by leaders and the followers of political parties to participate in political events, campaigns, and elections. The acquisition, analysis, and presentation of such content have received considerable attention from opinion‐mining researchers. For this purpose,different supervised and unsupervised techniques have been used.However,they have produced less efficient results, which need to be improved by incorporating additional classifiers with the extended data sets. The authors investigate different supervised machine learning classifiers for classifying the political affiliations of users. For this purpose, a data set of political reviews is acquired from Twitter and annotated with different polarity classes. After pre‐processing, different machine learning classifiers like K‐nearest neighbor, na?ve Bayes, support vector machine, extreme gradient boosting,and others, are applied. Experimental results illustrate that support vector machine and extreme gradient boosting have shown promising results for predicting political affiliations.

1 | INTRODUCTION

The evolution of web 2.0 has brought about revolutionary changes in the user‐generated contents. Among various changes that it brought, the most significant is written texts,which are deemed as an important medium for the expression of opinion. Currently, opinion resources like discussion forums, review sites, social media, and blogs have gained currency.These contents can greatly help in obtaining information about the attitudes of people towards diverse socio‐political issues, commercial phenomena, products or services, news,and many other topics. The process of sorting out opinions from the Internet is called opinion mining [1, 2].

There are different ways to use opinion mining techniques for public welfare. For example, in marketing, it helps a firm to judge the effectiveness of publicity and understand about the popularity of a product. With the help of this method, a company can have a much clearer vision than with the use of traditional methods like interviews and other outdated surveys [3]. Recently, social media platforms, such as Twitter, have become amongst the most popular platforms used by public and political leaders and parties to express sentiments regarding political events, campaigns, and elections [3].

1.1 | Research study motivation

Different techniques, including supervised and unsupervised methods, have been applied to predict election results [4].

The basic limitation of the supervised machine techniques used for political affiliation prediction is the lack of sufficient labelled data sets and limited applicability of such techniques.Earlier studies [4–6] have used limited data sets and a limited number of machine classifiers for political affiliation prediction. Brout et al. [5] used supervised machine learning classifiers with a limited number of features for classifying the political alignment of users.The major limitation of their work is the less efficient results, which the author here aim to address by incorporating additional classifiers in addition to support vector machine (SVM).

1.2 | Novelty of the work

The baseline study [4] used a few classifiers with a small‐scale data set. The major limitation of their work was the use of a few classifiers on a data set of small size, which yielded poor performance. The proposed work is an enhancement of the work performed by Juneja and Ojha [4] by using multiple classifiers,experimented on an extended data set.Moreover,an increased number of joint label reviews, divided into multiple classes, are used.

The proposed work is novel in terms of the efficient prediction of political affiliation results on the data sets acquired from Twitter using supervised machine learning algorithms.This will enable researchers to develop more advanced applications for election prediction based on Twitter feedback.

1.3 | Problem formulation

The problem under investigation is to classify tweets containing political affiliation by taking this task as a multi‐class classification problem and using a training and testing data set. The aim is to classify the new tweet as belonging to a particular political party.

Herein, the authors classify and predict the public sentiments that have an affiliation with a particular political party using different ML classifiers and recommend the classifier with the best results. The performance results of different machine classifiers are compared with respect to the data sets pertaining to different political parties.

The proposed study aims at predicting the political affiliation from the public feedback (reviews/tweets) by applying different supervised machine learning techniques, and recommending the classifier with the best results.

1.4 | Research questions

(i) How can a comparison of different machine learning classifiers be performed on the data set of political review for the prediction of political affiliations?

(ii) What is the efficiency of multiple ML classifiers to predict political affiliations from online political reviews?

(iii) What is the efficiency of the proposed classifier with respect to the baseline method?

1.5 | Research contributions

The major contributions of the proposed work are as follows:

(i) Acquiring and pre‐processing a political affiliation data set

(ii) Classifying users' feedback with respect to their political affiliations using supervised machine learning techniques

(iii) Comparing the results of machine learning classifiers and recommending the best classifier

Section 2 presents related work; research methodology is presented in Section 3; Section 4 gives the results and discussion;and the final section provides the conclusion to the study with some future directions.

2 | RELATED WORK

In this section, a review of selected studies conducted on the prediction of political affiliations and election results is presented.

The work performed by Sahu and Nanda [7] aims to predict the election results for Indian political parties from Twitter data.They applied python tools to predict the positive and negative tweets. After the evaluation of positive and negative classes for each political party, it is clear that most of the voters had a positive attitude for Bharatiya Janata Party and it was predicted that Bharatiya Janata Party would win the Lok Sabha general election 2019.

Khan et al. [8] performed sentiment classification of user reviews using a supervised learning technique with the perspective of comparative opinion mining. They collected user reviews of different products from YouTube about iOS versus Android,Facebook versus Twitter,and Microsoft versus Google. The data set of reviews included about 6000 reviews.Different machine learning classifiers like decision tree (DT),na?ve Bayes (NB), SVM, K‐nearest neighbor (KNN), random forest (RF), logistic regression (LR), and extreme gradient boosting (XG Boost) are applied on the data set through an Anaconda‐based Python framework. Different evaluation measures like F1‐score, precision, accuracy and recall, are applied. Using 3000 reviews, the DT is giving the best performance;and on 6000 reviews,the RF classifier gives the best result among others.

The work performed by Imran et al. [9] aimed at the quantitative prediction of personality from Facebook posts.The data were collected from different Facebook accounts.They applied different classifiers, namely SVM, NB, and DT using Weka application. They checked the neuroticism, agreeableness, conscientiousness, and openness of Facebook users.Firstly,they collected data and then pre‐processed it,and after that applied machine learning classifiers. The SVM and DT yielded the best results in machine learning among these classifiers.

To predict the Delhi Corporation Elections results,Juneja and Ojha [4] implemented logistic regression classifier, NB classifier, Bernoulli NB classifier, SGDC, NuSVC,SVC, and linear SVC. Python and NLTK platform were used to implement the proposed model. The pre‐processing stage was also applied to achieve better results. Among other classifiers, multinomial NB yielded better results in terms of 78% accuracy. As a future enhancement, a web‐based interface with multi‐lingual support, and increased number of classification categories, could improve the system's performance.

To predict US presidential elections, Wicaksono [10] proposed a system, based on a binary multinomial NB classifier with sentiment aggregation support. The results obtained are promising in terms of accurate prediction. The proposed method is generic and can be applied in multiple domains,however, for better performance, the applicability of other classifiers needs to be investigated.

The work performed by Sharma and Moh [6] aimed at predicting an Indian election using a Twitter‐based sentiment analysis technique by collecting more than 42,000 tweets.After applying basic pre‐processing steps, classification was performed using dictionary‐based and supervised machine learning techniques, namely NB and SVM. The performance evaluation results illustrated that an accuracy of 68% was achieved by the dictionary‐based approach, whereas NB and SVM attained 62% and 78%, respectively. Furthermore, the investigation of other classifiers, such as regression and random forest, can be exploited for better results.

The work performs by [11] aimed at the Twitter‐based trend prediction of election results. During the election period from 15 March 2014, to 12 May 2014, tweets were collected and a sentiment analysis technique was applied to predict the election results.It was observed that the coefficient of Tweets was statistically significant and positive for all models. Even after incorporating nationality/regional factors into the analysis, the results remain consistent. The study was affected by defects due to small sample sizes.

The work performed by Kalmegh [12] carried out a comparative evaluation of RandomTree, REPTree and Simple Cart in the Indian news using the Weka platform.Around 649 news items were collected and further processed into the term frequency matrix.The output of the NLP process was used to correlate the news (video, textual, audio) to the relevant e‐learning contents. As a result, it is found that the Random-Tree algorithm performance was the best. The overall performance of the simple Cart and REPTree algorithm was not acceptable, because both algorithms are able to classify only politics news correctly.

Mejova et al. [13] proposed a supervised machine learning classification system by applying the Tuned SVMlight classifier for the prediction of sentiments expressed by political candidates.The main emphasis was on classifying tweets in terms of liberal sentiments.Different pre‐processing steps to stop word removal, such as tokenisation, etc. were used. Overall,satisfactory results were obtained. However, further enhancement can be made by predicting whether the candidate is liberal or anti‐republican.

The work performed by Boutet et al. [5], aimed at collecting tweets, related to the 2010 UK general election. The resulting data set collects nearly 1,150,000 tweets pertaining to more than 220,000 users. They used three classifiers, namely volume classifier, retweet classifier, and SVM classifier. The empirical results showed that the accuracy of the proposed classification technique was 86%.

Conover et al. [14] proposed a machine learning‐based system on the eve of the US mid‐term elections in 2010. To discriminate between users in the ‘right’ and ‘left’ classes, they used linear SVMs and applied latent semantic analysis(LSA)of hash tags.Full text of user's tweets was trained by SVM with a 91% accuracy prediction of political afflictions yielded. After this, 506 users were labelled by human annotators, indicating‘right’ political alignment, 373 users were marked as ‘left’, and 77 were placed in the ‘ambiguous’ category.

The work performed by Ellen and Parameswaran [15]classified the online posts by the author's group affiliation.The two data sets that attempt to classify consist of real‐world data,discussing current issues pertaining to Palestinian/Israeli dialogue.Supervised classification algorithms,namely SVM and k‐NN algorithm, were used. The greater improvement of k‐NN is an important indication that by following a better methodology to combine features it will enable SVMs to capitalise on the added features.

An election prediction system for user's opinions, published on an election prediction site http://www.electionprediction.org/,was propose by Kim and Hovy[16].They used the SVM classifier to predict opinions, outperforming all the baseline methods. Previously, researchers in opinion analysis mostly focused on the judgement opinions, which expressed positive or negative sentiment about a topic. In the future, a model for productivity opinion in another domain,such as the real estate market and stock markets, will be proposed. Table 1 presents a review of selected studies.

3 | RESEARCH METHODOLOGY

Different modules are used in the proposed system: (i) data collection and annotation (assigning party names), (ii) pre‐processing, (iii) apply different classifier of machine learning,(iv)comparing the performance of the multiple classifiers,and finally (v) recommending the best classifier.

3.1 | Data set collection and assigning party names

For compiling data sets, the data compilation module is applied, where the reviews are collected from Twitter. It is used to filter out noisy data and gives input to the noise reduction module. Data set detailed information is given in Table 2. For data set collection, the authors used onlinepolitical reviews and identified their affiliation with a political party of Pakistan. Data were collected from social networking site Twitter. The data used herein are publicly available data and no personal information was collected.The data set consists of 2018 tweets. The sentences are tagged with political parties names, which are: PMLN, PTI,PPP and Others. In other classes, there are different parties data, including MQM, TLP, ANP, MQMP, JUI, JUIF, PSP,BAP, PkMAP, JIP, and BRP.

T A B L E 1 Review of selected studies

T A B L E 2 Details of the proposed data set

The acquired data set is split into two parts, testing (20%)and training (80%), used for experimental purposes.

Basically, convention among the research community is to take 20–25 sets of data for testing purposes.For large data sets,25%data was taken for testing to obtain good accuracy;80:20 is also referred to as the Pareto principle [17].

3.1.1 | Training data

Three professional political analysts were asked to manually assign sentiment tags in the training data set. Manual annotations performed by these annotators for every comment produced three votes, and comments with a majority vote are tagged with that polarity. A satisfactory inter‐annotator agreement of {84.2%} with 0.86 Kappa (K)score was obtained, which is very satisfactory [15], andshows a degree of agreement between the human annotators. The training data set consists of 80% (D:1615) tweets.Training data set samples are present in Table 3.

T A B L E 3 Tweets sets of data set ‘training’

3.1.2 | Testing data

Testing was performed on the remaining 20% of the data set.Test data are used to evaluate the model and the testing data set is given after the model is trained on the training data set. In other words,test data are given in the classifier while testing a classifier in order to assess whether the classifier is working satisfactorily or otherwise [18].

Using different techniques, the data set can be split into testing and training, namely, (i) random split, (ii) cross‐validation, and (iii) cross‐validation using hold [19], summarised as follows:

(i)Random split:In this method,a data set is split within the test as well as in the train according to a certain proportion, for example, splitting at 20:80, we select a random sample.The random segmentation process is more robust than other ways, because the data set is further correctly divided [19]. A 20:80 split was performed by randomly dividing the data set for testing and training.

(ii)Cross‐validation:During this method,a data set is broken down within the testing and training slot in different aspects by randomly choosing subsets, that is, a data set is repeatedly partitioned into testing and training.Moreover,the data set is divided within two halves,that is,validation as well as training [19].

(iii)Cross‐validation using hold‐out: The data set is divided into three segments with different sizes, that is, training,validation, and testing. The training stage aims at the training model and through validation, the performance of the model is validated and overtraining is avoided when training is over,if the performance is good enough at the validation stage. [20]. Testing data set specimens are present in Table 4.

A data set was created in an MS Excel file.In Tables 3 and 4, samples of tweets for training along with testing set are available. Political tweets collected from Twitter are presented herein. The testing, as well as training data, shown in Table 5 was pre‐processed before it passed on to the ML classifier[21].

Algorithm 1 displays the steps needed to split a data set into training and testing.

3.2 | Pre‐processing

The various pre‐processing steps are applied on the collected tweets.These include hashtags removal,tokenisation,and extra notable characters[3].Algorithm 2 shows pre‐processing tasks using Anaconda framework and implemented in Python‐based coding [22].

The pre‐processing steps are listed [3] as follows:

3.2.1 | Tokenisation

In tokenisation, the words are converted into a small piece of content. To do this, the NLTK tokeniser in used on Python‐environment,3.2.2 Stop Word Removal: The top words play no role in the detection of sentiment words,so such words are systematically eliminated by using a pre‐assembled list. An example of such words includes, ‘is’, ‘a’, ‘the’, etc.

T A B L E 4 Review sets from the data set (testing)

In Algorithm 2, pre‐processing tasks are presented.

Algorithm 2 Pre‐processing steps.

3.3 | Apply machine learning classifications

The obtained data set is divided into testing and training. In the next stage, the political tweets are classified into different categories: PTI, PMLN, PPP, and others, using different supervised machine learning classifiers, namely SVM, k‐NNs, decision tree, XG boost, and logistics regression by applying Anaconda‐based Python framework.The Anaconda framework was used because it is an open source and friendly user for Python. Anaconda provides pre‐installed 100 packages. Labelled tweets (data sets) determine the efficiency as well as the performance of every classifier separately from various machine‐learning classifiers. Performances of common works for machine learning classifiers are shown in Figure 1.

3.4 | Feature engineering

During machine learning algorithm implementation, the following steps are applied: (i) feature vector, (ii) term frequency (TF), and (iii) TF and inverse documents frequency(IDF).

(i) Feature vector:This converts tweets into the token count matrices. [23]

(ii) TF:In given tweets,the number of occurrences is counted by this measure [24]

(iii) TF and IDF: In a given data set, inverse document frequency and term frequency computation are used to display the significance of words [23]

3.5 | Na?ve Bayes

This classifier is related to the probability classifier family, on the basis of Bayes theorem.It is supervised in nature and used for regression as well as classification. In most cases, good results are obtained when the NB classifier is applied on small and large data sets and it is most suitable when the input features are high.

where,P(Y|X) is aYprobability givenX;P(X|Y) is a probability ofX(attributes) given classY;P(X) is anXprior probability; andP(Y) is a prior probability of a response(target) variable.

3.6 | Random forest

On the basis of hyperparameter tuning, RF is more flexible than other classifiers. It gives valuable and efficient results most of the time. Random forests are used for regression and classification tasks.The result of RF is the outcome from each DT. RF with more decision tress results in better generalisations [25]. The mathematical formula is as follows:

T A B L E 5 Tweets earlier as well as later cleaning

F I G U R E 1 Machine learning classifiers general workflow

wherezis the number of the specimen with replacement,wis the training instancesy, andwz is the classification tree or training.

3.7 | Support vector machine

For classifications, SVM is the supervised ML algorithm.It is used for linear and non‐linear problems. By arranging data in different classes, the SVM operates by finding the line, and the hyperplane splits the data set into categories.

The basic concept of linear SVM with respect to sentiment classification is to decide the hyperplane by dividing the corpus[25]. Its numerical formula is as shown below:

where,T= data set of tweets,ywill belong to the value ofz,indicating whether the element has a relationship with that class.

3.8 | Logistic regression

Based on the training and testing data sets, the goal of LR is to classify the tweets into different polarity categories. It predicts the sentiment class for the tweets. LR is the quickest predictive classifier that can make a better generalisation by avoiding overfitting. On a new data set, its performance is really good [26]. Its numerical formula is:

where,xrepresents a constant, and the remaining represents equation boundary function.

3.9 | K‐nearest neighbor

K‐nearest neighbor classifier in accordance with example‐oriented learning is used to solve regression as well as classification problems. In the industry, it is most widely used for classification problems. It is a dull learner, inn‐dimensional slots, as all the instances of the training data set are stored. By using majority votes of its K‐neighbors, it classifies new cases [26]. The numerical formula is:

where,biani‐th matter of specimens as well asqbeing the prediction consequence.

3.10 | Extreme gradient boosting

The gradient boosting framework provides a basis for the XG Boost classifiers[27].It gives promising results in a distributed environment like SGE, MPI, and Hadoop. The mathematical formulation is given as:

The above equation is explained as:f(a)is a function that is to be minimalised, whereFis a positive constant,nis a total number of points, andzdenotes the function.

3.11 | Decision tree

It is normally used for text classification and other problems and is a supervised learning technique. In this technique, instances are classified,based on the sorting of feature values.In the DT, a node represents the features, which are to be classified, while the value of the characteristic represents the branch. The start‐up of classification takes place by sorting feature values from the root node. A divide and conquers approach is used in the tree construction [26]. The numerical formula is:

where,zis subset,ydenotes root node, andadenotes the leaves of the trees.

3.12 | How the suggested method works

The supervised learning‐based opinion‐mining method for political reviews starts by entering tweets, pre‐processing steps, and applying and classifying them with different polarity categories, namely, PMLN, PTI, PPP, and Others.The data set is divided within testing (20%) as well as training (80%). The polarity label and review tag is given to the classifier during the training phase. When the training phase of the classifier is completed, the result of the machine learning classifier is observed by the evaluation of the remaining test data. The results obtained are evaluated using different metrics like recall, f‐score, accuracy, and precision.

The working steps of the suggested method are given in Algorithm 3.

Algorithm 3 Working steps of the proposed system

4 | RESULTS AND DISCUSSION

This section deals with answering research questions by describing experiments.

T A B L E 6 Parameter setting for KNN

4.1 | Answer to RQ 1: ‘How can a comparison of different machine learning classifiers be performed on a data set of political review for the prediction of political affiliations?’

To answer research Q1, the authors used multiple ML classifiers on a data set for evaluating various classifiers' performances. A detailed working procedure used for applying each classifier, with a set of recommended parameters (KNN classifier),is described in Table 6.For comparing the efficiency of different classifiers on the acquired data set, the authors applied different efficiency evaluation measures: (i) F‐1 score,(ii)accuracy,(iii)precision,and(iv)recall,described as follows.

(i) Accuracy: For a total number of observations, the rate of accurately predicted observations is known as accuracy.Mathematically, it is formulated as follows:

TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative.

(ii)Precision: Positive prediction value, which measures the exactness of the provided model, is known as precision.For a few false‐positive specifications,precision is high.A mathematical formulation is presented as follows:

TP = True Positive and FP = False Positive.

(iii)Recall: This measures the confident cases which are accurately classified by the model, also called sensitivity.High recall depicts that the number of positive instances,misclassified as negative, is less. A mathematical formulation is presented as follows:

R=Recall,TP=True Positive,and FN=False Negative.

(iv)F1‐score: The mean value of recall and precision is an F‐score or F1‐measure. Mathematically, it is computed as follows:

R=Recall,P=Precision,TP=True Positive,FP=False Positive, and FN = False Negative.

Tables 7 and 8 give parameter settings for XGBoost and SVM classifiers, respectively.

4.2 | Answer to RQ.2:‘What is the efficiency of multiple ML classifiers to predicting political affiliations from online political reviews?’

To find an answer for research Q.2,multiple ML classifiers are implemented on the data set, as detailed below.

Experiment#1:The experiment is executed on the data set containing 2018 tweets about ‘political affliction’ with stop words. Different ML classifiers, like SVM, KNN, XGBoost,RF,DT,LR, and NB,are applied and the results are shown in Table 9. The authors used different metrics including recall,F1‐score, precision, and accuracy. It is clear that the performance of the support vector machine is better from other machine learning classifiers in the form of applying evaluations metric with recall (92%), F1‐measures (92%), accuracy(92.08%), and precision (92%).

Experiment #2: The authors performed experimentation‘without stop words’, applied on the data set, containing 2018 tweets about ‘political affliction’. After applying multiple ML classifiers,the experiential results obtained are shown in Table 10. Different machine learning classifiers like NB, SVM,XGBoost, NB, LR, DT, and RF, are implemented and the results are evaluated using different measures like recall, F1‐score, precision, and accuracy. The authors achieved the performance results in the form of recall (93%), F1‐measures(93%), accuracy (92.82%), and precision (93%). It is clear that the performance of the XGBOOST classifier is better from other machine learning classifiers.

Conclusion: Tables 9 and 10 are now compared. In Table 9,the results of stop words are inserted and in Table 10,the results are entered without stop words. The accuracy result is better in the ‘without stop words’ table. In Table 9, the accuracy result of SVM is 92.08 and in Table 10, it is 92.33. The accuracy result of XG boost classifier in Table 9 is 91.34 and in Table 10, it is 92.84. Support vector machine is the best classifier in Table 9 (with stop words) and in Table 10 (without stop words), XG boost classifier is best. The K‐neighbors classifier has shown the worst performance in Table 9 (with Stop words) and as well as in Table 10 (without stop words).

The accuracy of K‐Neighbors classifiers in Table 9 is 71.29%and in Table 10, the accuracy is 77.48%. The authors used TFIDF feature engineering technique for machine learning classifiers and also performed experimentation on another feature technique,namely embedding‐based feature using deep learning model Bilstm and obtained a recall(84%),F1‐measure(84%), accuracy (84%), and precision (84%).

T A B L E 7 Parameter settings of extreme gradient boosting classifier without stop word

T A B L E 8 Parameter setting of support vector machine Classifier without stop words

T A B L E 9 Experimental results of multiple ML classifier (with stop words)

T A B L E 10 Experimental results of multiple ML classifier (without stop words)

4.2.1 | Best performance classifiers with stop words

The results presented in Table 9 depict that SVM performed well when stop words were taken into account, achieving the best results with reference to recall, accuracy, f‐score, and precision. The following are the possible justifications supported by the literature:

(i)Most text categorisation problems are linearly separable:The textual data used in this work is categorised into different labels,based on the class tags used in the training data set. Such data are linearly separable, which results in the best performance of SVM classifier [24](ii)High dimensionality of input space: Whenever using learning text classifiers, a lot of (more than 10,000) functions must be handled. It is not necessary that it relies on the number of characteristics because the SVM classifier used overfitting protection and the SVM classifier has an ability to deal with such large feature spaces[24].The same applies as a case in this data set, where the dimensional space is more than 9000 which is why SVM gives the best performance for classification on this data set

4.2.2 | Best performance classifier without stop words

In the experiments described herein (Table 10), the XGBoost classifier performed better other classifiers in the case of‘without stop words’. The following is a possible reason. It is reported by Ren et al. [28] that through GB algorithm XGBoost is a fast implementation, which is the cause of the fast speed and high accuracy. In this data set, XGBoost also gave the best accuracy when stop words were removed.

4.2.3 | Worst performance classifiers with stop words

The results shown in Tables 9 and 10 depict that the KNN exhibited the worst performance in the case of ‘with stop words’ as well as ‘without stop words’. The main reason for the worst accuracy of KNN is as follows: the KNN accuracy is low because the determination of the new data class is based on a simple voting majority system, in which most voting systems ignore the proximity between data, which is unacceptable when the distance of each nearest neighbour is greatly different from the distance from the test data [29].This is the main reason why KNN classifiers gives a low accuracy in this work.

4.2.4 | Classifier with worst accuracy

KNN: The cause of the low accuracy of the KNN is the determination of new data classes, which is based on a simple vote majority system, where the majority vote system ignores the closeness between data, which is unacceptable when the distance of each nearest neighbour differs greatly from the distance of the test data [29].

The next classifier which showed low performance is the NB due to the following reasons:(i)a large data set is required to give a good result for NB classifiers and,on a small data set,the performance is poor[30].In the proposed system,there is only one data set of a couple of thousands of Tweets, that is,2018 tweets in a data set.The size of the data set is small due to NB giving poor results.(ii)Due to the multi‐class data set,the NB classifiers produced the worst result, as on a multiple classes data set it is not an efficient predictor [30]. There are multiple (four) classes in the proposed study, which is why,during classification, NB produced poor performance.

There are certain other parameters, which contribute significantly to the performance of an ML classifier,such as the number of classes,data set size,number of examples in a data set, and percentage of dividing of the data sets within testing and training and structure of the data set [31].

4.2.5 | Recommending the classifiers with performances

In Table 9, the presented results show that SVM, and in Table 10, XGBoost performance, yielded the best performances when compared with other machine learning classifiers,and so these two are recommended.

4.2.6 | Results of cross‐validation for different classifiers

The authors applied 10‐fold cross‐validation for performing experiments on different classifiers. The results reported in Table 11 indicate the values of mean accuracy, standard validation of accuracy, mean precision macro, standard validation of precision macro, mean recall macro, standard recall macro,mean F‐1 macro, and standard F‐1 macro.

4.3 | Answer to RQ.3:‘What is the efficiency of the proposed classifier with respect to the baseline method?’

The answer of RQ3 is given by comparing the efficiency of the recommended classifier with baseline studies.

T A B L E 11 Cross‐validation of different classifiers

4.3.1 | Comparison with baseline methods

Juneja and Ojha[4]: This work was performed by Juneja and Ojha [4] to predict Delhi Corporation election results by applying NB,SVC,NuSVC,linear SVC,and MNB.MNB gives the best (80%) accuracy (Table 12).

Sharma and Moh[6]:The work performed by Sharma and Moh [6] aimed at predicting the Indian election results using Twitter‐based analysis. They obtained 78% accuracy by SVM(Table 12).

Boutet et al. [5]: The work performed by Boutet et al. [5]aimed at collecting tweets related to the 2010 UK general election. They used three classifiers, namely (i) volume classifier, (ii) retweet classifier, and (iii) SVM classifier. They achieved an accuracy of 86%through Bayesian‐volume(Table 12)when compared with baselines results.

4.3.2 | Proposed work (with and without stop words)

SVM has yielded the best performance with stop words and the XGBoost classifier has shown good results without stop words. The results are presented in Table 12 in terms of F‐score, precision, accuracy, and recall.

4.3.3 | Statistical analysis

Consider two models M1 (SVM) and M2 (XGBoost), which are evaluated on an independent data set. LetNdenote the number of records. The error rate for SVM is depicted by e1 and e2 and shows an error rate for XGBoost. The main goal is to verify that the observational difference is statistically significant between e1 and e2 [32]. It is formulated as follows:

The variances of error ratio are: e1 (1–e1)∕n and e2 (1–e2)∕n. (1–∝)∕% is the confidence level, which shows the confidence interval fordt, which is given by the following equation:

In the above equation, values for accuracy, error rate,and accuracy difference of stop words and without stop words performance are placed on the results of SVM and XGBoost classifiers. The analysis results are depicted in Table 13.

Upper Level =0.117205 and Lower Level=0.82795

In the aforementioned computations,the authors applied a two‐sided test for checking whetherdt= 0 ordt≠0. After inserting the value in the aforementioned equation, the confidence interval is obtained fordtat the 95%confidence level.Since the internal spans values are zero, the difference can be claimed to be not statistically significant at a confidence level of 95%. The authors’ result of upper case is 0.117,205 and thelower case is 0.082,795.The internal spans values are not zero or less than zero,which is why it can be easily claimed that the difference is statistically significant.Table 14 shows the analysis results.

T A B L E 12 Compared with baseline results

T A B L E 13 Analysis results with stop words

T A B L E 14 Without stop words

Upper Level=0.13714 and Lower Level =0.628598

In aforementioned computations for stop words, the authors applied two‐sided test for checking whetherdt= 0 ordt≠0.After inserting the value in the equation,the confidence interval was obtained fordtat the 95%confidence level.Since the internal spans value is zero,then it can be claimed that the difference is not statistically significant at a confidence level of 95%. The obtained result of upper case is 0.13,714 and lower case is 0.0,628,598.The internal span values are not zero or less than zero which is why it can be easily claimed that the difference is statistically significant.

5 | CONCLUSION AND FUTURE WORK

This task involves a comparison of multiple ML classifiers for political reviews based on performance. The following tasks have been performed: (i) preparation of a data set of political affiliation from Twitter,(ii)cleaned up tweets through applying different pre‐processing steps, (iii) used multiple machine learning classifiers on the data set, (iv) result of classifiers applied on the data set are compared,and(v)the classifier with the best result is recommended.

The proposed system works on classifying the tweets into multiple classes:‘PTI,PMLN,PPP,and Others’.Different ML classifiers are applied in this study, such as XGBoost, SVM,DT,NB,KNN,and RF.Tweets are classified into four multiple political classes. Performing evaluation is based on different performance assessment measures, that is, accuracy, F‐measures, precision, and recall. The obtained results illustrate that,with stop words,the support vector machine classifier has given the best results(A:92.08,F‐score:92, P:92, and R: 92),and the extreme gradient boosting classifier, without stop words,gives the best result(A:92.82,F‐score:93,P:93,and R:93),while the KNNs classifier gives the worst performance as compared to the other classifiers.

5.1 | Limitations

1. Data set segmentation into testing and training splits is performed using a random splitting method

2. It has been observed that the performance of ML classifiers for classifying tweets is reduced by increasing the polarity classes (i.e. four classes) in the given data set [33]

3. Data set size used in this study is limited(2018),a large data set should be applied [34]

5.1.1 | Future works

1. Other data set splitting techniques like stratified sampling,hold out cross‐validation,and some others could be applied to assess the performance of ML classifiers

2. Decreasing the number of class labels assigned to each review in the data set can assist in the resulting improvement

3. A data set with a large data size can assist in conducting the experiments to obtain better results

4. It is necessary to investigate why SVM gives the best performance with stop words,and after removing stop words,XGBoost classifiers give better performance than the other classifiers

ORCID

Muhammad Zubair Asgharhttps://orcid.org/0000-0003-3320-2074

CAAI Transactions on Intelligence Technology2021年3期

CAAI Transactions on Intelligence Technology的其它文章: Relating brain structure images to personality characteristics using 3D convolution neural network; Consistent image processing based on co‐saliency; Accurate detection method of pig's temperature based on non‐point source thermal infrared image; Performance evaluation of deep neural networks for forecasting time‐series with multiple structural breaks and high volatility; Contrast of multi‐resolution analysis approach to transhumeral phantom motion decoding; Optimized viewport‐adaptive 360‐degree video streaming