999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Research on identity recognition of English mail author based on writing style

2019-10-21 16:16:15徐坤豪
大東方 2019年9期

徐坤豪

Abstract:The content of the email is often very short,but the style of language is obvious.Therefore,we think the ideal in the sample case,part of the text style can be used to identify the author of the text.We use a short word mail in proportion,word species accounted for ratio,the average length of words,the mean and variance of lexical density and the maximum number of single use ratio as characteristic value,principal component analysis of these features,the final extract two principal components,which reflect the word density and vocabulary does not repeat,and then to the two principal components were used as independent variables and the dependent variables,the authors make different scatter diagram,found that these scattered point map has certain rules,can reflect the differences between the various authors,so we use the BP neural network model identification,to extract principal components as input features,with a four bit binary number As the authors number,each author selects a certain number of mail to train.We find that when the learning rate is 0.01 and the hidden layer is 50,the test output is the best,and the correct rate of identification is 87.5%.

Key words:text feature;principal component analysis;scatter diagram;BP neural network pattern identification ? identification

I.Problem Analysis and Model Establishment

1.1 SPSS principal component analysis

The eigenvalues of the extracted are input into the SPSS,and the principal component analysis is used to reduce the dimension of the feature set.

It can be seen intuitively that there is a correlation between the variables,but it needs to be tested,and then the output is the correlation test:After the Bartlett sphericity test,the P value <0.001.combines two indexes,which shows the correlation between the variables,and can be analyzed by factor.we can see that the eigenvalues of components 1 and 2 are greater than 1,and they can explain 79.773% variance,which is pretty good.Therefore,we can extract 1 and 2 as principal components,and seize the main contradiction.

The eight picture the abscissa represents 2 main components,namely “the average sentence length recognition ability of the author”;the ordinate represents the principal component 1,namely “the proportion of total words for identifying the author through different words ability;relationship between each figure represent each author of the two kinds of ability;through SPSS we can see that these two kinds of ability of each author has some relations and differences obviously.Therefore,we can put these two components as input parameters of BP neural network training,and then identify the authors of the text.

1.2 The solution of neural network

We have two main components extracted as the input of neural network,as a four bit binary number to express the authors name was S,so the choice of logarithmic function as the transfer function of output neurons.Through repeated testing,to determine the learning rate is 0.01,the maximum number of iterations for 10000 times,the hidden layer 50 layer.

After executing a large number of neural network algorithms,we found that among the eight selected authors,seven were basically identified.The accuracy rate reached 87.5%.We could think that this model could identify the author of the mail.We chose two distributed scatter diagrams as follows:

II.Conclusions

The lexical structure out of the model can reflect the characteristics of different authors in a certain extent,this paper proposes the method of vocabulary and structure established identification based on the identity of the mail author is effective.Through principal component analysis,plot analysis,we conclude that the lexical features we selected can be used to different authors,the recognition rate can reach 87.5%.in the process of training the BP neural network,we found that for the final accuracy of the test result the greatest impact is the number of hidden layers,visible and hidden layers is determined accurately BP neural network training is the key factor,followed by BP network learning rate will affect the learning effect.

III.References

[1]RuiHua Qi.Research on the identification of text authors[M].Beijing:Tsinghua University press,2017;

[2]Shuying Zhang、Ye Zhang.Implementation of pattern recognition and intelligent computing -Matlab Technology[M].Beijing:Electronic Industry Press,2015:138-191;

[3]G.U.Yule,The statistical study of literary vocabulary, Cambridge University Press,(1944);

[4]J.Moody and J.Utans, Architecture Selection Strategies for Neural Networks Application to Corporate Bond Rating, Neural Networks in the Capital Markets, (1995);

(作者單位:山東理工大學)

主站蜘蛛池模板: 精品无码人妻一区二区| 狂欢视频在线观看不卡| 国产内射一区亚洲| 国产国模一区二区三区四区| 国产成人亚洲综合A∨在线播放| 激情亚洲天堂| 国产精品自拍合集| 97国产精品视频人人做人人爱| 国产成人久久777777| 国产免费怡红院视频| 国产亚洲视频中文字幕视频| 亚洲国产中文欧美在线人成大黄瓜| 国产精品 欧美激情 在线播放| 在线欧美日韩国产| 岛国精品一区免费视频在线观看 | 亚洲综合九九| 精品国产自在在线在线观看| 九色综合视频网| 久久青草热| 免费99精品国产自在现线| 日韩精品成人网页视频在线| 99在线视频免费| 91成人在线观看| 高清免费毛片| 亚洲精品国产精品乱码不卞| 一级一级一片免费| 免费看美女毛片| 色噜噜综合网| 91在线国内在线播放老师| hezyo加勒比一区二区三区| 色综合手机在线| 国产特级毛片aaaaaa| 国产精品视频系列专区| 欧美在线精品怡红院| 一区二区自拍| 无遮挡一级毛片呦女视频| 国产午夜人做人免费视频中文| 美女被躁出白浆视频播放| 美女无遮挡被啪啪到高潮免费| 国产成人资源| 国产区免费| 久久人妻xunleige无码| 伊人久久综在合线亚洲91| 亚洲黄色激情网站| 久久国产V一级毛多内射| 日本久久免费| 国产日韩精品一区在线不卡| 幺女国产一级毛片| 精品国产自| 亚洲综合极品香蕉久久网| 国产成人无码久久久久毛片| 欧美亚洲国产日韩电影在线| 69av免费视频| 国产欧美日韩免费| 亚洲人成影院午夜网站| 国产精品不卡永久免费| 亚洲精品天堂在线观看| 色综合久久久久8天国| 天堂久久久久久中文字幕| 夜色爽爽影院18禁妓女影院| 亚洲床戏一区| 夜精品a一区二区三区| 香蕉在线视频网站| 女同国产精品一区二区| 欧美亚洲另类在线观看| 人妻熟妇日韩AV在线播放| 日韩视频免费| 性色在线视频精品| 自拍偷拍一区| 欧美综合区自拍亚洲综合绿色| 欧美日本在线| 亚洲日韩精品综合在线一区二区| 日韩毛片视频| 亚洲一区网站| 中文字幕在线欧美| 97精品久久久大香线焦| 亚洲人成色77777在线观看| 1024你懂的国产精品| 九九这里只有精品视频| 中文字幕啪啪| 无码内射在线| 久视频免费精品6|