• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Research on identity recognition of English mail author based on writing style

    2019-10-21 16:16徐坤豪
    大東方 2019年9期
    關(guān)鍵詞:單位

    徐坤豪

    Abstract:The content of the email is often very short,but the style of language is obvious.Therefore,we think the ideal in the sample case,part of the text style can be used to identify the author of the text.We use a short word mail in proportion,word species accounted for ratio,the average length of words,the mean and variance of lexical density and the maximum number of single use ratio as characteristic value,principal component analysis of these features,the final extract two principal components,which reflect the word density and vocabulary does not repeat,and then to the two principal components were used as independent variables and the dependent variables,the authors make different scatter diagram,found that these scattered point map has certain rules,can reflect the differences between the various authors,so we use the BP neural network model identification,to extract principal components as input features,with a four bit binary number As the authors number,each author selects a certain number of mail to train.We find that when the learning rate is 0.01 and the hidden layer is 50,the test output is the best,and the correct rate of identification is 87.5%.

    Key words:text feature;principal component analysis;scatter diagram;BP neural network pattern identification ? identification

    I.Problem Analysis and Model Establishment

    1.1 SPSS principal component analysis

    The eigenvalues of the extracted are input into the SPSS,and the principal component analysis is used to reduce the dimension of the feature set.

    It can be seen intuitively that there is a correlation between the variables,but it needs to be tested,and then the output is the correlation test:After the Bartlett sphericity test,the P value <0.001.combines two indexes,which shows the correlation between the variables,and can be analyzed by factor.we can see that the eigenvalues of components 1 and 2 are greater than 1,and they can explain 79.773% variance,which is pretty good.Therefore,we can extract 1 and 2 as principal components,and seize the main contradiction.

    The eight picture the abscissa represents 2 main components,namely “the average sentence length recognition ability of the author”;the ordinate represents the principal component 1,namely “the proportion of total words for identifying the author through different words ability;relationship between each figure represent each author of the two kinds of ability;through SPSS we can see that these two kinds of ability of each author has some relations and differences obviously.Therefore,we can put these two components as input parameters of BP neural network training,and then identify the authors of the text.

    1.2 The solution of neural network

    We have two main components extracted as the input of neural network,as a four bit binary number to express the authors name was S,so the choice of logarithmic function as the transfer function of output neurons.Through repeated testing,to determine the learning rate is 0.01,the maximum number of iterations for 10000 times,the hidden layer 50 layer.

    After executing a large number of neural network algorithms,we found that among the eight selected authors,seven were basically identified.The accuracy rate reached 87.5%.We could think that this model could identify the author of the mail.We chose two distributed scatter diagrams as follows:

    II.Conclusions

    The lexical structure out of the model can reflect the characteristics of different authors in a certain extent,this paper proposes the method of vocabulary and structure established identification based on the identity of the mail author is effective.Through principal component analysis,plot analysis,we conclude that the lexical features we selected can be used to different authors,the recognition rate can reach 87.5%.in the process of training the BP neural network,we found that for the final accuracy of the test result the greatest impact is the number of hidden layers,visible and hidden layers is determined accurately BP neural network training is the key factor,followed by BP network learning rate will affect the learning effect.

    III.References

    [1]RuiHua Qi.Research on the identification of text authors[M].Beijing:Tsinghua University press,2017;

    [2]Shuying Zhang、Ye Zhang.Implementation of pattern recognition and intelligent computing -Matlab Technology[M].Beijing:Electronic Industry Press,2015:138-191;

    [3]G.U.Yule,The statistical study of literary vocabulary, Cambridge University Press,(1944);

    [4]J.Moody and J.Utans, Architecture Selection Strategies for Neural Networks Application to Corporate Bond Rating, Neural Networks in the Capital Markets, (1995);

    (作者單位:山東理工大學(xué))

    猜你喜歡
    單位
    Study on Quality Monitoring of Cross-sea Steel Frame Bridge Based on Mechanical Performance Analysis
    Ethical analysis of legal system of self defense in socialtransformation
    Research on the Application of Artistic Conception in Contemporary Photography
    Humor,Teachers’First Assistantin English Teaching in Secondary Vocational School
    User Experience and Interaction Design in Industrial Design
    Research on Humanized Design Concept in Industrial Design
    A View of Biblical Archetypes in Othello
    苗木供求信息
    Construction and Safety of Data Communication Network of Electric Power Information System
    Research and Application of TCA Advanced Technology in Reservoir Development of Ordovician Carbonate Reservoir
    建湖县| 定兴县| 南江县| 宜川县| 东丰县| 寻乌县| 浪卡子县| 巴林右旗| 时尚| 巩义市| 海南省| 仙游县| 安义县| 青冈县| 河津市| 大荔县| 普安县| 广东省| 湟中县| 丰台区| 西平县| 漳平市| 商丘市| 高邮市| 东港市| 天门市| 通榆县| 万盛区| 莱州市| 普兰店市| 玛曲县| 都江堰市| 海门市| 建宁县| 克拉玛依市| 南部县| 黄骅市| 安吉县| 吉安县| 海盐县| 黑河市|