• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    An Abstractive Summarization Technique with Variable Length Keywords as per Document Diversity

    2021-12-16 06:38:26MuhammadYahyaSaeedMuhammadAwaisMuhammadYounasMuhammadArifShahAtifKhanIrfanUddinandMarwanMahmoud
    Computers Materials&Continua 2021年3期

    Muhammad Yahya Saeed,Muhammad Awais, Muhammad Younas,Muhammad Arif Shah,Atif Khan, M.Irfan Uddin and Marwan Mahmoud

    1Department of Software Engineering, Government College University, Faisalabad,Faisalabad, 38000, Pakistan

    2Department of IT and Computer Science, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur,Pakistan

    3Department of Computer Science, Islamia College Peshawar, Peshawar, Pakistan

    4Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan

    5Faculty of Applied Studies, King Abdulaziz University, Jeddah, Saudi Arabia

    Abstract: Text Summarization is an essential area in text mining,which has procedures for text extraction.In natural language processing, text summarization maps the documents to a representative set of descriptive words.Therefore, the objective of text extraction is to attain reduced expressive contents from the text documents.Text summarization has two main areas such as abstractive, and extractive summarization.Extractive text summarization has further two approaches, in which the first approach applies the sentence score algorithm,and the second approach follows the word embedding principles.All such text extractions have limitations in providing the basic theme of the underlying documents.In this paper, we have employed text summarization by TF-IDF with PageRank keywords, sentence score algorithm, and Word2Vec word embedding.The study compared these forms of the text summarizations with the actual text,by calculating cosine similarities.Furthermore, TF-IDF based PageRank keywords are extracted from the other two extractive summarizations.An intersection over these three types of TD-IDF keywords to generate the more representative set of keywords for each text document is performed.This technique generates variable-length keywords as per document diversity instead of selecting fixedlength keywords for each document.This form of abstractive summarization improves metadata similarity to the original text compared to all other forms of summarized text.It also solves the issue of deciding the number of representative keywords for a specific text document.To evaluate the technique,the study used a sample of more than eighteen hundred text documents.The abstractive summarization follows the principles of deep learning to create uniform similarity of extracted words with actual text and all other forms of text summarization.The proposed technique provides a stable measure of similarity as compared to existing forms of text summarization.

    Keywords: Metadata;page rank;sentence score;word2vec;cosine similarity

    1 Introduction

    It is challenging to process unstructured text documents without getting some prior idea about them,in the form of metadata.By applying text summarization techniques,the cost of text processing decreases,as the text mining algorithm utilizes only those documents which mainly relate to the text queries.Although these text queries rely solely on the metadata, the query results comprise over actual text documents[1,2].Assessing multiple text documents is a time-consuming and challenging task.Therefore, metadata extraction techniques have a vital role in Text Mining[3].The extractive summarization gives actual lines but does not contain the whole theme of the written text.Keywords extraction techniques extract crucial words but generally do not verify the significance of these words over actual or reduced extracted text[4,5].Ultimately the efficiency of these information extraction techniques either decreases or increases as per content & context vulnerabilities of the underlying text [6].Less efficient text assessment ultimately lacks to represent the real theme of long text, and it has chances to waste the applied effort during the query processing [7].

    There are two broader ways of keywords-based metadata abstractive summarization.The first is single document keywords extraction,and the other is making a dictionary of keywords for multiple text documents[8].Query processing utilizes the individual and collective metadata mainly in three different manners to assess a similar set of documents instead process the vast bulk of text documents.In the first method, the query matching process marks the related set of documents with single document metadata.In the second method, we match the individual document metadata with each other and group them as per metadata similarity criteria.This step facilitates to form a dictionary of unique keywords for each set of related documents.In this technique, a piece of text can fall into multiple subgroups.To improve the deficiencies of these two methods, we apply the clustering technique.In this method, the text clustering technique creates unique clusters from the keywords-based metadata.These clusters represent a specific group of documents [8,9].The metadata processing fundamentally depends on the quality of the underlying abstractive text summarization technique.

    In the second technique,the reduced text contains those potential lines of the text,which may represent the actual context of the long paragraph.There are two widely applied techniques for sentence extraction.In the first technique,we initially identify unique words and then calculate the words occurrence frequencies.We use these frequencies to assign sentence scores to the various lines of the text document.Ultimately,high score lines become visible and extracted to represent the actual text.The process applied in this sort of text extraction mainly follows the Sentence Score Algorithm(SSA)[10,11].In the second method,we follow the word embedding principles referred to as the Word2Vec model.In this model,instead of calculating the terms occurrence frequencies, the frequencies of their mutual occurrences play the leading role, e.g., bread &butter, tea & sugar.In the current paper, we have applied Python language package Gensim for word embedding [11,12].By using this package, we have applied the Gensim Based Word2Vec Algorithm(GW2VA)to create text summaries.

    Natural Language Processing (NLP) has numerous real-life applications that require text mining in identifying the text necessary [3,7].In these applications, some issues regarding metadata processing exist like length, type, diversity [9,13,14,15].Our study jointly applies multiple summarizations to address the following problems.

    Improved Parallel Corpus Analysis:We have created an enriched corpus besides just creating the keyword-based assessment of the text.This technique has a single time corpus creation mechanism, with multiple benefits.The first benefit is the option of cross similarity verification and ease of switching between numerous forms of metadata.This corpus creation technique involves a one-time effort and reduces the query processing cost with better accuracy.Secondly, it relies on four main corpus processing areas, combined under one method and these are single Document Summarization (SDS), Multiple Document Summarization (MDS),abstractive summarization, and extractive summarization.

    How Many Keywords Are Sufficient to Extract:The previous approaches of text extraction mostly extract the fixed-length TF-IDF based PRK from a paragraph.Usually,there exist no specific rules to fix the number of keywords from a long article[7,11,16].We have designed an approach which gives the variablelength keywords for each paragraph.

    Applicable to Previous Studies:This study applies to all those studies which have used a keywords-based metadata approach in text mining.We have presented the comparative results in the Section 5 of this paper.

    Unique Features:The presented study is different from other keywords extraction studies.The previous studies do not target the information diversity in the multiple documents of a given corpus [5,14,17].The current technique follows the principles of deep learning.It interrelates the number of keywords of a paragraph with the diversity of its information, i.e., For more diverse text, our technique generates more keyword, and for the less unalike document, the keywords are also fewer.This keyword extracting technique attempts to assure that every paragraph has an equally capable set of keywords to present the actual theme of the article.We have discussed the multiple applied techniques in Section 3 of this paper.We have also explained every step of our technique in Section 4.

    This section gives an overview of the presented work,scope and need of the paper.Rest of the paper is organized as follows:Section 2 presents the related work.Section 4 detailed description about implemented steps,with the summary.Section 5 step by step briefing of the proposed technique with tables and graphs,with the summary.Section 6 conclusion of the entire work.

    2 Related Work

    Qaiser and Ali explored the use of TF-IDF and examined the relevance of keywords in the text documents.Their research focused on how this algorithm process several unstructured documents.First,the algorithm designed to implement TF-IDF then to test the results of the algorithm with the strengths and weaknesses of the TD-IDF algorithm.They discussed the way to fix the flaws of text relevance and their effects with future directions of research.They suggested that the TF-IDF algorithm is simple to implement, but it has limitations.In today's big data world, text mining requires new data processing techniques.e.g., they discussed a variant of TF-IDF applied at the inter-language level using statistical translation.They proved that genetic algorithms improve TF-IDF.They stated about the search engine giants Google had adapted these algorithms of PageRank to display the relevant results when a user places a query.TF-IDF can work with other methods such as Naive Bayes to get better results.

    Roul et al.performed a classification by using the document contents and its reference structure.Their proposed model combined the PageRank with TF-IDF.Their results improved the document relevance to its PageRank keywords.They showed the idea to fix the reference structure based on the similarity of the document and proved that the proposed classification method is promising and guide to autoclassification.For this, they developed a classification model and a series of page types that recreate the link structure.By doing this, they combined the advantages of TF-IDF with the PageRank algorithm.This work has three goals.First, it gives the direction of the proposed classification model to allows us to understand the effectiveness of this approach.Secondly, it makes a comparison with other known standard classification models.Third, query processed by a combination of different parameters.

    Ganiger and Rajashekharaiah focused the automatic text reduction area of text mining.They divided summaries into short-text passages and phrases.The primary purpose of their work was to obtain a concise and informative text from the source document.They used standard algorithms for extracting keywords and applied TF-IDF as a baseline algorithm.In their work, they performed the training for keyword extraction algorithms.Their keyword extraction algorithm consisted of multiple parts to compare and evaluate the accuracy and completeness of the applied algorithm.They showed that TF-IDF is the main algorithm for generating suitable keywords.

    The work of Yi-ran and Meng-Xin based on a words network extraction method that ignores useless characters.They created a network that does not include all existing keywords.They also proved that the classical algorithms belonging to this area contribute more complexity.In their work, they proposed a keywords extraction method based on the PageRank algorithm.Their proposed algorithm created weights for the common word and divided these words by the corresponding value of the word frequency.By determining the position of the weighting factor, the importance of each word shown.

    Pan et al.worked over keyword extraction and used it in the keyword’s assessment methodology.They focused on the issue of how to use keywords quickly and accurately in text processing.They showed that there exist many options to use keywords in many ways to improve the accuracy and flexibility of the text extraction process.In their work, they extracted the words through the improved algorithm for the TextRank keywords based on the TF-IDF algorithm.They applied estimation algorithms to calculate the importance of words in the text-based results.They further based the execution of the TextRank algorithm on these results.Finally, they used these keywords to perform the deletion of unnecessary keywords.Their findings have shown that their method of extracting the keywords is more accurate than traditional TF-IDF and text classification methods.

    Li and Zhao proved that sentence score summaries are often less useful in email text filtration due to sending and receiving short character messages, as these become common compound keywords.They stated that besides using sentence scores, the classification algorithm based on graphs give better keywords assessment.They implemented graphs by building a vector of concepts and by measuring the similarity of the words.Finally, they constructed an array of keywords and extracted keywords by using the TextRank keyword.As compared to the traditional TextRank algorithm, their algorithm worked more effectively and provided the text extraction by the conventional TextRank and TF-IDF algorithm.

    Mahata et al.used an unsupervised approach that combines PageRank and neural network algorithm.In their method, they worked with text documents using embedded keywords.The main application of their model was to select a set of keywords relating to the defined similarity estimates.They used two related datasets to apply keyword ranking and to consider the text summary suitability.In their work, they tried to find the concepts in the content of the text document and assigned weights to the candidate words.Their proposed system based on a set of experimental data to evaluate all candidate PageRank keywords.They suggested their work for multimodal dataset to extract keywords relating to images and tagging them by automatic indexing.

    3 Description of the Applied Techniques

    The text assessment process applies various text mining techniques simultaneously.Some of these techniques have their role in almost every type of corpus processing, e.g., the text pre-processing is the first step [11,13].After this, the next text mining processes are word identifications, word types assessments, stemming, lemmatization, etc., [12,15].We have discussed these techniques in this section,and all of these relate to the current study.

    3.1 Text Pre-Processing

    Text Pre-Processing techniques cleanse the dataset by eliminating useless tokens.It is a widely accepted practice in all sorts of Natural Language Processing.(NLP)projects.This phase removes special symbols and all those elements which add noise to the text[7,13,15].There exists no specific definition of text-noise,but generally, anything which causes unnecessary text processing is noise, e.g., inconsistent data, duplication,wrong writings, etc., Besides removing the text elements, there are various other forms of the text processing like case folding.Case folding represents multiple text elements in a specific kind, e.g., it may represent proper nouns as capital words and remaining text as small letters, etc., This sort of text processing makes the terms recognition easy for the text engineer and search engine[14,16,17].

    3.2 Tokenization

    The critical purpose of tokenization is dividing the sentence into parts, which termed as tokens, and chucking away many words like punctuation, etc., Tokenization is the technique of distributing the data into the form of words, symbols, or phrases [4,17,18].The straightforward drive of tokenization is to classify strong word arguments.Some tasks necessity relates to tokenization like the nature of language,like, English and Chinese languages are different regarding the use of the white spaces.Similarly, there are certain words like compound words, slangs, words joined with symbols, etc., All these types of terms need unique tokenization routines in the process of text processing[13,17,19].

    3.3 Stop Words Removal

    NLP has many common words that have little significance in the semantic context,and we name them as stop words.There exist various lists of stop words in every text processing system[5,17,20].These words are essential for the reader as these words give the sense to the sentences.But these words are not always useful for text processing,and search engines do not use these words for the findings of the relevant results.These words are like ‘a(chǎn)nd,’ ‘or,’ this,’ is,’ etc., These words frequently occur in all text documents and create hurdles in text processing.Removal of these words exists as a typical text processing routine in the cases,where these words have no role in the underlying text mining [17,21,22].After removing these words from the text, the overhead of text processing also becomes reduced [21].

    3.4 Parts of Speech Tagging

    In Part of Speech(POS),we perform recognition of nouns,verbs,proverbs,objectives,etc.,we can also represent POS as POS-tagging.In this process, the NLP process creates the annotations of different types with their type’s identification [13,16,19].This process performs the grammatical tagging and helps to understand the connections of the sentence and their relations.This form of token labelling provides the enriched metadata for the text query processing[21,23,24].

    3.5 Stemming and Lemmatization

    During the Stemming process,we try to identify the word’s base or stem,and we remove the affixes.We use this step to replace the word, and by doing this, the specific term comes to its original root.There are various examples like ‘eat,’ ‘eating.’ By stemming, we reduce ‘eating’ to ‘eat’ as ‘eat’ is the stem.Any line referring to these words can be related to the context where the food items discussed [23,25].Stemming has a related process, which we call as Lemmatization or Lemming.This process assures that we reduce the words to their exact stems.If we have two words ‘care,’ ‘car’ then we cannot convert‘care’to ‘car,’as ‘car’is not stem to the word‘care’ and so on[15,22].

    3.6 Term Frequency-Inverse Document Frequency

    Cleansed text after the preprocessing serves as an input to perform the text summarization process.We calculate TF-IDF for a text corpus,and it provides numerical value,which shows the importance of the word to a document.TF-IDF values increase as per the word’s repetition proportionally for a given document.The frequency of the words in the text corpus balance this value [14,27].This technique has wide existence of its use over the internet as it differentiates the common and uncommon words in the text corpus.Term Frequency(TF)provides the raw frequency of the word in the document.Inverse Document Frequency(IDF)helps to assess the word, whether the word is common in the document or uncommon, etc., First, we calculate the total documents.Then we calculate the documents containing the word concerned.We divide these numbers to assess documents ratio containing the term[15,28].Following is the formula of TF-IDF.

    where tri= Term tr in Document d and

    N =Total number of Corpus documents:N = |D|

    3.7 Page Rank Keywords

    We use the Page Rank(PR)algorithm for processing the text documents,which we link or hyperlink like the web pages.This algorithm works on the principle of assigning each paragraph a numerical value in such a way that this value implies the importance of a document as compared to the other materials[11,17,29].This algorithm works satisfactorily for all those documents which have reciprocal links.We use the PR value for the given page’s significance indication.The PR value bis on two parameters,i.e.,the total number of PR pages and the calculated values of the PR for these pages[7,13,30].Consider the pages W,X,Y have a link to the page Z.If we want to calculate the PR value for Z,then it will take the sum of all PR values of W,X,and Y.

    3.8 Sentence Scoring

    Sentence scoring is the process that assigns the score to the paragraph lines.This score basis on the frequency of the words in the given paragraph.This process has multiple variants over the Internet.Sentence scoring relies either on predefined keywords or post analyzed keywords.In the case of predefined words, lines containing these specified words have a higher score than other lines [22,31].There exists customized search engines or software applications which highlight the query-specific words in these query-specific lines.We may extract these lines, as per the requirement.In a second way, after performing the document analysis, the high-frequency words facilitate selecting the crucial lines of the text.This latter method of sentence scoring has a common application as the extractive summarization method[28,32].

    3.9 Word Embedding

    We apply the Word2Vec model for the context assessment,and we train this model over the textual data.The fundamental objective of this model is mapping the sentence words as hooked on a small dimensional vector space.This model identifies the terms of a close relation.It maintains the distance of these words closer and smaller by keeping view of the context or meanings of these words [15,28].We train this model by artificial intelligence, and we use the neural networks over the vector words, in predicting the document context.The outcome of this processing depends on generating similar context words in close vectors [14,27].We obtain word clusters, which are more probable to occur together simultaneously.This model keeps improving over time as it is time-variant, i.e., the words which were occurring together in the last decade are not necessarily closer to each other in the current decade[17,29].

    3.10 Cosine Similarity

    We use the Cosine Similarity(CS)to measure the resemblance among the various documents.It endorses maximum probable articles of interest for the user.Item resemblance approbation is subject to the CS value[28,30].CS serves as an optimal choice when the documents have high-dimensional attributes, particularly in retrieving text for the information analysis.We use CS for similarity calculation among both the items &users,in the form of item&user vector[7,18,22].CS formula is presented in Eq.(3).

    In this section, we have briefly described our utilized techniques.These techniques perform the text analysis and provide the base for the metadata generation.The text analysis uses the study of the keyword in various forms.We have described the text preprocessing as the primary step of NLP.To obtain the main keywords of the paragraph, we apply ‘stop words’ removal.Then we apply the POS-tagging to differentiate the word types.We group the words to their stems and analyze them regarding their context.We have discussed the tokenization process used in abstractive summarization by TF-IDF with the PRK.We have discussed sentence scoring and Word embedding, used in the extractive summary.We have precisely mentioned CS, as we applied it in the current study to assess the similarity between the various forms of summarization.

    4 Corpus Creations, Processing,and Extraction Metadata

    In this paper, we have presented the way to select improved keywords from the actual text and its multiple reduced forms.This technique gives enhanced corpus view and selects the different number of keywords as per the diversity of numerous types of corpus.We have explained all the steps of our technique form the Sections 4.1-4.6 and the block diagram of these steps depicted in Fig.1.

    Figure 1: Text mining process from tokenization to summarized text extraction

    4.1 Preprocessing of Text

    We have preprocessed the text to remove its abnormalities.We have utilized this text in multiple summarizations of our proposed technique.We have performed it in the following steps.

    ●We have removed the non-ASCII character and unnecessary numerical values from the text.

    ●We have identified the named entities and abbreviations by using the dictionary approach.

    ●These steps provided the cleansed text,and we have applied two extractive summarizations over this sentence-based text.

    ●Tokenization has critical importance in morphological analysis of the text.We have applied tokenization and further steps by using Python language package NLTK.

    ●The stop word removal performed over the cleansed text.

    ●Next,we have performed the POS tagging over the cleansed text.

    ●Then the stemming and lemmatization process applied to reduce the text further.

    ●Our abstractive summarization technique basis on this token-based cleansed text.

    4.2 Word Embedding Based Extractive Text Summarization

    Word embedding refers to a set of techniques used for modelling the language, and these techniques have NLP learning features.We apply vocabulary to map the tokens of words and phrases to the real number vectors.We have used Python language package Gensim to obtain Word2Vec summaries of the dataset [6,19].We have performed it as below.

    ●This summarization performed over individual paragraphs of the dataset.

    ●The obtained summaries placed in an adjacent column with the actual text paragraph, see Fig.2(a).

    ●The text on average is one third reduced in each paragraph by this summarization technique.

    ●These summaries rely on those words which have high frequency to occur together in various documents.

    Figure 2: Preview of the actual and summarized corpuses

    4.3 Sentence Score Based Extractive Text Summarization

    Text summarization through the sentence score includes four things,i.e.,text pre-processing which we have discussed previously involve,identifying high-frequency words,ranking the sentences as per the score/frequency of identified words, ranked lines extraction in the form of an extracted summary [21,26].This process has the following steps.

    ●This summarization performed over individual actual paragraphs of the dataset.

    ●The generated summaries placed in the adjacent column with Word2Vec reduced paragraph,see Fig.2(a).

    ●The text of each paragraph reduced one third on average.

    ●These summaries utilize those words,which have a high frequency in the given document.

    4.4 PRK Based Abstractive Text Summarization

    Word We have applied PRK over the text documents.Generally, eight to ten keywords satisfactorily define the theme of the document.In search engine optimization techniques, a thousand words web page or blog have five to eight keywords, depending on cost per click criteria [31,33].There exist no specific rules to fix the number of keywords.In our experimentation, we have used long paragraphs consisting of about eight hundred to one thousand words.Therefore, we have taken fixed ten words for each paragraph as per the above generally followed trend.

    ●For PRK extraction, tokenization based cleansed text generated, i.e., without stop-words, numbers,and duplicate words.

    ●These words placed in the next column of sentence score summaries.

    ●The previous PRK methods have drawback regarding the selection of the number of keywords to represent the theme of the document.

    ●We have resolved this drawback in our proposed technique by selecting the variable number of keywords as per context diversity.

    4.5 Variable Length Keywords Creation

    We have applied three mainly implemented techniques to generate abstractive and extractive summarizations.Word embedding mainly relates to the document context, whereas the sentence score algorithm generates document specific words frequency counting [11,17].We have further extracted the TF-IDF based PRK from both extractive forms of summarizations.Then we have taken the intersection of these keywords.The resultant intersection of keywords consists of variable length keywords and effectively handle the diversity for each text paragraph as compared to the fixed-length PRK [22,27].We have compared all these forms of abstractive and extractive summarization with the actual text.Our proposed technique of abstractive summarization has shown better resemblance to the actual text.

    4.6 Corpus Visualization and Single Time Processing

    The proposed corpus presentation technique has a one-time cost to fetch the actual paragraph for processing.We have placed the summarized text adjacent to the actual text.Then we have calculated CS of these summaries with the actual text.This technique provides an efficient corpus analysis mechanism during text mining [21,25].However, if there exists some storage issue, there exists an option to generate the ad hoc abstractive and extractive summaries.These ad hoc summaries have no storage need for further processing.One by one, simultaneous extraction and processing of paragraphs utilizes optimal cost and time for handling the diversity of information in the given text[23,26,32].

    This section contains the detail about the main steps performed in the presented technique.We have discussed the way of expressing the corpus with its salient themes.For this, we have processed the extended corpus into two stages.At the first stage, the sentence level text cleansing performed, and this text used to extract the main lines of the text.Then we analyzed the text, word by word, and cleansed it to obtain the keyword-based representation of the text.These extracted forms of text substitute lengthy corpus text processing and save the overhead of cost and time.We have combined the features of these techniques to generate a robust set of keywords with better similarity to the actual text.We have used Python language packages of NLP to implement the proposed technique.

    5 Results and Discussion

    We have taken a diverse news dataset[33]for the experimentation.We have selected long paragraphs consisting of more than eight hundred words,see Figs.2(a)and 2(b).The actual text of the corpus has six different reduced forms,represented by multiple columns.In Fig.2(a),Column H consists of variable length keywords extracted from all forms of summarized text.We have compared these summarizations with the actual text and with each other as shown in Fig.2(b).We have described these results in this section.

    5.1 Average of PRK from Word2Vec Summarized Text Comparison

    Word2Vec summarized text comprises of the words which have a higher probability of occurring together in each document [34,35].We have extracted TF-IDF based PRKs from this text.We have comparted these PRKs with the PRKs of actual text, sentence score, and our proposed Combined PRKs(CPRK).The keywords extracted from this Word2Vec reduced summaries have better average similarity with CPRKs as compared to the remaining two types of abstractive summarizations, see Fig.3 below.

    Figure 3: Average of PRK from Word2Vec summarized text comparison,with the PRKs of all three other forms of summarized text

    5.2 Average of PRK from Sentence Score Summarized Text Comparison

    Sentence score summarized text comprises of the words which have greater frequency to occur in the document.We have extracted TF-IDF based PRKs from this text.We have compared these PRKs with the PRKs of actual text, Word2Vec and proposed CPRKs.The keywords extracted from sentence score summaries have better average similarity with our CPRKs, as compared to the remaining two types of abstractive summarizations as shown in Fig.4.

    Figure 4: Average of PRK from sentence score summarized text comparison, with the PRKs of all three other forms of summarized text

    5.3 Average of PRK from Actual Text Comparison

    We have extracted TF-IDF based PRKs from the actual text.We have compared these PRKs with the PRK’s of Word2Vec, sentence score and our proposed CPRKs.The keywords extracted from the actual text, have a better average similarity with CPRKs, as compared to the remaining two types of abstractive summarizations as presented in Fig.5.

    Figure 5: Average of PRK from actual text comparison, with the PRKs of all three other forms of summarized text

    5.4 Extractive Text Summarization Comparison

    The abstractive summarization from the actual text has better similarity with Word2Vec reduced text,see Fig.5.The PRKs of Word2Vec, have better similarity with PRK of actual text, as compared to PRKs of sentence score summary, see Figs.3 and 4.To elaborate on this fact, we have calculated the similarity of extractive summarizations with the actual text.The Word2Vec extractive summaries have better similarity with actual text as shown in Fig.6.

    5.5 Variable Length PRK Extraction

    We have taken the intersection of three abstractive summarizations to form a new type of abstractive summarization.This technique provides a very stable variable-length abstractive summarization.It has a smaller number of words as compared to extractive summarizations and gives better text similarity with the actual text.This technique solves the issue to decide that how much keywords are enough to extract from a text paragraph as shown in Fig.7.We have fixed PRK length as ten.However, this technique has automatically extracted the required number of keywords.

    Figure 6: Comparison of extractive summarizations: Word2Vec summary with sentence score summary

    Figure 7: Variable length keywords as per text diversity

    This technique has doubled the keywords for actual text paragraphs as shown in Fig.8.This technique ultimately fulfils the deficiency of the fixed-length TF-IDF based PRKs extractions.

    Figure 8: Average of all variable length keywords

    All the previous TF-IDF based PRK extraction techniques fail to guarantee the uniform representation of the text documents with extracted keywords.The proposed CPRKs identifies the text paragraphs uniformly as shown in Fig.9.In this figure,it is evident that CRK has exactly equal similarity with actual text and the remaining two forms of the extractive summarizations.

    Figure 9: Variable PRK uniformity to identify multiple forms of corpuses

    Initially,we have applied corpus pre-processing for the removal of text abnormalities.We have created the sentence score summary of each paragraph from the text obtained.We have created a Word2Vec summary of each paragraph.We have applied POS tagging, stemming, lemmatization, and morphological analysis to the original text.We have used this text in our TF-IDF based abstractive summarization.We have created TF-IDF based PRK, from the cleansed text.We have placed these summaries in adjacent columns of our CSV corpus file.For all the above summarizations, we have calculated the cosine similarities with the actual text.These values provided a comparative analysis of all the text summarization.We have further extracted TF-IDF based PRK from both types of extractive summarizations, obtained by SSA & GW2VA.We have combined the keywords extracted from extractive summarization.Above provided a unique set of keywords, as variable-length abstractive summarization, for each paragraph.The CS values comparisons for each reduced corpus, presented in the form of table & graphs, to show the efficiency of our applied technique.

    6 Conclusion

    The existing studies show that we cannot justify text processing efficiency,just in terms of reduced time& cost.This assessment fundamentally begins with justifying the robustness of the applied metadata generating technique.After that, reduced processing cost becomes justifiable if the metadata sufficiently substitutes the actual text.All this creates the need for designing optimal metadata techniques.These techniques have an essential role in the system of text information retrieval.This technique is different from other TF-IDF based PRK extractions.The previous techniques overlook the information diversity of the multiple text documents.The fixed-length keywords do not provide uniform text identification for every text paragraph.The current technique implements the principles of deep learning and interrelates the number of keywords of a paragraph with the diversity of its information.Therefore, every paragraph has an equally capable set of keywords to present the actual theme of the paragraph.In this paper, we have created three different summarized representations of the actual text.The first form of the summarization is TF-IDF based PageRank Keywords.The second and third are sentence score &Gensim-Word2Vec Extractive summaries of the text.We have calculated the Cosine Similarity of these reduced forms with the actual text corpus.We have presented a detailed comparative analysis of these techniques.Based on these summarizations, we have suggested our improved technique of abstractive summarization with variable length PRKs.

    Acknowledgement:We are thankful to Government College University Faisalabad (GCUF), Pakistan to provide resources for this research.

    Funding Statement:The author(s) received no specific funding for this study.

    Conflicts of Interest:The authors declare that they have no conflicts of interest.

    少妇的逼水好多| 免费观看的影片在线观看| 午夜激情久久久久久久| 男的添女的下面高潮视频| 日韩熟女老妇一区二区性免费视频| 丁香六月天网| 国产精品蜜桃在线观看| 国产精品嫩草影院av在线观看| 免费观看性生交大片5| 成人国产av品久久久| 999精品在线视频| 国产免费一区二区三区四区乱码| 亚洲内射少妇av| 日本av免费视频播放| 中文天堂在线官网| 国产精品.久久久| 青春草视频在线免费观看| 狠狠精品人妻久久久久久综合| 91精品一卡2卡3卡4卡| 777米奇影视久久| 国产有黄有色有爽视频| 久久毛片免费看一区二区三区| 精品99又大又爽又粗少妇毛片| 亚洲欧美中文字幕日韩二区| 亚洲综合精品二区| 在线观看免费高清a一片| 狂野欧美激情性xxxx在线观看| 午夜久久久在线观看| 在线观看免费日韩欧美大片 | 日韩精品有码人妻一区| 18在线观看网站| 精品人妻偷拍中文字幕| 插逼视频在线观看| 高清毛片免费看| 狂野欧美激情性bbbbbb| 国产精品久久久久久av不卡| 91久久精品电影网| 男女无遮挡免费网站观看| 春色校园在线视频观看| 黄色怎么调成土黄色| 中国三级夫妇交换| 一区二区三区精品91| 99国产综合亚洲精品| 欧美老熟妇乱子伦牲交| 亚洲图色成人| 久久精品夜色国产| 亚洲婷婷狠狠爱综合网| 日韩中文字幕视频在线看片| 亚洲av中文av极速乱| 夫妻性生交免费视频一级片| 国产又色又爽无遮挡免| 日日啪夜夜爽| 另类精品久久| 国产精品嫩草影院av在线观看| 久久精品久久精品一区二区三区| 2018国产大陆天天弄谢| 久久精品国产自在天天线| 国产精品 国内视频| 色94色欧美一区二区| 国产午夜精品一二区理论片| 亚洲精品国产av蜜桃| 水蜜桃什么品种好| 亚洲婷婷狠狠爱综合网| 搡老乐熟女国产| 一本—道久久a久久精品蜜桃钙片| 中国国产av一级| 国产男人的电影天堂91| 国产国语露脸激情在线看| 三级国产精品欧美在线观看| 伦理电影大哥的女人| 国产又色又爽无遮挡免| 国产亚洲精品第一综合不卡 | 最近中文字幕高清免费大全6| 精品久久久久久久久av| 亚洲美女视频黄频| 只有这里有精品99| 欧美三级亚洲精品| 黄色配什么色好看| 高清视频免费观看一区二区| av专区在线播放| 乱人伦中国视频| 国产乱来视频区| 99久久精品国产国产毛片| 天美传媒精品一区二区| 亚洲内射少妇av| 最近最新中文字幕免费大全7| 久久99精品国语久久久| 男女免费视频国产| videosex国产| 亚洲在久久综合| 欧美变态另类bdsm刘玥| 中文字幕久久专区| 精品亚洲乱码少妇综合久久| 国产日韩欧美视频二区| av女优亚洲男人天堂| 国语对白做爰xxxⅹ性视频网站| 久久青草综合色| 精品久久国产蜜桃| 日本wwww免费看| 日本午夜av视频| 免费高清在线观看视频在线观看| 成人毛片a级毛片在线播放| 高清午夜精品一区二区三区| 天堂中文最新版在线下载| 日本黄色日本黄色录像| 午夜免费观看性视频| 伊人久久精品亚洲午夜| 国产精品人妻久久久久久| 国产免费又黄又爽又色| 久久精品国产鲁丝片午夜精品| 香蕉精品网在线| 最后的刺客免费高清国语| 亚洲欧洲日产国产| 欧美少妇被猛烈插入视频| 午夜久久久在线观看| 久久人人爽人人爽人人片va| 最近最新中文字幕免费大全7| 亚洲av男天堂| 国产成人免费观看mmmm| 最新中文字幕久久久久| 日韩不卡一区二区三区视频在线| 国产一区二区三区av在线| 91午夜精品亚洲一区二区三区| 这个男人来自地球电影免费观看 | 人人妻人人爽人人添夜夜欢视频| 少妇被粗大的猛进出69影院 | 大片免费播放器 马上看| 女性被躁到高潮视频| 亚洲欧美中文字幕日韩二区| 久久久精品区二区三区| 亚洲欧美日韩另类电影网站| 乱人伦中国视频| 老女人水多毛片| 亚洲精品久久成人aⅴ小说 | 免费大片黄手机在线观看| 成人无遮挡网站| 我的女老师完整版在线观看| 精品午夜福利在线看| 欧美国产精品一级二级三级| 欧美性感艳星| av在线播放精品| 精品国产国语对白av| 成年人午夜在线观看视频| 午夜福利影视在线免费观看| 免费观看的影片在线观看| av女优亚洲男人天堂| 水蜜桃什么品种好| 美女视频免费永久观看网站| 亚洲不卡免费看| 高清视频免费观看一区二区| 免费高清在线观看日韩| 人体艺术视频欧美日本| 制服丝袜香蕉在线| 免费观看在线日韩| 日韩人妻高清精品专区| 成人手机av| 欧美人与性动交α欧美精品济南到 | 精品久久久久久久久av| 国产成人精品婷婷| 亚洲综合色网址| 亚洲欧洲日产国产| 在线观看国产h片| 久久99热这里只频精品6学生| 免费大片18禁| 亚洲精品久久午夜乱码| 女性被躁到高潮视频| 少妇人妻精品综合一区二区| 飞空精品影院首页| 国产日韩欧美亚洲二区| 国产在线一区二区三区精| 久久精品熟女亚洲av麻豆精品| 国产精品人妻久久久久久| .国产精品久久| 日本免费在线观看一区| 亚洲av日韩在线播放| 亚洲国产精品成人久久小说| 国产成人av激情在线播放 | 国精品久久久久久国模美| 精品卡一卡二卡四卡免费| 最近手机中文字幕大全| 国产片特级美女逼逼视频| 日韩一区二区三区影片| 婷婷成人精品国产| 99热这里只有精品一区| 久热这里只有精品99| 精品人妻在线不人妻| 欧美变态另类bdsm刘玥| av有码第一页| 亚洲欧美一区二区三区黑人 | 日本av手机在线免费观看| 夜夜骑夜夜射夜夜干| 在线观看www视频免费| tube8黄色片| 亚洲在久久综合| 国产女主播在线喷水免费视频网站| 亚洲激情五月婷婷啪啪| 日韩制服骚丝袜av| 大码成人一级视频| av卡一久久| 人妻系列 视频| 欧美3d第一页| 极品少妇高潮喷水抽搐| 国产成人freesex在线| 日韩av在线免费看完整版不卡| 日本色播在线视频| 九色成人免费人妻av| 亚洲色图综合在线观看| 最近手机中文字幕大全| 丰满饥渴人妻一区二区三| 高清在线视频一区二区三区| 赤兔流量卡办理| 大陆偷拍与自拍| 人成视频在线观看免费观看| 中文欧美无线码| 久久久久久久久大av| 黑人猛操日本美女一级片| 啦啦啦视频在线资源免费观看| 丝袜脚勾引网站| 国产欧美日韩一区二区三区在线 | 国产精品国产三级国产av玫瑰| 亚洲情色 制服丝袜| 亚洲四区av| 久久99热6这里只有精品| 亚洲三级黄色毛片| 成人免费观看视频高清| 美女大奶头黄色视频| 色婷婷av一区二区三区视频| 精品久久久久久久久av| 精品国产国语对白av| 99热6这里只有精品| 国产深夜福利视频在线观看| 亚洲人成网站在线观看播放| 国产av码专区亚洲av| 夫妻性生交免费视频一级片| 欧美精品人与动牲交sv欧美| 亚洲国产精品专区欧美| 五月玫瑰六月丁香| 久久久久人妻精品一区果冻| av国产久精品久网站免费入址| 久久精品夜色国产| 成年女人在线观看亚洲视频| 黄色配什么色好看| 69精品国产乱码久久久| 校园人妻丝袜中文字幕| 国产在视频线精品| 热re99久久精品国产66热6| 亚洲国产毛片av蜜桃av| av有码第一页| 蜜臀久久99精品久久宅男| 午夜激情久久久久久久| 成人无遮挡网站| 三上悠亚av全集在线观看| 少妇的逼水好多| 大陆偷拍与自拍| 日本与韩国留学比较| 国产成人精品婷婷| 美女内射精品一级片tv| 国产乱来视频区| 成人综合一区亚洲| 国产精品一区二区三区四区免费观看| 欧美人与性动交α欧美精品济南到 | 亚洲精品乱码久久久v下载方式| 日韩精品免费视频一区二区三区 | 久久久久久久国产电影| 午夜影院在线不卡| 午夜福利在线观看免费完整高清在| 亚洲精品乱久久久久久| 亚洲精品国产av成人精品| 精品亚洲乱码少妇综合久久| 日本免费在线观看一区| 精品午夜福利在线看| 简卡轻食公司| 精品人妻熟女毛片av久久网站| 精品一区二区免费观看| 久久人人爽av亚洲精品天堂| 成年人午夜在线观看视频| 国产在线一区二区三区精| 亚洲精品美女久久av网站| 91午夜精品亚洲一区二区三区| 日韩大片免费观看网站| 国产成人免费无遮挡视频| 亚洲精品成人av观看孕妇| 一个人免费看片子| 国产精品一区二区三区四区免费观看| 久久热精品热| 亚州av有码| 国产成人a∨麻豆精品| 国产精品一二三区在线看| 精品久久久噜噜| 久久久久久久久久久免费av| 国产黄频视频在线观看| 午夜久久久在线观看| 插阴视频在线观看视频| 丝袜在线中文字幕| 久久毛片免费看一区二区三区| 九九在线视频观看精品| 亚洲精品日韩av片在线观看| 欧美另类一区| 中文字幕精品免费在线观看视频 | 免费日韩欧美在线观看| 国产精品久久久久久久久免| 午夜视频国产福利| 精品国产露脸久久av麻豆| av免费观看日本| av天堂久久9| 亚洲国产色片| 国产亚洲精品第一综合不卡 | 亚洲人成网站在线观看播放| 中文字幕人妻丝袜制服| 国产极品粉嫩免费观看在线 | 亚洲国产色片| 精品亚洲成国产av| 国产白丝娇喘喷水9色精品| 亚洲精品,欧美精品| 亚洲内射少妇av| 欧美 亚洲 国产 日韩一| 99久久人妻综合| 欧美+日韩+精品| 男女啪啪激烈高潮av片| 久久久久视频综合| 能在线免费看毛片的网站| 特大巨黑吊av在线直播| 国产毛片在线视频| 一级,二级,三级黄色视频| 自拍欧美九色日韩亚洲蝌蚪91| 日韩欧美精品免费久久| 99热这里只有精品一区| 在线精品无人区一区二区三| 男女边摸边吃奶| 啦啦啦在线观看免费高清www| 亚洲国产最新在线播放| 国产深夜福利视频在线观看| 亚洲精品美女久久av网站| 免费高清在线观看视频在线观看| 国产男女内射视频| 最近手机中文字幕大全| 久久久久久久久久久丰满| 国产精品一区二区三区四区免费观看| 国产精品偷伦视频观看了| 免费av中文字幕在线| 国产片特级美女逼逼视频| 精品亚洲成国产av| 99国产综合亚洲精品| 极品人妻少妇av视频| 卡戴珊不雅视频在线播放| 青春草国产在线视频| 国产又色又爽无遮挡免| 日韩av不卡免费在线播放| 欧美精品国产亚洲| 亚洲av免费高清在线观看| 久久久a久久爽久久v久久| av卡一久久| 国产乱来视频区| 亚洲精品一二三| 超色免费av| 久久午夜综合久久蜜桃| 少妇熟女欧美另类| a级片在线免费高清观看视频| 18在线观看网站| 日韩强制内射视频| 一级黄片播放器| 国产精品人妻久久久久久| 亚洲精品视频女| 中文字幕最新亚洲高清| 99热6这里只有精品| 久久久欧美国产精品| av在线观看视频网站免费| 亚洲av电影在线观看一区二区三区| 精品人妻熟女av久视频| 久久精品熟女亚洲av麻豆精品| 午夜福利网站1000一区二区三区| 国产成人精品婷婷| 亚洲国产精品成人久久小说| 男女无遮挡免费网站观看| 久久久久久久久久久久大奶| 亚洲精品国产色婷婷电影| 少妇熟女欧美另类| 亚洲性久久影院| 亚洲一级一片aⅴ在线观看| 亚洲精品第二区| 久久久久久久久久人人人人人人| 永久网站在线| 国产精品国产三级国产av玫瑰| 最近的中文字幕免费完整| 色网站视频免费| av一本久久久久| 精品人妻一区二区三区麻豆| 亚洲精品日韩av片在线观看| 亚洲av免费高清在线观看| av专区在线播放| 日韩在线高清观看一区二区三区| 日本午夜av视频| 春色校园在线视频观看| 黄色欧美视频在线观看| freevideosex欧美| 久久 成人 亚洲| videos熟女内射| av网站免费在线观看视频| 国产精品女同一区二区软件| 亚洲av不卡在线观看| 色网站视频免费| 精品久久久久久久久av| 久久人人爽人人片av| 久久精品国产亚洲av涩爱| 高清av免费在线| 亚洲av在线观看美女高潮| 纵有疾风起免费观看全集完整版| 男男h啪啪无遮挡| 97超碰精品成人国产| 精品久久蜜臀av无| 高清毛片免费看| 日韩av不卡免费在线播放| 午夜福利视频精品| 成人午夜精彩视频在线观看| 精品久久久久久电影网| 亚洲欧美一区二区三区国产| 日本黄色日本黄色录像| 午夜激情久久久久久久| 久热久热在线精品观看| 午夜福利在线观看免费完整高清在| 中文字幕最新亚洲高清| 精品亚洲乱码少妇综合久久| 欧美精品一区二区大全| 国精品久久久久久国模美| 精品久久久噜噜| 哪个播放器可以免费观看大片| 夜夜骑夜夜射夜夜干| 久久久欧美国产精品| 色婷婷av一区二区三区视频| 最近的中文字幕免费完整| 亚洲中文av在线| 免费久久久久久久精品成人欧美视频 | 亚洲国产精品国产精品| 在线看a的网站| 黄色怎么调成土黄色| 涩涩av久久男人的天堂| 午夜日本视频在线| 18禁在线无遮挡免费观看视频| 99国产精品免费福利视频| 国产亚洲午夜精品一区二区久久| 日日啪夜夜爽| 国产精品秋霞免费鲁丝片| 亚洲成人一二三区av| 亚洲精品视频女| 国产精品久久久久久av不卡| 精品99又大又爽又粗少妇毛片| 午夜福利,免费看| 男女边吃奶边做爰视频| 国产高清三级在线| 日本vs欧美在线观看视频| 欧美xxxx性猛交bbbb| 精品熟女少妇av免费看| 亚洲精品av麻豆狂野| 高清视频免费观看一区二区| 欧美少妇被猛烈插入视频| 亚洲图色成人| 91久久精品电影网| 婷婷色综合www| 熟女电影av网| 18禁在线无遮挡免费观看视频| 久久久久国产网址| 亚洲情色 制服丝袜| 久久这里有精品视频免费| 亚洲av福利一区| 日日啪夜夜爽| 69精品国产乱码久久久| 999精品在线视频| 2022亚洲国产成人精品| 欧美精品亚洲一区二区| 亚洲av日韩在线播放| 日韩欧美精品免费久久| 黄色一级大片看看| 在线观看国产h片| 国产免费又黄又爽又色| 色吧在线观看| 自线自在国产av| 少妇丰满av| 三上悠亚av全集在线观看| 街头女战士在线观看网站| 免费黄色在线免费观看| 亚洲一区二区三区欧美精品| av女优亚洲男人天堂| 亚洲精品美女久久av网站| 国产成人av激情在线播放 | 国产精品一区二区三区四区免费观看| 91在线精品国自产拍蜜月| 不卡视频在线观看欧美| 日韩成人av中文字幕在线观看| 99国产精品免费福利视频| 日韩成人av中文字幕在线观看| 少妇被粗大的猛进出69影院 | 国产午夜精品久久久久久一区二区三区| 热99国产精品久久久久久7| 国产白丝娇喘喷水9色精品| 国产视频首页在线观看| 久久人人爽av亚洲精品天堂| 久久婷婷青草| 99久久中文字幕三级久久日本| 91精品国产九色| 亚洲av欧美aⅴ国产| 99九九在线精品视频| 亚洲国产精品999| 九色成人免费人妻av| 久久精品国产亚洲av天美| 日韩中字成人| 一级黄片播放器| 国产精品国产三级专区第一集| 欧美精品亚洲一区二区| 成人毛片a级毛片在线播放| 中文乱码字字幕精品一区二区三区| 久久婷婷青草| 亚洲激情五月婷婷啪啪| 国产黄色视频一区二区在线观看| 亚洲av日韩在线播放| 一本久久精品| 亚洲成人av在线免费| 在线观看免费高清a一片| 欧美激情 高清一区二区三区| 国产亚洲午夜精品一区二区久久| 免费少妇av软件| 亚洲综合色惰| 熟妇人妻不卡中文字幕| 国产精品一国产av| 麻豆乱淫一区二区| 精品人妻偷拍中文字幕| 老司机影院成人| 国产av精品麻豆| 国产精品久久久久久av不卡| 一个人看视频在线观看www免费| 一级二级三级毛片免费看| 久热这里只有精品99| 欧美少妇被猛烈插入视频| 少妇被粗大的猛进出69影院 | 欧美最新免费一区二区三区| 青春草亚洲视频在线观看| 少妇的逼水好多| 有码 亚洲区| 亚洲三级黄色毛片| 大码成人一级视频| 久久久国产欧美日韩av| 精品一区二区免费观看| 久久久久久久国产电影| 亚洲天堂av无毛| 青春草国产在线视频| av网站免费在线观看视频| 欧美 亚洲 国产 日韩一| 夫妻性生交免费视频一级片| 国产免费视频播放在线视频| 国产一区有黄有色的免费视频| av播播在线观看一区| 亚洲精品第二区| 色网站视频免费| 多毛熟女@视频| 夜夜爽夜夜爽视频| 免费不卡的大黄色大毛片视频在线观看| av黄色大香蕉| 99热网站在线观看| 精品亚洲成a人片在线观看| 日本wwww免费看| 99热这里只有精品一区| 少妇人妻 视频| 中文字幕免费在线视频6| 黄片无遮挡物在线观看| 麻豆精品久久久久久蜜桃| 日产精品乱码卡一卡2卡三| 一级黄片播放器| 国产精品国产三级专区第一集| 少妇 在线观看| 男女边摸边吃奶| 久久久久国产精品人妻一区二区| 美女大奶头黄色视频| 熟女人妻精品中文字幕| 久久热精品热| 国产成人aa在线观看| 久久久亚洲精品成人影院| 亚洲人与动物交配视频| 日韩一本色道免费dvd| 日韩 亚洲 欧美在线| 国产精品无大码| 欧美成人精品欧美一级黄| 亚洲国产精品专区欧美| 日韩精品免费视频一区二区三区 | 欧美日韩精品成人综合77777| 91精品国产九色| 国产一区二区在线观看日韩| 日本91视频免费播放| 大片电影免费在线观看免费| 中文字幕久久专区| 男女高潮啪啪啪动态图| 人人妻人人爽人人添夜夜欢视频| 91在线精品国自产拍蜜月| 亚洲成色77777| 婷婷色综合www| 三级国产精品欧美在线观看| 女性被躁到高潮视频| 国产女主播在线喷水免费视频网站| 久久久久精品性色| 日韩av在线免费看完整版不卡| 一级黄片播放器| 国产一区二区三区av在线| 少妇熟女欧美另类| 在线观看免费日韩欧美大片 | 日韩av在线免费看完整版不卡| 人妻系列 视频| 欧美人与性动交α欧美精品济南到 | 大码成人一级视频| 亚洲第一av免费看| 日本猛色少妇xxxxx猛交久久| 欧美变态另类bdsm刘玥| 人人澡人人妻人| 国产精品一区二区三区四区免费观看| 久久午夜福利片| 国产av精品麻豆| 2021少妇久久久久久久久久久| 我的老师免费观看完整版|