• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Embedding Extraction for Arabic Text Using the AraBERT Model

    2022-08-24 12:57:04AmiraHamedAboElghitTaherHamzaandAyaAlZoghby
    Computers Materials&Continua 2022年7期

    Amira Hamed Abo-Elghit, Taher Hamzaand Aya Al-Zoghby

    1Faculty of Computers and Information, Department of Computer Sciences, Mansoura University, Mansoura,35516, Egypt

    2Faculty of Computers and Artificial Intelligence, Department of Computer Sciences, Damietta University, Damietta,34517, Egypt

    Abstract: Nowadays, we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work, we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking, essay grading, and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features, frequency-based features, and pre-trained model-based features.Also, we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers (AraBERT).We used the AraBERT model in two different variants.First, as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second, AraBERT is adopted as a pre-trained model, and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results, we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor (21.7723) and the fine-tuned AraBERT v2 (21.8211).On the other hand, AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination (R2) values (0.014050,-0.032861), respectively.

    Keywords:Semantic textual similarity; arabic language; embeddings;AraBERT;pre-trained models; regression; contextual-based models; concurrency concept

    1 Introduction

    The textual similarity is a critical topic in Natural Language Processing (NLP).That is due to its increasingly important turn in related topics such as text classification, recovery of specific information from data, clustering, topic retrieval, subject tracking, question answering systems, essay grading, and summarization.The textual similarity process tends to estimate the relevancy between text units [1,2].The variations in the approaches existing in the literature review for textual similarity depend on the text representation scheme used before text comparison.Text representation is a significant task used to convert the unregulated form of textual data into a more formal construction before any additional text analysis or using it in predictive modeling [3].Text representation, word embeddings, or vectorization means converting the text to numbers, which can be integers or floating-point values, then using it as input to machine learning algorithms [4].We can divide the approaches of word embeddings into three categories: frequency-based or statistical-based, prediction-based or pre-trained, and contextualbased word embeddings.The frequency-based word embeddings approach is the traditional text modeling, which is based on the BOW representation.It contains One Hot Encoding (OHE), Hashing Vectorization,Part Of Speech (POS) Weighting[5], Word Counts,Term Frequency-Inverse Document Frequency (TFIDF) [4], and N-grams [6].These vectorization techniques of text representation work well; however, they fail to keep a semantic relation between words or the meaning of a text, not considering the context inwhich aword appears.Consequently, the order ofwords’occurrence is lost as we create a vector of tokens in randomized order, and they may provide a sparse vector that contains a lot of zeros.The prediction-based or pre-trained word embedding models are trained on a large collection of texts to build fixed-length and continuous-valued vectors in low-dimensional space.The embedding size can vary depending on the target size selected during the training.It includesWord2Vec[7], Doc2Vec [8], FastText [9], GloVe [10], Aravec [11], etc.Pre-trained models save the time spent on obtaining, cleaning, and processing (intensively) enormous datasets.However, it, unfortunately,does not consider the relations between multiple words and the overall sentences’meanings or context within the text.

    To overcome the above problems, contextual-based embedding models such as ELMo [12],ULMFiT [13], and BERT [14] are effective for learning complete sentence embeddings.They are used in sequence-level semantics learning of all the sequences in the documents.Thus, such models learn divergent embeddings for polysemous words.The ELMo model, e.g., is a dynamic language modeling technique to learn the embeddings of words based on context and considers the divergent embeddings of polysemous words.It contains two language models in each of the two directions that form a multilayer Recurrent Neural Network (RNN).The ULMFiT model is a left-to-right language model that boosts performance for some pre-trained models such as ELMo by including multiple fine-tuning techniques.By contrast to the ELMo, which incorporates the left-to-right and right-to-left models, the BERT uses bidirectional transformer training to provide more precise word embeddings.Three versions of BERT that address the Arabic language include the multilingual BERT (mBERT)[15] and two versions of AraBERT [16].

    The main objective of this research is to propose an algorithm for estimating the textual similarity scores between Arabic texts.Then, use these scores in multiple tasks, such as text ranking, essay grading, and question answering systems.Our detailed objectives are: 1) Choosing the best text vectorization scheme to represent texts in the used dataset.2) Picking the best regression model to make predictions represent the relevancy scores between text units from the applied regressors in terms of MAPE and R2Evaluation metrics.3) Reducing the execution time of processing and increasing the CPU utilization as much as possible.

    To implement our proposed algorithm, we used the AraBERT model in two different variants:first, we used it as a feature extractor model in addition to many other text embedding schemes such as word counts, TFIDF, and POS weighting as statistical-based approaches.Also, we use FastText and Aravec pre-trained models as prediction-based approaches.Then, we fed those features to several regressors to make a prediction value that represents the relevancy score between their input texts.Second, we address the AraBERT model as a pre-trained model and fine-tune its parameters on the measuring textual similarity task to use the obtained results in many other tasks later.

    The rest of this paper is organized as follows.The literature is reviewed in Section 2.Section 3 then describes the details of our proposed algorithm, and the experimental settings are introduced in Section 4.We present our experiments’details in Section 5.The discussion of results and implications is introduced in Section 6.Section 7 finally outlines the conclusion and suggests future work.

    2 Review of Literature

    Section 2.1 discusses the concept of textual similarity and the studies addressing it in the literature.Then, we address the recent research that used the AraBERT model in multiple NLP tasks in Section 2.2.

    2.1 Textual Similarity and Its Approaches

    In our previous work [1], we introduced a comprehensive overview of the textual similarity measurement approaches in the literature.We illustrated the differences between the categories of textual similarity concepts: lexical-based, semantic-based, and hybrid-based similarity in detail.We noticed that the differences in the approaches provided in the previous work depend on the text vectorization technique used before the text comparison process.There are various text vectorization techniques used, such as TFIDF, Latent Semantic Indexing (LSI) [17], and Graph-based Representation [18].Due to these techniques, the similarity measure to compare text units differs because one similarity measure may not be convenient for all representation schemes.We summarized the most prominent attempts to measure the different textual similarity types and compared them according to the applied technique of feature extraction, the used dataset, and the results released by each approach.Then we shed light on the semantic analysis in the Arabic language, which is divided into four approaches: the word co-occurrence approach, the LSI approach, the feature-based approach, and the hybrid-based approach.Regarding the previous taxonomy mentioned above, we reviewed some of those approaches and summarized them according to the applied technique, the used dataset, the aim of each one,the similarity type (string-based, corpus-based, knowledge-based, or hybrid-based), and the results obtained by each approach.

    Recently, [19] proposed a semantics-based approach for post-retrieval query-performance prediction depending on semantic similarities measured between entities in documents and queries.It consists of predictors for measuring semantic distinction, semantic query drift, and semantic cohesion in the top-ranked list of retrieved documents.The finding was that the proposed semantics approach is more effective in query performance predicting than the term-based methods by considering the semantic relatedness instead of the exact terms matching.They evaluated the proposed approach on the Robust04, ClueWeb09-B, and ClueWeb12-B datasets.Their queries’rankings are compared according to the proposed predictions and the actual values using Pearson and Kendall Correlation rank coefficients.

    On the other hand, [20] proposed a probabilistic framework that incorporates Bidirectional Encoder Representations from Transformers (BERT) via sentence-level semantics into Pseudo-Relevance Feedback (PRF).They obtained the term importance at the term level.Then, they used the fine-tuned BERT model to get the embeddings of the query and the sentences in the feedback document to estimate the relevancy score between them.Next, the term scores at the sentence level are summed.Finally, the term-level and sentence-level weights are balanced by factors and combining the top-k terms to generate a novel query for the next iteration of the processing.They conducted several experiments depending on six TREC datasets.As manifested by the evaluation indicators, the improved models outperformed the existing baseline models.

    2.2 Using AraBERT Model in NLP Tasks

    Several researchers have used the AraBERT model, either as a feature extractor model or by fine-tuning its parameters for a specific task.For example, [21] proposed three neural models:Bi-LSTM, CNN with FastText pre-trained word embeddings, and Transformer architecture with AraBERT embeddings.They are combined with three similarity measures for Arabic text similarity and plagiarism detection.They used the question similarity dataset for Semantic Textual Similarity(STS) called Mawdoo3 and the 2015 Arabic Pan dataset for plagiarism detection evaluation.Their results showed that the AraBERT-Transformer outperformed other models in terms of Pearson correlation with the Dot-Product-Similarity.

    Reference [22] is another research that combined different types of classical and contextual embeddings: pre-trained word embeddings such as FastText and Aravec, pooled contextual embeddings,and AraBERT embeddings for processing Arabic Named Entity Recognition (NER) task on the AQMAR dataset.These embeddings are then fed into the Bi-LSTM.The experiments showed that the combination of the pooled contextual embeddings, FastText embeddings, and BERT embeddings had achieved the best performance.The proposed method in this research has achieved an F1 score of 77.62 percent, which outperforms all previously published results of deep and non-deep learning models on the same dataset.

    Reference [23] paper addressed the pre-trained AraBERT model to learn complete contextual sentence embeddings to show its utilization in Arabic text multi-class categorization.They used it in two variants.The first is to transfer the AraBERT knowledge to the Arabic text categorization, and they fine-tuned the AraBERT’s parameters on the OSAC datasets.Second, they used it as a feature extractor model, then fed its results to several classifiers, including CNN, LSTM, Bi-LSTM, MLP,and SVM.After comprehensive experiments, the findings showed that the fine-tunedAraBERTmodel accomplished state-of-the-art performance results (99%) in terms of F1-score and accuracy.

    Reference [24] presented a binary classifier model to decide whether the pairs of verses provided by the QurSim dataset are semantically related or not.The AraBERT language model is used.They avoided redundancy and generated unrelated verse pairs from the QurSim dataset, dividing it into three datasets for comparisons.The experiments showed that the AraBERTv0.2 outperformed the AraBERTv2 on the three datasets in terms of accuracy score (92%).

    Finally, [25] is shared in the EACL WANLP-2021 Shared Task 2:“Sarcasm and Sentiment Detection.”and proposed a strategy consisting of two systems.The first system investigated whether a given Arabic tweet was sarcastic or not, which required performing deletions, segmentation, and insertion operations on different parts of the text.The other system aimed to detect the sentiment of the Arabic tweet from the ArSarcasm-v2 dataset that involved experimenting with multiple versions of two transformer-based models, AraELECTRA and AraBERT.They achieved the seventh and fourth places in the sarcasm and sentiment detection subtasks, respectively.

    3 Methodology

    This section extensively presents the methodology implemented for developing the proposed system.First, we start by describing the dataset used in this work.Then, we explain the proposed method and its modules.

    3.1 Dataset

    In this paper, we use the SemEval2017-task3 (Community Question Answering)-subtask-D(Rerank correct answers for a new question) dataset, which refers to the Arabic CQA-MD (Community Question Answering-Medical Domain) dataset [26].It was collected from three Arabic medical websites (WebTeb, Altibbi, and Islamweb) that permit posting questions related to health and medical conditions by visitors and getting answers from professional doctors.It was divided into training,development, and testing datasets.Every dataset file includes a sequence of threads that begins with the original question and is associated with a list of 30 question-answer pairs, each with the following labels: D (Direct) means the QA pair contains a direct answer to the original question.R (Related)means the QA pair includes an answer to the original question that covers some of the aspects raised in the original question.In the end, I (irrelevant) means the QA pair contains an answer irrelevant to the original question.

    Fig.1 illustrates annotated questions from the dataset.Also, each QA pair is associated with some metadata, including the following: ID (QAID) is a unique ID of the question-answer pair.Relevance(QArel): the relevance of the question-answer pair concerning the question, which is to be predicted at test time, and Confidence (QAconf): this is the confidence value for the relevance annotation, based on inter-annotator agreement and other factors.This value is available for the training dataset only; it is not available for the development and test datasets.

    Figure 1: Annotated question from the Arabic CQA-MD dataset

    So, we use this dataset (training and development) with this associated metadata to accomplish our primary research objective: estimate the relevancy scores between text pairs.We consider the confidence (QAconf) values as the relevancy score between the question and its QA pairs.

    3.2 Text Preprocessing Phase

    In Fig.2, we propose two models for preprocessing: simple preprocessing and full preprocessing,depending on the nature of the task of the subsequent phases.For instance, we only need some preprocessing steps to transformdata into a formthatmatches theAraBERTmodel, such as removing diacritics, punctuations, and URL text.So, we consider this situation in our proposed methodology and define two types of preprocessing steps.The simple preprocessing procedure includes diacritics removal (Tashkeel_Removing Function), punctuations, URL text removal, and spell checking.Then,we apply the tokenization task to split the text into its tokens using the AraBERT tokenizer.Afterward,we change each text to a BERT format by adding the particular [CLS] token at the start of each text and a [SEP] token between the sentence and the end.Then, we determine each token’s index according to AraBERT’s vocabulary.The full preprocessing contains the same steps as the previous preprocessing type, in addition to stopwords removing, named entity recognition (NER), stemming,and lemmatization tasks, respectively.But there is a difference between the tokenization task in both algorithms.

    To complete the diacritics removing task (Tashkeel_Removing function), we use the Tashaphyne Python library [27], which is an Arabic light stemmer and segmentor.In the normalize module of this package, we specifically use the function strip_tashkeel.We define a set of patterns to detect any punctuation symbols and URL text in the text using the re python library, which provides several functions to facilitate the search for a specific pattern or string form in the text and remove it from the text.Next, we use Farasa [28], an Arabic Nag (ANLP) toolkit serving the spellchecking task and several other tasks such as segmentation, stemming,Named Entity Recognition,stemming,Named Entity Recognition(NER), and part-of-speech tagging.As shown in Algorithm 1 of the full preprocessing, we use a builtin function in Python called split that allows changing the default splitter from space to any symbol or character if we need it.Consequently, we remove them from sentences using the Natural Language Toolkit (NLTK) Python package, which includes a stopwords corpus containing stopwords’lists for Arabic and many other languages [29].The Farasa Named Entity Recognizer is used to generate a list of named entities in text.The aim behind using this technique as a step of preprocessing steps is to keep the named entities found in a text without any change that may be happening to them in the stemming task, as shown in Tab.1.

    Arabic stemmers are categorized into two categories: light-based stemmers and root-based stemmers.This type is used in the stemming step with the Farasa Stemmer web API.Also, we use Khoja stemmer as a root-based stemmer [30].Consequently, after applying the NER and stemming processes to a text, we compare the output list from the NER process to the output list from the stemming process to obtain the final representation of the given text, as shown in Fig.3.Thus, we are given a set of questions, each of which is associated with a set called P that includes question-answer pairs.To compute our features, we define a question with its question-answer pairs as<T1, T2>,where T1 is the original question and T2 is a question from its question-answer pair according to three setups:

    ?Simple processed data setup in which we perform simple preprocessing on T1 and T2 before using them in the AraBERT model.

    ?Stemmed data setup in which the stemming process from the full preprocessing phase is applied to T1 and T2.

    ?Lemmatized data setup in which the lemmatization process from the full preprocessing phase is applied to T1 and T2.

    Figure 2: System architecture

    Algorithm 1: Full Text Preprocessing 1.Function: Full_Preprocessing (qi,pi)2.Input:3 qi: question 4 pi: list of approximately 30 pair of answers retrieved for qi 5.Output:6.stemmed_ques: string object represents stemmed version of preprocessed question //initially null 7.lemmatized_ques:stringobjectrepresentslemmatizedversionofpreprocessedquestion//initially null 8.stemmed_pairs: preprocessed list of stemmed answers for this question // initially empty 9.lemmatized_pairs: preprocessed list of lemmatized answers for this question // initially empty 10.Variables:11.qo: preprocessed question 12.po: a list of preprocessed answers for qo 13.token_list: list of tokens, initially empty 14.tokens_after_stopwords_remove: list of tokens after removes stopwords from them //initially empty 15.ner_list: list of tuples of NER process // initially empty 16.stems_list: list of stems of tokens // initially empty 17.stemmed_ques: string object represents the stemmed version of the question 18.lemmatized_ques: string object represents the lemmatized version of the question 19.stemmed_pairs: preprocessed list of stemmed answers for this question 20.lemmatized_pairs: preprocessed list of lemmatized answers for this question 21.Begin 22.qo= Tashkeel_Removing (qi)23.qo= URL_Removing (qi)24.qo= Punctuation_Removing (qi)25.qo= Spellchecking (qi)26.tokens_list=Tokenization (qi)27.tokens_after_stopwords_remove=Stopwords_Removing (tokens_list)28.ner_list=Named Entity Recognition (tokens_after_stopwords_remove)29.stems_list=Stemming (tokens_after_stopwords_remove)30.stemmed_ques=Compare (ner_list, stems_list)31.lemmatized_ques=Lemmatize (stemmed_ques)32.for answer in pianswer=0, 1, ...do 33.po[answer] = Tashkeel_Removing (pi[answer])34.po[answer] = URL_Removing (pi[answer])35.po[answer] = Punctuation_Removing (pi[answer])(Continued)

    Algorithm 1: Continued 36.po[answer] = Spellchecking (pi[answer])37.token_list = Tokenization (pi[answer])38.tokens_after_stopwords_remove = Stopwords_Removing (token_list)39.ner_list = Named Entity Recognition (tokens_after_stopwords_remove)40.stems_list = Stemming (tokens_after_stopwords_remove)41.stemmed_pairs.insert (answer, Compare (ner_list, stems_list ))42.lemmatized_pairs.insert (answer, Lemmatize (stems_list ))43.return stemmed_ques, lemmatized_ques, stemmed_pairs, lemmatized_pairs 44.End

    Table 1: Representation of the influence of the spell checking and NER processes on stemming and lemmatization results

    Figure 3: Comparison between NER and stemming processes’results

    3.3 Text Vectorization Phase

    This phase consists of two modules: the traditional features module and the AraBERT model.

    3.3.1 Traditional Features Module

    In this phase, we execute multiple feature engineering techniques.To begin, we employ threesentence pair matching metrics: Long Common Substring/Sequence [31], Levenshtein distance, and Minimum Edit Distance (MED) [32], which are intended to directly calculate the similarity (overlapping of characters/terms/substrings) of two sequences.To obtain an accurate sequence similarity value, the stopwords are removed and each word is lemmatized.Consequently, for each sentence pair, we get three features as lexical-based similarity features.Second, we apply three types of statistical-based embedding techniques: word counting, TFIDF, and POS weighting.We use the sklearn.feature_extraction module [33] to extract these features in a format reinforced by machine learning algorithms fromthe dataset.We consider the sparsity problem, which may be caused by them,and try to solve it by applying these steps in the preprocessing phase: stopwords from being removed,fixing misspelt words, and reducing words to their lemma.

    To ensure that these steps have an effect, we calculate the number of vocabulary in our dataset;it is approximately 21835 in the stemmed data setups and it becomes approximately 10988 in the lemmatized data setup.We notice that the number of vocabulary words decreased using the lemmatization process.Consequently, the dimensionality of vectors decreases.Third, we apply some pre-trainedword embedding models: FastText and Aravec, both ofwhich apply to the Arabic language.Tab.2 shows the versions of pre-trained word embedding that were used in this study.In Python, we use the Genism library that provides access to FastText and other word embedding algorithms for training and extracting word vectors; it allows us to download pre-trained models from the internet to be loaded and fine-tuned [34].

    We try each of these models individually to initialize word embedding, although we sometimes cannot find embeddings for some words in a sentence.Consequently, we combined them to complete each other and obtain a large number of word embeddings.Because of the nature of our used dataset,a sentence may contain foreign words that are the names of medicines or diseases that are not found in the Aravec or Arabic Fast text models.To address this issue, we used the FastText model’s multilingual model, as shown in Fig.4.We notice that some of the words are misspelled, so we cannot use an embedding model to get the correct embedding for them.Thus, spell checking is an essential step in the preprocessing phase.Second, we must convert some words to their lemma form to get their embeddings from pre-trained models such as the Aravec model.However, there are some terms that are not found in an embeddingmodel even after correcting them; thus, we ignore them froma sentence.

    Table 2: Pre-trained word embedding models

    Figure 4: Process of obtaining word embeddings from the pre-trained models

    To obtain a single sentence vector representing the embedding of each sentence, we adopt several methods, such as averaging the word vectors that form each sentence, the averaged vectors that result are multiplied by a projection matrix, and using smooth inverse frequency (SIF) [35] to estimate each word-embedding weight by a / (a + p(w)), where a is a parameter that is typically set to 0.001 and p(w) is the frequency of the word in a dataset, contrary to the previous methods that assign equivalent weights to each word in the sentence.

    3.3.2 AraBERT Model

    Because we are dealing with Arabic texts, we use the AraBERT model, which is an Arabic pre-trained language model based on the BERT architecture [14,16].There are four releases of it:AraBERT v0.1, AraBERT v1, AraBERT v0.2, and AraBERT v2.They may differ from each other in using the Farasa segmenter that will split affixes from the text.AraBERT now comes in four new variants.All models are accessible on the HuggingFace model page under the aubmindlab name.AraBERT models are pre-trained on a massive collection of text and then fine-tuned for different tasks.Consequently,weused the AraBERT modelin two ways:first, weinvestigated and fine-tuned its parametersforthe Semantic Textual Similarity(STS)task,andthen,wefedthe AraBERTembeddings to a feed-forward layer containing one neuron with a linear activation function to predict the similarity scores.Second, we apply it as a feature extractor model to obtain a fixed-length tensor (usually 768 for AraBERT Base models and 1024 for AraBERT Large models).To obtain sentence embeddings,the average pooling of all tokens’layers is estimated.Then, feed these obtained embeddings to the regression models.In two variants, we compare the AraBERT model with the Multilingual BERT(mBERT) model [15].

    3.4 Features Extraction Phase

    In this phase, we use two methods to extract features from each sentence pair’s vectors: kernels and element-wise operations.To begin with, we want to maintain the discriminating power of lexicalbased similarity features when compared to the dimensionality of a vector derived from each BOW feature for each sentence.Consequently, we estimate sentence pair distances using 12 kernel functions and combine them with lexical-based similarity features to represent each sentence pair.Tab.3 shows the 12 kernel functions that were used in this work.We notice that these features are on different scales,which may have an impact on the fit of regression models in the following phase.Thus, we attempt to normalize them into [0, 1] using the max-min normalization technique and standardize them around 0 using the StandardScaler module before building regression models.

    Table 3: Used kernel functions

    The second method is the element-wise operations that contain a large category of operations such as arithmetic, comparison, and other operations that operate on corresponding elements within the respective tensors or vectors.For each sentence pair, we use two types of operations: multiplication and subtraction.Then, we concatenate the results into a single tensor.

    3.5 Regression Phase

    Different machine learning algorithms and deep learning models are considered for building regression models to make predictions that represent textual similarity scores.

    3.5.1 Machine Learning Regression Algorithms

    We investigate multiple learning algorithms for regression tasks such as Random Forest (RF),Support Vector Regressor (SVR), Nu Support Vector Regression (NuSVR), Gradient Boosting(GB), AdaBoost, least angle regression (LARS), Cross-validated Least Angle Regression (LARSCV),Bagging regressor, Stochastic Gradient Descent (SGD), Ridge regressor, Bayesian Ridge regressor,Decision Trees, Lasso regressor, Elastic Net, Polynomial regressor, and Extreme Gradient Boosting(XGB) [36].In Python, we use the scikit-learn toolkit [37] to implement these algorithms except for the XGB regressor, which we used the xgboost package to implement in our work.

    3.5.2 Deep Learning Models

    We implement a multilayer perceptron model (MLP) that comprises two hidden layers with the ReLU activation function.Sentence pairs’embeddings represent the input that is fed to these layers.The first hidden layer contains the number of input dimensions plus 50 hidden neurons, i.e., if the dimensions of the input embeddings are equal to (600,), the number of input dimensions plus 10 hidden nodes comprises the second hidden layer.This should be noted.We experimented with wider and deeper neural networks in this model, but the wider neural network outperformed the deeper neural network in the experiments; hence, we rely on this neural network architecture.We use Adam [38]as an optimization technique and use Mean Square Error (MSE) and Mean Absolute Error (MAE)as both loss and evaluation functions [39].Then, we set the validation split parameter to 0.2; hence,80% of the data is used to train the model, whereas the remaining 20% is used for testing purposes,with epochs of 100 and batch_size of 100.Finally, the output layer contains one hidden neuron with a linear activation function to make a prediction.In Python, we use the Keras API, based on the TensorFlow and Theano [40] packages, for executing high-level neural networks; thus, we needed to have TensorFlow installed on our system first.

    4 Experimental Settings

    We describe the concurrency concept in Section 4.1, and the evaluation metrics are described in Section 4.2.

    4.1 Concurrency Concepts

    Section 3.1 shows that the answer pairs’total number is enormous and needs a long time to process all these pairs, which may be up to several days.For the experiment, we selected a sample that includes eight training questions with their pairs to be preprocessed.The time taken in this experiment is 1hr: 23m: 48s.It has been a long time, so to speed up program execution on this dataset, we used the concurrency concept, which is about parallel computation.All our experiments were run on a CPU processor with four cores using the Python Interpreter 3.8.Thereare threetypes of concurrency concepts: multithreading [41] is also known as preemptive multitasking, as the OS knows about each thread and can interrupt at any moment to start executing on another thread.Second,Asyncio [42] is also referred to as cooperative multitasking because the tasks collaborate and decide when to relinquish control.Finally, multiprocessing [43] achieves true concurrent execution because the processes run concurrently on different processors on different CPU cores.We will only look at two types: multithreading and multiprocessing.

    ThreadPool is a technology for achieving concurrency of execution in a computer program.It keeps a pool of idle threads pre-instantiated and ready to be assigned tasks; hence, it eliminates the creation time required to create them one by one.Another advantage of thread pools is that a thread can be reused once its execution is complete.We use concurrent.futures, a Python standard library module that includes a concrete subclass known as Thread Pool Executer, which uses multithreading,and we get a pool of threads for submitting the tasks.The pool thus created assigns tasks to the available threads and arranges them to run.For applying the multiprocessing concept in Python, we use a multiprocessing library for creating multiprocess operations, in which the process class creates a queue object to store the results of each process, in which the Queue class.The multiprocessing module provides the pool class, which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes.There is no guarantee that multithreaded will be faster because it depends on the type of program; there is a performance difference between CPU-bound and I/O-bound programs.When the tasks are CPUintensive, we should consider the multiprocessing module.By contrast, the tasks are I/O bound and require plenty of connections; the multithreading concept is recommended.To demonstrate that the multithreading concept is best suited for our I/O bound program, we ran several experiments on different dataset samples.

    4.1.1 Experiments 1 and 2

    Table 4: Comparison between sequential computation/running and the two concepts of parallel computation in experiments 1 and 2

    As shown in Tab.4, both experiments 1 and 2 are applied to the first 8-10 questions with their pairs in the training dataset.In Fig.5, we observe the following: In both experiments, sequential computation took more execution time than parallel computation.Hence, we eliminate sequential computation in the following experiments.The two types of parallel computation outperform sequential computation, but the difference in execution time between multithreading and multiprocessing is trivial.

    Figure 5: Comparison between running time of sequential computation and the parallel computation

    4.1.2 Experiments 3 and 4

    Table 5: Comparison between sequential computation/running and the two concepts of parallel computation in experiments 3 and 4

    As shown in Tab.5, in both experiments 3 and 4, we apply multithreading and multiprocessing concepts and try to emerge with a hybrid concept, trying to decrease executing time.From the training dataset, we take a sample of 10 questions and their pairs, ranging from index 76 to index 86.However,only 6-7 questions were processed in the hybrid concept.In the fourth experiment, we hope to explain why only six of the 10 questions in the third experiment worked.The hybrid concept comes first in this experiment’s order.Then both other concepts are applied to the same number and order of questions that are processed first in the hybrid.We determined that 6-7 threads only worked because the total number of threads that can run on our CPU processor with its cores is eight threads if all cores are occupied.Thus, the peak number of threads that can be run is equal to or less than eight, on the condition that no other programs or processes are running.In Fig.6, we observe the following: The peak number of threads that can run on all CPU cores is not constant.It changes from one experiment to another.Hybrid concepts take a longer execution time when compared with multithreading and multiprocessing individually.We prove that the multithreading concept is the most suitable for our task as it takes the least amount of time to execute.

    Figure 6: Comparison between running time of sequential computation and the parallel computation

    4.2 Regression Models Evaluation Metrics

    We used different metrics to evaluate the performance of different regression models on different types of features, as listed below:

    Root Mean Square Error (RMSE): It is the squared root of the mean of summation of squared prediction error as shown in Eq.(2) [39].The prediction error of a row of data is shown in Eq.(1).It converts the value of errors back to the units of the output variable, which makes it meaningful for interpretation.Its value varies from 0 to∞.A value of 0 indicates a perfect fit; the smaller the value,the better the fit.

    Mean Absolute Error (MAE): It is the mean of the summation of absolute squared prediction errors as shown in Eq.(3).Compared to RMSE, MAE is robust to the presence of outliers because it uses the absolute value.A value of 0 indicates a perfect fit; the smaller the value, the better the fit [39].

    The coefficient of determination (R2score) is the square of the correlation coefficient (R).It determines how well the regression predictions explore the real data points as shown in the equation as shown in Eq.(4).

    R2varies from 0 to 1.A value of 1 indicates that the regression predictions perfectly fit the data[39].0 indicates that the model does not explain the variability of the response data around its mean.R2may be a negative value when the model selected does not appropriately represent the nature of the data.

    The Mean Absolute Percentage Error(MAPE)is a popular metric for assessing generic regression problems [44].It is given by the following formula, as shown in Eq.(6).We can multiply this formula by 100% to express the number as a percentage.

    5 Experiments and Results

    The experimental results are analyzed and shown in the following.A comparison between using stemmed and lemmatized data setups is discussed in Section 5.1.The frequency-based with lexicalbased features experiment is described in Section 5.2.The pre-trained models’experiment is described in Section 5.3.AraBERT as a feature-extracting model experiment is discussed in Section 5.4.Then,we compared the findings of the MLP model in terms of MAE and MAPE with the best regressors of all the previous experiments.

    5.1 Stemmed and Lemmatized Data Setups Experiment

    In this experiment, we aim to determine if using lemma to decrease the dimensionality of the vector represented by BOW (word counting, TFIDF, and POS weighting) features provides better outcomes in the regression phase or not.Then, in the regression phase, we identify the influence of lemmatization and stemming processes on the data as we try these two data setups to test the quality of them as indicated by the results of the regressor model, which are evaluated via the RMSE as shown in Eq.(2).We discard the SGD regressor because its results show that it does not properly explore the problem variables.We select the minimum average of the RMSE values of regression models with different features (word counting, TFIDF, and POS weighting) as shown in Fig.7.We conclude the following:

    Figure 7: Using stemmed and lemmatized data setups: (A) that represents using these data setups on word counting representation, (B) that represents using these data setups on TFIDF representation,and (C) that represents using these data setups on POS weighting representation

    In the word counting representation, the average of the RMSE values of regression models according to stemmed data setup equals 0.208107, and lemmatized data setup equals 0.371046067.Hence, we observe that the stemmed data setup is more appropriate for word counting than the lemmatized data setup.In the TFIDF representation, the average of the RMSE values of regression models according to stemmed data setup equals 0.452521867, and the lemmatized data setup equals 0.1848484.Hence, we observe that the lemmatized data setup is more appropriate for the TFIDF than the stemmed data setup.In the POS weighting representation, the average of the RMSE values of regression models according to stemmed data setup equals 0.186642, and the lemmatized data setup equals0.184832333.We realize that the difference between the two values is not particularly significant.Hence, the POS weighting features are neutral, indicating that their influence is minor in comparison to other features, as demonstrated by the following experiments.

    5.2 Frequency/Lexical Based Features’Experiment

    After applying kernel functions to the vectors obtained from each BOW feature for each sentence pair, with the purpose of keeping the discriminating power of lexical-based similarity features compared with them.In this experiment, we intend to filter which are the best regressors based on both BOWand lexical-based features to select the best pool of regressors and the best types of features,where theBOWfeature represents the word counting, POSweighting, and TFIDF features collectively.MED, Long Common Substring, and Long Common Subsequence are all represented by lexical-based features.All features represent all BOW and lexical-based features together, as shown in Fig.8.The filtering process is based on selecting the smallest value in both RMSE and MAE and the highest value in theR2metric.Then, in ascending order, we rank those different selected models based on MAE metric values, which is the best metric for this purpose because it does not reflect large residuals.According to Fig.9, we conclude the following:

    ?We notice the XGP regressor gives the best values in terms of MAE and MAPE.However, the values ofR2are negative, so we do not select it.

    ?The best regressors are the Gradient Boosting, AdaBoost, and Lasso regressors.

    ?The best representation schemes for features are Long Common Substring, Long Common Subsequence, and BOW features (word counting, TFIDF, and POS weighting together).

    Figure 9: Results of the best regressors on the BOWand lexical-based features according to (A) MAE evaluation metric, (B) R2 evaluation metric, and (C) RMSE evaluation metric

    Figure 9: Results of the best regressors on the BOW and lexical-based features according to (A) MAE evaluation metric, (B) R2evaluation metric, and (C) RMSE evaluation metric

    5.3 Pre-Trained Models Experiment

    In this experiment, we aim to filter the best regressors according to each sentence embedding(averagingword vectors, projected averagingword vectors, and SIF) to select the best pool of regressors and the best representation of sentence embeddings using the same evaluation metrics settings as shown in Fig.10.According to Fig.11, those regressors (AdaBoost, Gradient Boosting, and Ridge)show the best results.Hence, we use them to identify the best sentence embedding representations.We conclude that the best sentence embedding representation is SIF sentence embeddings.

    5.4 AraBERT as a Features-Extracting Model Experiment

    In this experiment, we aim to filter which are the best regressors according to (AraBERT v0.1, AraBERT v1, AraBERT v0.2, AraBERT v2, and mBERT) embedding models to select the best pool of regressors and the best of these embedding models with the same evaluation metrics settings as the previous, as shown in Fig.12.According to Fig.12, those four regressors (AdaBoost,Gradient Boosting, Elastic Net, and Lars) show the best results.Thus, we use them to identify the best embedding models.As shown in Fig.13, we conclude that the best embedding models are the AraBERT v2-large embedding model and the AraBERT v0.2-large embedding model.

    Figure 10: Results of different regressors on the sentence embeddings representations according to (A)RMSE evaluation metric, (B) MAE evaluation metric, and (C) R2evaluation metric

    Figure 11: (Continued)

    Figure 11: Results of the best regressors on the sentence embeddings representations according to (A)RMSE evaluation metric, (B) MAE evaluation metric, and (C) R2evaluation metric

    Figure 12: Results of different regressors on the five types of features according to (A) RMSE evaluation metric, (B) MAE evaluation metric, and (C) R2evaluation metric

    Figure 13:Results of the best regressors on the five types of features according to (A)RMSEevaluation metric, (B) R2 evaluation metric, and (C) MAE evaluation metric

    6 Discussion of Results and Implications

    In this paper, we used the AraBERT model in two different variants to estimate the similarity scores between text units.All the previous experiments demonstrate that these regressors (Gradient Boosting and AdaBoost) give the best results with these text embeddings (Long Common Substring,Long Common Subsequence, SIF, AraBERT v0.2, and AraBERT v2) in terms of RMSE, MAE, and R2.We compare the findings of the MLP model in terms of MAE and MAPE with Gradient Boosting and AdaBoost regressors on these embeddings, as shown in Tabs.6 and 7.In both previous tables,the bolded values in each row represent the best values (the smallest values) according to MAE and MAPE metrics, on the condition that the value of the metric for these values is positive.Consequently,some rows in Tab.7 contain two bolded values.For example, the best MAPE to SIF value is 21.1508 and the worst value is -0.0065.Consequently, the condition is not met, and 21.7922 is chosen.The same is true for the BOW features.According to MAE and MAPE, BOW features are eliminated because the best regressor for them varies.From both tables, we can determine the best regressor for each embedding model as follows: AraBERT v0.2-Large with AdaBoost, Long Common Substring with AdaBoost,SIFwith AdaBoost,Long Common Subsequencewith MLP,and AraBERTv2-Large with GradientBoosting.

    In the final experiment, we fine-tune the parameters of AraBERT v2 on the used dataset to estimate the relevancy scores between text units.Next, the comparison between the previous candidate models from Tab.7 and the fine-tuned AraBERT v2 is illustrated in Tab.8 in terms of the MAPE metric.Finally,weconcludethat AraBERTv0.2-Large as a feature extractormodelwithAdaBoost has the highest value in terms of R2and the variance in the MAPE values between it and others is minor.In addition, the AraBERT v0.2-Large as a feature extractor outperforms the fine-tuned AraBERT v2 model on the used data set in terms of R2.

    Table 6: Comparison between the MLP, gradient boosting, and Adaboost regressors according to MAE evaluation metric

    Table 7: Comparison between the MLP, gradient boosting, and Adaboost regressors according to MAPE evaluation metric

    Table 8: Best regressor for each embedding model according to our experiments

    According to our findings in this paper, we can use them as the first step in a variety of NLP tasks such as text ranking, question-answer systems, and essay grading.Additionally, we used the multithreading concurrency concept to reduce processing execution time and increase CPU utilization as much as possible.Based on the findings, we believe that it is a better method for preprocessing text pairs than sequential processing and that it can be applied to other datasets and language settings.

    7 Conclusions and Future Work

    In this paper, we addressed the textual similarity task, which is of paramount importance for multiple topics in NLP, such as text ranking, essay grading, question answering systems, and text classification.We used themulti-tasks learning approach to train an algorithm to learn the embeddings from our dataset to estimate the textual similarity scores between text units and use them later in multiple tasks.Our system is divided into two different variants.In the first, we used multiple text vectorization schemes such as word counts, TFIDF, and POS weighting as statistical-based approaches, and FastText and Aravec pre-trained models as prediction-based approaches, besides the AraBERT as a feature extractor model to obtain text embeddings.These embeddings are then fed to various regressors to estimate the relevancy scores between text units.In the second variant of the system, we exploited the AraBERT model as a pre-trained model and fine-tuned its parameters for the task ofmeasuring textual similarity.We conducted several experiments on the SemEval2017-task3-subtask-D dataset, and we proved that the usage of the AraBERT v0.2-Large as a feature extractor model with AdaBoost has the highest value in terms of R2.In addition, the variance in the MAPE values between it and the other models is minor.Moreover, we noticed that the usage of AraBERT v0.2-Large as a feature extractor outperforms the fine-tuned AraBERT v2 model on the used data set in terms of R2.As for future work, we intend to use the obtained similarity scores in other NLP tasks and use the AraGPT or AraELECTRA models to obtain different embeddings.

    Acknowledgement:This paper and the research behind it would not have been possible without the exceptional support of my God, my supervisors, my family, and my institution and colleagues.

    Funding Statement:The authors received no specific funding for this study.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    国产精品一区二区在线不卡| 国产精品国产三级专区第一集| 又大又黄又爽视频免费| 国产精品女同一区二区软件| 亚洲欧美精品综合一区二区三区| 欧美97在线视频| 99re6热这里在线精品视频| 国产精品av久久久久免费| 男人操女人黄网站| 丰满迷人的少妇在线观看| 99久国产av精品国产电影| 亚洲av成人不卡在线观看播放网 | 亚洲成人免费av在线播放| 欧美黑人欧美精品刺激| 只有这里有精品99| 亚洲 欧美一区二区三区| 日本欧美国产在线视频| 亚洲欧美色中文字幕在线| 国产高清不卡午夜福利| 国产成人91sexporn| 日日爽夜夜爽网站| 精品国产超薄肉色丝袜足j| 一区二区日韩欧美中文字幕| 美女大奶头黄色视频| 久久久久视频综合| 午夜91福利影院| 天天躁夜夜躁狠狠躁躁| 精品少妇黑人巨大在线播放| 国产熟女午夜一区二区三区| 两个人看的免费小视频| 欧美 日韩 精品 国产| 亚洲,欧美,日韩| 国产福利在线免费观看视频| xxx大片免费视频| 欧美日韩国产mv在线观看视频| 女人被躁到高潮嗷嗷叫费观| 美女高潮到喷水免费观看| 制服诱惑二区| 啦啦啦在线免费观看视频4| 在线观看www视频免费| 久久久欧美国产精品| 欧美精品人与动牲交sv欧美| 精品一区在线观看国产| 麻豆精品久久久久久蜜桃| 两个人免费观看高清视频| 男女下面插进去视频免费观看| 成人国语在线视频| www日本在线高清视频| av在线老鸭窝| 成年美女黄网站色视频大全免费| 日日撸夜夜添| 婷婷色综合www| 亚洲国产精品国产精品| 十八禁网站网址无遮挡| 亚洲av成人精品一二三区| 啦啦啦中文免费视频观看日本| 另类精品久久| 精品少妇黑人巨大在线播放| 午夜老司机福利片| 日韩免费高清中文字幕av| 亚洲av成人不卡在线观看播放网 | 国产又色又爽无遮挡免| 中文欧美无线码| 黑人欧美特级aaaaaa片| 麻豆精品久久久久久蜜桃| 18禁国产床啪视频网站| 亚洲一卡2卡3卡4卡5卡精品中文| 亚洲国产最新在线播放| 亚洲少妇的诱惑av| 美女午夜性视频免费| 国产免费现黄频在线看| 菩萨蛮人人尽说江南好唐韦庄| av在线播放精品| 国产免费视频播放在线视频| 久久久久久久大尺度免费视频| 午夜福利视频在线观看免费| 久久天堂一区二区三区四区| 如日韩欧美国产精品一区二区三区| 欧美精品av麻豆av| 妹子高潮喷水视频| 大香蕉久久网| 日本vs欧美在线观看视频| 精品少妇内射三级| 99久久人妻综合| 久久久久精品人妻al黑| 少妇人妻久久综合中文| 丰满少妇做爰视频| av在线老鸭窝| 肉色欧美久久久久久久蜜桃| 一级毛片 在线播放| netflix在线观看网站| 国产深夜福利视频在线观看| 欧美最新免费一区二区三区| 桃花免费在线播放| 亚洲一区中文字幕在线| 国产成人午夜福利电影在线观看| 街头女战士在线观看网站| 国产日韩一区二区三区精品不卡| 欧美 日韩 精品 国产| 老汉色av国产亚洲站长工具| 一区二区av电影网| www日本在线高清视频| 精品国产一区二区三区四区第35| 欧美xxⅹ黑人| av天堂久久9| 久久久久国产精品人妻一区二区| 久久国产精品男人的天堂亚洲| 精品免费久久久久久久清纯 | 国产片内射在线| 曰老女人黄片| 亚洲成人国产一区在线观看 | 久久久久精品国产欧美久久久 | 久久久国产一区二区| 五月天丁香电影| 秋霞伦理黄片| 国产黄色免费在线视频| 美女福利国产在线| 精品一区二区三卡| 青草久久国产| 日日摸夜夜添夜夜爱| 伊人久久国产一区二区| 天天躁夜夜躁狠狠久久av| 欧美国产精品一级二级三级| 国产精品免费视频内射| 岛国毛片在线播放| 精品一区二区免费观看| 五月天丁香电影| 欧美黑人精品巨大| 只有这里有精品99| 99久久综合免费| 极品少妇高潮喷水抽搐| 日韩av在线免费看完整版不卡| 激情五月婷婷亚洲| 在线亚洲精品国产二区图片欧美| 在线天堂最新版资源| 可以免费在线观看a视频的电影网站 | 国产精品免费视频内射| 美女高潮到喷水免费观看| 亚洲av男天堂| 九色亚洲精品在线播放| 另类亚洲欧美激情| 国产欧美日韩综合在线一区二区| 侵犯人妻中文字幕一二三四区| 人妻 亚洲 视频| 超色免费av| 日本爱情动作片www.在线观看| 国产精品蜜桃在线观看| 久久久久精品人妻al黑| 啦啦啦在线观看免费高清www| 99久国产av精品国产电影| 午夜免费观看性视频| 精品国产一区二区三区久久久樱花| 亚洲综合色网址| 秋霞在线观看毛片| 欧美日韩综合久久久久久| 日韩大片免费观看网站| 久久久久久免费高清国产稀缺| 五月开心婷婷网| 爱豆传媒免费全集在线观看| 丰满饥渴人妻一区二区三| 欧美日本中文国产一区发布| 99久久精品国产亚洲精品| 超碰成人久久| 精品少妇黑人巨大在线播放| 自拍欧美九色日韩亚洲蝌蚪91| 少妇 在线观看| 尾随美女入室| 国产成人欧美在线观看 | 精品一区二区三卡| 亚洲美女黄色视频免费看| 在线观看免费高清a一片| 一个人免费看片子| 街头女战士在线观看网站| 欧美日韩视频高清一区二区三区二| 婷婷色av中文字幕| 在线天堂中文资源库| 少妇人妻久久综合中文| 青春草国产在线视频| 国产精品.久久久| 狂野欧美激情性xxxx| av女优亚洲男人天堂| 欧美人与善性xxx| 国产麻豆69| 老司机影院成人| 亚洲国产精品999| 国产国语露脸激情在线看| 中文字幕精品免费在线观看视频| 另类亚洲欧美激情| 亚洲一级一片aⅴ在线观看| 最近的中文字幕免费完整| 久久精品aⅴ一区二区三区四区| 久久女婷五月综合色啪小说| 国产精品欧美亚洲77777| 亚洲国产成人一精品久久久| 亚洲精品视频女| 久久 成人 亚洲| 精品国产一区二区三区四区第35| 亚洲欧洲国产日韩| 免费黄色在线免费观看| 久久精品人人爽人人爽视色| 国产日韩一区二区三区精品不卡| videos熟女内射| 一级,二级,三级黄色视频| 亚洲 欧美一区二区三区| 国产一区有黄有色的免费视频| 汤姆久久久久久久影院中文字幕| 夜夜骑夜夜射夜夜干| 80岁老熟妇乱子伦牲交| 美女主播在线视频| 亚洲欧美中文字幕日韩二区| 九草在线视频观看| 欧美另类一区| 丝瓜视频免费看黄片| 亚洲av欧美aⅴ国产| 日日爽夜夜爽网站| 黄片小视频在线播放| 日韩制服骚丝袜av| 天堂俺去俺来也www色官网| 午夜福利免费观看在线| 国产亚洲最大av| 伊人久久大香线蕉亚洲五| 纵有疾风起免费观看全集完整版| 黄色怎么调成土黄色| videos熟女内射| 国产黄频视频在线观看| 国产片特级美女逼逼视频| 91精品三级在线观看| 日韩大片免费观看网站| 午夜精品国产一区二区电影| 十八禁人妻一区二区| 亚洲国产欧美网| av视频免费观看在线观看| 亚洲欧洲国产日韩| 99热全是精品| 欧美激情高清一区二区三区 | 波野结衣二区三区在线| 久久午夜综合久久蜜桃| 国产成人欧美| 80岁老熟妇乱子伦牲交| 色婷婷久久久亚洲欧美| 麻豆精品久久久久久蜜桃| 久久毛片免费看一区二区三区| 久久精品人人爽人人爽视色| 日韩人妻精品一区2区三区| 日本猛色少妇xxxxx猛交久久| 国产精品嫩草影院av在线观看| 美女午夜性视频免费| 欧美精品一区二区大全| 中文字幕色久视频| 免费日韩欧美在线观看| 精品第一国产精品| 又黄又粗又硬又大视频| 久久国产精品男人的天堂亚洲| 色94色欧美一区二区| 人人妻人人澡人人看| 毛片一级片免费看久久久久| 丝袜在线中文字幕| 不卡视频在线观看欧美| 亚洲熟女精品中文字幕| 国产av国产精品国产| 999久久久国产精品视频| 女性生殖器流出的白浆| 精品国产一区二区久久| 日韩欧美精品免费久久| 国产亚洲av片在线观看秒播厂| 亚洲成人av在线免费| 爱豆传媒免费全集在线观看| 美女国产高潮福利片在线看| 国产xxxxx性猛交| 国产精品久久久久久人妻精品电影 | 又粗又硬又长又爽又黄的视频| 欧美成人精品欧美一级黄| 电影成人av| 精品国产国语对白av| 人人妻人人爽人人添夜夜欢视频| 欧美精品av麻豆av| 国产av一区二区精品久久| 女人精品久久久久毛片| 国产乱人偷精品视频| 如日韩欧美国产精品一区二区三区| 美女国产高潮福利片在线看| 亚洲欧美一区二区三区黑人| 热re99久久国产66热| 人妻一区二区av| 极品少妇高潮喷水抽搐| 亚洲人成电影观看| 亚洲欧美中文字幕日韩二区| 欧美日韩一区二区视频在线观看视频在线| 91精品伊人久久大香线蕉| 熟女av电影| 亚洲国产欧美在线一区| 男女午夜视频在线观看| 婷婷色综合大香蕉| 蜜桃国产av成人99| 91aial.com中文字幕在线观看| 国产 精品1| 欧美日韩亚洲综合一区二区三区_| 国精品久久久久久国模美| 熟女av电影| 午夜久久久在线观看| 国产日韩欧美亚洲二区| 人人妻人人澡人人看| 亚洲av综合色区一区| 一级片免费观看大全| 老司机深夜福利视频在线观看 | 精品一品国产午夜福利视频| 午夜激情久久久久久久| 搡老岳熟女国产| 少妇人妻 视频| 美女视频免费永久观看网站| 久久精品亚洲av国产电影网| 亚洲三区欧美一区| 亚洲人成电影观看| 国产精品国产三级国产专区5o| 国产女主播在线喷水免费视频网站| 9191精品国产免费久久| 人人妻人人澡人人爽人人夜夜| 综合色丁香网| 人妻一区二区av| 久久久久精品国产欧美久久久 | 国产 精品1| 黄频高清免费视频| 日本欧美国产在线视频| 亚洲欧美一区二区三区黑人| 老司机亚洲免费影院| 99精国产麻豆久久婷婷| 人妻人人澡人人爽人人| 精品国产乱码久久久久久男人| 精品少妇内射三级| 欧美最新免费一区二区三区| 国产免费一区二区三区四区乱码| 90打野战视频偷拍视频| 久久精品国产综合久久久| 国产一区二区激情短视频 | 国产精品欧美亚洲77777| 免费少妇av软件| 精品一区二区免费观看| 涩涩av久久男人的天堂| 99久久精品国产亚洲精品| www.自偷自拍.com| 日韩制服骚丝袜av| 精品亚洲成a人片在线观看| 欧美日韩福利视频一区二区| 超碰97精品在线观看| 最新的欧美精品一区二区| 最近2019中文字幕mv第一页| 日韩人妻精品一区2区三区| 欧美国产精品va在线观看不卡| 欧美激情高清一区二区三区 | 久久天躁狠狠躁夜夜2o2o | 在线观看免费视频网站a站| xxx大片免费视频| 波多野结衣av一区二区av| 国产成人91sexporn| 99久久人妻综合| 亚洲欧洲国产日韩| 国产日韩欧美亚洲二区| 可以免费在线观看a视频的电影网站 | 国产精品一国产av| 黄色 视频免费看| 久久精品亚洲av国产电影网| 亚洲精品国产区一区二| 婷婷色综合大香蕉| 日韩制服丝袜自拍偷拍| 免费不卡黄色视频| 亚洲精品国产av蜜桃| 日本午夜av视频| 啦啦啦啦在线视频资源| 成人手机av| 韩国高清视频一区二区三区| 日本色播在线视频| 在线亚洲精品国产二区图片欧美| 亚洲欧美成人综合另类久久久| 欧美成人精品欧美一级黄| 波多野结衣一区麻豆| 亚洲五月色婷婷综合| 亚洲精品视频女| 亚洲欧美日韩另类电影网站| 一区二区三区精品91| 建设人人有责人人尽责人人享有的| 日韩av在线免费看完整版不卡| 最近的中文字幕免费完整| 男女边摸边吃奶| 欧美精品亚洲一区二区| 欧美日韩成人在线一区二区| 婷婷成人精品国产| 日韩伦理黄色片| 久久久久久久国产电影| 99久久精品国产亚洲精品| 男女午夜视频在线观看| 一本色道久久久久久精品综合| 国产亚洲精品第一综合不卡| 啦啦啦在线观看免费高清www| 欧美97在线视频| 亚洲伊人久久精品综合| 亚洲熟女毛片儿| av视频免费观看在线观看| 高清在线视频一区二区三区| 成人毛片60女人毛片免费| 中文精品一卡2卡3卡4更新| 婷婷色av中文字幕| 黄频高清免费视频| 校园人妻丝袜中文字幕| 日韩 欧美 亚洲 中文字幕| 久久这里只有精品19| 久久久精品区二区三区| 久久99一区二区三区| av卡一久久| 1024香蕉在线观看| 啦啦啦在线观看免费高清www| 韩国精品一区二区三区| 国产黄频视频在线观看| 下体分泌物呈黄色| 天天躁日日躁夜夜躁夜夜| 一区二区日韩欧美中文字幕| 精品久久久久久电影网| 欧美日韩精品网址| 天天添夜夜摸| 中国国产av一级| 亚洲国产欧美网| 国产精品嫩草影院av在线观看| 久久亚洲国产成人精品v| 色婷婷av一区二区三区视频| 欧美精品亚洲一区二区| 在线天堂最新版资源| av国产精品久久久久影院| 少妇被粗大的猛进出69影院| 成人国产av品久久久| 亚洲精品日本国产第一区| 久久久久久久久久久免费av| 久久热在线av| 国产淫语在线视频| 日韩制服骚丝袜av| 999精品在线视频| 亚洲精品日韩在线中文字幕| 亚洲自偷自拍图片 自拍| 大香蕉久久网| 欧美久久黑人一区二区| 九草在线视频观看| 日韩免费高清中文字幕av| 免费观看av网站的网址| 最近的中文字幕免费完整| 亚洲美女黄色视频免费看| 国产爽快片一区二区三区| 亚洲av男天堂| 精品人妻在线不人妻| 国产精品女同一区二区软件| 人人妻人人澡人人爽人人夜夜| 欧美乱码精品一区二区三区| 欧美在线一区亚洲| 免费av中文字幕在线| 丁香六月天网| 欧美中文综合在线视频| 久久亚洲国产成人精品v| 国精品久久久久久国模美| 大陆偷拍与自拍| 在现免费观看毛片| 国产精品偷伦视频观看了| 欧美日韩亚洲高清精品| 国产精品亚洲av一区麻豆 | 亚洲国产精品一区三区| 久久久国产精品麻豆| 亚洲四区av| 国产黄色免费在线视频| 国产爽快片一区二区三区| 一本—道久久a久久精品蜜桃钙片| 久久狼人影院| 日本av手机在线免费观看| 色94色欧美一区二区| 热99国产精品久久久久久7| 男女边摸边吃奶| 亚洲国产欧美日韩在线播放| 亚洲成国产人片在线观看| 久久久精品免费免费高清| 午夜免费鲁丝| 欧美日韩亚洲综合一区二区三区_| 欧美日韩av久久| 国产黄频视频在线观看| 高清在线视频一区二区三区| svipshipincom国产片| 国产成人一区二区在线| av国产精品久久久久影院| 香蕉丝袜av| 少妇人妻精品综合一区二区| 久久精品久久久久久噜噜老黄| 国产亚洲av片在线观看秒播厂| 黑人猛操日本美女一级片| 亚洲精品美女久久av网站| 亚洲精品国产av成人精品| 亚洲av欧美aⅴ国产| 999精品在线视频| 十分钟在线观看高清视频www| 国产av一区二区精品久久| 亚洲,一卡二卡三卡| 精品国产乱码久久久久久男人| 狂野欧美激情性bbbbbb| 别揉我奶头~嗯~啊~动态视频 | 午夜精品国产一区二区电影| 色综合欧美亚洲国产小说| 亚洲男人天堂网一区| 亚洲欧美一区二区三区久久| 一级,二级,三级黄色视频| 国产欧美日韩一区二区三区在线| 午夜激情久久久久久久| 捣出白浆h1v1| www日本在线高清视频| 久久性视频一级片| 日本91视频免费播放| 久久久久精品国产欧美久久久 | 亚洲少妇的诱惑av| 欧美少妇被猛烈插入视频| 亚洲精品美女久久av网站| 2021少妇久久久久久久久久久| 国产极品天堂在线| 亚洲伊人色综图| 久久久精品94久久精品| 五月开心婷婷网| a级毛片黄视频| 欧美人与性动交α欧美精品济南到| 少妇精品久久久久久久| 欧美日韩视频高清一区二区三区二| 最近中文字幕2019免费版| 国产精品一区二区精品视频观看| av福利片在线| 久久婷婷青草| 少妇猛男粗大的猛烈进出视频| 国产成人精品无人区| 丝袜在线中文字幕| 成人三级做爰电影| 看免费成人av毛片| 久久久久精品性色| 不卡视频在线观看欧美| 国产精品三级大全| 欧美老熟妇乱子伦牲交| 亚洲精品久久午夜乱码| 久久久国产精品麻豆| 亚洲欧美一区二区三区久久| 黄频高清免费视频| 欧美日韩精品网址| 免费人妻精品一区二区三区视频| 亚洲精品在线美女| 69精品国产乱码久久久| 老熟女久久久| 日韩中文字幕视频在线看片| 美女中出高潮动态图| 亚洲国产中文字幕在线视频| 人妻 亚洲 视频| 国产高清国产精品国产三级| 在线精品无人区一区二区三| 黄色毛片三级朝国网站| 美女脱内裤让男人舔精品视频| 欧美人与性动交α欧美软件| 大话2 男鬼变身卡| 最近的中文字幕免费完整| 大香蕉久久成人网| 丝袜在线中文字幕| 久久久精品国产亚洲av高清涩受| 国产亚洲av片在线观看秒播厂| 国产一区亚洲一区在线观看| 精品人妻在线不人妻| 亚洲av电影在线观看一区二区三区| 亚洲在久久综合| 黄色怎么调成土黄色| 欧美激情极品国产一区二区三区| 99九九在线精品视频| 精品免费久久久久久久清纯 | 两个人看的免费小视频| 亚洲精品美女久久av网站| 精品少妇内射三级| 亚洲欧美精品自产自拍| 欧美精品亚洲一区二区| 久久性视频一级片| 熟妇人妻不卡中文字幕| 日本色播在线视频| 久久久久国产精品人妻一区二区| 你懂的网址亚洲精品在线观看| 黄片无遮挡物在线观看| 在线观看免费午夜福利视频| 国产老妇伦熟女老妇高清| 女性生殖器流出的白浆| 精品国产国语对白av| 中文字幕另类日韩欧美亚洲嫩草| 亚洲精品日本国产第一区| 亚洲国产欧美一区二区综合| 秋霞在线观看毛片| 国产 一区精品| 自线自在国产av| 爱豆传媒免费全集在线观看| 在线观看免费视频网站a站| 中国国产av一级| 久久久国产精品麻豆| 免费在线观看视频国产中文字幕亚洲 | 亚洲伊人久久精品综合| 欧美日韩国产mv在线观看视频| 咕卡用的链子| 国产成人精品在线电影| netflix在线观看网站| 久久人妻熟女aⅴ| 日韩中文字幕欧美一区二区 | 操出白浆在线播放| 99热全是精品| 国产av一区二区精品久久| 色视频在线一区二区三区| 国产男女超爽视频在线观看| 国产 精品1| 最黄视频免费看| 在线观看一区二区三区激情| 亚洲欧洲日产国产| 欧美老熟妇乱子伦牲交| 色播在线永久视频| 久久99一区二区三区| 久久久精品区二区三区| 国产精品秋霞免费鲁丝片| 日韩电影二区| 人人澡人人妻人|