• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Embedding Extraction for Arabic Text Using the AraBERT Model

    2022-08-24 12:57:04AmiraHamedAboElghitTaherHamzaandAyaAlZoghby
    Computers Materials&Continua 2022年7期

    Amira Hamed Abo-Elghit, Taher Hamzaand Aya Al-Zoghby

    1Faculty of Computers and Information, Department of Computer Sciences, Mansoura University, Mansoura,35516, Egypt

    2Faculty of Computers and Artificial Intelligence, Department of Computer Sciences, Damietta University, Damietta,34517, Egypt

    Abstract: Nowadays, we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work, we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking, essay grading, and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features, frequency-based features, and pre-trained model-based features.Also, we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers (AraBERT).We used the AraBERT model in two different variants.First, as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second, AraBERT is adopted as a pre-trained model, and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results, we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor (21.7723) and the fine-tuned AraBERT v2 (21.8211).On the other hand, AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination (R2) values (0.014050,-0.032861), respectively.

    Keywords:Semantic textual similarity; arabic language; embeddings;AraBERT;pre-trained models; regression; contextual-based models; concurrency concept

    1 Introduction

    The textual similarity is a critical topic in Natural Language Processing (NLP).That is due to its increasingly important turn in related topics such as text classification, recovery of specific information from data, clustering, topic retrieval, subject tracking, question answering systems, essay grading, and summarization.The textual similarity process tends to estimate the relevancy between text units [1,2].The variations in the approaches existing in the literature review for textual similarity depend on the text representation scheme used before text comparison.Text representation is a significant task used to convert the unregulated form of textual data into a more formal construction before any additional text analysis or using it in predictive modeling [3].Text representation, word embeddings, or vectorization means converting the text to numbers, which can be integers or floating-point values, then using it as input to machine learning algorithms [4].We can divide the approaches of word embeddings into three categories: frequency-based or statistical-based, prediction-based or pre-trained, and contextualbased word embeddings.The frequency-based word embeddings approach is the traditional text modeling, which is based on the BOW representation.It contains One Hot Encoding (OHE), Hashing Vectorization,Part Of Speech (POS) Weighting[5], Word Counts,Term Frequency-Inverse Document Frequency (TFIDF) [4], and N-grams [6].These vectorization techniques of text representation work well; however, they fail to keep a semantic relation between words or the meaning of a text, not considering the context inwhich aword appears.Consequently, the order ofwords’occurrence is lost as we create a vector of tokens in randomized order, and they may provide a sparse vector that contains a lot of zeros.The prediction-based or pre-trained word embedding models are trained on a large collection of texts to build fixed-length and continuous-valued vectors in low-dimensional space.The embedding size can vary depending on the target size selected during the training.It includesWord2Vec[7], Doc2Vec [8], FastText [9], GloVe [10], Aravec [11], etc.Pre-trained models save the time spent on obtaining, cleaning, and processing (intensively) enormous datasets.However, it, unfortunately,does not consider the relations between multiple words and the overall sentences’meanings or context within the text.

    To overcome the above problems, contextual-based embedding models such as ELMo [12],ULMFiT [13], and BERT [14] are effective for learning complete sentence embeddings.They are used in sequence-level semantics learning of all the sequences in the documents.Thus, such models learn divergent embeddings for polysemous words.The ELMo model, e.g., is a dynamic language modeling technique to learn the embeddings of words based on context and considers the divergent embeddings of polysemous words.It contains two language models in each of the two directions that form a multilayer Recurrent Neural Network (RNN).The ULMFiT model is a left-to-right language model that boosts performance for some pre-trained models such as ELMo by including multiple fine-tuning techniques.By contrast to the ELMo, which incorporates the left-to-right and right-to-left models, the BERT uses bidirectional transformer training to provide more precise word embeddings.Three versions of BERT that address the Arabic language include the multilingual BERT (mBERT)[15] and two versions of AraBERT [16].

    The main objective of this research is to propose an algorithm for estimating the textual similarity scores between Arabic texts.Then, use these scores in multiple tasks, such as text ranking, essay grading, and question answering systems.Our detailed objectives are: 1) Choosing the best text vectorization scheme to represent texts in the used dataset.2) Picking the best regression model to make predictions represent the relevancy scores between text units from the applied regressors in terms of MAPE and R2Evaluation metrics.3) Reducing the execution time of processing and increasing the CPU utilization as much as possible.

    To implement our proposed algorithm, we used the AraBERT model in two different variants:first, we used it as a feature extractor model in addition to many other text embedding schemes such as word counts, TFIDF, and POS weighting as statistical-based approaches.Also, we use FastText and Aravec pre-trained models as prediction-based approaches.Then, we fed those features to several regressors to make a prediction value that represents the relevancy score between their input texts.Second, we address the AraBERT model as a pre-trained model and fine-tune its parameters on the measuring textual similarity task to use the obtained results in many other tasks later.

    The rest of this paper is organized as follows.The literature is reviewed in Section 2.Section 3 then describes the details of our proposed algorithm, and the experimental settings are introduced in Section 4.We present our experiments’details in Section 5.The discussion of results and implications is introduced in Section 6.Section 7 finally outlines the conclusion and suggests future work.

    2 Review of Literature

    Section 2.1 discusses the concept of textual similarity and the studies addressing it in the literature.Then, we address the recent research that used the AraBERT model in multiple NLP tasks in Section 2.2.

    2.1 Textual Similarity and Its Approaches

    In our previous work [1], we introduced a comprehensive overview of the textual similarity measurement approaches in the literature.We illustrated the differences between the categories of textual similarity concepts: lexical-based, semantic-based, and hybrid-based similarity in detail.We noticed that the differences in the approaches provided in the previous work depend on the text vectorization technique used before the text comparison process.There are various text vectorization techniques used, such as TFIDF, Latent Semantic Indexing (LSI) [17], and Graph-based Representation [18].Due to these techniques, the similarity measure to compare text units differs because one similarity measure may not be convenient for all representation schemes.We summarized the most prominent attempts to measure the different textual similarity types and compared them according to the applied technique of feature extraction, the used dataset, and the results released by each approach.Then we shed light on the semantic analysis in the Arabic language, which is divided into four approaches: the word co-occurrence approach, the LSI approach, the feature-based approach, and the hybrid-based approach.Regarding the previous taxonomy mentioned above, we reviewed some of those approaches and summarized them according to the applied technique, the used dataset, the aim of each one,the similarity type (string-based, corpus-based, knowledge-based, or hybrid-based), and the results obtained by each approach.

    Recently, [19] proposed a semantics-based approach for post-retrieval query-performance prediction depending on semantic similarities measured between entities in documents and queries.It consists of predictors for measuring semantic distinction, semantic query drift, and semantic cohesion in the top-ranked list of retrieved documents.The finding was that the proposed semantics approach is more effective in query performance predicting than the term-based methods by considering the semantic relatedness instead of the exact terms matching.They evaluated the proposed approach on the Robust04, ClueWeb09-B, and ClueWeb12-B datasets.Their queries’rankings are compared according to the proposed predictions and the actual values using Pearson and Kendall Correlation rank coefficients.

    On the other hand, [20] proposed a probabilistic framework that incorporates Bidirectional Encoder Representations from Transformers (BERT) via sentence-level semantics into Pseudo-Relevance Feedback (PRF).They obtained the term importance at the term level.Then, they used the fine-tuned BERT model to get the embeddings of the query and the sentences in the feedback document to estimate the relevancy score between them.Next, the term scores at the sentence level are summed.Finally, the term-level and sentence-level weights are balanced by factors and combining the top-k terms to generate a novel query for the next iteration of the processing.They conducted several experiments depending on six TREC datasets.As manifested by the evaluation indicators, the improved models outperformed the existing baseline models.

    2.2 Using AraBERT Model in NLP Tasks

    Several researchers have used the AraBERT model, either as a feature extractor model or by fine-tuning its parameters for a specific task.For example, [21] proposed three neural models:Bi-LSTM, CNN with FastText pre-trained word embeddings, and Transformer architecture with AraBERT embeddings.They are combined with three similarity measures for Arabic text similarity and plagiarism detection.They used the question similarity dataset for Semantic Textual Similarity(STS) called Mawdoo3 and the 2015 Arabic Pan dataset for plagiarism detection evaluation.Their results showed that the AraBERT-Transformer outperformed other models in terms of Pearson correlation with the Dot-Product-Similarity.

    Reference [22] is another research that combined different types of classical and contextual embeddings: pre-trained word embeddings such as FastText and Aravec, pooled contextual embeddings,and AraBERT embeddings for processing Arabic Named Entity Recognition (NER) task on the AQMAR dataset.These embeddings are then fed into the Bi-LSTM.The experiments showed that the combination of the pooled contextual embeddings, FastText embeddings, and BERT embeddings had achieved the best performance.The proposed method in this research has achieved an F1 score of 77.62 percent, which outperforms all previously published results of deep and non-deep learning models on the same dataset.

    Reference [23] paper addressed the pre-trained AraBERT model to learn complete contextual sentence embeddings to show its utilization in Arabic text multi-class categorization.They used it in two variants.The first is to transfer the AraBERT knowledge to the Arabic text categorization, and they fine-tuned the AraBERT’s parameters on the OSAC datasets.Second, they used it as a feature extractor model, then fed its results to several classifiers, including CNN, LSTM, Bi-LSTM, MLP,and SVM.After comprehensive experiments, the findings showed that the fine-tunedAraBERTmodel accomplished state-of-the-art performance results (99%) in terms of F1-score and accuracy.

    Reference [24] presented a binary classifier model to decide whether the pairs of verses provided by the QurSim dataset are semantically related or not.The AraBERT language model is used.They avoided redundancy and generated unrelated verse pairs from the QurSim dataset, dividing it into three datasets for comparisons.The experiments showed that the AraBERTv0.2 outperformed the AraBERTv2 on the three datasets in terms of accuracy score (92%).

    Finally, [25] is shared in the EACL WANLP-2021 Shared Task 2:“Sarcasm and Sentiment Detection.”and proposed a strategy consisting of two systems.The first system investigated whether a given Arabic tweet was sarcastic or not, which required performing deletions, segmentation, and insertion operations on different parts of the text.The other system aimed to detect the sentiment of the Arabic tweet from the ArSarcasm-v2 dataset that involved experimenting with multiple versions of two transformer-based models, AraELECTRA and AraBERT.They achieved the seventh and fourth places in the sarcasm and sentiment detection subtasks, respectively.

    3 Methodology

    This section extensively presents the methodology implemented for developing the proposed system.First, we start by describing the dataset used in this work.Then, we explain the proposed method and its modules.

    3.1 Dataset

    In this paper, we use the SemEval2017-task3 (Community Question Answering)-subtask-D(Rerank correct answers for a new question) dataset, which refers to the Arabic CQA-MD (Community Question Answering-Medical Domain) dataset [26].It was collected from three Arabic medical websites (WebTeb, Altibbi, and Islamweb) that permit posting questions related to health and medical conditions by visitors and getting answers from professional doctors.It was divided into training,development, and testing datasets.Every dataset file includes a sequence of threads that begins with the original question and is associated with a list of 30 question-answer pairs, each with the following labels: D (Direct) means the QA pair contains a direct answer to the original question.R (Related)means the QA pair includes an answer to the original question that covers some of the aspects raised in the original question.In the end, I (irrelevant) means the QA pair contains an answer irrelevant to the original question.

    Fig.1 illustrates annotated questions from the dataset.Also, each QA pair is associated with some metadata, including the following: ID (QAID) is a unique ID of the question-answer pair.Relevance(QArel): the relevance of the question-answer pair concerning the question, which is to be predicted at test time, and Confidence (QAconf): this is the confidence value for the relevance annotation, based on inter-annotator agreement and other factors.This value is available for the training dataset only; it is not available for the development and test datasets.

    Figure 1: Annotated question from the Arabic CQA-MD dataset

    So, we use this dataset (training and development) with this associated metadata to accomplish our primary research objective: estimate the relevancy scores between text pairs.We consider the confidence (QAconf) values as the relevancy score between the question and its QA pairs.

    3.2 Text Preprocessing Phase

    In Fig.2, we propose two models for preprocessing: simple preprocessing and full preprocessing,depending on the nature of the task of the subsequent phases.For instance, we only need some preprocessing steps to transformdata into a formthatmatches theAraBERTmodel, such as removing diacritics, punctuations, and URL text.So, we consider this situation in our proposed methodology and define two types of preprocessing steps.The simple preprocessing procedure includes diacritics removal (Tashkeel_Removing Function), punctuations, URL text removal, and spell checking.Then,we apply the tokenization task to split the text into its tokens using the AraBERT tokenizer.Afterward,we change each text to a BERT format by adding the particular [CLS] token at the start of each text and a [SEP] token between the sentence and the end.Then, we determine each token’s index according to AraBERT’s vocabulary.The full preprocessing contains the same steps as the previous preprocessing type, in addition to stopwords removing, named entity recognition (NER), stemming,and lemmatization tasks, respectively.But there is a difference between the tokenization task in both algorithms.

    To complete the diacritics removing task (Tashkeel_Removing function), we use the Tashaphyne Python library [27], which is an Arabic light stemmer and segmentor.In the normalize module of this package, we specifically use the function strip_tashkeel.We define a set of patterns to detect any punctuation symbols and URL text in the text using the re python library, which provides several functions to facilitate the search for a specific pattern or string form in the text and remove it from the text.Next, we use Farasa [28], an Arabic Nag (ANLP) toolkit serving the spellchecking task and several other tasks such as segmentation, stemming,Named Entity Recognition,stemming,Named Entity Recognition(NER), and part-of-speech tagging.As shown in Algorithm 1 of the full preprocessing, we use a builtin function in Python called split that allows changing the default splitter from space to any symbol or character if we need it.Consequently, we remove them from sentences using the Natural Language Toolkit (NLTK) Python package, which includes a stopwords corpus containing stopwords’lists for Arabic and many other languages [29].The Farasa Named Entity Recognizer is used to generate a list of named entities in text.The aim behind using this technique as a step of preprocessing steps is to keep the named entities found in a text without any change that may be happening to them in the stemming task, as shown in Tab.1.

    Arabic stemmers are categorized into two categories: light-based stemmers and root-based stemmers.This type is used in the stemming step with the Farasa Stemmer web API.Also, we use Khoja stemmer as a root-based stemmer [30].Consequently, after applying the NER and stemming processes to a text, we compare the output list from the NER process to the output list from the stemming process to obtain the final representation of the given text, as shown in Fig.3.Thus, we are given a set of questions, each of which is associated with a set called P that includes question-answer pairs.To compute our features, we define a question with its question-answer pairs as<T1, T2>,where T1 is the original question and T2 is a question from its question-answer pair according to three setups:

    ?Simple processed data setup in which we perform simple preprocessing on T1 and T2 before using them in the AraBERT model.

    ?Stemmed data setup in which the stemming process from the full preprocessing phase is applied to T1 and T2.

    ?Lemmatized data setup in which the lemmatization process from the full preprocessing phase is applied to T1 and T2.

    Figure 2: System architecture

    Algorithm 1: Full Text Preprocessing 1.Function: Full_Preprocessing (qi,pi)2.Input:3 qi: question 4 pi: list of approximately 30 pair of answers retrieved for qi 5.Output:6.stemmed_ques: string object represents stemmed version of preprocessed question //initially null 7.lemmatized_ques:stringobjectrepresentslemmatizedversionofpreprocessedquestion//initially null 8.stemmed_pairs: preprocessed list of stemmed answers for this question // initially empty 9.lemmatized_pairs: preprocessed list of lemmatized answers for this question // initially empty 10.Variables:11.qo: preprocessed question 12.po: a list of preprocessed answers for qo 13.token_list: list of tokens, initially empty 14.tokens_after_stopwords_remove: list of tokens after removes stopwords from them //initially empty 15.ner_list: list of tuples of NER process // initially empty 16.stems_list: list of stems of tokens // initially empty 17.stemmed_ques: string object represents the stemmed version of the question 18.lemmatized_ques: string object represents the lemmatized version of the question 19.stemmed_pairs: preprocessed list of stemmed answers for this question 20.lemmatized_pairs: preprocessed list of lemmatized answers for this question 21.Begin 22.qo= Tashkeel_Removing (qi)23.qo= URL_Removing (qi)24.qo= Punctuation_Removing (qi)25.qo= Spellchecking (qi)26.tokens_list=Tokenization (qi)27.tokens_after_stopwords_remove=Stopwords_Removing (tokens_list)28.ner_list=Named Entity Recognition (tokens_after_stopwords_remove)29.stems_list=Stemming (tokens_after_stopwords_remove)30.stemmed_ques=Compare (ner_list, stems_list)31.lemmatized_ques=Lemmatize (stemmed_ques)32.for answer in pianswer=0, 1, ...do 33.po[answer] = Tashkeel_Removing (pi[answer])34.po[answer] = URL_Removing (pi[answer])35.po[answer] = Punctuation_Removing (pi[answer])(Continued)

    Algorithm 1: Continued 36.po[answer] = Spellchecking (pi[answer])37.token_list = Tokenization (pi[answer])38.tokens_after_stopwords_remove = Stopwords_Removing (token_list)39.ner_list = Named Entity Recognition (tokens_after_stopwords_remove)40.stems_list = Stemming (tokens_after_stopwords_remove)41.stemmed_pairs.insert (answer, Compare (ner_list, stems_list ))42.lemmatized_pairs.insert (answer, Lemmatize (stems_list ))43.return stemmed_ques, lemmatized_ques, stemmed_pairs, lemmatized_pairs 44.End

    Table 1: Representation of the influence of the spell checking and NER processes on stemming and lemmatization results

    Figure 3: Comparison between NER and stemming processes’results

    3.3 Text Vectorization Phase

    This phase consists of two modules: the traditional features module and the AraBERT model.

    3.3.1 Traditional Features Module

    In this phase, we execute multiple feature engineering techniques.To begin, we employ threesentence pair matching metrics: Long Common Substring/Sequence [31], Levenshtein distance, and Minimum Edit Distance (MED) [32], which are intended to directly calculate the similarity (overlapping of characters/terms/substrings) of two sequences.To obtain an accurate sequence similarity value, the stopwords are removed and each word is lemmatized.Consequently, for each sentence pair, we get three features as lexical-based similarity features.Second, we apply three types of statistical-based embedding techniques: word counting, TFIDF, and POS weighting.We use the sklearn.feature_extraction module [33] to extract these features in a format reinforced by machine learning algorithms fromthe dataset.We consider the sparsity problem, which may be caused by them,and try to solve it by applying these steps in the preprocessing phase: stopwords from being removed,fixing misspelt words, and reducing words to their lemma.

    To ensure that these steps have an effect, we calculate the number of vocabulary in our dataset;it is approximately 21835 in the stemmed data setups and it becomes approximately 10988 in the lemmatized data setup.We notice that the number of vocabulary words decreased using the lemmatization process.Consequently, the dimensionality of vectors decreases.Third, we apply some pre-trainedword embedding models: FastText and Aravec, both ofwhich apply to the Arabic language.Tab.2 shows the versions of pre-trained word embedding that were used in this study.In Python, we use the Genism library that provides access to FastText and other word embedding algorithms for training and extracting word vectors; it allows us to download pre-trained models from the internet to be loaded and fine-tuned [34].

    We try each of these models individually to initialize word embedding, although we sometimes cannot find embeddings for some words in a sentence.Consequently, we combined them to complete each other and obtain a large number of word embeddings.Because of the nature of our used dataset,a sentence may contain foreign words that are the names of medicines or diseases that are not found in the Aravec or Arabic Fast text models.To address this issue, we used the FastText model’s multilingual model, as shown in Fig.4.We notice that some of the words are misspelled, so we cannot use an embedding model to get the correct embedding for them.Thus, spell checking is an essential step in the preprocessing phase.Second, we must convert some words to their lemma form to get their embeddings from pre-trained models such as the Aravec model.However, there are some terms that are not found in an embeddingmodel even after correcting them; thus, we ignore them froma sentence.

    Table 2: Pre-trained word embedding models

    Figure 4: Process of obtaining word embeddings from the pre-trained models

    To obtain a single sentence vector representing the embedding of each sentence, we adopt several methods, such as averaging the word vectors that form each sentence, the averaged vectors that result are multiplied by a projection matrix, and using smooth inverse frequency (SIF) [35] to estimate each word-embedding weight by a / (a + p(w)), where a is a parameter that is typically set to 0.001 and p(w) is the frequency of the word in a dataset, contrary to the previous methods that assign equivalent weights to each word in the sentence.

    3.3.2 AraBERT Model

    Because we are dealing with Arabic texts, we use the AraBERT model, which is an Arabic pre-trained language model based on the BERT architecture [14,16].There are four releases of it:AraBERT v0.1, AraBERT v1, AraBERT v0.2, and AraBERT v2.They may differ from each other in using the Farasa segmenter that will split affixes from the text.AraBERT now comes in four new variants.All models are accessible on the HuggingFace model page under the aubmindlab name.AraBERT models are pre-trained on a massive collection of text and then fine-tuned for different tasks.Consequently,weused the AraBERT modelin two ways:first, weinvestigated and fine-tuned its parametersforthe Semantic Textual Similarity(STS)task,andthen,wefedthe AraBERTembeddings to a feed-forward layer containing one neuron with a linear activation function to predict the similarity scores.Second, we apply it as a feature extractor model to obtain a fixed-length tensor (usually 768 for AraBERT Base models and 1024 for AraBERT Large models).To obtain sentence embeddings,the average pooling of all tokens’layers is estimated.Then, feed these obtained embeddings to the regression models.In two variants, we compare the AraBERT model with the Multilingual BERT(mBERT) model [15].

    3.4 Features Extraction Phase

    In this phase, we use two methods to extract features from each sentence pair’s vectors: kernels and element-wise operations.To begin with, we want to maintain the discriminating power of lexicalbased similarity features when compared to the dimensionality of a vector derived from each BOW feature for each sentence.Consequently, we estimate sentence pair distances using 12 kernel functions and combine them with lexical-based similarity features to represent each sentence pair.Tab.3 shows the 12 kernel functions that were used in this work.We notice that these features are on different scales,which may have an impact on the fit of regression models in the following phase.Thus, we attempt to normalize them into [0, 1] using the max-min normalization technique and standardize them around 0 using the StandardScaler module before building regression models.

    Table 3: Used kernel functions

    The second method is the element-wise operations that contain a large category of operations such as arithmetic, comparison, and other operations that operate on corresponding elements within the respective tensors or vectors.For each sentence pair, we use two types of operations: multiplication and subtraction.Then, we concatenate the results into a single tensor.

    3.5 Regression Phase

    Different machine learning algorithms and deep learning models are considered for building regression models to make predictions that represent textual similarity scores.

    3.5.1 Machine Learning Regression Algorithms

    We investigate multiple learning algorithms for regression tasks such as Random Forest (RF),Support Vector Regressor (SVR), Nu Support Vector Regression (NuSVR), Gradient Boosting(GB), AdaBoost, least angle regression (LARS), Cross-validated Least Angle Regression (LARSCV),Bagging regressor, Stochastic Gradient Descent (SGD), Ridge regressor, Bayesian Ridge regressor,Decision Trees, Lasso regressor, Elastic Net, Polynomial regressor, and Extreme Gradient Boosting(XGB) [36].In Python, we use the scikit-learn toolkit [37] to implement these algorithms except for the XGB regressor, which we used the xgboost package to implement in our work.

    3.5.2 Deep Learning Models

    We implement a multilayer perceptron model (MLP) that comprises two hidden layers with the ReLU activation function.Sentence pairs’embeddings represent the input that is fed to these layers.The first hidden layer contains the number of input dimensions plus 50 hidden neurons, i.e., if the dimensions of the input embeddings are equal to (600,), the number of input dimensions plus 10 hidden nodes comprises the second hidden layer.This should be noted.We experimented with wider and deeper neural networks in this model, but the wider neural network outperformed the deeper neural network in the experiments; hence, we rely on this neural network architecture.We use Adam [38]as an optimization technique and use Mean Square Error (MSE) and Mean Absolute Error (MAE)as both loss and evaluation functions [39].Then, we set the validation split parameter to 0.2; hence,80% of the data is used to train the model, whereas the remaining 20% is used for testing purposes,with epochs of 100 and batch_size of 100.Finally, the output layer contains one hidden neuron with a linear activation function to make a prediction.In Python, we use the Keras API, based on the TensorFlow and Theano [40] packages, for executing high-level neural networks; thus, we needed to have TensorFlow installed on our system first.

    4 Experimental Settings

    We describe the concurrency concept in Section 4.1, and the evaluation metrics are described in Section 4.2.

    4.1 Concurrency Concepts

    Section 3.1 shows that the answer pairs’total number is enormous and needs a long time to process all these pairs, which may be up to several days.For the experiment, we selected a sample that includes eight training questions with their pairs to be preprocessed.The time taken in this experiment is 1hr: 23m: 48s.It has been a long time, so to speed up program execution on this dataset, we used the concurrency concept, which is about parallel computation.All our experiments were run on a CPU processor with four cores using the Python Interpreter 3.8.Thereare threetypes of concurrency concepts: multithreading [41] is also known as preemptive multitasking, as the OS knows about each thread and can interrupt at any moment to start executing on another thread.Second,Asyncio [42] is also referred to as cooperative multitasking because the tasks collaborate and decide when to relinquish control.Finally, multiprocessing [43] achieves true concurrent execution because the processes run concurrently on different processors on different CPU cores.We will only look at two types: multithreading and multiprocessing.

    ThreadPool is a technology for achieving concurrency of execution in a computer program.It keeps a pool of idle threads pre-instantiated and ready to be assigned tasks; hence, it eliminates the creation time required to create them one by one.Another advantage of thread pools is that a thread can be reused once its execution is complete.We use concurrent.futures, a Python standard library module that includes a concrete subclass known as Thread Pool Executer, which uses multithreading,and we get a pool of threads for submitting the tasks.The pool thus created assigns tasks to the available threads and arranges them to run.For applying the multiprocessing concept in Python, we use a multiprocessing library for creating multiprocess operations, in which the process class creates a queue object to store the results of each process, in which the Queue class.The multiprocessing module provides the pool class, which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes.There is no guarantee that multithreaded will be faster because it depends on the type of program; there is a performance difference between CPU-bound and I/O-bound programs.When the tasks are CPUintensive, we should consider the multiprocessing module.By contrast, the tasks are I/O bound and require plenty of connections; the multithreading concept is recommended.To demonstrate that the multithreading concept is best suited for our I/O bound program, we ran several experiments on different dataset samples.

    4.1.1 Experiments 1 and 2

    Table 4: Comparison between sequential computation/running and the two concepts of parallel computation in experiments 1 and 2

    As shown in Tab.4, both experiments 1 and 2 are applied to the first 8-10 questions with their pairs in the training dataset.In Fig.5, we observe the following: In both experiments, sequential computation took more execution time than parallel computation.Hence, we eliminate sequential computation in the following experiments.The two types of parallel computation outperform sequential computation, but the difference in execution time between multithreading and multiprocessing is trivial.

    Figure 5: Comparison between running time of sequential computation and the parallel computation

    4.1.2 Experiments 3 and 4

    Table 5: Comparison between sequential computation/running and the two concepts of parallel computation in experiments 3 and 4

    As shown in Tab.5, in both experiments 3 and 4, we apply multithreading and multiprocessing concepts and try to emerge with a hybrid concept, trying to decrease executing time.From the training dataset, we take a sample of 10 questions and their pairs, ranging from index 76 to index 86.However,only 6-7 questions were processed in the hybrid concept.In the fourth experiment, we hope to explain why only six of the 10 questions in the third experiment worked.The hybrid concept comes first in this experiment’s order.Then both other concepts are applied to the same number and order of questions that are processed first in the hybrid.We determined that 6-7 threads only worked because the total number of threads that can run on our CPU processor with its cores is eight threads if all cores are occupied.Thus, the peak number of threads that can be run is equal to or less than eight, on the condition that no other programs or processes are running.In Fig.6, we observe the following: The peak number of threads that can run on all CPU cores is not constant.It changes from one experiment to another.Hybrid concepts take a longer execution time when compared with multithreading and multiprocessing individually.We prove that the multithreading concept is the most suitable for our task as it takes the least amount of time to execute.

    Figure 6: Comparison between running time of sequential computation and the parallel computation

    4.2 Regression Models Evaluation Metrics

    We used different metrics to evaluate the performance of different regression models on different types of features, as listed below:

    Root Mean Square Error (RMSE): It is the squared root of the mean of summation of squared prediction error as shown in Eq.(2) [39].The prediction error of a row of data is shown in Eq.(1).It converts the value of errors back to the units of the output variable, which makes it meaningful for interpretation.Its value varies from 0 to∞.A value of 0 indicates a perfect fit; the smaller the value,the better the fit.

    Mean Absolute Error (MAE): It is the mean of the summation of absolute squared prediction errors as shown in Eq.(3).Compared to RMSE, MAE is robust to the presence of outliers because it uses the absolute value.A value of 0 indicates a perfect fit; the smaller the value, the better the fit [39].

    The coefficient of determination (R2score) is the square of the correlation coefficient (R).It determines how well the regression predictions explore the real data points as shown in the equation as shown in Eq.(4).

    R2varies from 0 to 1.A value of 1 indicates that the regression predictions perfectly fit the data[39].0 indicates that the model does not explain the variability of the response data around its mean.R2may be a negative value when the model selected does not appropriately represent the nature of the data.

    The Mean Absolute Percentage Error(MAPE)is a popular metric for assessing generic regression problems [44].It is given by the following formula, as shown in Eq.(6).We can multiply this formula by 100% to express the number as a percentage.

    5 Experiments and Results

    The experimental results are analyzed and shown in the following.A comparison between using stemmed and lemmatized data setups is discussed in Section 5.1.The frequency-based with lexicalbased features experiment is described in Section 5.2.The pre-trained models’experiment is described in Section 5.3.AraBERT as a feature-extracting model experiment is discussed in Section 5.4.Then,we compared the findings of the MLP model in terms of MAE and MAPE with the best regressors of all the previous experiments.

    5.1 Stemmed and Lemmatized Data Setups Experiment

    In this experiment, we aim to determine if using lemma to decrease the dimensionality of the vector represented by BOW (word counting, TFIDF, and POS weighting) features provides better outcomes in the regression phase or not.Then, in the regression phase, we identify the influence of lemmatization and stemming processes on the data as we try these two data setups to test the quality of them as indicated by the results of the regressor model, which are evaluated via the RMSE as shown in Eq.(2).We discard the SGD regressor because its results show that it does not properly explore the problem variables.We select the minimum average of the RMSE values of regression models with different features (word counting, TFIDF, and POS weighting) as shown in Fig.7.We conclude the following:

    Figure 7: Using stemmed and lemmatized data setups: (A) that represents using these data setups on word counting representation, (B) that represents using these data setups on TFIDF representation,and (C) that represents using these data setups on POS weighting representation

    In the word counting representation, the average of the RMSE values of regression models according to stemmed data setup equals 0.208107, and lemmatized data setup equals 0.371046067.Hence, we observe that the stemmed data setup is more appropriate for word counting than the lemmatized data setup.In the TFIDF representation, the average of the RMSE values of regression models according to stemmed data setup equals 0.452521867, and the lemmatized data setup equals 0.1848484.Hence, we observe that the lemmatized data setup is more appropriate for the TFIDF than the stemmed data setup.In the POS weighting representation, the average of the RMSE values of regression models according to stemmed data setup equals 0.186642, and the lemmatized data setup equals0.184832333.We realize that the difference between the two values is not particularly significant.Hence, the POS weighting features are neutral, indicating that their influence is minor in comparison to other features, as demonstrated by the following experiments.

    5.2 Frequency/Lexical Based Features’Experiment

    After applying kernel functions to the vectors obtained from each BOW feature for each sentence pair, with the purpose of keeping the discriminating power of lexical-based similarity features compared with them.In this experiment, we intend to filter which are the best regressors based on both BOWand lexical-based features to select the best pool of regressors and the best types of features,where theBOWfeature represents the word counting, POSweighting, and TFIDF features collectively.MED, Long Common Substring, and Long Common Subsequence are all represented by lexical-based features.All features represent all BOW and lexical-based features together, as shown in Fig.8.The filtering process is based on selecting the smallest value in both RMSE and MAE and the highest value in theR2metric.Then, in ascending order, we rank those different selected models based on MAE metric values, which is the best metric for this purpose because it does not reflect large residuals.According to Fig.9, we conclude the following:

    ?We notice the XGP regressor gives the best values in terms of MAE and MAPE.However, the values ofR2are negative, so we do not select it.

    ?The best regressors are the Gradient Boosting, AdaBoost, and Lasso regressors.

    ?The best representation schemes for features are Long Common Substring, Long Common Subsequence, and BOW features (word counting, TFIDF, and POS weighting together).

    Figure 9: Results of the best regressors on the BOWand lexical-based features according to (A) MAE evaluation metric, (B) R2 evaluation metric, and (C) RMSE evaluation metric

    Figure 9: Results of the best regressors on the BOW and lexical-based features according to (A) MAE evaluation metric, (B) R2evaluation metric, and (C) RMSE evaluation metric

    5.3 Pre-Trained Models Experiment

    In this experiment, we aim to filter the best regressors according to each sentence embedding(averagingword vectors, projected averagingword vectors, and SIF) to select the best pool of regressors and the best representation of sentence embeddings using the same evaluation metrics settings as shown in Fig.10.According to Fig.11, those regressors (AdaBoost, Gradient Boosting, and Ridge)show the best results.Hence, we use them to identify the best sentence embedding representations.We conclude that the best sentence embedding representation is SIF sentence embeddings.

    5.4 AraBERT as a Features-Extracting Model Experiment

    In this experiment, we aim to filter which are the best regressors according to (AraBERT v0.1, AraBERT v1, AraBERT v0.2, AraBERT v2, and mBERT) embedding models to select the best pool of regressors and the best of these embedding models with the same evaluation metrics settings as the previous, as shown in Fig.12.According to Fig.12, those four regressors (AdaBoost,Gradient Boosting, Elastic Net, and Lars) show the best results.Thus, we use them to identify the best embedding models.As shown in Fig.13, we conclude that the best embedding models are the AraBERT v2-large embedding model and the AraBERT v0.2-large embedding model.

    Figure 10: Results of different regressors on the sentence embeddings representations according to (A)RMSE evaluation metric, (B) MAE evaluation metric, and (C) R2evaluation metric

    Figure 11: (Continued)

    Figure 11: Results of the best regressors on the sentence embeddings representations according to (A)RMSE evaluation metric, (B) MAE evaluation metric, and (C) R2evaluation metric

    Figure 12: Results of different regressors on the five types of features according to (A) RMSE evaluation metric, (B) MAE evaluation metric, and (C) R2evaluation metric

    Figure 13:Results of the best regressors on the five types of features according to (A)RMSEevaluation metric, (B) R2 evaluation metric, and (C) MAE evaluation metric

    6 Discussion of Results and Implications

    In this paper, we used the AraBERT model in two different variants to estimate the similarity scores between text units.All the previous experiments demonstrate that these regressors (Gradient Boosting and AdaBoost) give the best results with these text embeddings (Long Common Substring,Long Common Subsequence, SIF, AraBERT v0.2, and AraBERT v2) in terms of RMSE, MAE, and R2.We compare the findings of the MLP model in terms of MAE and MAPE with Gradient Boosting and AdaBoost regressors on these embeddings, as shown in Tabs.6 and 7.In both previous tables,the bolded values in each row represent the best values (the smallest values) according to MAE and MAPE metrics, on the condition that the value of the metric for these values is positive.Consequently,some rows in Tab.7 contain two bolded values.For example, the best MAPE to SIF value is 21.1508 and the worst value is -0.0065.Consequently, the condition is not met, and 21.7922 is chosen.The same is true for the BOW features.According to MAE and MAPE, BOW features are eliminated because the best regressor for them varies.From both tables, we can determine the best regressor for each embedding model as follows: AraBERT v0.2-Large with AdaBoost, Long Common Substring with AdaBoost,SIFwith AdaBoost,Long Common Subsequencewith MLP,and AraBERTv2-Large with GradientBoosting.

    In the final experiment, we fine-tune the parameters of AraBERT v2 on the used dataset to estimate the relevancy scores between text units.Next, the comparison between the previous candidate models from Tab.7 and the fine-tuned AraBERT v2 is illustrated in Tab.8 in terms of the MAPE metric.Finally,weconcludethat AraBERTv0.2-Large as a feature extractormodelwithAdaBoost has the highest value in terms of R2and the variance in the MAPE values between it and others is minor.In addition, the AraBERT v0.2-Large as a feature extractor outperforms the fine-tuned AraBERT v2 model on the used data set in terms of R2.

    Table 6: Comparison between the MLP, gradient boosting, and Adaboost regressors according to MAE evaluation metric

    Table 7: Comparison between the MLP, gradient boosting, and Adaboost regressors according to MAPE evaluation metric

    Table 8: Best regressor for each embedding model according to our experiments

    According to our findings in this paper, we can use them as the first step in a variety of NLP tasks such as text ranking, question-answer systems, and essay grading.Additionally, we used the multithreading concurrency concept to reduce processing execution time and increase CPU utilization as much as possible.Based on the findings, we believe that it is a better method for preprocessing text pairs than sequential processing and that it can be applied to other datasets and language settings.

    7 Conclusions and Future Work

    In this paper, we addressed the textual similarity task, which is of paramount importance for multiple topics in NLP, such as text ranking, essay grading, question answering systems, and text classification.We used themulti-tasks learning approach to train an algorithm to learn the embeddings from our dataset to estimate the textual similarity scores between text units and use them later in multiple tasks.Our system is divided into two different variants.In the first, we used multiple text vectorization schemes such as word counts, TFIDF, and POS weighting as statistical-based approaches, and FastText and Aravec pre-trained models as prediction-based approaches, besides the AraBERT as a feature extractor model to obtain text embeddings.These embeddings are then fed to various regressors to estimate the relevancy scores between text units.In the second variant of the system, we exploited the AraBERT model as a pre-trained model and fine-tuned its parameters for the task ofmeasuring textual similarity.We conducted several experiments on the SemEval2017-task3-subtask-D dataset, and we proved that the usage of the AraBERT v0.2-Large as a feature extractor model with AdaBoost has the highest value in terms of R2.In addition, the variance in the MAPE values between it and the other models is minor.Moreover, we noticed that the usage of AraBERT v0.2-Large as a feature extractor outperforms the fine-tuned AraBERT v2 model on the used data set in terms of R2.As for future work, we intend to use the obtained similarity scores in other NLP tasks and use the AraGPT or AraELECTRA models to obtain different embeddings.

    Acknowledgement:This paper and the research behind it would not have been possible without the exceptional support of my God, my supervisors, my family, and my institution and colleagues.

    Funding Statement:The authors received no specific funding for this study.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    久久国产精品影院| 欧洲精品卡2卡3卡4卡5卡区| 午夜福利在线观看吧| 免费搜索国产男女视频| 亚洲性夜色夜夜综合| 在线观看av片永久免费下载| 国产精品1区2区在线观看.| 女人高潮潮喷娇喘18禁视频| 国产在视频线在精品| 非洲黑人性xxxx精品又粗又长| 一个人免费在线观看的高清视频| 国产乱人伦免费视频| 1024手机看黄色片| 精品国产三级普通话版| 窝窝影院91人妻| 精品欧美国产一区二区三| 日本精品一区二区三区蜜桃| 国产亚洲精品久久久久久毛片| 色视频www国产| 免费高清视频大片| 成人欧美大片| 99久久精品热视频| 午夜激情欧美在线| 女警被强在线播放| 午夜a级毛片| 精品人妻一区二区三区麻豆 | 亚洲精品日韩av片在线观看 | 性色av乱码一区二区三区2| 亚洲一区二区三区不卡视频| 色播亚洲综合网| 中文资源天堂在线| 久久精品国产清高在天天线| 国产伦精品一区二区三区四那| 两个人视频免费观看高清| 手机成人av网站| 国产真实乱freesex| 色综合站精品国产| 白带黄色成豆腐渣| 亚洲精华国产精华精| 日韩人妻高清精品专区| 成人亚洲精品av一区二区| 极品教师在线免费播放| 好看av亚洲va欧美ⅴa在| 十八禁人妻一区二区| 国产伦精品一区二区三区四那| 一边摸一边抽搐一进一小说| 大型黄色视频在线免费观看| 中文在线观看免费www的网站| 少妇的逼好多水| 国产爱豆传媒在线观看| 麻豆一二三区av精品| 人人妻人人澡欧美一区二区| 内地一区二区视频在线| 精品久久久久久久毛片微露脸| 波野结衣二区三区在线 | 国产综合懂色| 天天添夜夜摸| 欧美最黄视频在线播放免费| 精品99又大又爽又粗少妇毛片 | 久久这里只有精品中国| 一区二区三区国产精品乱码| 99在线人妻在线中文字幕| 国产激情偷乱视频一区二区| 男插女下体视频免费在线播放| 男女午夜视频在线观看| 午夜激情福利司机影院| 久久精品国产自在天天线| 国产高清三级在线| 男人舔女人下体高潮全视频| 亚洲va日本ⅴa欧美va伊人久久| 国产亚洲欧美在线一区二区| 一级黄片播放器| 国产精品久久久久久亚洲av鲁大| 成人欧美大片| 日韩亚洲欧美综合| 亚洲激情在线av| 天天一区二区日本电影三级| 色综合欧美亚洲国产小说| 国产精品女同一区二区软件 | 中文字幕av在线有码专区| 给我免费播放毛片高清在线观看| 国产精品久久久人人做人人爽| 亚洲中文日韩欧美视频| 午夜精品在线福利| av在线天堂中文字幕| 日韩免费av在线播放| 国产伦一二天堂av在线观看| 国产久久久一区二区三区| 嫩草影院精品99| 国产精品久久久久久亚洲av鲁大| 精品一区二区三区av网在线观看| 禁无遮挡网站| 亚洲真实伦在线观看| 两个人看的免费小视频| 最近视频中文字幕2019在线8| 国产三级中文精品| 伊人久久精品亚洲午夜| 亚洲人成网站在线播| 高清毛片免费观看视频网站| 色吧在线观看| 色老头精品视频在线观看| 午夜精品在线福利| av中文乱码字幕在线| 国产色婷婷99| 搡女人真爽免费视频火全软件 | 男人和女人高潮做爰伦理| 国产午夜福利久久久久久| 国产高清videossex| 午夜福利免费观看在线| av天堂在线播放| 内射极品少妇av片p| 啪啪无遮挡十八禁网站| 免费av不卡在线播放| 99热这里只有是精品50| 欧美成人性av电影在线观看| 99国产极品粉嫩在线观看| 最近最新中文字幕大全电影3| 99久久无色码亚洲精品果冻| 亚洲av成人av| 亚洲精品在线观看二区| 国产探花在线观看一区二区| 午夜激情欧美在线| 亚洲av熟女| 香蕉av资源在线| 精品人妻1区二区| 亚洲色图av天堂| 一本久久中文字幕| 亚洲色图av天堂| 波多野结衣高清作品| 精品人妻1区二区| 丰满人妻熟妇乱又伦精品不卡| 亚洲国产色片| 久久性视频一级片| 精品久久久久久久末码| 麻豆国产97在线/欧美| 无遮挡黄片免费观看| 搡老熟女国产l中国老女人| 成年女人毛片免费观看观看9| 欧美xxxx黑人xx丫x性爽| a级毛片a级免费在线| 婷婷亚洲欧美| 欧美乱色亚洲激情| 日韩欧美国产在线观看| tocl精华| 非洲黑人性xxxx精品又粗又长| 一进一出好大好爽视频| 亚洲中文字幕一区二区三区有码在线看| 久久天躁狠狠躁夜夜2o2o| 亚洲熟妇熟女久久| 亚洲激情在线av| 亚洲 国产 在线| 亚洲欧美日韩高清在线视频| 午夜精品久久久久久毛片777| 一区二区三区免费毛片| 搡老妇女老女人老熟妇| 欧美日韩综合久久久久久 | 久久久久久人人人人人| 亚洲国产欧美网| 国产69精品久久久久777片| 最后的刺客免费高清国语| 一二三四社区在线视频社区8| 村上凉子中文字幕在线| 欧美最新免费一区二区三区 | 色综合欧美亚洲国产小说| 久久6这里有精品| 麻豆国产97在线/欧美| 亚洲国产精品sss在线观看| 欧美三级亚洲精品| 欧美成人免费av一区二区三区| 国产三级黄色录像| 国产伦一二天堂av在线观看| 精品久久久久久成人av| 88av欧美| 国产精品99久久99久久久不卡| 99久久99久久久精品蜜桃| 中文字幕久久专区| 国产v大片淫在线免费观看| 在线免费观看的www视频| 99久久久亚洲精品蜜臀av| 婷婷精品国产亚洲av| 午夜精品一区二区三区免费看| 国产视频内射| www日本黄色视频网| 手机成人av网站| 成人一区二区视频在线观看| 久久草成人影院| 亚洲国产精品久久男人天堂| 91在线精品国自产拍蜜月 | 国产亚洲精品久久久久久毛片| 欧美日韩国产亚洲二区| 精品午夜福利视频在线观看一区| 亚洲精品国产精品久久久不卡| 国产精品免费一区二区三区在线| 国产精品久久久人人做人人爽| 久久人人精品亚洲av| 一进一出抽搐gif免费好疼| 国产免费一级a男人的天堂| 黄色日韩在线| 精品一区二区三区视频在线观看免费| 日韩欧美国产一区二区入口| 三级男女做爰猛烈吃奶摸视频| 黑人欧美特级aaaaaa片| 桃色一区二区三区在线观看| 在线观看免费午夜福利视频| 俺也久久电影网| 99久久无色码亚洲精品果冻| 欧美黑人巨大hd| 中文字幕人成人乱码亚洲影| 国产精品久久久久久亚洲av鲁大| 精品一区二区三区人妻视频| 两个人的视频大全免费| 亚洲一区二区三区色噜噜| 老司机深夜福利视频在线观看| 亚洲一区二区三区不卡视频| 成人高潮视频无遮挡免费网站| 老熟妇乱子伦视频在线观看| 国产伦在线观看视频一区| netflix在线观看网站| 成人欧美大片| 特大巨黑吊av在线直播| 人妻久久中文字幕网| 亚洲男人的天堂狠狠| 国产av一区在线观看免费| 尤物成人国产欧美一区二区三区| 熟女电影av网| 婷婷精品国产亚洲av| 在线观看66精品国产| 真人做人爱边吃奶动态| 脱女人内裤的视频| 午夜免费男女啪啪视频观看 | 一个人免费在线观看的高清视频| 欧美日韩中文字幕国产精品一区二区三区| 午夜a级毛片| 在线视频色国产色| 欧美+亚洲+日韩+国产| 亚洲一区二区三区色噜噜| 精品一区二区三区视频在线观看免费| 少妇熟女aⅴ在线视频| 国产伦在线观看视频一区| 波多野结衣巨乳人妻| 在线观看免费午夜福利视频| 亚洲黑人精品在线| 免费看美女性在线毛片视频| 国产精品自产拍在线观看55亚洲| 久久人人精品亚洲av| 午夜福利在线在线| 精品国产亚洲在线| 成人永久免费在线观看视频| 最近在线观看免费完整版| 亚洲av电影在线进入| 88av欧美| av在线蜜桃| 亚洲av二区三区四区| 国产成人a区在线观看| 精品午夜福利视频在线观看一区| bbb黄色大片| 国产真实乱freesex| 国产精品美女特级片免费视频播放器| 嫩草影院入口| 91av网一区二区| 一级a爱片免费观看的视频| 我的老师免费观看完整版| 精品一区二区三区av网在线观看| 免费av观看视频| 一个人看视频在线观看www免费 | 国产伦一二天堂av在线观看| a级一级毛片免费在线观看| 欧美最黄视频在线播放免费| 久久国产精品人妻蜜桃| xxx96com| 国产色爽女视频免费观看| 高清在线国产一区| 女人被狂操c到高潮| 欧美一级毛片孕妇| 国产精品久久电影中文字幕| 国产精品亚洲av一区麻豆| 无人区码免费观看不卡| 免费在线观看成人毛片| 久9热在线精品视频| 久久久久性生活片| 成年免费大片在线观看| 一个人观看的视频www高清免费观看| 欧美一区二区国产精品久久精品| 在线观看av片永久免费下载| 亚洲 国产 在线| 韩国av一区二区三区四区| 欧美中文综合在线视频| 日韩高清综合在线| 天天躁日日操中文字幕| 久久久久九九精品影院| 禁无遮挡网站| 最近在线观看免费完整版| 日韩亚洲欧美综合| 日本在线视频免费播放| 一本综合久久免费| 亚洲av二区三区四区| 全区人妻精品视频| 在线观看av片永久免费下载| av天堂在线播放| 亚洲国产精品sss在线观看| 久久国产乱子伦精品免费另类| 少妇丰满av| 国产精品一及| 噜噜噜噜噜久久久久久91| 波多野结衣高清作品| 亚洲中文字幕日韩| 国产午夜福利久久久久久| 国产老妇女一区| 1000部很黄的大片| 国产野战对白在线观看| 最好的美女福利视频网| 亚洲av免费在线观看| 在线观看美女被高潮喷水网站 | 99国产极品粉嫩在线观看| 久久国产乱子伦精品免费另类| 亚洲自拍偷在线| 亚洲自拍偷在线| 少妇丰满av| 草草在线视频免费看| 国产精品影院久久| 在线视频色国产色| АⅤ资源中文在线天堂| 1024手机看黄色片| www国产在线视频色| 国产成人欧美在线观看| 国产高清三级在线| 丁香六月欧美| 脱女人内裤的视频| 香蕉久久夜色| 国内久久婷婷六月综合欲色啪| 国产高清三级在线| 18+在线观看网站| 欧美日本视频| 国产一区二区三区视频了| 一区福利在线观看| 深爱激情五月婷婷| 一级作爱视频免费观看| svipshipincom国产片| 国产精品亚洲美女久久久| 亚洲 欧美 日韩 在线 免费| 最新中文字幕久久久久| 一本久久中文字幕| e午夜精品久久久久久久| 亚洲av电影不卡..在线观看| 亚洲av电影不卡..在线观看| 人妻丰满熟妇av一区二区三区| 亚洲欧美日韩高清在线视频| 日韩国内少妇激情av| 亚洲专区中文字幕在线| eeuss影院久久| 麻豆国产97在线/欧美| 欧美成人a在线观看| 成人鲁丝片一二三区免费| 精品日产1卡2卡| 成人欧美大片| 国产高清三级在线| 香蕉久久夜色| 伊人久久精品亚洲午夜| 夜夜看夜夜爽夜夜摸| 国产野战对白在线观看| 国产黄a三级三级三级人| 51国产日韩欧美| 日韩欧美国产在线观看| 日本熟妇午夜| 91久久精品电影网| 久久久成人免费电影| 日本a在线网址| 午夜老司机福利剧场| 在线看三级毛片| 色av中文字幕| 免费在线观看日本一区| 国语自产精品视频在线第100页| 亚洲欧美日韩无卡精品| 国产av不卡久久| a在线观看视频网站| av黄色大香蕉| 日韩国内少妇激情av| 亚洲成人精品中文字幕电影| 亚洲熟妇中文字幕五十中出| 欧美xxxx黑人xx丫x性爽| 蜜桃亚洲精品一区二区三区| 波多野结衣巨乳人妻| 久久国产精品影院| 亚洲乱码一区二区免费版| 国产国拍精品亚洲av在线观看 | 变态另类丝袜制服| 亚洲精品影视一区二区三区av| 国产视频内射| 久久久久久久久中文| 老熟妇乱子伦视频在线观看| ponron亚洲| 国产精品一区二区三区四区久久| 欧美黑人巨大hd| 欧美一级毛片孕妇| 免费人成在线观看视频色| 国产黄片美女视频| 国产野战对白在线观看| 国产精品一区二区三区四区久久| 中亚洲国语对白在线视频| www国产在线视频色| 国产精品野战在线观看| 成年女人看的毛片在线观看| 少妇人妻一区二区三区视频| 丰满乱子伦码专区| 成人av在线播放网站| 日本一本二区三区精品| 在线播放无遮挡| 两性午夜刺激爽爽歪歪视频在线观看| 夜夜躁狠狠躁天天躁| 精品一区二区三区视频在线 | 深夜精品福利| 亚洲成人精品中文字幕电影| 少妇丰满av| 五月玫瑰六月丁香| 啪啪无遮挡十八禁网站| 国产97色在线日韩免费| 少妇的逼水好多| 99久久久亚洲精品蜜臀av| 99热只有精品国产| 高清毛片免费观看视频网站| 精品久久久久久成人av| 国产精品99久久久久久久久| 亚洲 国产 在线| 欧美日韩中文字幕国产精品一区二区三区| 国模一区二区三区四区视频| 特大巨黑吊av在线直播| 日韩欧美在线乱码| 亚洲精品在线观看二区| 一个人免费在线观看的高清视频| 午夜精品在线福利| 最新在线观看一区二区三区| 免费av观看视频| 伊人久久大香线蕉亚洲五| 他把我摸到了高潮在线观看| 99国产精品一区二区蜜桃av| 久久性视频一级片| 99热这里只有精品一区| 少妇熟女aⅴ在线视频| 久久久久亚洲av毛片大全| 亚洲av成人av| 免费搜索国产男女视频| 欧美日韩福利视频一区二区| 国产精品 欧美亚洲| 91九色精品人成在线观看| 国产精品国产高清国产av| 亚洲美女视频黄频| 亚洲精华国产精华精| 高清日韩中文字幕在线| 午夜亚洲福利在线播放| 99热这里只有精品一区| 每晚都被弄得嗷嗷叫到高潮| 观看美女的网站| 最新美女视频免费是黄的| 国产精品香港三级国产av潘金莲| 一级作爱视频免费观看| 99国产精品一区二区三区| 宅男免费午夜| 好男人电影高清在线观看| 美女高潮喷水抽搐中文字幕| 两人在一起打扑克的视频| 欧美最新免费一区二区三区 | 久久精品国产亚洲av涩爱 | 老师上课跳d突然被开到最大视频 久久午夜综合久久蜜桃 | 成年免费大片在线观看| 欧美区成人在线视频| 一个人免费在线观看的高清视频| 婷婷丁香在线五月| 老汉色∧v一级毛片| а√天堂www在线а√下载| 精品一区二区三区av网在线观看| 精品人妻1区二区| 日韩高清综合在线| 久久精品国产综合久久久| 他把我摸到了高潮在线观看| 亚洲欧美激情综合另类| 波野结衣二区三区在线 | 精品久久久久久久末码| 午夜视频国产福利| 国产精品久久久人人做人人爽| 午夜激情福利司机影院| 国语自产精品视频在线第100页| 九色成人免费人妻av| 欧美中文综合在线视频| 亚洲乱码一区二区免费版| 非洲黑人性xxxx精品又粗又长| 色老头精品视频在线观看| 亚洲内射少妇av| 欧美极品一区二区三区四区| 日韩免费av在线播放| 69av精品久久久久久| 波多野结衣巨乳人妻| avwww免费| 午夜激情福利司机影院| 欧美区成人在线视频| 极品教师在线免费播放| 级片在线观看| 少妇人妻一区二区三区视频| 日本一二三区视频观看| 日本熟妇午夜| 美女黄网站色视频| 99久国产av精品| 国产精品综合久久久久久久免费| 一夜夜www| 亚洲人成网站高清观看| 久久精品国产自在天天线| 久9热在线精品视频| 变态另类成人亚洲欧美熟女| 午夜亚洲福利在线播放| 中文字幕高清在线视频| 深夜精品福利| 精品久久久久久成人av| 亚洲精品一区av在线观看| 麻豆国产av国片精品| 99久久九九国产精品国产免费| 中文字幕熟女人妻在线| 高潮久久久久久久久久久不卡| 国产蜜桃级精品一区二区三区| 亚洲成a人片在线一区二区| 成年女人看的毛片在线观看| 欧美zozozo另类| 香蕉久久夜色| 久久久久久久午夜电影| 国产成人a区在线观看| www.色视频.com| 午夜福利成人在线免费观看| 精品人妻1区二区| 黄色片一级片一级黄色片| 国产一区二区三区视频了| 欧美xxxx黑人xx丫x性爽| 亚洲欧美日韩东京热| 欧美性感艳星| 少妇的逼水好多| 国产国拍精品亚洲av在线观看 | 国产免费男女视频| 欧美一区二区国产精品久久精品| 夜夜躁狠狠躁天天躁| 久久精品国产亚洲av香蕉五月| 国产69精品久久久久777片| 亚洲精品在线美女| 欧美激情在线99| 成人永久免费在线观看视频| 久久久久久久午夜电影| 国内精品久久久久精免费| www.色视频.com| 国产成人啪精品午夜网站| 亚洲成人精品中文字幕电影| 三级毛片av免费| 一进一出抽搐gif免费好疼| 亚洲精品国产精品久久久不卡| 欧美成人免费av一区二区三区| 亚洲av成人不卡在线观看播放网| 久久午夜亚洲精品久久| 一级黄色大片毛片| 一个人观看的视频www高清免费观看| 久久精品国产综合久久久| 亚洲精品粉嫩美女一区| 脱女人内裤的视频| 女生性感内裤真人,穿戴方法视频| 久久精品国产综合久久久| 九色成人免费人妻av| 久久国产精品人妻蜜桃| 国产精品av视频在线免费观看| 精品日产1卡2卡| 美女大奶头视频| 夜夜夜夜夜久久久久| 亚洲 国产 在线| 一级作爱视频免费观看| 国产99白浆流出| 国产成人影院久久av| 又黄又爽又免费观看的视频| 久久久国产精品麻豆| 人妻丰满熟妇av一区二区三区| 久久久久久久久大av| 丰满人妻熟妇乱又伦精品不卡| 国产不卡一卡二| 亚洲第一欧美日韩一区二区三区| 久久欧美精品欧美久久欧美| 精品熟女少妇八av免费久了| ponron亚洲| 女警被强在线播放| 在线观看av片永久免费下载| 麻豆成人午夜福利视频| 国产精品1区2区在线观看.| 性色av乱码一区二区三区2| 真人做人爱边吃奶动态| 欧美一区二区国产精品久久精品| 精品国内亚洲2022精品成人| 一二三四社区在线视频社区8| 亚洲av成人精品一区久久| 深夜精品福利| 日本一本二区三区精品| 国产精品久久久人人做人人爽| 欧美丝袜亚洲另类 | 日本 欧美在线| 两性午夜刺激爽爽歪歪视频在线观看| 成人特级av手机在线观看| 人人妻人人看人人澡| 国产精品爽爽va在线观看网站| 看片在线看免费视频| 国产一区二区激情短视频| 免费一级毛片在线播放高清视频| 国内毛片毛片毛片毛片毛片| 国产黄片美女视频| 欧美成人a在线观看| 九九在线视频观看精品| 国产精华一区二区三区| 亚洲 欧美 日韩 在线 免费| 精品欧美国产一区二区三| 免费在线观看影片大全网站| 非洲黑人性xxxx精品又粗又长| 91字幕亚洲| 非洲黑人性xxxx精品又粗又长| av中文乱码字幕在线| 欧美3d第一页| 国产一区二区在线观看日韩 |