• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

    2022-07-08 01:38:44AhmadHusseinAbabneh

    Ahmad Hussein Ababneh

    Abstract—Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets.Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004.This study investigates which of these datasets can provide significant training and fair evaluation for text classification(TC).In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models.We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison.The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset.The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time,with the accuracy of 82%.

    1.Introduction

    Text classification (TC) is defined as a text mining process that aims to specify the class or category of a given textual document.The output of such a process can be viewed as a set of classes; each of them contains a set of text documents that represents a specific topic or kind.Single-label TC provides a single label for each document, whereas multi-label TC may generate multiple labels for each document[1].

    The massive amount of texts found on the web and the amount of texts that is being loaded every second necessitate a classification model that can classify each document into its appropriate category, facilitating tasks in other fields, such as natural language processing and information retrieval tasks[2].The importance of TC in other text mining fields has been deeply emphasized by Elnagaret al.[1],[3].

    The training and evaluation of newly developed models in the field of TC mainly depend on the existence of a pre-classified dataset or corpus of text documents[4].A large number of datasets based on the English language have been evaluated for developing TC models using various statistical learning models.In [5]Zhanget al.listed these datasets with their respective statistics, including the number of text documents contained in training and testing sets, the number of classes, and accuracy obtained using several supervised and unsupervised learning models on these datasets.The construction of datasets for the TC research purpose for the Arabic language, which is at the core of this research, is still in its early stages.However,promising research in this field has recently emerged, creating new datasets, such as Single-Label Arabic News Articles Dataset (SANAD), NADiA, and KALIMAT.

    In the field of TC, selecting appropriate datasets is typically a major challenge; thus, researchers should have clear understanding of the data on which a model will be trained and tested to obtain unbiased and significant results[5].It is critical to evaluate the existing datasets and assess whether these datasets were prepared according to a scientific approach.Well-prepared datasets imply consistent judgment across different TC models.

    From our viewpoint, two important factors should be considered when assessing the effectiveness and usefulness of a dataset:

    1) Experiments should be conducted on the dataset using successful machine learning (ML) models in the field of TC.In general, supervised TC models have achieved significant results and have been used in this field since 1968[6].Support vector machines (SVMs), logistic regression (LR),K-nearest neighbor (KNN), etc.have been used intensely in this field.A lot of studies, with various ML models, achieved consistent results[7]-[12].Thus, a new dataset, with its training and testing set, should produce comparable results if it is processed using such pre-trained ML models.

    2) The dataset should be compared with the existing datasets.This comparison includes the application of uniform TC models with uniform implementation constraints.In addition, it includes standard accuracy measures that have been used to evaluate TC models, such as the precision, recall,f1-score, macro_avg,and weighted_avg.Furthermore, a heat map could be constructed for each run to provide a clear picture of the behavior of classifiers over each dataset.

    In this paper, we focus on assessing the quality of Arabic language datasets that have been constructed for training and evaluating TC models developed especially for single-label Arabic TC.The Arabic language is imperative because it is used for more than 450 million people and the religious language for all Muslims worldwide[13].Furthermore, the Arabic language has unique features that make the text processing of the language such a challenging task.These features include 1) the inflectional and derivational nature of the language, 2) the lack of capitalization and short vowels, 3) the existence of different variations of the language in use—the modern, classical, and colloquial, and 4) the permission of the construction of a sentence with an implicit subject[14].

    Several Arabic language datasets have been created for training the newly developed TC models.These datasets include SANAD, Khaleej, Arabiya, Akhbarona[15], KALIMAT, a multipurpose Arabic corpus[16],Waten2004[17], and Khaleej2004[17].There are other datasets, but these datasets do not contain a reasonable number of text documents.This study presents a comprehensive guide for researchers in the field of Arabic TC on the best and reliable datasets to use for model training and evaluation.This research aims to investigate existing Arabic datasets that can provide significant model training and fair evaluation of the models for TC.To achieve this aim, several classifiers have been developed and implemented.These classifiers are based on well-known and tested supervised TC methods.These classifiers are tested on the previously mentioned Arabic datasets.The analysis also includes a comparison with an English dataset that has been used in several research studies[18]-[22].

    After introducing the main problem and explaining our aim in Introduction, the remaining sections of the paper are structured as follows.Section 2 describes the learning models used in our experiment.Section 3 illustrates a general framework and highlights steps performed before, during, and after training the models.Section 4 evaluates the models’ training relevance results and highlights the main findings.Finally, Section 5 draws the main conclusion.

    2.Text Classification Tasks

    Both the supervised and unsupervised ML approaches have been examined in the field of TC[6],[23],[24].The difference between supervised and unsupervised learning lies in labeled data.Supervised learning models use labeled data that represent the models’ knowledge repository, and then use this knowledge to make future predictions.Simply, we teach a model with a sufficient amount of data and then make the model for future predictions.For unsupervised learning models, data are unlabeled, and the models are not aware of any information about the classes or categories in the data; these models attempt to find meaningful patterns in the data[24].

    In this paper, learning models in our experiments include a multinomial naive Bayes (NB) classifier, an SVM classifier, a random forest (RF) classifier, a KNN classifier, and an LR classifier.These classifiers adopted the supervised learning approach, and we have explained the reason for this choice in Introduction.The following subsections explain the five models.

    2.1.NB Model

    The NB classifier is one of the simplest classification learning models but has achieved reasonable accuracy[25],[26].Experiments have shown that it is comparable to advanced models, such as SVM models[7].Furthermore, it is considered an efficient learning method[27], and, as Jianget al.mentioned in [28], NB is among the top ten data mining algorithms due to its efficiency and simplicity.

    NB is a family of models.They use the NB assumption, given pairs of datais a point in ad-dimensional real space,andyi∈Y, whereY={1,2,…,m} is a finite set of classifications.A probabilistic model or a joint distribution betweenxandycan be built under the following assumption:

    Assume a family of distributionsPθf(wàn)orx(i)∈Rdandyi∈Y, such that

    In TC,xandycan be expressed as documents and their classes or words.x-values represent features, and these features may be represented by numerical values, such as frequency.Regardless of the numerical representation of features, the fundamental assumption in NB is the conditional independence[27]:

    If we have anotherx∈Rd, we need to predicty.We need to estimateθf(wàn)romD(e.g., using the maximum likelihood); then, we could computeyfrom the equation:

    From the NB rule:

    The denominator does not depend ony; we will omit it, and if we return to the conditional assumption, we can rewrite the previous equation as follows:

    2.2.RF Model

    RF is identified as one of the most popular and powerful ML algorithms that can perform both classification and regression tasks[29],[30].It is an ensemble ML approach, with a divide-and-conquer approach that combines multiple learning models to form one model to improve the performance[1].Thus, as the name suggests, the model creates several decision trees as a boost of a single tree in a cut model.The model builds up these multiple decision trees using an algorithm, such as the information gain or the Gini index approach[31].In general, the more trees in the forest, the more robust the classification, resulting in higher accuracy[32].

    In RF, to classify a new object based on attributes or features, each tree provides its prediction, and the model chooses the classification that has the most votes (the majority voting) of the trees in the forest.These majority votes then help rectify the high variance which means that a model makes a poor prediction when the training dataset is changed and generally appears in a single decision tree model[31].Thus, RF has a low bias, because it is based on decision trees, and has low variance.It uses the bagging technique (the bagging technique is an ensemble ML approach), which combines multiple decision trees to form a single model.The low bias and variance reduce the overfitting problem, which happens when a trained model performs well with training data but provides inaccurate predictions with the testing data[33].

    Complete mathematical analysis of RF can be found in [32].Assume the number of cases in a training set isN(or the number of items in the training data).Then, samples fromNcases will be taken at random but with replacement.If there areMinput variables or features, a numberm<Mis specified such that at each node,mvariables are selected at random out ofM.The best split ofmis used to split the tree node.The value ofmis kept constant, while the forest is grown.Each tree is grown to the largest extent possible, and there is no pruning.We can predict new data by aggregating the prediction ofktrees[30].

    D={x1,x2,…,xn},xi∈Rdyi∈YY={1,2,…,m}

    Given a training dataset and specified classes , where ,selectkrandom samples with replacement fromD, and build or fitkdecision trees, as follows:

    Select random samples, with replacement, fromDandY; name them asDiYi.

    FithionDiYi, wherehiis a decision tree learner that was constructed from theDirandom sample.

    After training, classifying newx′into the appropriate class (or merely obtaining thevalue) can be made by taking the majority vote from all individual decision trees onx′:

    Besides, the low bias and variance advantage of the RF classifier implies that the classifier takes less time for training and testing than other supervised learning models[30],[31].This is rational, because RF belongs to the low bias family, which reduces the number of assumptions made by RF to fit a model, and it needs a maximum of two tuning parameters[32].

    2.3.KNN Model

    KNN is a powerful classification model in the multidimensional classification space that provides competitive accuracy results[34]-[38].The estimation of the classification assumes that neighbors always share common features and, consequently, common classes.

    KNN assumes that in thed-dimensional space, the closer the two objects, the more likely they belong to the same class.The closeness could be depicted as the distance between the objects or simply between points.The distance can be computed using the Euclidean, Hamming, Manhattan, or Minkowski methods.The closeness could also be seen as the similarity between the objects, and this similarity can be computed using one of the similarity computation methods, such as the cosine similarity[38].

    KNN is a classical ML algorithm that has theoretical guarantees: Asnor the size of dataDbecomes huge,there is some theoretical guarantee about the performance of KNN[36].

    Given a training datasetD={(x1,y1), (x2,y2),…,(xn,yn)},xi∈Rdand a finite set of classesyi∈Y,whereY={1, 2,…,m}, there is some distance metric, such as the Euclidean distance, which measures the distance between two points in thed-dimensional space such that

    wherexi,xj∈Rd,xi=(xi1,xi2,…,xid), andxj=(xj1,xj2,…,xjd).In KNN, if we consider the distance between the newxcalledx′and its closest points (point) as the similarity measure, the class ofx′will be

    If we consider the similarity betweenxiandxj, wherexiis a vector of features, such that=(xi1,xi2,…,xih)and=(xj1,xj2,…,xjl),xi,xj∈Rd, is[2]

    we can generalize the similarity principle and reform (3) to be

    KNN also has a probabilistic formulation[34]as follows.A random variabley:

    whereP(y∣x,D) is the fraction of pointsxiinNk(x), andNk(x) is thek-nearest points toxi.

    The predicted class ofx′is

    2.4.SVM Model

    SVM is a learning model that separates data into classes by creating balanced hyperplanes in a highdimensional space.The idea is to find hyperplanes that maximize the distances between the edges of the classes.SVM gradually seeks to find the maximum empty area between any pair of classes, which is represented as the margin that spans from the central line that separates two classes, to the boundaries of the classes[12].In a multi-classification model, SVM creates a set of hyperplanes and then chooses a subset of them that maximizes the separation margins[26].

    SVM is considered a robust and accurate classification model.This judgment is based on a large number of research studies in different fields, which used SVM and achieved significant results[39]-[46].For the Arabic language, Qadiet al.[11]experimented with 10 supervised learning models, wherein using the SVM model was able to obtain the most significant performance result.A complete mathematical formulation of SVM can be found in [47], but a brief and summarized explanation is essential.

    To illustrate the optimization behind the SVM model, we start from a simple linear model for the binary classification.Given thek-dimensional vectorsx(1),x(2), …,x(n), the training set, and input vectors=(x1,x2,…,xd), wheredis the number of features,xi∈Rd, andyis the classes’ labels,y∈YandY∈{1, -1}, the classification function is

    SVM depicts the data as instances that are embedded in ad-dimensional space and finds a weighted vector=(w1,w2,…,wd),wi∈Rd, such that the linew·x=θ, where operator “·” means the dot product,separates the boundary of two classes (as shown inFig.1).

    Representing a text document in ad-dimensional space means that each distinct word (feature)establishes a dimension, and the document is a point in this space; the classification is performed by creating a separation line, such that all points that fall on one side of the line belong to class 1, while the points which fall on the second side belong to class 2, as shown inFig.2.

    Fig.1.Binary linear classification.

    Fig.2.Text documents on the feature space.

    Now, to define the SVM classifier, we can update the idea of the linear classification.FromD,we have pre-labeled examples (training dataset),,y∈Y, andY∈{1,2,…,m}.Using the same technique in the linear classification, the dot product of the weighted vectorwand the vectorxis.

    In linear classification, multiple lines can exist and separate their classes, but SVM seeks to find the best line that divides data points.To define the best line, SVM considers the distance from the separating line that tells us how confident a prediction is[48].Thus, SVM searches for the line that has the largest margins.

    Define the margin 2γas the length of the empty area between the boundaries of the classes, and the best line is located at the middle of this area, as shown inFig.3.

    Fig.3.Largest margins in SVM.

    Then, the distance betweenAandLis

    Fig.4.Computation of γ.

    AndMis onL, so the=-b, then

    This means that the distance or the margin betweenAandLis the vectorw, which is perpendicular to lineL, multiplying by the coordinates ofAplus the offsetb,γ=(w·A+b)y.So, we want to findwsuch that the smallest marginis as large as possible[47]:

    and this leads to the initial optimization

    such that

    In the previous definition ofγ, the longer thew, the bigger the margin will be; thus, we need to normalizewby its length:

    where

    As shown inFig.5, lineLis defined by a few points that are located on the boundary (not by all points in the space).This means that we can ignore most of the points and focus on points that defined the line (these points are called the support vectors).Let us also require support vectorsxito be on the plane defined by

    Fig.5.Relationship between γ and the support vectors.

    To findγ, we need to find the relation between the support vectors: Assume thatx1is the support vector on one side of lineLandx2is the vector point on the other side ofL,

    We can substitute the value of x1into the equationw·x1+b=1, and we obtain

    Now, the equation w ·x2+b=-1, because it is on the other side ofL:

    since

    This implies that maximizingγcan be expressed as follows:

    which is equivalent to

    Therefore, (7) can be rewritten as

    such that

    So far, we implicitly assume that our data are linearly separable, and it is always possible to find the line that entirely separates the classes.However, most real datasets are noisy, which means that no such separable hyperplanes exist[49].Dealing with such data, where finding out a robust separator seems impossible and this almost happens in every dataset, implies to change our formulation and introduce the penalty parameter:

    whereCis a control variable that can be set using cross-validation.

    The previous equation means that the line should have a maximum margin and makes a small number of mistakes.However, the mistaken points do not have equal severity[49], and we can penalize them using the slack variableξ, as shown inFig.6.

    Fig.6.Separated line with slack variable ξ.

    with the constraint

    A new optimization problem appears in which our goal is to find the largest margin and the smallest sum of the penaltyξi:

    2.5.LR Model

    LR is one of the most widely used classification models[50],[51].Even the name contains the word“regression”, LR is mainly used for classification problems, because it predicts categorical values, not continuous ones.

    In LR, we model our data with a linear functionf(w,b)=wx+b, wherewis the weight andbis the bias,and this produces continuous values.In LR, we seek to covert continuous values to the probability.LR models the probability using the sigmoid function[52]:

    This prediction outputs a probability value between 0 and 1.

    In the approximation functionhθ(x), we have to define thew(weight) andb(bias) using a specific cost function, such as the cross-entropy[52]:

    LR optimizes the cost overwandbusing the gradient descent optimization strategy.It starts at some point and iteratively updates the parameters (wandb) through the calculation of the derivative and then follows the direction of the derivative until reaching the global minimum.In each iteration, the parameters are updated in small steps called the learning rateη.The gradient descent then updateswandbusing the following rules:

    Furthermore, the derivatives can be performed using the following equations[51]:

    3.Structural Framework

    This study’s goal is to measure the relevance of existing Arabic language datasets in the context of TC tasks using well-known and widely used supervised learning models.Therefore, we construct a complete experiment that includes the data collection of the existing datasets, text preprocessing, model implementation, and model performance and analysis.The framework established in this experiment is depicted in Fig.7.

    3.1.Experiment Datasets

    Fig.7shows all the steps performed during the implementation of this experiment.Note that the starting point is to collect datasets used and proposed in the field of TC for the Arabic language.As we stated in Introduction, the selection is based on the number of documents included in the datasets and on the number of classes into which the documents are classified.Therefore, many datasets have been excluded.Datasets selected in the experiment and their features are presented inTable 1.

    Table 1: Features of the Arabic language TC datasets

    Fig.7.Experiment framework.

    InTable 1,seven datasets have been chosen and split into training and testing sets.These datasets have been recently proposed in TC as datasets for training and evaluating newly developed models for the Arabic language[3].Table 1shows that the number of documents in both training and testing sets in these datasets varies, ranging from 5690 documents in Khaleej2004 to 110900 in SANAD.

    The SANAD dataset is the largest because it contains three merged datasets: Khaleej, Akhbarona, and Arabiya.Therefore, we process SANAD as one dataset, and as three datasets by splitting it into three parts.Table 1shows the connection between SANAD and the three datasets, and we refer to them as SANADKhaleej, SANAD-Akhbarona, and SANAD-Arabiya inTable 1, but throughout the paper, they will be referred to as Khaleej, Akhbarona, and Arabiya.We process SANAD in this way to determine whether we can use one of the three parts individually as training and testing sets, especially when they contain sufficient documents.These individual examinations of Khaleej, Akhbarona, and Arabiya have an efficiency effect, since there are significant differences in the size of SANAD as a whole and the sizes of its parts.

    The other three datasets are KALIMAT, Waten2004, and Khaleej2004[15].These datasets are constructed in folders; each folder represents a class (culture, sport, religion, etc.).KALIMAT, Waten2004, and Khaleej2004 datasets are originally not split into training and testing sets, but for consistency with the SANAD’s datasets, we split them according to the splitting ratio used in SANAD.

    For comparison, we use another dataset called “20_newsgroup dataset” that is built in English and has been used to the experiment with new classification models[18]-[22].Another reason that motivates us to use this dataset is the number of documents it contains (18846 documents), as it contains a moderate number of documents (most English language datasets have a huge number of documents, see [5]), it is comparable to the seven Arabic datasets shown inTable 1.

    3.2.Dataset Preprocessing

    The general preprocessing operations have been performed, which include the following:

    1) Tokenization: This operation tokenizes a text and represents the text as a set of words or tokens.

    2) Stopwords removal: This operation includes the removal of any kind of speech that is not a noun or verb.The list of distinct stopwords for the Arabic language contains more than 400 terms[13].

    3) Stemming: The process of removing prefixes and suffixes of a token.The stemming operation is essential and has the positive efficiency and performance effect[14].

    3.3.Learning Model Training

    After preprocessing the text documents, the documents are then split into training and testing documents,as shown inTable 1andFig.7.Learning models are trained in the supervised learning context using the dataset to generate the required learning models.Five learning models are developed based on each training dataset.For example, the experiments developed five learning models for SANAD using five learning approaches described in Section 2.This allows us to measure the performance of each learning approach on each dataset and provides a way to measure how the learning approach suited each dataset.So, the total number of models that are learned from the eight datasets (including the English language one) is 40.

    Python 3 is used to develop these models, and we use a machine with the 8-GB random access memory(RAM) and Intel(R) core? i5-7200U CPU@2.50 GHz 2.71 GHz processor.This piece of information is important, so that readers can relate to the execution time analysis that appears in the evaluation section.

    3.4.Outputs of Learning Models

    The classification process is initiated using the 40 training models, and classification results are collected.Each document will automatically classified into a certain category based on the learning model.The outputs of the learning models are a set of categories.The grouped documents share certain criteria based on the learning method used to perform the classification process.Therefore, for the 8 datasets, we have 40 classification output sets (because each dataset was used to train 5 classification methods).The automatically generated output takes the following quadruple shape:

    The quadruple (SVM, Arabiya, 103, Politics) means that the SVM model classifies document 103 in the Arabiya dataset into the Politics category.

    This representation helps grasp the automatically generated category of the document using different learning models.For example, the same document (103 in Arabiya) has been classified by KNN into the Finance category, and this fact can be easily discovered by processing the quadruple (KNN, Arabiya, 103,Finance), which means that the KNN classifier classifies document 103 in the Arabiya dataset into the Finance category.Since document 103 talks about a loan from the International Monetary Fund to Egypt, the two classification results seem both reasonable.

    The output sets of all experiments are large; they equal the number of documents in the testing sets multiplied by 5.However, the important thing is to measure the classification accuracy and relevancy of each learning method on each dataset.This means that the automatically generated classifications should match the classifications obtained using the testing sets in a significant ratio.The well-known accuracy and relevancy measures in the field of TC are described in the next subsection.

    3.5.Accuracy and Relevance Measures

    The testing phase is established using the testing set of each dataset.The testing set contains preclassified documents (actual classes), and then the learned model predicts a class for each document.Important relevance measures are used to measure the match performance between the pre-classified classes and predicted classes.The performance metrics used in this study include the precision, recall,f1-score, macro_avg, weighted_avg, accuracy, and confusion matrix.For clarity, we assume the following:

    1) True positive: The number of documents that are classified into classciand should be inci.

    2) False positive: The number of documents that are classified into classciand should not be inci.

    3) True negative: The number of documents that are not classified into classciand should not be inci.

    4) False negative: The number of documents that are not classified into classciand should be inci.

    5) Support: The actual number of documents in each class (taken from the testing dataset).

    The relevance measures include

    The confusion matrix has also been used in the evaluation, which tells us if the classification model did right or wrong[38].The confusion matrix has been depicted in a heat map form to ensure clarity for readers.The confusion matrix takes the following form:

    wherexdenotes the number of documents classified into Class 1 and predicted by the learning model as Class 1, whereasyrepresents the number of documents classified into Class 1 and predicted by the learning model as Class 2.Thus, the main diagonal represents the learning model’s correct classification (the actual and the predicated match).

    4.Evaluation and Discussion

    The relevance measurement outcomes for each learning model on the eight datasets are collected for further analysis and discussion.The aim is not to choose a particular dataset or model; the aim is to show,using empirical experiments, the features of each model over the Arabic language datasets shown inTable 1.

    4.1.Accuracy Measure

    The final accuracy results are displayed in Fig.8.Note that the accuracy of the LR and SVM models is higher than that of the other models on all datasets, and it shows that all learning models, except the RF model, obtain the highest accuracy on the English language dataset.The accuracy of all models on the Khaleej and Arabiya datasets is convergent and is higher than the accuracy on the other Arabic language datasets.In this study, all learning models obtain the lowest accuracy on the Waten2004 dataset.

    4.2.Execution Time

    The accuracy of the five classifiers is supplemented with the time (in seconds) taken to train and develop each model.Table 2shows the time in seconds required to develop each model over the eight datasets.The horizontal view represents the time according to the datasets,whereas the vertical view represents the time according to the learning models.Table 2shows that both the LR and SVM models consume a considerable amount of time to be able to achieve the final accuracy, as shown inFig.8, whereas the KNN model consumes the lowest time.

    Fig.8.Accuracy results of the five learning models over the eight datasets.

    Table 2: Time in seconds required to build each model using the eight datasets

    The vertical view of Khaleej and English corpus columns shows the convergence between the behavior of the five models on Khaleej and the English language corpus, respectively.And note that the time consumed by the LR and SVM models on the SANAD dataset is 439.73 s and 1942.84 s, respectively.When we compare this to the accuracy of these models over the SANAD dataset inFig.8, we see that the increase in the time penalty comes with no improvement in the accuracy level.

    4.3.Relevance Measures

    The time execution provides an idea of how long it takes to run each model over each dataset with the hardware specifications mentioned in subsection 3.3; it will not be a critical factor in the experiment, because the datasets have different sizes.Therefore, we explore the relevance measures, because they are important to determine which datasets contain sufficient semantic information that can yield a reasonable classification result.

    4.3.1.Macro Average and Weighted Average

    The accuracy curve inFig.8is supplemented byFigs.9to14, which also measure the relevance of each model over the eight datasets.InFigs.9to 14, the datasets are placed on thex-axis, while they-axis represents the relevance values (precision, recall, etc.).The presentation in this way allowed us to see the behavior of each model on the eight datasets separately as well as the behavior of the models in one picture.The salient points in these figures can be summarized as follows:

    1) InFigs.9to14, two peaks appear; one at Arabiya, which spans to Khaleej, and one at the English language dataset.This note provides a positive indication about the possibility of using Khaleej and Arabiya as training datasets for TC tasks.For all learning models, these two datasets offer a productive environment that leads to significant relevant results.

    Fig.9.The macro_avg of precision values of five learning models over eight datasets.

    Fig.10.The macro_avg of the recall values of five learning models over eight datasets.

    Fig.11.The macro_avg of the f1 score values of five learning models over eight datasets.

    Fig.12.The weighted_avg of precision values of five learning models over eight datasets.

    Fig.13.The weighted_avg of the recall values of five learning models over eight datasets.

    Fig.14.The weighted_avg of the f1 score values of five learning models over eight datasets.

    2) InFigs.9to14, the curves are noted to decrease toward the SANAD dataset, which means that the learning models obtained smaller relevance values, except for the NB model, as shown inFig.9; however,the NB model shows unstable behaviors, as shown inFigs.9to14.This comment is valuable because it is mentioned before that SANAD is a collection of three datasets, that is, Arabiya, Khaleej, and Akhbarona, and when we process these three datasets separately using the learning models, we have obtained significant results.The question is why the fusion of these three datasets reduces relevance measures, which seems illogical because ML models should benefit from an increase in the dataset size.The answer to this question relies on the Arabic language.As we mentioned in Introduction, Arabic is a semantically rich language with a massive number of distinct terms.Although all Arab countries speak the same language, Arabic, each has its set of terms that are commonly used in its media.This variation in terms used from one country to the next impacts the statistics used by the learning model.

    3) InFigs.9to14, the models show the lowest relevance on the Waten2004 and KALIMAT datasets except for the NB model; the reason for this comes from the categories chosen.Both of them contain International and Local categories, which contain International or Local news data.Both the Local and International categories contained documents that could be predicted or classified into another category.For example, if the document processed refers to an international sports event, there is a good chance that the learning model will classify this document into the Sports category rather than the International category.

    4) According to relevance curves inFigs.9to14:

    ? The KNN and RF models show a similar trend, and their relevance scores are confined between 50%and 60%.

    ? The SVM and LR models show a convergent trend, and their relevance scores are confined between 70% and 85%.

    ? The SVM model obtains the most significant scores, which is consistent with the results shown inFig.8.

    ? The NB model shows significant results on Akhbarona, Arabiya, Khaleej, SANAD, and the English language corpus.

    4.3.2.Confusion Matrix and Heat Map

    The confusion matrix provides a clear picture of how the learning model predicted or classified the documents.Table 3represents the confusion matrix of the SVM model on Waten2004.The number of categories is six (Culture, Economy, International, Local, Religion, and Sports).The bold and underlined values are the numbers of documents that the SVM model classified correctly based on the actual classification.

    Table 3: Confusion matrix of SVM on Waten2004 dataset

    Confusion matrix interpretation is crucial because it provides information about the relationship between the categories.For example, 307 documents of 656 are classified into the Culture category, and they are preclassified in the testing set of Waten2004 in the Culture category, whereas 349 (33 + 30 + 80 + 133 + 73)documents that should be in the Culture category are incorrectly classified into other categories.Note that 133 of the documents that should be classified into the Culture category are classified into the Religion category, which means that the SVM model found relationships between the documents that belong to both categories.The confusion matrix can also reveal a problem in the classification process.The International category has 149 (33 + 27 + 37 + 21 + 31) misclassified documents, whereas the Local category has 296 (83+ 90 + 17 + 49 + 57) misclassified documents.We clarified the classification problem of International and Local categories when we analyzedFigs.9to 14, and the confusion matrix asserts our clarification.

    As we find that the SVM model has obtained the most significant relevance results, we present the confusion matrices of the SVM model on the eight datasets.The confusion matrices are presented in the heat map form to facilitate the trace of the performance of the classification process, as shown inFigs.15to21.

    Fig.15.Confusion matrix of SVM experiment on Akhbarona dataset.

    Fig.16.Confusion matrix of SVM experiment on Waten2004 dataset.

    Fig.17.Confusion matrix of SVM experiment on Khaleej2004 dataset.

    Fig.18.Confusion matrix of SVM experiment on Arabiya dataset.

    Fig.19.Confusion matrix of SVM experiment on KALIMAT dataset.

    Fig.20.Confusion matrix of SVM experiment on SANAD dataset.

    InFigs.15to22, the color brightness of a cell represents the number of documents in that cell; the brighter the color, the larger the number of documents.The scale on the right of each figure shows where the color changes.Based on this fact, the classification using Arabiya (Fig.18) seems to be the most significant followed by SANAD (Fig.20) and Khaleej (Fig.21).In these three figures, the diagonal appears, which gives the positive indication of the classification quality.Moreover, the trend of the classification inFigs.18,20, and 21 is similar to the trend inFig.22, which represents the classification using the 20_newgroup dataset.Based on classifications represented inFigs.15to 17, and 19, we can see that brighten cells are not located on the diagonal parts of the maps, which could mean that the number of predicted documents that does not match the number of the actual classification is large.

    Fig.21.Confusion matrix of SVM experiment on the Khaleej dataset.

    Fig.22.Confusion matrix of SVM experiment on the 20_newsgroup dataset.

    4.4.Main Findings after Analysis

    The relevance measures and time constraints should be used to determine which of the seven datasets should be used to train a classification model.In this paper, we aim to provide researchers with the required relevance measures upon which researchers can give their decision.Accordingly, we highlight the following findings from our experiment:

    1) For the five learning models, the use of KALIMAT, Waten2004, and Khaleej2004 as training datasets has produced insignificant relevance results (e.g., for the SVM classifier, the accuracy rates using the datasets are 67%, 66%, and 70%, respectively (Fig.8)).Thus, there is no need to consider the time penalty.

    2) The use of SANAD as a training dataset provides reasonable classification (Fig.20), but the time penalty is high (Table 2).The time required to train the SVM model with the SANAD dataset is ten times that required to train with the Khaleej dataset and 42 times that required to train with the Arabiya dataset.

    3) The use of Arabiya and Khaleej, which are complete datasets that are fused to form SANAD, as training datasets for the five learning models, provides significant relevance results (Figs.8to14) with an acceptable time penalty (Table 2).The advantage of the Arabiya dataset is that it provides the most significant classification, and this is shown in the confusion matrix representation.However, the disadvantage is the fewer number of categories; Arabiya has only five categories (Fig.18).On the other hand, the Khaleej dataset has more categories, but the relevance score is the second-best (Fig.21).

    4) The time required to process Akhbarona is less than that required to process SANAD (Table 2).Therefore, Akhbarona can be used instead of SANAD in training a learning model.

    5) In the seven Arabic language datasets, the number of categories is small (Figs.15to21)—the maximum is seven categories.In contrast, the English dataset 20_newgroup contains 20 categories with 7 subcategories of the Computer category (Fig.22).The categories of the seven Arabic datasets are general and each can contain several subcategories.The detailed categories of the English dataset help in increasing the accuracy of the learning process.A future challenge is to create a dataset that contains a considerable number of categories and classify the documents into these categories in a consistent way that improves the learning strategy to develop a learning model with an accurate classification.

    5.Conclusion

    In the paper, we trained five well-known and widely used learning models using seven recently published Arabic language datasets for TC tasks.The SANAD, Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004,and Khaleej2004 datasets were studied.This study presented all relevance measures that are necessary for the researchers to decide which of these datasets provides the best training and testing environment for TC of the Arabic language text.The selected learning models included the well-known and accurate learning models that have demonstrated their competence in TC.

    The decision of which dataset(s) to use as training and testing sets is based on the behaviors of these datasets during the training and evaluation of the learning models.The behavior was then measured through standard relevance and time measures obtained on 40 different learning experiments.After analyzing all experiments, we have found that the Arabiya and Khaleej datasets provided significant training, and this was reflected in their relevance measures, which reached 82%.Furthermore, we found that the learning time of the models on Arabiya and Khaleej was deemed reasonable.The SANAD and Akhbarona datasets showed a convergent relevance trend, with relevance scores ranging from 56% to 68%.However, the execution time to train the models using the SANAD dataset was insignificant.KALIMAT, Waten2004, and Khaleej2004 showed a moderate performance for both relevance and training time.

    One shortcoming was observed in this study: The number of categories in the seven datasets is small compared with that in the existing English language dataset.The maximum number of categories in the Arabic datasets is seven, and all categories depict general topics that could be subdivided into subcategories.

    Disclosures

    The authors declare no conflicts of interest.

    在线观看国产h片| 男女边吃奶边做爰视频| 久久鲁丝午夜福利片| 丝袜在线中文字幕| 欧美另类一区| 日本黄大片高清| 亚洲少妇的诱惑av| av不卡在线播放| 久久久精品94久久精品| 在线观看免费日韩欧美大片| 久久婷婷青草| 韩国av在线不卡| 国产精品免费大片| 日韩av不卡免费在线播放| 精品人妻在线不人妻| av女优亚洲男人天堂| 高清av免费在线| 国产一区二区三区av在线| av有码第一页| 天天影视国产精品| 99久久精品国产国产毛片| 中文字幕av电影在线播放| 成人国产麻豆网| 日本免费在线观看一区| 尾随美女入室| 九色亚洲精品在线播放| 亚洲欧洲日产国产| 亚洲激情五月婷婷啪啪| 日本免费在线观看一区| 久久久久久久久久成人| 亚洲精品一区蜜桃| 人妻系列 视频| 一二三四中文在线观看免费高清| 日韩三级伦理在线观看| 国产极品天堂在线| 欧美+日韩+精品| 久久精品熟女亚洲av麻豆精品| 亚洲人与动物交配视频| 免费大片18禁| 国产一区亚洲一区在线观看| a级毛片在线看网站| 亚洲国产精品999| 18禁观看日本| 国产免费视频播放在线视频| 色婷婷av一区二区三区视频| 日韩人妻精品一区2区三区| 天天影视国产精品| 晚上一个人看的免费电影| 一级黄片播放器| av一本久久久久| 久久精品国产a三级三级三级| 久久青草综合色| 成人国语在线视频| 亚洲精品日韩在线中文字幕| 成人综合一区亚洲| 欧美日韩亚洲高清精品| 大香蕉97超碰在线| 亚洲国产av影院在线观看| 成人手机av| 亚洲国产精品一区二区三区在线| 亚洲久久久国产精品| 又黄又爽又刺激的免费视频.| 亚洲精品中文字幕在线视频| 亚洲色图综合在线观看| 日本wwww免费看| 最新的欧美精品一区二区| 成年美女黄网站色视频大全免费| 日韩,欧美,国产一区二区三区| 午夜免费观看性视频| 久久精品久久久久久久性| 国产精品久久久久成人av| 久久热在线av| 搡女人真爽免费视频火全软件| 美女大奶头黄色视频| 一级毛片黄色毛片免费观看视频| 亚洲精华国产精华液的使用体验| 老女人水多毛片| 妹子高潮喷水视频| 韩国av在线不卡| 亚洲 欧美一区二区三区| 一本色道久久久久久精品综合| 欧美精品一区二区免费开放| av.在线天堂| 午夜福利视频在线观看免费| 美女视频免费永久观看网站| 性色avwww在线观看| 啦啦啦在线观看免费高清www| 亚洲av国产av综合av卡| 伦理电影大哥的女人| 蜜桃在线观看..| 十八禁高潮呻吟视频| 午夜激情av网站| 亚洲婷婷狠狠爱综合网| 成人影院久久| 夫妻午夜视频| 一区二区av电影网| av.在线天堂| 亚洲在久久综合| 丁香六月天网| 99热全是精品| 狂野欧美激情性bbbbbb| 亚洲 欧美一区二区三区| 黄片播放在线免费| 亚洲精品乱久久久久久| 搡女人真爽免费视频火全软件| 汤姆久久久久久久影院中文字幕| 中文字幕人妻丝袜制服| 久久久久国产网址| 成年女人在线观看亚洲视频| 99热6这里只有精品| 免费日韩欧美在线观看| 一二三四中文在线观看免费高清| 女人精品久久久久毛片| 欧美日韩精品成人综合77777| 国产精品久久久久久久电影| 国产高清三级在线| 亚洲伊人色综图| 乱人伦中国视频| 999精品在线视频| 热re99久久精品国产66热6| a级毛色黄片| 美女国产高潮福利片在线看| 日韩成人av中文字幕在线观看| 高清av免费在线| 日本猛色少妇xxxxx猛交久久| 亚洲av福利一区| 亚洲国产欧美在线一区| 午夜激情久久久久久久| 日韩制服骚丝袜av| 精品国产乱码久久久久久小说| av播播在线观看一区| 免费高清在线观看视频在线观看| 国产女主播在线喷水免费视频网站| 一区在线观看完整版| av在线观看视频网站免费| a级毛片在线看网站| 一级,二级,三级黄色视频| 纵有疾风起免费观看全集完整版| 九九在线视频观看精品| 国产男女超爽视频在线观看| 九草在线视频观看| 久久人人97超碰香蕉20202| av不卡在线播放| 久久久久精品性色| 曰老女人黄片| 免费观看无遮挡的男女| www.熟女人妻精品国产 | 久久亚洲国产成人精品v| 午夜免费男女啪啪视频观看| 久久精品熟女亚洲av麻豆精品| 男女午夜视频在线观看 | 一边摸一边做爽爽视频免费| 亚洲国产色片| 日韩成人av中文字幕在线观看| 中国国产av一级| 久久综合国产亚洲精品| 菩萨蛮人人尽说江南好唐韦庄| 两性夫妻黄色片 | 97人妻天天添夜夜摸| 人妻人人澡人人爽人人| 一区二区三区乱码不卡18| 久久青草综合色| 成人无遮挡网站| 国产免费视频播放在线视频| 一区二区三区四区激情视频| 精品国产国语对白av| 亚洲婷婷狠狠爱综合网| 精品国产国语对白av| 丝袜美足系列| 欧美 亚洲 国产 日韩一| 91精品伊人久久大香线蕉| 亚洲欧美日韩卡通动漫| 最近中文字幕高清免费大全6| 校园人妻丝袜中文字幕| 久久99热这里只频精品6学生| 免费久久久久久久精品成人欧美视频 | 亚洲精品久久久久久婷婷小说| 1024视频免费在线观看| 91午夜精品亚洲一区二区三区| 亚洲国产精品一区二区三区在线| 寂寞人妻少妇视频99o| 国产欧美亚洲国产| 亚洲精品美女久久久久99蜜臀 | 99re6热这里在线精品视频| 成年动漫av网址| 亚洲精品一二三| kizo精华| 国产日韩欧美亚洲二区| 国产深夜福利视频在线观看| 国产爽快片一区二区三区| 老熟女久久久| 精品亚洲乱码少妇综合久久| 精品人妻一区二区三区麻豆| 久久久国产欧美日韩av| 国产色婷婷99| 午夜视频国产福利| 久久精品aⅴ一区二区三区四区 | 国产精品久久久av美女十八| av视频免费观看在线观看| 少妇高潮的动态图| 成人毛片a级毛片在线播放| 免费女性裸体啪啪无遮挡网站| 国产av码专区亚洲av| 国产成人一区二区在线| a级片在线免费高清观看视频| 久久亚洲国产成人精品v| 欧美+日韩+精品| 欧美人与善性xxx| 国产探花极品一区二区| 欧美激情国产日韩精品一区| 九草在线视频观看| 久久女婷五月综合色啪小说| 伦理电影大哥的女人| 亚洲成色77777| 国产熟女欧美一区二区| 波野结衣二区三区在线| 亚洲精品一二三| 少妇被粗大的猛进出69影院 | 亚洲婷婷狠狠爱综合网| √禁漫天堂资源中文www| 亚洲成色77777| 99re6热这里在线精品视频| 欧美成人午夜免费资源| 51国产日韩欧美| 99久久精品国产国产毛片| 一区二区三区乱码不卡18| 自拍欧美九色日韩亚洲蝌蚪91| 欧美日韩精品成人综合77777| 交换朋友夫妻互换小说| 国产av精品麻豆| 高清视频免费观看一区二区| a级片在线免费高清观看视频| 亚洲三级黄色毛片| 熟妇人妻不卡中文字幕| 激情视频va一区二区三区| 免费黄色在线免费观看| 热99国产精品久久久久久7| 精品一区二区三区视频在线| 国语对白做爰xxxⅹ性视频网站| 亚洲,一卡二卡三卡| 亚洲情色 制服丝袜| 自拍欧美九色日韩亚洲蝌蚪91| 一区在线观看完整版| 国产精品一区二区在线观看99| 免费看不卡的av| 精品一品国产午夜福利视频| 国产日韩欧美亚洲二区| 中文字幕免费在线视频6| 狂野欧美激情性bbbbbb| 一级毛片我不卡| 黄色怎么调成土黄色| 国产精品三级大全| 色吧在线观看| 99香蕉大伊视频| 亚洲av国产av综合av卡| 免费在线观看完整版高清| 免费少妇av软件| av线在线观看网站| 欧美变态另类bdsm刘玥| 你懂的网址亚洲精品在线观看| 波多野结衣一区麻豆| 男女国产视频网站| a 毛片基地| 日日啪夜夜爽| 丝瓜视频免费看黄片| 夜夜爽夜夜爽视频| 蜜桃在线观看..| 国产精品久久久久久久电影| 日本午夜av视频| 全区人妻精品视频| 午夜福利在线观看免费完整高清在| 国产男女超爽视频在线观看| 国产一区二区激情短视频 | 熟女人妻精品中文字幕| 国产黄色视频一区二区在线观看| 99热这里只有是精品在线观看| 亚洲av综合色区一区| 九九爱精品视频在线观看| 国产深夜福利视频在线观看| 十八禁高潮呻吟视频| 国产精品久久久av美女十八| av视频免费观看在线观看| kizo精华| 国产在线免费精品| 国产白丝娇喘喷水9色精品| 国产成人精品福利久久| 精品国产一区二区三区久久久樱花| 免费女性裸体啪啪无遮挡网站| 22中文网久久字幕| 中文欧美无线码| 一区在线观看完整版| 精品国产乱码久久久久久小说| 男女下面插进去视频免费观看 | 中文字幕免费在线视频6| 国产成人精品一,二区| 久久婷婷青草| 人妻一区二区av| 亚洲av电影在线进入| 寂寞人妻少妇视频99o| 超色免费av| 欧美日韩精品成人综合77777| 亚洲美女视频黄频| 亚洲一级一片aⅴ在线观看| 久久久久视频综合| 丰满乱子伦码专区| 亚洲欧美一区二区三区国产| 巨乳人妻的诱惑在线观看| 99精国产麻豆久久婷婷| 又大又黄又爽视频免费| 国产免费视频播放在线视频| 中文字幕亚洲精品专区| 久久久久精品人妻al黑| 中文字幕另类日韩欧美亚洲嫩草| 女性生殖器流出的白浆| 亚洲在久久综合| 免费观看av网站的网址| freevideosex欧美| 2022亚洲国产成人精品| av免费在线看不卡| 九色成人免费人妻av| 亚洲av.av天堂| 亚洲av.av天堂| 国产亚洲午夜精品一区二区久久| 99视频精品全部免费 在线| 肉色欧美久久久久久久蜜桃| 纵有疾风起免费观看全集完整版| 91国产中文字幕| 亚洲国产av新网站| 2018国产大陆天天弄谢| 国产高清国产精品国产三级| 高清不卡的av网站| 在线 av 中文字幕| 久久久久久人人人人人| 亚洲国产看品久久| 国产精品.久久久| 母亲3免费完整高清在线观看 | 人体艺术视频欧美日本| 91午夜精品亚洲一区二区三区| 日韩制服骚丝袜av| 制服人妻中文乱码| 国语对白做爰xxxⅹ性视频网站| 日本欧美国产在线视频| 中文字幕亚洲精品专区| 在线观看一区二区三区激情| 日韩精品免费视频一区二区三区 | 丰满迷人的少妇在线观看| 日韩免费高清中文字幕av| 国产老妇伦熟女老妇高清| 国产成人欧美| 国产精品久久久av美女十八| 人妻一区二区av| 日本黄大片高清| 欧美人与性动交α欧美精品济南到 | 亚洲综合色网址| 国产精品女同一区二区软件| 99九九在线精品视频| 2021少妇久久久久久久久久久| 97精品久久久久久久久久精品| 97精品久久久久久久久久精品| 婷婷色av中文字幕| 免费少妇av软件| www.av在线官网国产| 亚洲精品久久久久久婷婷小说| 国产国语露脸激情在线看| 大片电影免费在线观看免费| 亚洲情色 制服丝袜| 亚洲综合精品二区| 天堂俺去俺来也www色官网| 在线亚洲精品国产二区图片欧美| 久久久a久久爽久久v久久| 午夜福利乱码中文字幕| 五月开心婷婷网| 欧美精品一区二区大全| 99热国产这里只有精品6| 制服人妻中文乱码| 9色porny在线观看| 国内精品宾馆在线| 亚洲精品一区蜜桃| 在线精品无人区一区二区三| 狠狠婷婷综合久久久久久88av| 国产精品偷伦视频观看了| 国产精品免费大片| 亚洲高清免费不卡视频| 亚洲欧美一区二区三区国产| 国产精品一二三区在线看| 久久精品熟女亚洲av麻豆精品| 免费av不卡在线播放| 欧美精品人与动牲交sv欧美| 美女主播在线视频| 国产成人精品婷婷| 国产免费视频播放在线视频| 麻豆精品久久久久久蜜桃| 黄色毛片三级朝国网站| 国产精品国产三级国产专区5o| 看免费成人av毛片| 99热网站在线观看| 九色亚洲精品在线播放| 人妻一区二区av| 精品国产乱码久久久久久小说| 我要看黄色一级片免费的| 少妇的逼水好多| 69精品国产乱码久久久| 性色avwww在线观看| 国产av一区二区精品久久| av免费观看日本| 亚洲精品国产av成人精品| 欧美变态另类bdsm刘玥| 中文字幕人妻熟女乱码| 最近手机中文字幕大全| 另类亚洲欧美激情| 最黄视频免费看| www日本在线高清视频| 国产又色又爽无遮挡免| 五月伊人婷婷丁香| www日本在线高清视频| 亚洲国产精品999| 亚洲伊人久久精品综合| 9热在线视频观看99| 久久精品国产亚洲av涩爱| 99久久中文字幕三级久久日本| 亚洲人与动物交配视频| 久久鲁丝午夜福利片| 777米奇影视久久| 国产成人欧美| 亚洲国产精品一区二区三区在线| 亚洲情色 制服丝袜| 极品少妇高潮喷水抽搐| 亚洲精品视频女| 王馨瑶露胸无遮挡在线观看| 久久久精品免费免费高清| 久久久久精品久久久久真实原创| www日本在线高清视频| 99re6热这里在线精品视频| 桃花免费在线播放| 韩国av在线不卡| 国产高清不卡午夜福利| 黑人猛操日本美女一级片| 晚上一个人看的免费电影| av在线观看视频网站免费| 啦啦啦视频在线资源免费观看| 夜夜爽夜夜爽视频| 69精品国产乱码久久久| av不卡在线播放| 人人妻人人爽人人添夜夜欢视频| 色吧在线观看| 另类精品久久| 精品国产露脸久久av麻豆| 国产高清国产精品国产三级| 伊人久久国产一区二区| 亚洲精品日韩在线中文字幕| 欧美日韩国产mv在线观看视频| 18+在线观看网站| 国产成人免费观看mmmm| 青春草国产在线视频| 国产精品久久久久久久电影| 黄片播放在线免费| 一级爰片在线观看| av福利片在线| 久久久久精品性色| 性高湖久久久久久久久免费观看| 大片电影免费在线观看免费| 中文字幕免费在线视频6| 日韩人妻精品一区2区三区| 卡戴珊不雅视频在线播放| 国产精品久久久av美女十八| 天堂中文最新版在线下载| 亚洲,一卡二卡三卡| 国产av一区二区精品久久| 女人精品久久久久毛片| 在线观看www视频免费| 国产免费福利视频在线观看| 久久97久久精品| 久久毛片免费看一区二区三区| 日本vs欧美在线观看视频| 亚洲av综合色区一区| 22中文网久久字幕| 一本大道久久a久久精品| 大香蕉久久网| 黄片播放在线免费| 久久国产精品男人的天堂亚洲 | av国产久精品久网站免费入址| 亚洲性久久影院| 黑人猛操日本美女一级片| 亚洲欧美色中文字幕在线| 成人国产麻豆网| 成人二区视频| 亚洲精品自拍成人| 日本欧美国产在线视频| 90打野战视频偷拍视频| 日韩成人av中文字幕在线观看| 国产av一区二区精品久久| 爱豆传媒免费全集在线观看| 国产在线一区二区三区精| 久久精品久久久久久噜噜老黄| 99热这里只有是精品在线观看| 91午夜精品亚洲一区二区三区| 国产精品熟女久久久久浪| 波野结衣二区三区在线| 中文字幕av电影在线播放| 中国美白少妇内射xxxbb| 九色成人免费人妻av| 韩国精品一区二区三区 | 中文乱码字字幕精品一区二区三区| 国产白丝娇喘喷水9色精品| 国产精品女同一区二区软件| 午夜免费观看性视频| 日韩不卡一区二区三区视频在线| 啦啦啦啦在线视频资源| 亚洲精品日韩在线中文字幕| 男人爽女人下面视频在线观看| 黄色一级大片看看| 国产精品一区二区在线不卡| 久久国产精品男人的天堂亚洲 | 午夜激情av网站| 国产精品久久久久久久电影| 热re99久久国产66热| 在线精品无人区一区二区三| 中国国产av一级| 午夜视频国产福利| 日韩中文字幕视频在线看片| 亚洲少妇的诱惑av| 菩萨蛮人人尽说江南好唐韦庄| 中文字幕免费在线视频6| 一区在线观看完整版| 黄色视频在线播放观看不卡| 在线看a的网站| 99久国产av精品国产电影| 亚洲av电影在线进入| 国产1区2区3区精品| 色5月婷婷丁香| 狂野欧美激情性xxxx在线观看| 久久狼人影院| 亚洲av福利一区| 天天影视国产精品| 国产成人免费观看mmmm| 久久精品国产鲁丝片午夜精品| 中文字幕人妻熟女乱码| 女的被弄到高潮叫床怎么办| 日本午夜av视频| 蜜桃国产av成人99| 一区在线观看完整版| 如何舔出高潮| 欧美人与性动交α欧美精品济南到 | 国产毛片在线视频| 国产精品欧美亚洲77777| 青春草视频在线免费观看| 国产男人的电影天堂91| 国产一区有黄有色的免费视频| av片东京热男人的天堂| 国产在线一区二区三区精| 一区二区av电影网| av线在线观看网站| 日韩制服骚丝袜av| 精品亚洲乱码少妇综合久久| 亚洲国产精品专区欧美| 亚洲图色成人| 极品人妻少妇av视频| 亚洲少妇的诱惑av| 中国三级夫妇交换| 国产亚洲av片在线观看秒播厂| 桃花免费在线播放| 久久久国产欧美日韩av| 少妇的丰满在线观看| 一个人免费看片子| 午夜激情久久久久久久| 免费人成在线观看视频色| 久久久国产一区二区| 国产xxxxx性猛交| 高清av免费在线| 日韩一区二区视频免费看| 国产男人的电影天堂91| 久久婷婷青草| 丰满乱子伦码专区| 宅男免费午夜| 校园人妻丝袜中文字幕| 国产一区二区激情短视频 | 永久网站在线| 美女国产高潮福利片在线看| 22中文网久久字幕| 欧美成人午夜精品| 91aial.com中文字幕在线观看| 国产无遮挡羞羞视频在线观看| 亚洲熟女精品中文字幕| 久久久久久久大尺度免费视频| 最新中文字幕久久久久| 久久精品国产亚洲av天美| av片东京热男人的天堂| av在线老鸭窝| 日韩中文字幕视频在线看片| 午夜影院在线不卡| 日本猛色少妇xxxxx猛交久久| 欧美变态另类bdsm刘玥| 亚洲情色 制服丝袜| 欧美日本中文国产一区发布| 国产免费视频播放在线视频| 久久久久久人人人人人| 国产国语露脸激情在线看| av在线app专区| 一本色道久久久久久精品综合| 国产亚洲精品第一综合不卡 | a级毛片黄视频| 婷婷色麻豆天堂久久| 少妇熟女欧美另类| 成人手机av| 一级黄片播放器| 黄片无遮挡物在线观看| 久久久久久久国产电影| 五月伊人婷婷丁香| 内地一区二区视频在线| 一级毛片 在线播放| 男人添女人高潮全过程视频| 国产精品三级大全| 欧美激情 高清一区二区三区| 国产成人av激情在线播放| 岛国毛片在线播放|