• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

    2022-07-08 01:38:44AhmadHusseinAbabneh

    Ahmad Hussein Ababneh

    Abstract—Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets.Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004.This study investigates which of these datasets can provide significant training and fair evaluation for text classification(TC).In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models.We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison.The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset.The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time,with the accuracy of 82%.

    1.Introduction

    Text classification (TC) is defined as a text mining process that aims to specify the class or category of a given textual document.The output of such a process can be viewed as a set of classes; each of them contains a set of text documents that represents a specific topic or kind.Single-label TC provides a single label for each document, whereas multi-label TC may generate multiple labels for each document[1].

    The massive amount of texts found on the web and the amount of texts that is being loaded every second necessitate a classification model that can classify each document into its appropriate category, facilitating tasks in other fields, such as natural language processing and information retrieval tasks[2].The importance of TC in other text mining fields has been deeply emphasized by Elnagaret al.[1],[3].

    The training and evaluation of newly developed models in the field of TC mainly depend on the existence of a pre-classified dataset or corpus of text documents[4].A large number of datasets based on the English language have been evaluated for developing TC models using various statistical learning models.In [5]Zhanget al.listed these datasets with their respective statistics, including the number of text documents contained in training and testing sets, the number of classes, and accuracy obtained using several supervised and unsupervised learning models on these datasets.The construction of datasets for the TC research purpose for the Arabic language, which is at the core of this research, is still in its early stages.However,promising research in this field has recently emerged, creating new datasets, such as Single-Label Arabic News Articles Dataset (SANAD), NADiA, and KALIMAT.

    In the field of TC, selecting appropriate datasets is typically a major challenge; thus, researchers should have clear understanding of the data on which a model will be trained and tested to obtain unbiased and significant results[5].It is critical to evaluate the existing datasets and assess whether these datasets were prepared according to a scientific approach.Well-prepared datasets imply consistent judgment across different TC models.

    From our viewpoint, two important factors should be considered when assessing the effectiveness and usefulness of a dataset:

    1) Experiments should be conducted on the dataset using successful machine learning (ML) models in the field of TC.In general, supervised TC models have achieved significant results and have been used in this field since 1968[6].Support vector machines (SVMs), logistic regression (LR),K-nearest neighbor (KNN), etc.have been used intensely in this field.A lot of studies, with various ML models, achieved consistent results[7]-[12].Thus, a new dataset, with its training and testing set, should produce comparable results if it is processed using such pre-trained ML models.

    2) The dataset should be compared with the existing datasets.This comparison includes the application of uniform TC models with uniform implementation constraints.In addition, it includes standard accuracy measures that have been used to evaluate TC models, such as the precision, recall,f1-score, macro_avg,and weighted_avg.Furthermore, a heat map could be constructed for each run to provide a clear picture of the behavior of classifiers over each dataset.

    In this paper, we focus on assessing the quality of Arabic language datasets that have been constructed for training and evaluating TC models developed especially for single-label Arabic TC.The Arabic language is imperative because it is used for more than 450 million people and the religious language for all Muslims worldwide[13].Furthermore, the Arabic language has unique features that make the text processing of the language such a challenging task.These features include 1) the inflectional and derivational nature of the language, 2) the lack of capitalization and short vowels, 3) the existence of different variations of the language in use—the modern, classical, and colloquial, and 4) the permission of the construction of a sentence with an implicit subject[14].

    Several Arabic language datasets have been created for training the newly developed TC models.These datasets include SANAD, Khaleej, Arabiya, Akhbarona[15], KALIMAT, a multipurpose Arabic corpus[16],Waten2004[17], and Khaleej2004[17].There are other datasets, but these datasets do not contain a reasonable number of text documents.This study presents a comprehensive guide for researchers in the field of Arabic TC on the best and reliable datasets to use for model training and evaluation.This research aims to investigate existing Arabic datasets that can provide significant model training and fair evaluation of the models for TC.To achieve this aim, several classifiers have been developed and implemented.These classifiers are based on well-known and tested supervised TC methods.These classifiers are tested on the previously mentioned Arabic datasets.The analysis also includes a comparison with an English dataset that has been used in several research studies[18]-[22].

    After introducing the main problem and explaining our aim in Introduction, the remaining sections of the paper are structured as follows.Section 2 describes the learning models used in our experiment.Section 3 illustrates a general framework and highlights steps performed before, during, and after training the models.Section 4 evaluates the models’ training relevance results and highlights the main findings.Finally, Section 5 draws the main conclusion.

    2.Text Classification Tasks

    Both the supervised and unsupervised ML approaches have been examined in the field of TC[6],[23],[24].The difference between supervised and unsupervised learning lies in labeled data.Supervised learning models use labeled data that represent the models’ knowledge repository, and then use this knowledge to make future predictions.Simply, we teach a model with a sufficient amount of data and then make the model for future predictions.For unsupervised learning models, data are unlabeled, and the models are not aware of any information about the classes or categories in the data; these models attempt to find meaningful patterns in the data[24].

    In this paper, learning models in our experiments include a multinomial naive Bayes (NB) classifier, an SVM classifier, a random forest (RF) classifier, a KNN classifier, and an LR classifier.These classifiers adopted the supervised learning approach, and we have explained the reason for this choice in Introduction.The following subsections explain the five models.

    2.1.NB Model

    The NB classifier is one of the simplest classification learning models but has achieved reasonable accuracy[25],[26].Experiments have shown that it is comparable to advanced models, such as SVM models[7].Furthermore, it is considered an efficient learning method[27], and, as Jianget al.mentioned in [28], NB is among the top ten data mining algorithms due to its efficiency and simplicity.

    NB is a family of models.They use the NB assumption, given pairs of datais a point in ad-dimensional real space,andyi∈Y, whereY={1,2,…,m} is a finite set of classifications.A probabilistic model or a joint distribution betweenxandycan be built under the following assumption:

    Assume a family of distributionsPθf(wàn)orx(i)∈Rdandyi∈Y, such that

    In TC,xandycan be expressed as documents and their classes or words.x-values represent features, and these features may be represented by numerical values, such as frequency.Regardless of the numerical representation of features, the fundamental assumption in NB is the conditional independence[27]:

    If we have anotherx∈Rd, we need to predicty.We need to estimateθf(wàn)romD(e.g., using the maximum likelihood); then, we could computeyfrom the equation:

    From the NB rule:

    The denominator does not depend ony; we will omit it, and if we return to the conditional assumption, we can rewrite the previous equation as follows:

    2.2.RF Model

    RF is identified as one of the most popular and powerful ML algorithms that can perform both classification and regression tasks[29],[30].It is an ensemble ML approach, with a divide-and-conquer approach that combines multiple learning models to form one model to improve the performance[1].Thus, as the name suggests, the model creates several decision trees as a boost of a single tree in a cut model.The model builds up these multiple decision trees using an algorithm, such as the information gain or the Gini index approach[31].In general, the more trees in the forest, the more robust the classification, resulting in higher accuracy[32].

    In RF, to classify a new object based on attributes or features, each tree provides its prediction, and the model chooses the classification that has the most votes (the majority voting) of the trees in the forest.These majority votes then help rectify the high variance which means that a model makes a poor prediction when the training dataset is changed and generally appears in a single decision tree model[31].Thus, RF has a low bias, because it is based on decision trees, and has low variance.It uses the bagging technique (the bagging technique is an ensemble ML approach), which combines multiple decision trees to form a single model.The low bias and variance reduce the overfitting problem, which happens when a trained model performs well with training data but provides inaccurate predictions with the testing data[33].

    Complete mathematical analysis of RF can be found in [32].Assume the number of cases in a training set isN(or the number of items in the training data).Then, samples fromNcases will be taken at random but with replacement.If there areMinput variables or features, a numberm<Mis specified such that at each node,mvariables are selected at random out ofM.The best split ofmis used to split the tree node.The value ofmis kept constant, while the forest is grown.Each tree is grown to the largest extent possible, and there is no pruning.We can predict new data by aggregating the prediction ofktrees[30].

    D={x1,x2,…,xn},xi∈Rdyi∈YY={1,2,…,m}

    Given a training dataset and specified classes , where ,selectkrandom samples with replacement fromD, and build or fitkdecision trees, as follows:

    Select random samples, with replacement, fromDandY; name them asDiYi.

    FithionDiYi, wherehiis a decision tree learner that was constructed from theDirandom sample.

    After training, classifying newx′into the appropriate class (or merely obtaining thevalue) can be made by taking the majority vote from all individual decision trees onx′:

    Besides, the low bias and variance advantage of the RF classifier implies that the classifier takes less time for training and testing than other supervised learning models[30],[31].This is rational, because RF belongs to the low bias family, which reduces the number of assumptions made by RF to fit a model, and it needs a maximum of two tuning parameters[32].

    2.3.KNN Model

    KNN is a powerful classification model in the multidimensional classification space that provides competitive accuracy results[34]-[38].The estimation of the classification assumes that neighbors always share common features and, consequently, common classes.

    KNN assumes that in thed-dimensional space, the closer the two objects, the more likely they belong to the same class.The closeness could be depicted as the distance between the objects or simply between points.The distance can be computed using the Euclidean, Hamming, Manhattan, or Minkowski methods.The closeness could also be seen as the similarity between the objects, and this similarity can be computed using one of the similarity computation methods, such as the cosine similarity[38].

    KNN is a classical ML algorithm that has theoretical guarantees: Asnor the size of dataDbecomes huge,there is some theoretical guarantee about the performance of KNN[36].

    Given a training datasetD={(x1,y1), (x2,y2),…,(xn,yn)},xi∈Rdand a finite set of classesyi∈Y,whereY={1, 2,…,m}, there is some distance metric, such as the Euclidean distance, which measures the distance between two points in thed-dimensional space such that

    wherexi,xj∈Rd,xi=(xi1,xi2,…,xid), andxj=(xj1,xj2,…,xjd).In KNN, if we consider the distance between the newxcalledx′and its closest points (point) as the similarity measure, the class ofx′will be

    If we consider the similarity betweenxiandxj, wherexiis a vector of features, such that=(xi1,xi2,…,xih)and=(xj1,xj2,…,xjl),xi,xj∈Rd, is[2]

    we can generalize the similarity principle and reform (3) to be

    KNN also has a probabilistic formulation[34]as follows.A random variabley:

    whereP(y∣x,D) is the fraction of pointsxiinNk(x), andNk(x) is thek-nearest points toxi.

    The predicted class ofx′is

    2.4.SVM Model

    SVM is a learning model that separates data into classes by creating balanced hyperplanes in a highdimensional space.The idea is to find hyperplanes that maximize the distances between the edges of the classes.SVM gradually seeks to find the maximum empty area between any pair of classes, which is represented as the margin that spans from the central line that separates two classes, to the boundaries of the classes[12].In a multi-classification model, SVM creates a set of hyperplanes and then chooses a subset of them that maximizes the separation margins[26].

    SVM is considered a robust and accurate classification model.This judgment is based on a large number of research studies in different fields, which used SVM and achieved significant results[39]-[46].For the Arabic language, Qadiet al.[11]experimented with 10 supervised learning models, wherein using the SVM model was able to obtain the most significant performance result.A complete mathematical formulation of SVM can be found in [47], but a brief and summarized explanation is essential.

    To illustrate the optimization behind the SVM model, we start from a simple linear model for the binary classification.Given thek-dimensional vectorsx(1),x(2), …,x(n), the training set, and input vectors=(x1,x2,…,xd), wheredis the number of features,xi∈Rd, andyis the classes’ labels,y∈YandY∈{1, -1}, the classification function is

    SVM depicts the data as instances that are embedded in ad-dimensional space and finds a weighted vector=(w1,w2,…,wd),wi∈Rd, such that the linew·x=θ, where operator “·” means the dot product,separates the boundary of two classes (as shown inFig.1).

    Representing a text document in ad-dimensional space means that each distinct word (feature)establishes a dimension, and the document is a point in this space; the classification is performed by creating a separation line, such that all points that fall on one side of the line belong to class 1, while the points which fall on the second side belong to class 2, as shown inFig.2.

    Fig.1.Binary linear classification.

    Fig.2.Text documents on the feature space.

    Now, to define the SVM classifier, we can update the idea of the linear classification.FromD,we have pre-labeled examples (training dataset),,y∈Y, andY∈{1,2,…,m}.Using the same technique in the linear classification, the dot product of the weighted vectorwand the vectorxis.

    In linear classification, multiple lines can exist and separate their classes, but SVM seeks to find the best line that divides data points.To define the best line, SVM considers the distance from the separating line that tells us how confident a prediction is[48].Thus, SVM searches for the line that has the largest margins.

    Define the margin 2γas the length of the empty area between the boundaries of the classes, and the best line is located at the middle of this area, as shown inFig.3.

    Fig.3.Largest margins in SVM.

    Then, the distance betweenAandLis

    Fig.4.Computation of γ.

    AndMis onL, so the=-b, then

    This means that the distance or the margin betweenAandLis the vectorw, which is perpendicular to lineL, multiplying by the coordinates ofAplus the offsetb,γ=(w·A+b)y.So, we want to findwsuch that the smallest marginis as large as possible[47]:

    and this leads to the initial optimization

    such that

    In the previous definition ofγ, the longer thew, the bigger the margin will be; thus, we need to normalizewby its length:

    where

    As shown inFig.5, lineLis defined by a few points that are located on the boundary (not by all points in the space).This means that we can ignore most of the points and focus on points that defined the line (these points are called the support vectors).Let us also require support vectorsxito be on the plane defined by

    Fig.5.Relationship between γ and the support vectors.

    To findγ, we need to find the relation between the support vectors: Assume thatx1is the support vector on one side of lineLandx2is the vector point on the other side ofL,

    We can substitute the value of x1into the equationw·x1+b=1, and we obtain

    Now, the equation w ·x2+b=-1, because it is on the other side ofL:

    since

    This implies that maximizingγcan be expressed as follows:

    which is equivalent to

    Therefore, (7) can be rewritten as

    such that

    So far, we implicitly assume that our data are linearly separable, and it is always possible to find the line that entirely separates the classes.However, most real datasets are noisy, which means that no such separable hyperplanes exist[49].Dealing with such data, where finding out a robust separator seems impossible and this almost happens in every dataset, implies to change our formulation and introduce the penalty parameter:

    whereCis a control variable that can be set using cross-validation.

    The previous equation means that the line should have a maximum margin and makes a small number of mistakes.However, the mistaken points do not have equal severity[49], and we can penalize them using the slack variableξ, as shown inFig.6.

    Fig.6.Separated line with slack variable ξ.

    with the constraint

    A new optimization problem appears in which our goal is to find the largest margin and the smallest sum of the penaltyξi:

    2.5.LR Model

    LR is one of the most widely used classification models[50],[51].Even the name contains the word“regression”, LR is mainly used for classification problems, because it predicts categorical values, not continuous ones.

    In LR, we model our data with a linear functionf(w,b)=wx+b, wherewis the weight andbis the bias,and this produces continuous values.In LR, we seek to covert continuous values to the probability.LR models the probability using the sigmoid function[52]:

    This prediction outputs a probability value between 0 and 1.

    In the approximation functionhθ(x), we have to define thew(weight) andb(bias) using a specific cost function, such as the cross-entropy[52]:

    LR optimizes the cost overwandbusing the gradient descent optimization strategy.It starts at some point and iteratively updates the parameters (wandb) through the calculation of the derivative and then follows the direction of the derivative until reaching the global minimum.In each iteration, the parameters are updated in small steps called the learning rateη.The gradient descent then updateswandbusing the following rules:

    Furthermore, the derivatives can be performed using the following equations[51]:

    3.Structural Framework

    This study’s goal is to measure the relevance of existing Arabic language datasets in the context of TC tasks using well-known and widely used supervised learning models.Therefore, we construct a complete experiment that includes the data collection of the existing datasets, text preprocessing, model implementation, and model performance and analysis.The framework established in this experiment is depicted in Fig.7.

    3.1.Experiment Datasets

    Fig.7shows all the steps performed during the implementation of this experiment.Note that the starting point is to collect datasets used and proposed in the field of TC for the Arabic language.As we stated in Introduction, the selection is based on the number of documents included in the datasets and on the number of classes into which the documents are classified.Therefore, many datasets have been excluded.Datasets selected in the experiment and their features are presented inTable 1.

    Table 1: Features of the Arabic language TC datasets

    Fig.7.Experiment framework.

    InTable 1,seven datasets have been chosen and split into training and testing sets.These datasets have been recently proposed in TC as datasets for training and evaluating newly developed models for the Arabic language[3].Table 1shows that the number of documents in both training and testing sets in these datasets varies, ranging from 5690 documents in Khaleej2004 to 110900 in SANAD.

    The SANAD dataset is the largest because it contains three merged datasets: Khaleej, Akhbarona, and Arabiya.Therefore, we process SANAD as one dataset, and as three datasets by splitting it into three parts.Table 1shows the connection between SANAD and the three datasets, and we refer to them as SANADKhaleej, SANAD-Akhbarona, and SANAD-Arabiya inTable 1, but throughout the paper, they will be referred to as Khaleej, Akhbarona, and Arabiya.We process SANAD in this way to determine whether we can use one of the three parts individually as training and testing sets, especially when they contain sufficient documents.These individual examinations of Khaleej, Akhbarona, and Arabiya have an efficiency effect, since there are significant differences in the size of SANAD as a whole and the sizes of its parts.

    The other three datasets are KALIMAT, Waten2004, and Khaleej2004[15].These datasets are constructed in folders; each folder represents a class (culture, sport, religion, etc.).KALIMAT, Waten2004, and Khaleej2004 datasets are originally not split into training and testing sets, but for consistency with the SANAD’s datasets, we split them according to the splitting ratio used in SANAD.

    For comparison, we use another dataset called “20_newsgroup dataset” that is built in English and has been used to the experiment with new classification models[18]-[22].Another reason that motivates us to use this dataset is the number of documents it contains (18846 documents), as it contains a moderate number of documents (most English language datasets have a huge number of documents, see [5]), it is comparable to the seven Arabic datasets shown inTable 1.

    3.2.Dataset Preprocessing

    The general preprocessing operations have been performed, which include the following:

    1) Tokenization: This operation tokenizes a text and represents the text as a set of words or tokens.

    2) Stopwords removal: This operation includes the removal of any kind of speech that is not a noun or verb.The list of distinct stopwords for the Arabic language contains more than 400 terms[13].

    3) Stemming: The process of removing prefixes and suffixes of a token.The stemming operation is essential and has the positive efficiency and performance effect[14].

    3.3.Learning Model Training

    After preprocessing the text documents, the documents are then split into training and testing documents,as shown inTable 1andFig.7.Learning models are trained in the supervised learning context using the dataset to generate the required learning models.Five learning models are developed based on each training dataset.For example, the experiments developed five learning models for SANAD using five learning approaches described in Section 2.This allows us to measure the performance of each learning approach on each dataset and provides a way to measure how the learning approach suited each dataset.So, the total number of models that are learned from the eight datasets (including the English language one) is 40.

    Python 3 is used to develop these models, and we use a machine with the 8-GB random access memory(RAM) and Intel(R) core? i5-7200U CPU@2.50 GHz 2.71 GHz processor.This piece of information is important, so that readers can relate to the execution time analysis that appears in the evaluation section.

    3.4.Outputs of Learning Models

    The classification process is initiated using the 40 training models, and classification results are collected.Each document will automatically classified into a certain category based on the learning model.The outputs of the learning models are a set of categories.The grouped documents share certain criteria based on the learning method used to perform the classification process.Therefore, for the 8 datasets, we have 40 classification output sets (because each dataset was used to train 5 classification methods).The automatically generated output takes the following quadruple shape:

    The quadruple (SVM, Arabiya, 103, Politics) means that the SVM model classifies document 103 in the Arabiya dataset into the Politics category.

    This representation helps grasp the automatically generated category of the document using different learning models.For example, the same document (103 in Arabiya) has been classified by KNN into the Finance category, and this fact can be easily discovered by processing the quadruple (KNN, Arabiya, 103,Finance), which means that the KNN classifier classifies document 103 in the Arabiya dataset into the Finance category.Since document 103 talks about a loan from the International Monetary Fund to Egypt, the two classification results seem both reasonable.

    The output sets of all experiments are large; they equal the number of documents in the testing sets multiplied by 5.However, the important thing is to measure the classification accuracy and relevancy of each learning method on each dataset.This means that the automatically generated classifications should match the classifications obtained using the testing sets in a significant ratio.The well-known accuracy and relevancy measures in the field of TC are described in the next subsection.

    3.5.Accuracy and Relevance Measures

    The testing phase is established using the testing set of each dataset.The testing set contains preclassified documents (actual classes), and then the learned model predicts a class for each document.Important relevance measures are used to measure the match performance between the pre-classified classes and predicted classes.The performance metrics used in this study include the precision, recall,f1-score, macro_avg, weighted_avg, accuracy, and confusion matrix.For clarity, we assume the following:

    1) True positive: The number of documents that are classified into classciand should be inci.

    2) False positive: The number of documents that are classified into classciand should not be inci.

    3) True negative: The number of documents that are not classified into classciand should not be inci.

    4) False negative: The number of documents that are not classified into classciand should be inci.

    5) Support: The actual number of documents in each class (taken from the testing dataset).

    The relevance measures include

    The confusion matrix has also been used in the evaluation, which tells us if the classification model did right or wrong[38].The confusion matrix has been depicted in a heat map form to ensure clarity for readers.The confusion matrix takes the following form:

    wherexdenotes the number of documents classified into Class 1 and predicted by the learning model as Class 1, whereasyrepresents the number of documents classified into Class 1 and predicted by the learning model as Class 2.Thus, the main diagonal represents the learning model’s correct classification (the actual and the predicated match).

    4.Evaluation and Discussion

    The relevance measurement outcomes for each learning model on the eight datasets are collected for further analysis and discussion.The aim is not to choose a particular dataset or model; the aim is to show,using empirical experiments, the features of each model over the Arabic language datasets shown inTable 1.

    4.1.Accuracy Measure

    The final accuracy results are displayed in Fig.8.Note that the accuracy of the LR and SVM models is higher than that of the other models on all datasets, and it shows that all learning models, except the RF model, obtain the highest accuracy on the English language dataset.The accuracy of all models on the Khaleej and Arabiya datasets is convergent and is higher than the accuracy on the other Arabic language datasets.In this study, all learning models obtain the lowest accuracy on the Waten2004 dataset.

    4.2.Execution Time

    The accuracy of the five classifiers is supplemented with the time (in seconds) taken to train and develop each model.Table 2shows the time in seconds required to develop each model over the eight datasets.The horizontal view represents the time according to the datasets,whereas the vertical view represents the time according to the learning models.Table 2shows that both the LR and SVM models consume a considerable amount of time to be able to achieve the final accuracy, as shown inFig.8, whereas the KNN model consumes the lowest time.

    Fig.8.Accuracy results of the five learning models over the eight datasets.

    Table 2: Time in seconds required to build each model using the eight datasets

    The vertical view of Khaleej and English corpus columns shows the convergence between the behavior of the five models on Khaleej and the English language corpus, respectively.And note that the time consumed by the LR and SVM models on the SANAD dataset is 439.73 s and 1942.84 s, respectively.When we compare this to the accuracy of these models over the SANAD dataset inFig.8, we see that the increase in the time penalty comes with no improvement in the accuracy level.

    4.3.Relevance Measures

    The time execution provides an idea of how long it takes to run each model over each dataset with the hardware specifications mentioned in subsection 3.3; it will not be a critical factor in the experiment, because the datasets have different sizes.Therefore, we explore the relevance measures, because they are important to determine which datasets contain sufficient semantic information that can yield a reasonable classification result.

    4.3.1.Macro Average and Weighted Average

    The accuracy curve inFig.8is supplemented byFigs.9to14, which also measure the relevance of each model over the eight datasets.InFigs.9to 14, the datasets are placed on thex-axis, while they-axis represents the relevance values (precision, recall, etc.).The presentation in this way allowed us to see the behavior of each model on the eight datasets separately as well as the behavior of the models in one picture.The salient points in these figures can be summarized as follows:

    1) InFigs.9to14, two peaks appear; one at Arabiya, which spans to Khaleej, and one at the English language dataset.This note provides a positive indication about the possibility of using Khaleej and Arabiya as training datasets for TC tasks.For all learning models, these two datasets offer a productive environment that leads to significant relevant results.

    Fig.9.The macro_avg of precision values of five learning models over eight datasets.

    Fig.10.The macro_avg of the recall values of five learning models over eight datasets.

    Fig.11.The macro_avg of the f1 score values of five learning models over eight datasets.

    Fig.12.The weighted_avg of precision values of five learning models over eight datasets.

    Fig.13.The weighted_avg of the recall values of five learning models over eight datasets.

    Fig.14.The weighted_avg of the f1 score values of five learning models over eight datasets.

    2) InFigs.9to14, the curves are noted to decrease toward the SANAD dataset, which means that the learning models obtained smaller relevance values, except for the NB model, as shown inFig.9; however,the NB model shows unstable behaviors, as shown inFigs.9to14.This comment is valuable because it is mentioned before that SANAD is a collection of three datasets, that is, Arabiya, Khaleej, and Akhbarona, and when we process these three datasets separately using the learning models, we have obtained significant results.The question is why the fusion of these three datasets reduces relevance measures, which seems illogical because ML models should benefit from an increase in the dataset size.The answer to this question relies on the Arabic language.As we mentioned in Introduction, Arabic is a semantically rich language with a massive number of distinct terms.Although all Arab countries speak the same language, Arabic, each has its set of terms that are commonly used in its media.This variation in terms used from one country to the next impacts the statistics used by the learning model.

    3) InFigs.9to14, the models show the lowest relevance on the Waten2004 and KALIMAT datasets except for the NB model; the reason for this comes from the categories chosen.Both of them contain International and Local categories, which contain International or Local news data.Both the Local and International categories contained documents that could be predicted or classified into another category.For example, if the document processed refers to an international sports event, there is a good chance that the learning model will classify this document into the Sports category rather than the International category.

    4) According to relevance curves inFigs.9to14:

    ? The KNN and RF models show a similar trend, and their relevance scores are confined between 50%and 60%.

    ? The SVM and LR models show a convergent trend, and their relevance scores are confined between 70% and 85%.

    ? The SVM model obtains the most significant scores, which is consistent with the results shown inFig.8.

    ? The NB model shows significant results on Akhbarona, Arabiya, Khaleej, SANAD, and the English language corpus.

    4.3.2.Confusion Matrix and Heat Map

    The confusion matrix provides a clear picture of how the learning model predicted or classified the documents.Table 3represents the confusion matrix of the SVM model on Waten2004.The number of categories is six (Culture, Economy, International, Local, Religion, and Sports).The bold and underlined values are the numbers of documents that the SVM model classified correctly based on the actual classification.

    Table 3: Confusion matrix of SVM on Waten2004 dataset

    Confusion matrix interpretation is crucial because it provides information about the relationship between the categories.For example, 307 documents of 656 are classified into the Culture category, and they are preclassified in the testing set of Waten2004 in the Culture category, whereas 349 (33 + 30 + 80 + 133 + 73)documents that should be in the Culture category are incorrectly classified into other categories.Note that 133 of the documents that should be classified into the Culture category are classified into the Religion category, which means that the SVM model found relationships between the documents that belong to both categories.The confusion matrix can also reveal a problem in the classification process.The International category has 149 (33 + 27 + 37 + 21 + 31) misclassified documents, whereas the Local category has 296 (83+ 90 + 17 + 49 + 57) misclassified documents.We clarified the classification problem of International and Local categories when we analyzedFigs.9to 14, and the confusion matrix asserts our clarification.

    As we find that the SVM model has obtained the most significant relevance results, we present the confusion matrices of the SVM model on the eight datasets.The confusion matrices are presented in the heat map form to facilitate the trace of the performance of the classification process, as shown inFigs.15to21.

    Fig.15.Confusion matrix of SVM experiment on Akhbarona dataset.

    Fig.16.Confusion matrix of SVM experiment on Waten2004 dataset.

    Fig.17.Confusion matrix of SVM experiment on Khaleej2004 dataset.

    Fig.18.Confusion matrix of SVM experiment on Arabiya dataset.

    Fig.19.Confusion matrix of SVM experiment on KALIMAT dataset.

    Fig.20.Confusion matrix of SVM experiment on SANAD dataset.

    InFigs.15to22, the color brightness of a cell represents the number of documents in that cell; the brighter the color, the larger the number of documents.The scale on the right of each figure shows where the color changes.Based on this fact, the classification using Arabiya (Fig.18) seems to be the most significant followed by SANAD (Fig.20) and Khaleej (Fig.21).In these three figures, the diagonal appears, which gives the positive indication of the classification quality.Moreover, the trend of the classification inFigs.18,20, and 21 is similar to the trend inFig.22, which represents the classification using the 20_newgroup dataset.Based on classifications represented inFigs.15to 17, and 19, we can see that brighten cells are not located on the diagonal parts of the maps, which could mean that the number of predicted documents that does not match the number of the actual classification is large.

    Fig.21.Confusion matrix of SVM experiment on the Khaleej dataset.

    Fig.22.Confusion matrix of SVM experiment on the 20_newsgroup dataset.

    4.4.Main Findings after Analysis

    The relevance measures and time constraints should be used to determine which of the seven datasets should be used to train a classification model.In this paper, we aim to provide researchers with the required relevance measures upon which researchers can give their decision.Accordingly, we highlight the following findings from our experiment:

    1) For the five learning models, the use of KALIMAT, Waten2004, and Khaleej2004 as training datasets has produced insignificant relevance results (e.g., for the SVM classifier, the accuracy rates using the datasets are 67%, 66%, and 70%, respectively (Fig.8)).Thus, there is no need to consider the time penalty.

    2) The use of SANAD as a training dataset provides reasonable classification (Fig.20), but the time penalty is high (Table 2).The time required to train the SVM model with the SANAD dataset is ten times that required to train with the Khaleej dataset and 42 times that required to train with the Arabiya dataset.

    3) The use of Arabiya and Khaleej, which are complete datasets that are fused to form SANAD, as training datasets for the five learning models, provides significant relevance results (Figs.8to14) with an acceptable time penalty (Table 2).The advantage of the Arabiya dataset is that it provides the most significant classification, and this is shown in the confusion matrix representation.However, the disadvantage is the fewer number of categories; Arabiya has only five categories (Fig.18).On the other hand, the Khaleej dataset has more categories, but the relevance score is the second-best (Fig.21).

    4) The time required to process Akhbarona is less than that required to process SANAD (Table 2).Therefore, Akhbarona can be used instead of SANAD in training a learning model.

    5) In the seven Arabic language datasets, the number of categories is small (Figs.15to21)—the maximum is seven categories.In contrast, the English dataset 20_newgroup contains 20 categories with 7 subcategories of the Computer category (Fig.22).The categories of the seven Arabic datasets are general and each can contain several subcategories.The detailed categories of the English dataset help in increasing the accuracy of the learning process.A future challenge is to create a dataset that contains a considerable number of categories and classify the documents into these categories in a consistent way that improves the learning strategy to develop a learning model with an accurate classification.

    5.Conclusion

    In the paper, we trained five well-known and widely used learning models using seven recently published Arabic language datasets for TC tasks.The SANAD, Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004,and Khaleej2004 datasets were studied.This study presented all relevance measures that are necessary for the researchers to decide which of these datasets provides the best training and testing environment for TC of the Arabic language text.The selected learning models included the well-known and accurate learning models that have demonstrated their competence in TC.

    The decision of which dataset(s) to use as training and testing sets is based on the behaviors of these datasets during the training and evaluation of the learning models.The behavior was then measured through standard relevance and time measures obtained on 40 different learning experiments.After analyzing all experiments, we have found that the Arabiya and Khaleej datasets provided significant training, and this was reflected in their relevance measures, which reached 82%.Furthermore, we found that the learning time of the models on Arabiya and Khaleej was deemed reasonable.The SANAD and Akhbarona datasets showed a convergent relevance trend, with relevance scores ranging from 56% to 68%.However, the execution time to train the models using the SANAD dataset was insignificant.KALIMAT, Waten2004, and Khaleej2004 showed a moderate performance for both relevance and training time.

    One shortcoming was observed in this study: The number of categories in the seven datasets is small compared with that in the existing English language dataset.The maximum number of categories in the Arabic datasets is seven, and all categories depict general topics that could be subdivided into subcategories.

    Disclosures

    The authors declare no conflicts of interest.

    色播在线永久视频| 亚洲人成77777在线视频| 亚洲自偷自拍图片 自拍| 一区在线观看完整版| 考比视频在线观看| 不卡av一区二区三区| e午夜精品久久久久久久| 亚洲性夜色夜夜综合| 国产欧美日韩一区二区三区在线| 欧美久久黑人一区二区| 黄色片一级片一级黄色片| 精品亚洲乱码少妇综合久久| 亚洲欧美一区二区三区久久| 18禁裸乳无遮挡动漫免费视频| 免费女性裸体啪啪无遮挡网站| www.自偷自拍.com| 搡老熟女国产l中国老女人| 大片电影免费在线观看免费| 91av网站免费观看| 久久人人97超碰香蕉20202| 最近最新中文字幕大全电影3 | 男女无遮挡免费网站观看| 国产一区二区三区视频了| av天堂久久9| 国产精品久久久久久人妻精品电影 | √禁漫天堂资源中文www| 日韩三级视频一区二区三区| 丝袜美足系列| 欧美日韩一级在线毛片| 亚洲人成77777在线视频| 丁香六月天网| 欧美 亚洲 国产 日韩一| 色婷婷av一区二区三区视频| 99在线人妻在线中文字幕 | 女性生殖器流出的白浆| 淫妇啪啪啪对白视频| kizo精华| 精品午夜福利视频在线观看一区 | 日韩免费av在线播放| 国产熟女午夜一区二区三区| 天天躁日日躁夜夜躁夜夜| 九色亚洲精品在线播放| 久久中文字幕一级| 精品国产一区二区三区四区第35| 国产精品免费视频内射| 精品一区二区三区av网在线观看 | bbb黄色大片| 大香蕉久久网| 久久精品熟女亚洲av麻豆精品| 丝袜在线中文字幕| 午夜福利视频精品| 少妇的丰满在线观看| 国产成人影院久久av| 黄片小视频在线播放| 国产在线一区二区三区精| 女人爽到高潮嗷嗷叫在线视频| 两人在一起打扑克的视频| 久久天躁狠狠躁夜夜2o2o| 老汉色av国产亚洲站长工具| 精品国产一区二区三区四区第35| 国产亚洲av高清不卡| 真人做人爱边吃奶动态| 2018国产大陆天天弄谢| 亚洲黑人精品在线| 女人久久www免费人成看片| 久久久欧美国产精品| 日韩视频在线欧美| 久久精品国产a三级三级三级| 亚洲精品成人av观看孕妇| 国产在视频线精品| 免费av中文字幕在线| 日本wwww免费看| 汤姆久久久久久久影院中文字幕| 免费观看av网站的网址| 深夜精品福利| 国产在线一区二区三区精| 欧美 日韩 精品 国产| 下体分泌物呈黄色| 9191精品国产免费久久| 两个人看的免费小视频| 一级片'在线观看视频| 9色porny在线观看| 男女下面插进去视频免费观看| videosex国产| 他把我摸到了高潮在线观看 | 国产黄频视频在线观看| 丝袜美腿诱惑在线| 国产91精品成人一区二区三区 | 午夜福利在线观看吧| videosex国产| 19禁男女啪啪无遮挡网站| 久久狼人影院| 丝瓜视频免费看黄片| 国产亚洲av高清不卡| 国产精品熟女久久久久浪| 日本av手机在线免费观看| 亚洲av片天天在线观看| 欧美性长视频在线观看| 久久 成人 亚洲| 久久久久精品国产欧美久久久| 菩萨蛮人人尽说江南好唐韦庄| 香蕉国产在线看| 亚洲七黄色美女视频| 午夜福利免费观看在线| 精品人妻熟女毛片av久久网站| 欧美老熟妇乱子伦牲交| 久热这里只有精品99| 老熟妇仑乱视频hdxx| 精品一区二区三区视频在线观看免费 | 老司机午夜十八禁免费视频| 久久久久久久久久久久大奶| 成在线人永久免费视频| 在线天堂中文资源库| 黄色视频不卡| 一进一出好大好爽视频| 久热这里只有精品99| 亚洲美女黄片视频| 亚洲欧美色中文字幕在线| 国产日韩欧美亚洲二区| 老熟妇仑乱视频hdxx| 另类精品久久| 亚洲午夜精品一区,二区,三区| 日日摸夜夜添夜夜添小说| 人妻久久中文字幕网| 黄色片一级片一级黄色片| 最近最新免费中文字幕在线| 一夜夜www| 中文字幕另类日韩欧美亚洲嫩草| 天堂动漫精品| 国产免费av片在线观看野外av| 精品亚洲乱码少妇综合久久| 中文亚洲av片在线观看爽 | 69精品国产乱码久久久| 久久99一区二区三区| 90打野战视频偷拍视频| 黄色视频,在线免费观看| 欧美激情极品国产一区二区三区| 99精品欧美一区二区三区四区| 国产高清激情床上av| 亚洲av成人不卡在线观看播放网| 热re99久久精品国产66热6| 亚洲欧美日韩另类电影网站| 欧美日韩av久久| 欧美精品亚洲一区二区| 国产视频一区二区在线看| 黄色成人免费大全| 亚洲中文字幕日韩| 久热爱精品视频在线9| 黑人巨大精品欧美一区二区mp4| 一级a爱视频在线免费观看| 亚洲专区国产一区二区| 欧美日韩国产mv在线观看视频| av国产精品久久久久影院| 午夜日韩欧美国产| 99国产精品一区二区三区| 国产成+人综合+亚洲专区| 午夜福利影视在线免费观看| 欧美人与性动交α欧美软件| 国产精品1区2区在线观看. | 久久这里只有精品19| 亚洲国产av新网站| aaaaa片日本免费| 少妇的丰满在线观看| √禁漫天堂资源中文www| 久久午夜综合久久蜜桃| xxxhd国产人妻xxx| 少妇猛男粗大的猛烈进出视频| 国产在视频线精品| 热99国产精品久久久久久7| 亚洲精品一卡2卡三卡4卡5卡| 久久国产精品影院| 久9热在线精品视频| 91成年电影在线观看| 纵有疾风起免费观看全集完整版| 久久亚洲真实| 美女主播在线视频| 免费人妻精品一区二区三区视频| 精品国产一区二区三区久久久樱花| 国产一区二区激情短视频| 精品人妻熟女毛片av久久网站| 十八禁网站网址无遮挡| 国产一区二区三区综合在线观看| 亚洲熟妇熟女久久| 老汉色av国产亚洲站长工具| 91成年电影在线观看| 国产免费福利视频在线观看| 狠狠狠狠99中文字幕| 亚洲国产精品一区二区三区在线| 国产熟女午夜一区二区三区| 麻豆国产av国片精品| 中文欧美无线码| 一区在线观看完整版| 欧美人与性动交α欧美精品济南到| 一本色道久久久久久精品综合| 国产成人免费无遮挡视频| 一本久久精品| 老司机亚洲免费影院| 亚洲精品一卡2卡三卡4卡5卡| 777久久人妻少妇嫩草av网站| 亚洲欧美精品综合一区二区三区| 亚洲专区国产一区二区| 亚洲成人免费电影在线观看| 一边摸一边做爽爽视频免费| 中文字幕精品免费在线观看视频| 亚洲色图av天堂| 又紧又爽又黄一区二区| 曰老女人黄片| 动漫黄色视频在线观看| 三上悠亚av全集在线观看| videos熟女内射| 一区二区日韩欧美中文字幕| 欧美日本中文国产一区发布| 日韩熟女老妇一区二区性免费视频| 国产精品.久久久| 69av精品久久久久久 | 精品高清国产在线一区| 搡老熟女国产l中国老女人| 国产1区2区3区精品| 两性午夜刺激爽爽歪歪视频在线观看 | 国产精品.久久久| 大陆偷拍与自拍| 1024视频免费在线观看| 亚洲人成电影免费在线| 国产精品亚洲av一区麻豆| 亚洲avbb在线观看| 高清av免费在线| 国产成人啪精品午夜网站| 成人影院久久| 国产av国产精品国产| www日本在线高清视频| 热re99久久精品国产66热6| 乱人伦中国视频| 日韩欧美一区二区三区在线观看 | 午夜激情av网站| 老熟妇仑乱视频hdxx| av视频免费观看在线观看| 久久中文看片网| videosex国产| 久久久久久久精品吃奶| 中亚洲国语对白在线视频| 国产精品 国内视频| 国产av又大| 欧美成人免费av一区二区三区 | 国产精品二区激情视频| 黄色视频不卡| 免费女性裸体啪啪无遮挡网站| 麻豆av在线久日| 热99re8久久精品国产| 少妇的丰满在线观看| 正在播放国产对白刺激| 国精品久久久久久国模美| 国产成+人综合+亚洲专区| 丰满饥渴人妻一区二区三| 精品亚洲成国产av| 久久毛片免费看一区二区三区| 人人妻人人澡人人爽人人夜夜| 大陆偷拍与自拍| 久久久久久久精品吃奶| 精品熟女少妇八av免费久了| 国产97色在线日韩免费| 熟女少妇亚洲综合色aaa.| 黄片播放在线免费| 午夜福利影视在线免费观看| 超碰97精品在线观看| 少妇精品久久久久久久| 深夜精品福利| 国产有黄有色有爽视频| 中文欧美无线码| av又黄又爽大尺度在线免费看| 亚洲精品av麻豆狂野| e午夜精品久久久久久久| 一级毛片女人18水好多| 婷婷成人精品国产| 欧美精品人与动牲交sv欧美| 久久久久久久国产电影| 美女福利国产在线| 水蜜桃什么品种好| 免费日韩欧美在线观看| 新久久久久国产一级毛片| 日本精品一区二区三区蜜桃| 麻豆av在线久日| 精品久久蜜臀av无| 9191精品国产免费久久| 国产精品久久久久久精品古装| 国产高清激情床上av| 欧美日韩亚洲综合一区二区三区_| 桃花免费在线播放| 在线观看舔阴道视频| 国产精品国产高清国产av | 国产精品免费视频内射| 日韩制服丝袜自拍偷拍| 亚洲成人国产一区在线观看| 亚洲九九香蕉| 精品一区二区三区四区五区乱码| 正在播放国产对白刺激| 五月天丁香电影| 丁香六月天网| 成人免费观看视频高清| 如日韩欧美国产精品一区二区三区| av一本久久久久| 欧美精品啪啪一区二区三区| 色在线成人网| 五月开心婷婷网| 国产色视频综合| 国产激情久久老熟女| 99国产精品一区二区三区| 少妇粗大呻吟视频| 欧美午夜高清在线| 国产成人精品久久二区二区91| 亚洲国产av影院在线观看| av一本久久久久| 岛国在线观看网站| 精品人妻1区二区| 人妻一区二区av| 亚洲av国产av综合av卡| 亚洲第一欧美日韩一区二区三区 | 亚洲中文字幕日韩| 精品一区二区三卡| 免费高清在线观看日韩| 中文字幕人妻熟女乱码| 亚洲欧洲精品一区二区精品久久久| 精品国产国语对白av| 飞空精品影院首页| 欧美黄色片欧美黄色片| 最近最新免费中文字幕在线| 一级片'在线观看视频| 天天躁日日躁夜夜躁夜夜| 999精品在线视频| 久久天堂一区二区三区四区| 久久影院123| 亚洲午夜理论影院| 12—13女人毛片做爰片一| 欧美黑人精品巨大| 国产有黄有色有爽视频| 人人妻人人澡人人看| 久久热在线av| 一级黄色大片毛片| 在线观看一区二区三区激情| 99re6热这里在线精品视频| 色94色欧美一区二区| 成人黄色视频免费在线看| 夫妻午夜视频| 真人做人爱边吃奶动态| 精品国内亚洲2022精品成人 | 五月开心婷婷网| 久久天躁狠狠躁夜夜2o2o| 777米奇影视久久| 99九九在线精品视频| 悠悠久久av| 深夜精品福利| 这个男人来自地球电影免费观看| 亚洲综合色网址| 欧美人与性动交α欧美软件| 色在线成人网| 51午夜福利影视在线观看| 菩萨蛮人人尽说江南好唐韦庄| 午夜日韩欧美国产| 亚洲伊人色综图| 国产在线视频一区二区| 超色免费av| 国产日韩一区二区三区精品不卡| 亚洲精品一卡2卡三卡4卡5卡| av欧美777| 精品福利永久在线观看| 大片电影免费在线观看免费| 超碰成人久久| 亚洲国产看品久久| 99国产精品一区二区蜜桃av | 操出白浆在线播放| 精品少妇久久久久久888优播| 精品一区二区三区视频在线观看免费 | 在线观看www视频免费| 一进一出好大好爽视频| 最近最新免费中文字幕在线| 亚洲精品国产色婷婷电影| 成人特级黄色片久久久久久久 | 成人特级黄色片久久久久久久 | www.999成人在线观看| 久久久久国内视频| 久久久国产一区二区| av免费在线观看网站| 国产精品国产av在线观看| 国产黄色免费在线视频| 色尼玛亚洲综合影院| 久热爱精品视频在线9| 国产精品国产av在线观看| 亚洲av国产av综合av卡| 久久人人爽av亚洲精品天堂| 国产精品亚洲一级av第二区| 精品乱码久久久久久99久播| 国产欧美日韩综合在线一区二区| 国产精品国产高清国产av | a级毛片黄视频| 在线观看免费视频网站a站| 色精品久久人妻99蜜桃| 99久久精品国产亚洲精品| 丰满少妇做爰视频| 免费黄频网站在线观看国产| 国产亚洲精品久久久久5区| 久久久精品94久久精品| 欧美精品亚洲一区二区| 宅男免费午夜| 免费看十八禁软件| 黑人猛操日本美女一级片| 热re99久久精品国产66热6| 国产精品二区激情视频| 中文字幕人妻熟女乱码| 香蕉久久夜色| 国产又爽黄色视频| 高清毛片免费观看视频网站 | 夜夜骑夜夜射夜夜干| 国产真人三级小视频在线观看| 天天操日日干夜夜撸| 99久久国产精品久久久| 免费av中文字幕在线| netflix在线观看网站| 国产亚洲一区二区精品| 日韩视频在线欧美| 黄色视频不卡| 电影成人av| 国产一区二区在线观看av| 男女高潮啪啪啪动态图| 在线观看舔阴道视频| 日韩一区二区三区影片| av线在线观看网站| 精品少妇黑人巨大在线播放| 中亚洲国语对白在线视频| av片东京热男人的天堂| 午夜免费鲁丝| 黑人巨大精品欧美一区二区mp4| 日韩视频在线欧美| 老司机影院毛片| 后天国语完整版免费观看| 两个人看的免费小视频| 好男人电影高清在线观看| 亚洲一码二码三码区别大吗| 高潮久久久久久久久久久不卡| 丝瓜视频免费看黄片| av超薄肉色丝袜交足视频| 9色porny在线观看| 国产在线一区二区三区精| 精品免费久久久久久久清纯 | 亚洲综合色网址| 欧美日韩黄片免| 久久人妻福利社区极品人妻图片| 国产高清videossex| 丝袜在线中文字幕| 99精国产麻豆久久婷婷| 91麻豆av在线| 婷婷成人精品国产| 欧美精品一区二区大全| 午夜福利视频精品| 国产淫语在线视频| 波多野结衣一区麻豆| 老司机亚洲免费影院| 考比视频在线观看| 午夜老司机福利片| 久久久久网色| 亚洲av美国av| 777米奇影视久久| 欧美乱妇无乱码| 性少妇av在线| 黄色视频在线播放观看不卡| 国产一区二区激情短视频| √禁漫天堂资源中文www| 99国产极品粉嫩在线观看| 久久九九热精品免费| 丝袜美足系列| 国产一卡二卡三卡精品| 热99re8久久精品国产| 欧美精品亚洲一区二区| 国产精品一区二区在线观看99| 男女高潮啪啪啪动态图| 极品少妇高潮喷水抽搐| 国产av又大| 97人妻天天添夜夜摸| 精品国产一区二区三区久久久樱花| 久久午夜综合久久蜜桃| 精品亚洲乱码少妇综合久久| 一边摸一边抽搐一进一小说 | 50天的宝宝边吃奶边哭怎么回事| 午夜福利乱码中文字幕| 最近最新免费中文字幕在线| 国产一区二区 视频在线| 久久精品国产a三级三级三级| 免费观看人在逋| 国产99久久九九免费精品| 午夜福利乱码中文字幕| 女人精品久久久久毛片| 国产无遮挡羞羞视频在线观看| 性色av乱码一区二区三区2| 久久人妻av系列| 午夜福利乱码中文字幕| 日本wwww免费看| 国产一区二区三区视频了| 啪啪无遮挡十八禁网站| 9191精品国产免费久久| 十八禁网站免费在线| 伦理电影免费视频| tube8黄色片| 欧美人与性动交α欧美精品济南到| 国产精品免费视频内射| 香蕉丝袜av| 在线观看免费高清a一片| 亚洲,欧美精品.| 欧美一级毛片孕妇| 十八禁高潮呻吟视频| 成年人黄色毛片网站| 777久久人妻少妇嫩草av网站| 久久国产精品大桥未久av| 我的亚洲天堂| 一进一出好大好爽视频| 精品午夜福利视频在线观看一区 | 久久久久精品人妻al黑| 午夜成年电影在线免费观看| 亚洲美女黄片视频| 亚洲国产av影院在线观看| 12—13女人毛片做爰片一| 亚洲av国产av综合av卡| 久久久久网色| 黑丝袜美女国产一区| 色视频在线一区二区三区| 国产精品久久久人人做人人爽| 欧美黄色淫秽网站| 亚洲 国产 在线| 操出白浆在线播放| 热re99久久精品国产66热6| 这个男人来自地球电影免费观看| 在线十欧美十亚洲十日本专区| 国产av精品麻豆| 日日爽夜夜爽网站| 日本精品一区二区三区蜜桃| 久久久久精品国产欧美久久久| 久久狼人影院| 桃花免费在线播放| 老熟女久久久| 日本av免费视频播放| 侵犯人妻中文字幕一二三四区| 亚洲国产精品一区二区三区在线| 久久精品亚洲精品国产色婷小说| 极品教师在线免费播放| 亚洲国产欧美网| 国产欧美日韩精品亚洲av| 91老司机精品| 人妻一区二区av| 久久影院123| 国产精品国产av在线观看| 免费久久久久久久精品成人欧美视频| 亚洲成人免费av在线播放| 色婷婷av一区二区三区视频| 国产深夜福利视频在线观看| 两性夫妻黄色片| 亚洲欧美日韩另类电影网站| 色精品久久人妻99蜜桃| 97人妻天天添夜夜摸| 亚洲精品中文字幕一二三四区 | 亚洲熟妇熟女久久| 50天的宝宝边吃奶边哭怎么回事| 深夜精品福利| 亚洲欧美日韩高清在线视频 | 国产片内射在线| 久久久欧美国产精品| 国产片内射在线| 黄片小视频在线播放| 亚洲欧洲精品一区二区精品久久久| 欧美变态另类bdsm刘玥| 国产人伦9x9x在线观看| 亚洲视频免费观看视频| 国产精品 欧美亚洲| 男女无遮挡免费网站观看| 欧美日韩亚洲国产一区二区在线观看 | 午夜精品久久久久久毛片777| 丝袜在线中文字幕| 亚洲人成电影免费在线| 好男人电影高清在线观看| 精品国产一区二区久久| 日韩欧美国产一区二区入口| 亚洲av国产av综合av卡| 成人特级黄色片久久久久久久 | 黄片播放在线免费| 久久久久精品人妻al黑| 精品国内亚洲2022精品成人 | 国产深夜福利视频在线观看| 精品一区二区三区四区五区乱码| 亚洲全国av大片| 日韩一卡2卡3卡4卡2021年| 午夜免费成人在线视频| 最新在线观看一区二区三区| 久久久国产欧美日韩av| 日韩中文字幕视频在线看片| 久久狼人影院| 国产精品免费大片| 1024香蕉在线观看| 成人18禁高潮啪啪吃奶动态图| 另类亚洲欧美激情| 国产激情久久老熟女| www.自偷自拍.com| 这个男人来自地球电影免费观看| 亚洲成人手机| 亚洲久久久国产精品| 法律面前人人平等表现在哪些方面| 涩涩av久久男人的天堂| aaaaa片日本免费| 国产精品影院久久| 精品熟女少妇八av免费久了| 在线观看免费视频日本深夜| 精品福利永久在线观看| 国产精品1区2区在线观看. | 一进一出好大好爽视频| 成人影院久久| 天堂动漫精品| 中文字幕高清在线视频| 中文字幕人妻丝袜制服| 亚洲va日本ⅴa欧美va伊人久久| 大型av网站在线播放|