• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    An Enhanced Automatic Arabic Essay Scoring System Based on Machine Learning Algorithms

    2023-12-12 15:51:30NourmeenLotfyAbdulazizShehabMohammedElhosenyandAhmedAbuElfetouh
    Computers Materials&Continua 2023年10期

    Nourmeen Lotfy,Abdulaziz Shehab,2,?,Mohammed Elhoseny,3 and Ahmed Abu-Elfetouh

    1Department of Information Systems,Faculty of Computers and Information Science,Mansoura University,Mansoura,35516,Egypt

    2Department of Information Systems,College of Computer and Information Sciences,Jouf University,Sakaka,Saudi Arabia

    3College of Computing and Informatics,University of Sharjah,Sharjah,United Arab Emirates

    ABSTRACT Despite the extensive effort to improve intelligent educational tools for smart learning environments,automatic Arabic essay scoring remains a big research challenge.The nature of the writing style of the Arabic language makes the problem even more complicated.This study designs,implements,and evaluates an automatic Arabic essay scoring system.The proposed system starts with pre-processing the student answer and model answer dataset using data cleaning and natural language processing tasks.Then,it comprises two main components:the grading engine and the adaptive fusion engine.The grading engine employs string-based and corpus-based similarity algorithms separately.After that,the adaptive fusion engine aims to prepare students’scores to be delivered to different feature selection algorithms,such as Recursive Feature Elimination and Boruta.Then,some machine learning algorithms such as Decision Tree,Random Forest,Adaboost,Lasso,Bagging,and K-Nearest Neighbor are employed to improve the suggested system’s efficiency.The experimental results in the grading engine showed that Extracting DIStributionally similar words using the CO-occurrences similarity measure achieved the best correlation values.Furthermore,in the adaptive fusion engine,the Random Forest algorithm outperforms all other machine learning algorithms using the(80%–20%) splitting method on the original dataset.It achieves 91.30%,94.20%,0.023,0.106,and 0.153 in terms of Pearson’s Correlation Coefficient,Willmot’s Index of Agreement,Mean Square Error,Mean Absolute Error,and Root Mean Square Error metrics,respectively.

    KEYWORDS Arabic;corpus-based similarity;correlation;machine learning;string-based similarity;text similarity

    1 Introduction

    Assessment is an essential component of the educational process.Two types of writing assessment are as follows[1]:Long essay scoring is used to estimate relatively long responses that can help students to improve their writing skills and reflect their abilities to understand the subject content.It generally includes an introduction,body,and conclusion.Short essay scoring is frequently used for scoring short text answers for“define”and“why”questions.The content of the responses determines it,and the style is unimportant[2].Manual assessment is a labor-intensive task that necessitates significant effort,time,and resources.It puts a lot of strain on teachers,especially if they teach a large number of students and they frequently assign writing assignments.Moreover,in the manual assessment of writing questions,different human graders can assign different scores for the same question,which many students regard as unfair grading.

    Text similarity is a text classification category determined by how similar two texts are to one another[3].It can be approached in three ways[4]:string-based similarities,corpus-based similarities,knowledge-based similarities,or a selection of possible combinations between them.Characterbased and term-based techniques,which determine similarity by counting the number of distinctive characters in these two sequences,are the two categories of string-based similarities[4].Examples of character-based distance measures are Damera-Levenshtein(DL)[5],N-Gram[6],Smith-Waterman,Jaro,etc.Examples of term-based distance measures are the Overlap coefficient [7],Matching Coefficient[8],Cosine similarity[9],etc.Corpus-based similarities,such as Latent Semantic Analysis(LSA) [10],Extracting DIStributionally similar words using CO-occurrences (DISCO1) [11],and DISCO2[12],are measures that recognize the degree of similarity between words by utilizing exclusive information derived from large corpora.The knowledge-based similarity is a semantic similarity approach that specifies the degree of similarity between words using data from semantic networks[13].The most well-known widely-adopted semantic networks used in knowledge-based similarities are Arabic WordNet [14] and English WordNet [15].Finally,a selection of possible combinations is used together to achieve the best performance.

    One of the most common applications of text similarity is the Automatic Essay Scoring (AES)system.It is designed to score and evaluate student responses automatically based on a predefined trained set of answer documents,and it frequently provides appropriate feedback and corrections for the assessment process [16,17].Compared with a manual process,AES systems reduce effort,time,and the cost of institutional resources while also achieving fairness in marking student answers[17].Some studies for scoring Arabic essays have been presented.Unfortunately,given differences in writing methods,answer lengths,multiple synonyms,spelling errors,grammar,and morphological structure,no empirical AES software systems for Arabic have been developed.So,the main question in this research is how Arabic questions,with these challenges,will be used in Arabic universities and schools that depend on smart learning,Artificial Intelligence(AI),and Natural Language Processing(NLP)technologies.More detail will be presented in Section 2.

    This study prepares a dataset in the sociology course,which includes 270 short answers(27 essay questions × 10 student answers/questions).It is an extension of Shehab et al.[18] and is utilized to build the proposed system,which consists of two main components.First,the grading engine measures the similarity values between students’answers and answer models using some similarity measures,including string-based and corpus-based algorithms under various scenarios.Second,the adaptive fusion engine fuses the similarity scores to enhance the overall system accuracy by providing a better accurate correlation and reducing the error.In this engine,six Machine Learning(ML)algorithms are applied,namely,Decision Tree(DT),Random Forest(RF),Adaboost,Lasso,Bagging,and K-Nearest Neighbor(KNN),after applying some Feature Selection(FS)algorithms,such as Recursive Feature Elimination(RFE)and Boruta algorithms.

    The main contributions of this paper are as follows:

    ? Creating a novel dataset that might help many interested researchers in the field of Arabic essay scoring systems.

    ? Comparing the performance of a set of widely-adopted text similarity measures that belong to different categories including string-based similarity measures and corpus-based similarity measures in assessing the answers to essay questions in the Arabic language.

    ? Designing and implementing an automated Arabic essay questions grading system that consists of two main subsystems: grading engine which measures the similarity between students’answers and answers models using a number of similarity measures,and adaptive fusion engine which fuses the similarity scores using ML algorithms to enhance the overall system accuracy.

    ? Using various evaluation methods such as the Pearson’s Correlation Coefficient (CORR),Willmot’s Index of Agreement(D),Mean Square Error(MSE),Mean Absolute Error(MAE),Root Mean Square Error(RMSE),and Runtime to assess the models’performance

    ? Providing feedback and reports for each institution,student,and teacher who rely on smart learning,AI,and NLP technologies.

    The rest of this work is organized as follows: Section 2 presents some related work.Section 3 describes the proposed framework.Section 4 shows the experimental results and discussion.Section 5 presents the conclusion and future work.

    2 Related Works

    Several automatic grading systems in English have been used:Project Essay Grading(PEG)[19],IntelliMetric [20],C-rater [21],Paperless School free-text Marking Engine (PS-ME) [22],Automark[23],and Bayesian Essay Test Scoring System(BETSY)[24],whereas a few studies have been conducted in Arabic [25–34].Thus,this section discusses some of the efforts made toward developing Arabic automatic essay grading systems.

    Reafat et al.[25]proposed an LSA-based method for evaluating Arabic essays.Only 29 student answer papers were used in the experiment (5 papers for training and 24 for testing).The proposed method is primarily concerned with reducing the deleted stop-words to achieve a satisfactory grading level.According to the system,the correlation between automatic and manual scores is 0.91.However,it lacks feedback on answers for students and teachers.

    Gomaa et al.[26]used a hybrid approach in which they combined multiple similarity measures,including the longest common subsequence.They created a dataset containing 610 Arabic-language student responses.This proposal estimates the student responses after they have been translated into English.They tried to treat the challenges of processing Arabic text.However,the proposed framework has many drawbacks,such as the absence of good stemming techniques,the loss of context structure because many words are not semantically translated from Arabic to English,and the experimental results that must be fed to a machine-learning algorithm that takes a long time to process.

    Gomaa et al.[27]investigated similarity algorithms for Arabic automatic short-answer grading.String-based and corpus-based similarity measures are being combined and assessed.Students’answers are handled holistically and partially,and useful feedback is provided.Moreover,the authors created a new benchmark Arabic dataset with 50 questions and 12 answers for each question.Finally,the results of the evaluation measures demonstrated that the proposed model could be deployed in the actual environment.

    Ewees et al.[28] compared the Cosine similarity and KNN algorithm in the LSA method to score Arabic essays automatically.They also enhanced LSA using some pre-processing tasks,such as processing the entered text,unifying letterforms,removing formatting,replacing synonyms,stemming,and removing “stop-words.” The system outputs revealed that using Cosine similarity with LSA produced better values than using KNN with LSA.This system has a 0.88 overall correlation with the teachers’evaluations.

    Alghamdi et al.[29] demonstrated a hybrid computerized Arabic essay assessment system that combines LSA with the three linguistic aspects:1)word frequency,2)word stemming,and 3)spelling errors.This suggestion should determine the reduced dimensionality to use in LSA to assess the effectiveness of this suggested system.According to the system,96.72% of the test data is correctly graded and automatic and manual scores correlated at 0.78,similar to the inter-human correlation of 0.7.

    Al-Jouie et al.[30]presented an automatic evaluator of student essays in Arabic.A system modeled after the scheme used by schoolteachers in Riyadh,Saudi Arabia’s capital.Language proficiency,essay structure,and content that is relevant to the topic are the criteria to be used to grade the essays.Thus,they created a method based on the LSA similarity measure and rhetorical structure theory.The system was tested on over 300 different essays,all of which were handwritten by schoolchildren and covered a wide range of topics.Machine-human correlation in grading was used to assess performance.This system has a 0.79 overall correlation with the teachers’evaluations and a 78.33%overall accuracy on the test data.

    Shehab et al.[18]applied four text similarity measures,two for string-based algorithms and two for corpus-based algorithms to 210 Arabic students’answers from an in-house dataset to score them and find an efficient method for essay grading.When compared with the word-based approach,the N-Gram approach is used in this model because it is simpler and produces a more reliable result when dealing with noisy data,such as grammatical or spelling errors.The researchers showed that the character-based N-Gram algorithm outperformed the other three types in terms of CORR(0.82):Damera-Levenshtein,LSA,and DISCO2.

    Azmi et al.[31]presented a hybrid Arabic scoring system that considers LSA,writing style,spelling errors,and some other lexical aspects.The 350 Arabic essays collected from schoolchildren were used to test the system.The best accuracy was reported to be 0.9,with a CORR of 0.76.The relatively high accuracy value may be because auto-“exact”scores and“within range”scores are acceptable and correct scores.The auto-score is“exact”when the difference between it and the actual score is between 0 and 0.5,and the auto-score is“within range”when the difference between it and the actual score is between 0.5 and 2.5.

    Al Awaida et al.[32]proposed the lone work that utilized Arabic WordNet(AWN)in Arabic AES systems.The intention was to increase the system’s accuracy by swapping out student answer word synonyms.The authors used the f-score to add the selected features to the feature space,and Cosine similarity to measure the similarity between the student answer and the model answer.Schoolchildren collected and tested an in-house dataset containing 120 questions and three model answers for each question.The impact of using AWN is compared.The authors reported CORR ranges of 0.5–1 and MAE of 0.117,and they concluded that using AWN improves text similarity accuracy.

    Abdeljaber [17] suggested a string-based text similarity measure,namely,the Longest Common Subsequence (LCS) for grading short answers to Arabic essay questions depending on Arabic WordNet.The authors reported that the framework achieved the best results with an RMSE value of 0.81 and a CORR value of 0.94 on a dataset of 330 students’answers.However,feedback and correction on the answers for students and teachers were not given.

    Nael et al.[33] were the first researchers to develop a method for assessing Arabic quick answers using deep learning approaches.They proposed AraScore after conducting empirical research and studies with a baseline model,Recurrent Neural Network (RNN),Long Short-Term Memory(LSTM),Bidirectional LSTM(Bi-LSTM),and two transformer-based language models:Bidirectional Encoder Representations from Transformers (BERT) and Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA).They ran all the tests on the Automated Student Assessment Prize (ASAP) short answer scoring dataset,which has 17,205 responses and approximately 1,600 answers for each question.As a result,they achieved cutting-edge performance with a Quadratic Weighted Kappa(QWK)score of 0.78,demonstrating the strength and robustness of modern Arabic NLP techniques.However,this proposed system was trained and tested on the translated text that differed syntactically from real Arabic text.

    Badry et al.[34] attempted to create an Automatic Arabic Short Answer Grading (AASAG)model by utilizing semantic similarity techniques.It is employed to gauge how semantically similar the student’s response is to the sample response.The suggested approach is applied to the Arabic Dataset for Automatic Short Answer Grading Evaluation(AR-ASAG).It is one of the few publicly accessible Arabic datasets.It includes 2133 pairs of student responses and model responses in formats like txt,xml,and db.Through two tests that used two weighting schemas:local and hybrid local and global weighting schema.The efficacy of the proposed approach was assessed.The hybrid local and global weight-based LSA approach that was created produced superior outcomes than local weight-based LSA,with an F1-score value of(82.82%)and an RMSE value of 0.798,respectively.

    To conclude,the most of aforementioned studies lack high accuracy,low processing time,and provide feedback.As a result,the missing feedback and low correction on the answers to Arabic essay questions for students and teachers with high processing time are considered research gaps and should be addressed using an effective tool based on text similarity and ML algorithms as presented in the proposed system.

    Table 1 shows the current state-of-the-artwork-related Arabic automatic essay grading systems.For each previous work,the used text similarity approach,the used dataset size,and the used evaluation measures are stated.

    Table 1:Previous Arabic automatic essay grading systems

    3 Proposed System

    This section explains the components of building the proposed system.The next subsections illustrate dataset description,text pr-processing,grading engine,adaptive fusion engine,and report with score and feedback.Fig.1 summarizes these components in a block diagram,and each component is discussed in detail in the following subsections.

    3.1 Student Answer and Model Answer(SA&MA)Dataset

    We depended on a dataset prepared in a sociology course to apply the similarity algorithms between SA and MA.It is an extension to the dataset used in Shehab et al.[18],which was taken by secondary level 3 students,with the total student answers being 270 (27 questions/assignment×10 student answers/questions).Human graders evaluated the dataset question answers using model answers,with scores ranging from 0(completely incorrect)to 5(completely correct).Each judge had no idea about the other judge’s correction and grade.The mean score of two graders is calculated to consider the essential standard of the grading process.Table 2 displays a sample question,model answer,student answers,and the average scores that consider dataset attributes of Arabic text in a sociology course.

    3.2 Text Pre-Processing

    This crucial stage in any model transforms raw data into a format that can be used to correct errors in the dataset collected from the actual world and produce more accurate results and performances[35–37].The text pre-processing component is achieved by the following processes:data cleaning and NLP tasks.These processes are described in detail below.

    Figure 1:The proposed system’s flow diagram

    3.2.1 Data Cleaning

    A better probability of getting good results if the used dataset has been well-cleaned.As a result,two sub-processes involved in data cleaning are applied:rebuilding missing data and managing unwanted outliers.

    ?Rebuilding missing data:The used original dataset contains some missing data,such as the absence of a student’s degree in any sociology question.As a result,the rebuilding missing data are applied to remove it using the strategy of the mean where the average values of the scores are taken and exchanged the null value with the average value.

    ?Managing unwanted outliers:In the used dataset,manually,the questions,answers,and scores

    that consider outliers are removed and can be enhanced the performance of the system.

    3.2.2 Natural Language Processing(NLP)Tasks

    NLP tasks consider the scenarios of the grading process while measuring the similarity values between SA and MA.The major NLP tasks are Raw,Tokenization,Stop-words,Stemming,and Stopstem.Raw calculates the values for similarity without utilizing any NLP tasks.Tokenization reduces a text sequence to a series of sentences,and then those sentences to individual tokens [4,18].Stopwords must be eliminated from the text because they are of no notable significance and don’t add any meaning to the text’s classification.Stemming eliminates all prefixes,suffixes,and infixes from the word,returning it to its original root.Finally,Stop-stem applies Stop-words and Stemming NLP tasks.An example applied to sample Arabic data.

    3.3 Grading Engine

    The objective of the grading engine is to compute the similarity values between SA and MA using some text similarity measuring algorithms,including string-based and corpus-based algorithms,separately under various scenarios.To test the grading engine,thirteen string-based algorithms and two corpus-based algorithms are implemented.

    The proposed system employs the Bag-Of-Words(BOW)model to compute sentence-to-sentence similarity instead of word-to-word similarity.The BOW model represents each sentence as a collection of words without order and regards grammar rules.

    The similarity score between SA and MA is computed by computing the similarity score for each pair of words and then computing the overall score based on these similarity score values [38].To compute the overall similarity score,aN×Msimilarity matrix is constructed,whereNindicates the word count in the MA and indicates the word count in the SA.Moreover,in the similarity matrix,each word in MA is represented using a row while each word in SA is represented by a column.The similarity score between the SA and MA is computed after the similarity matrix is constructed using Eq.(1)[26].

    wheref(w)represents the word’s relative frequency that belongs to MA or SA andSim(w,SA)can be computed either by the maximum similarity(MaxSim)or the average similarity(Avgsim).MaxSimindicates the biggest similarity score between a certain wordwand the remaining words in the SA,whereasAvgsimis computed by the summing the similarity values of a certain wordwand dividing the result by the word count in the SA.Sim(w,MA)is computed in the same way.

    3.4 Adaptive Fusion Engine

    To determine how closely the model and the grader are correlated in the process of assigning grades,measures of the correlation are applied.Depending on the experimental results of the CORR and D values,it is noticed that more work is needed to maximize correlation at a desired and satisfying level.Thus,the adaptive fusion engine component is added to combine the different obtained similarity values to enhance the efficiency of the proposed system.

    This component is applied by the following processes: preparing obtained scores,applying feature selection algorithms,validating the model’s results,applying machine learning algorithms,and measuring evaluation metrics.These processes are described in detail below.

    3.4.1 Features Preparation

    In the proposed system,sixty features that consider fifteen well-known text similarity string-based and corpus-based measuring algorithms under four various scenarios including Raw,Stemming,Stopwords,and Stop-stem are employed.The values of the scores acquired using a single text similarity algorithm under a single NLP scenario are represented as a feature.Two sub-processes involved in feature preparation are applied:normalization and feature scaling.

    ?Normalization:Normalization entails changing the data,namely,translating the source data into a different format that enables efficient data pre-processing.A minimax scaler is used to scale and transform the features using a specific range of 0 and 1[39],as shown in Eq.(2).

    where min,max=feature range.

    A similarity value for each feature bounded by the interval[0,1]is produced,where 1 indicates that the findings are identical and 0 indicates no meaningful resemblance between SA and MA.

    ?Feature Scaling:Feature scaling is a method for effectively distributing the independent features in the dataset over a predetermined range.The standard scaler is one of the most widely used algorithms in feature scaling pre-processing.It is critical because it speeds up the algorithm’s learning process [40].This algorithm is executed for each normalized input feature in the original dataset,resulting in better results for the proposed framework.The average and standard deviation are measured for each normalized feature to standardize it.Then,the new value ofAscaledfor each sampleAis computed,as shown in Eq.(3)[37].

    3.4.2 Feature Selection(FS)Algorithms

    The computational cost and memory usage are usually exponential because of the massive number of high-dimensional data during modern data analysis,visualization,and modeling.To address these issues,the FS process is one of the most basic techniques that is frequently used.It applies by eliminating no longer relevant,noisy,or redundant features and retaining relevant information for resolving the specific problem[41].

    Filter,wrapper,and embedded are the three different types of FS methods.Using the characteristics of the data,filter methods select the most discriminative features depending on a two-step strategy.First,all features are ranked based on specific criteria.Second,the features with the highest rankings are chosen [42,43].Wrapper methods evaluate the features using the intended learning algorithm[42,43].Finally,embedded methods select features while the modeling algorithm is running.Two popular FS algorithms are used in the proposed system:RFE and Boruta algorithms,to reduce the features and improve the results.

    ?Recursive Feature Elimination(RFE)Algorithm

    The RFE algorithm is executed by Guyon et al.[44].Its popularity is because of its simplicity and effectiveness during setup and usage.Two essential configuration parameters are applied during the implementation.The first configuration is a number of features,and the second is the technique that is used to select this feature with its parameters(estimator).

    To select the features using the recursion method,according to[45],the relative importance of each feature can change significantly when evaluated over a different subset of features during the stepwise elimination process.Then,A final ranking is constructed based on the(inverse)order in which features are eliminated.Finally,the FS process is determined by selecting the firstnfeatures from the ranking.The RFE algorithm’s pseudo-code is shown in Algorithm 1.

    Algorithm 1:The pseudocode for the RFE

    ?Boruta Algorithm

    The Boruta algorithm is a wrapper for the random forest classification algorithm,which is implemented in the R package randomForest [46].It considers multivariable relationships and is effective for classification and regression problems.Moreover,it is based on the idea that by adding randomness to the system and collecting results from an ensemble of randomized samples,random fluctuations,and correlations can be reduced in their misleading impact[47].

    Algorithm 2 shows the working steps of the Boruta algorithm.To select the features using the Boruta method,according to [48],shadow attributes (shadowAttrs) are created by extending the original dataset (D),the RF algorithm is applied to the extended dataset (extendedDataset),and the measure of Z score is applied to each feature (zScoreSet).Consequently,the maximum value of zScoreSet among shadowAttrs (MZSA) is defined,and the hit is selected for each feature that is considered better than MZSA.Then,MZSA employs a two-sided equality test.The features with more significance than MZSA are considered important(appliedSet),and the features with less significance than MZSA are considered unimportant (canceledSet).Finally,the algorithm is executed in several iterations until the number of RF implements is achieved or the features are assigned as important or unimportant.

    Algorithm 2:The pseudocode for the Boruta

    Algorithm 2(continued)

    As mentioned above,the FS approach is a solution for high data dimensionality that improves the performance of models,reduces the data dimensionality,and increases the accuracy and reliability.In RFE,the forty-four features are extracted and eliminated sixteen features using the estimator as an RF regressor and determined the number of features with the(n_features_to_select)parameter.In Boruta,thirty-seven features are extracted and eliminated twenty-three features using the RF regressor estimator,and selected the number of RF implements with the n_estimators’parameter.

    3.4.3 Model Validation

    In implemented experiments,we used the widely used method of cross-validation: the random hold-out method.The random hold-out method is applied to estimate the effectiveness of ML algorithms for predictive modeling problems on a dataset that was not executed in the training model.It is a quick and simple procedure and can be applied to any supervised learning algorithm for regression or classification problems.

    This procedure divides the dataset into two groups.The first group,known as the training dataset,is used to fit and transform the model,and the second group,known as the test dataset,is used to predict unknown values of the model[49].The used dataset was randomly divided into many trainingtesting phases,but two methods are chosen because of their effectiveness.The two methods are 70%–80% for the training phase and 30%–20% for the testing phase.These attempts aim to find the best training-testing phase that maximizes the CORR and D values between manual and automatic grading for essay question assessment while reducing the teacher’s inaccuracy.

    3.4.4 Machine Learning(ML)Models

    Some ML algorithms were used,namely,K-Nearest Neighbor (KNN),Lasso,Random Forest(RF),Decision Tree (DT),Bagging,and Adaboost algorithms.Table 3 shows the parameters of the ML methods used in implemented experiments.

    Table 3:The parameters of ML methods

    3.4.5 Evaluation Metrics

    Five evaluation metrics were used in implemented experiments,namely,CORR,D,MSE,MAE,and RMSE,to calculate and evaluate the performance of the models.

    Pearson’s Correlation Coefficient:Pearson’s Correlation Coefficient(CORR)is used to calculate to what extent the model and the grader are correlated when assigning grades.It is computed using Eq.(4).

    Willmot’s index of agreement:Willmot’s index of agreement (D),which ranges from 0 to 1,is a standardized measure of the level of model prediction error[50].A perfect match is represented by an agreement value of 1,whereas total discord is represented by a value of 0[50].The index of agreement can identify proportional and additive discrepancies between the variances and means of the real and simulated data[51].It is computed using Eq.(5).

    Mean Square Error:Mean Square Error (MSE) in statistical models quantifies the amount of error.It is computed by the average squared difference between the observed and predicted values.It is never a negative value and is employed in regression predictive modeling.When the value of MSE is small,the best-fit line can be found easily.It is computed using Eq.(6).

    Mean Absolute Error:Mean Absolute Error (MAE) is known as the average of the absolute differences between the predicted and observed values.It is never a negative value and is employed in regression predictive modeling.It can detect errors despite its limited sensitivity to outliers.It is computed using Eq.(7).

    Root Mean Square Error:Root Mean Square Error(RMSE)is a popular metric for calculating the difference between predicted and observed values.It is never a negative value.A lower RMSE value is preferable to a higher RMSE value.It is computed using Eq.(8).

    where N is the number of values,Aiis the original or observed value,Biis the predicted value from regression,A is the original or observed values,B is the predicted values from regression,andandare the averages of each observed and predicted value,respectively.

    4 Experimental Results and Discussion

    4.1 The Environment

    The experiments were implemented on a Z Book laptop with the characteristics of Intel(R)Core(TM)i7-6600U(Central Processing Unit)CPU@2.60 GHz 2.81 GHz of processor,16 GB of Random Access Memory(RAM),and 64-bit Operating System(OS)of system type.A common Java software,NetBeans(IDE 8.2),is used to apply text similarity algorithms at the grading engine and a common Python software,Anaconda(Jupter-python 3),to develop and evaluate the proposed framework at the adaptive fusion engine.Furthermore,four different methods Raw,Stemming,Stop-words,and Stopstem are used in testing for each text similarity technique and divided the used dataset into training and testing phases using the random hold-out technique after applying the FS methods for model validation to be executed on the ML algorithms.

    4.2 Results and Discussion

    The main components of the proposed system were grading and adaptive fusion engines.The following sub-sections present the results in detail.

    4.2.1 Results of the Grading Engine

    Some results of similarity algorithms are mentioned according to the most suitable CORR&D values and ignored the other similarity algorithms,as shown in Table 4.

    Table 4:The correlation results between SA and MA using some text similarity algorithms under various scenarios

    Based on Table 4,the DISCO2 similarity measure is the best compared with other string-based and corpus-based similarity algorithms under various scenarios with CORR and D values of 79.90%&83.60%,respectively in the Raw task,80.60%&85.10%,respectively in the Stemming task,79.10%&82.60%,respectively in the Stop-words task,and 79.40% &83.00%,respectively in the Stop-stem task.In general,the results of DISCO2 produced the best CORR and D values compared with other similarity algorithms because of existing groups of words with similar distributions.

    4.2.2 Results of the Adaptive Fusion Engine

    The ML algorithms were applied to the results of similarity algorithms to enhance the correlation values and reduce the errors.

    In this sub-section,five experiments have been conducted to evaluate the proposed system.The evaluation metrics CORR,D,MSE,MAE,and RMSE are used to represent the models’performance.Moreover,the Runtime during the training and testing models is calculated to show how a dataset with FS algorithms reduced the time compared with a dataset without FS algorithms.

    These experiments are summarized in experiments A,B,C,D,and E.Experiment A shows the CORR and D metrics.Experiment B shows the MSE metric.Experiment C shows the MAE.Experiment D shows the RMSE metric.Finally,Experiment E shows the Runtime for each training and testing time.Moreover,each experiment is discussed using the(70%–30%)training-testing phase and(80%–20%)training-testing phase to present the results.

    Experiment A:CORR&D Evaluation Metrics of the Regressors

    According to Figs.2 and 3,in(a),the best CORR and D values in ML algorithms are achieved by the RF and Bagging algorithms,whether the dataset is with or without FS algorithms.In the dataset without FS,the RF and Bagging achieve CORR values of 87.60%and 86.50%and D values of 92.10%and 91.70%,respectively.For the RFE algorithm,RF and Bagging achieve CORR values of 88.00%and 86.50% and D values of 92.00% and 91.70%,respectively.Moreover,for the Boruta algorithm,RF and Bagging achieve CORR values of 88.20%and 86.80%and D values of 92.50%and 91.90%,respectively.However,the DT has the worst results with CORR and D values of 72.00%and 75.00%for the dataset without FS,71.00%and 74.00%for the RFE algorithm,and 74.10%and 77.20%for the Boruta algorithm,respectively.

    Figure 2:(a)Results of the CORR(70%–30%).(b)Results of the CORR(80%–20%)

    Figure 3:(a)Results of the D(70%–30%).(b)Results of the D(80%–20%)

    In (b),RF and Bagging ensemble methods are the best algorithms for CORR and D values whether the dataset is with or without FS algorithms.In the dataset without FS algorithms,the CORR and D values of 91.30% and 94.20% for the RF algorithm and 90.60% and 93.90% for the Bagging algorithm are achieved.For the RFE algorithm,CORR values of 91.00%and 90.60%,and the same D value of 94.00%are achieved by RF and Bagging,respectively.Moreover,for the Boruta algorithm,RF and Bagging achieve CORR values of 90.90%and 90.40%and D values of 94.00%and 93.80%,respectively.However,the DT has the worst results with CORR and D values of 82.00%and 85.00%for each dataset without the FS algorithm and the RFE algorithm,respectively.Also,it achieves CORR and D values of 76.60%and 78.40%for the Boruta algorithm,respectively.

    Experiment B:MSE Evaluation Metric of the Regressors

    According to Fig.4,in(a),the best MSE values in ML algorithms are achieved by the RF and Bagging algorithms,whether the dataset is with or without FS algorithms.The dataset without FS achieves MSE values of 0.029 and 0.030 for RF and Bagging algorithms,respectively.For the RFE algorithm,RF and Bagging achieve MSE values of 0.028 and 0.030,respectively.For the Boruta algorithm,RF and Bagging achieve the MSE values of 0.027 and 0.030,respectively.However,the DT achieves the worst results with MSE values of 0.062 for the dataset without FS,0.063 for the RFE algorithm,and 0.056 for the Boruta algorithm.

    Figure 4:(a)Results of the MSE(70%–30%).(b)Results of the MSE(80%–20%)

    In (b),RF and Bagging ensemble methods were noted as the best algorithms for MSE values.In the dataset without FS,the MSE values of 0.023 for the RF algorithm and 0.025 for the Bagging algorithm are achieved.For the RFE algorithm,the best MSE values are for RF and Bagging with the same MSE value of 0.024.For the Boruta algorithm,the best MSE values are for RF and Bagging at 0.024 and 0.025,respectively.However,the DT records the worst results with MSE values of 0.043 for the dataset without FS,0.044 for the RFE algorithm,and 0.056 for the Boruta algorithm.

    Experiment C:MAE Evaluation Metric of the Regressors

    According to Fig.5,in(a),the best MAE values in ML algorithms are achieved by the RF and Bagging algorithms,whether the dataset is with or without FS algorithms.The dataset without FS achieves MAE values of 0.114 and 0.115 for RF and Bagging algorithms,respectively.For the RFE algorithm,RF and Bagging achieve MAE values of 0.112 and 0.115,respectively.For the Boruta algorithm,RF and Bagging achieve MAE values of 0.110 and 0.114,respectively.However,the DT has the worst results with MAE values of 0.173 for each dataset without FS and the RFE algorithm and 0.167 for the Boruta algorithm.

    In(b),RF and Bagging ensemble methods are the best algorithms for MAE values.In the dataset without FS,the MAE values of 0.106 for the RF algorithm and 0.108 for the Bagging algorithm are achieved.For the RFE and Boruta algorithms,the best MAE values are the same for RF and Bagging at 0.107.The DT has the worst results with MAE values of 0.142 for the dataset without FS,0.145 for the RFE algorithm,and 0.155 for the Boruta algorithm.

    The MAE evaluation metric of the regressors is not accurate because the gradient magnitude is dependent only on the sign of Ai-Binot on the error size;even when the error is small,the gradient magnitude will be large.Thus,during training ML models,this lack of differentiability might cause convergence concerns.

    Figure 5:(a)Results of the MAE(70%–30%)(b)Results of the MAE(80%–20%)

    Experiment D:RMSE Evaluation Metric of the Regressors

    Fig.6(a)shows that the best RMSE values in ML algorithms are achieved by the RF and Bagging algorithms,whether the dataset is with or without FS algorithms.The dataset without FS achieves RMSE values of 0.169 and 0.174 for RF and Bagging algorithms,respectively.For the RFE algorithm,RF and Bagging achieve RMSE values of 0.168 and 0.175,respectively.For the Boruta algorithm,RF and Bagging achieve RMSE values of 0.165 and 0.173,respectively.However,the DT has the worst results with RMSE values of 0.249 for the dataset without FS,0.250 for the RFE algorithm,and 0.237 for the Boruta algorithm.

    Figure 6:(a)Results of the RMSE(70%–30%).(b)Results of the RMSE(80%–20%)

    In(b),RF and Bagging ensemble methods are the best algorithms for RMSE values.In the dataset without FS,the RMSE values of 0.153 for the RF algorithm and 0.157 for the Bagging algorithm are achieved.For the RFE algorithm,the best RMSE values are for RF and Bagging with RMSE values of 0.154 and 0.156,respectively.For the Boruta algorithm,the best RMSE values are for RF and Bagging at 0.155 and 0.158,respectively.However,the DT achieves the worst results with RMSE values of 0.207 for the dataset without FS,0.209 for the RFE algorithm,and 0.237 for the Boruta algorithm.

    The reasons RF and Bagging algorithms outperform the other ML algorithms in the experiments above:They can transform the weak learners into strong learners by combining N learners;they can enhance the accuracy in the regression problem;they can automatically balance datasets;they can effectively execute on continuous values;and they can automate the detection of missing values in data.Furthermore,the RF can handle large amounts of data with thousands of features and limit the features that each tree can use,whereas the Bagging stays all of the features in each tree.This advantage contributes to the RF algorithm’s results being better than the Bagging algorithm’s results.Thus,DT records the worst accuracy results because of instability,that is,any changes in the dataset lead to changes in the architecture of the optimal decision tree.Moreover,it is frequently erroneous in decision-making.

    In the FS algorithms,RFE and Boruta reduced the features and improved the evaluation metrics nearly as if the original dataset was used.In general,RFE achieves these results because of its simplicity during usage and its effectiveness in identifying the features of the dataset.Moreover,Boruta achieves its results because it becomes extremely important when a dataset with multiple variables is provided for model building.

    Experiment E:Runtime of the Regressors

    Time reduction considers an important factor during the execution of the models.Its importance appears in reducing costs and improving productivity.Figs.7 and 8 show the training and testing run times of the models on the dataset before and after applying FS algorithms in milliseconds(ms).It is noticed that training and testing runtimes decreased after applying the FS.Furthermore,the Boruta algorithm achieved the lowest result of runtime in all the phases,whether in the training or testing.

    Figure 7:(a)Results of the training time(70%).(b)Results of the training time(80%)

    To conclude the discussion,despite the efforts to determine the best training-testing phase that maximizes the correlation value,the results of the 80%–20%training-testing phase achieved a better performance than the results of the 70%–30%training-testing phase.Therefore,the user decides what random hold-out method will depend on the user’s desires.Also,the obtained findings demonstrated that the proposed system outperforms[18]and can accurately forecast student answer scores.

    Figure 8:(a)Results of the testing time(30%).(b)Results of the testing time(20%)

    5 Conclusion and Future Work

    The essay question is a crucial type of inquiry that can enhance students’writing abilities and reveal their comprehension of the subject matter.However,manually grading essay responses can be arduous for instructors,especially when dealing with large classes and frequent writing assignments.Automatic assessment technology,such as Automated Essay Scoring(AES)systems,can be a viable solution to this issue.These systems grade and assess student responses automatically based on a set of pre-trained questions and provide feedback to the educational institution.This study developed and evaluated an Arabic AES system that uses NLP techniques and machine learning algorithms to grade essay responses.The system has two key components: the grading engine and the adaptive fusion engine.The grading engine uses similarity algorithms to compare student responses with model answers,while the adaptive fusion engine employs FS and machine learning algorithms to improve grading accuracy.The system’s experimental results revealed that the DISCO2 similarity measure and the RF algorithm outperformed other measures and algorithms,respectively.The study demonstrated that FS and machine learning algorithms can provide instructors with a precise and efficient solution for grading student essays.Future research will expand the dataset and assess the system’s applicability to various subjects and use knowledge-based measurements.Moreover,to improve the accuracy and generalizability of the proposed system,the pre-trained and fine-tuned transformer models will be considered in the future work.

    Acknowledgement:Thankful to Prof.Amira Rezk for the guidance,encouragement,and unlimited support from the start to end for giving the great and valuable ideas to enhance this study.

    Funding Statement:The authors received no specific funding for this study.

    Author Contributions:Study conception and design: Nourmeen Lotfy,Abdulaziz Shehab;data collection:Nourmeen Lotfy;analysis and interpretation of results:Nourmeen Lotfy,Abdulaziz Shehab.Mohammed Elhoseny;draft manuscript preparation: Nourmeen Lotfy,Abdulaziz Shehab.Ahmed Abu-Elfetouh.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:Data available on request from the authors.The data that support the findings of this study are available from the corresponding author,Abdulaziz Shehab,upon reasonable request.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    欧美亚洲日本最大视频资源| 少妇精品久久久久久久| 脱女人内裤的视频| 人人妻人人添人人爽欧美一区卜| 久久ye,这里只有精品| www.熟女人妻精品国产| av天堂久久9| 性少妇av在线| 男的添女的下面高潮视频| 欧美黑人欧美精品刺激| 日韩制服丝袜自拍偷拍| 久久国产精品影院| 亚洲,一卡二卡三卡| 久久综合国产亚洲精品| 久久国产精品人妻蜜桃| 免费在线观看日本一区| 国产成人a∨麻豆精品| 免费一级毛片在线播放高清视频 | 亚洲国产日韩一区二区| 国产欧美日韩一区二区三区在线| 一级毛片我不卡| 国产成人a∨麻豆精品| 亚洲精品国产av成人精品| 一本综合久久免费| 男人爽女人下面视频在线观看| 夜夜骑夜夜射夜夜干| 男人爽女人下面视频在线观看| 国产日韩欧美视频二区| 亚洲精品中文字幕在线视频| 国产爽快片一区二区三区| 纯流量卡能插随身wifi吗| 久久久久网色| 国产高清国产精品国产三级| 成人影院久久| 我的亚洲天堂| 脱女人内裤的视频| 欧美大码av| 制服人妻中文乱码| 成年女人毛片免费观看观看9 | 日韩电影二区| 欧美变态另类bdsm刘玥| 国产一卡二卡三卡精品| 在线天堂中文资源库| 免费女性裸体啪啪无遮挡网站| 国产伦人伦偷精品视频| 捣出白浆h1v1| 久久久国产欧美日韩av| 无遮挡黄片免费观看| 一区二区三区激情视频| 久热爱精品视频在线9| 高潮久久久久久久久久久不卡| 国产免费视频播放在线视频| 黑人巨大精品欧美一区二区蜜桃| 1024香蕉在线观看| 高清视频免费观看一区二区| 纯流量卡能插随身wifi吗| 99久久综合免费| 欧美日韩视频高清一区二区三区二| 777米奇影视久久| 熟女少妇亚洲综合色aaa.| 一区二区av电影网| 夫妻午夜视频| 亚洲国产精品一区三区| 日韩制服骚丝袜av| 大型av网站在线播放| 精品国产一区二区三区久久久樱花| 亚洲精品一区蜜桃| 乱人伦中国视频| 一本综合久久免费| 精品高清国产在线一区| 在线观看国产h片| 欧美国产精品一级二级三级| 日韩av免费高清视频| 2018国产大陆天天弄谢| 欧美国产精品一级二级三级| 一级毛片女人18水好多 | 欧美精品亚洲一区二区| 国产高清国产精品国产三级| 亚洲精品国产一区二区精华液| 日本vs欧美在线观看视频| 少妇人妻 视频| 久久久久久亚洲精品国产蜜桃av| 国产精品偷伦视频观看了| 欧美精品一区二区大全| 丝袜人妻中文字幕| 电影成人av| 成年美女黄网站色视频大全免费| 天天操日日干夜夜撸| 大话2 男鬼变身卡| 亚洲欧美精品自产自拍| 欧美在线一区亚洲| 99香蕉大伊视频| 亚洲av成人精品一二三区| 免费高清在线观看视频在线观看| 中国美女看黄片| 国产精品 国内视频| 人人妻人人爽人人添夜夜欢视频| 久久精品aⅴ一区二区三区四区| 十八禁高潮呻吟视频| 美女视频免费永久观看网站| 女人高潮潮喷娇喘18禁视频| 中文字幕制服av| 秋霞在线观看毛片| 成年人午夜在线观看视频| 精品卡一卡二卡四卡免费| 国产成人av激情在线播放| 国产精品免费大片| 91麻豆av在线| 亚洲欧洲精品一区二区精品久久久| 亚洲 欧美一区二区三区| 夜夜骑夜夜射夜夜干| 国产老妇伦熟女老妇高清| 久久久久网色| 无限看片的www在线观看| 亚洲欧美一区二区三区久久| 国产老妇伦熟女老妇高清| 少妇猛男粗大的猛烈进出视频| 青春草亚洲视频在线观看| 老司机午夜十八禁免费视频| 免费不卡黄色视频| 大片电影免费在线观看免费| 久久国产精品人妻蜜桃| 我要看黄色一级片免费的| 99国产精品一区二区三区| 女性生殖器流出的白浆| 丝瓜视频免费看黄片| 欧美亚洲日本最大视频资源| 十八禁高潮呻吟视频| 热99久久久久精品小说推荐| 久久精品熟女亚洲av麻豆精品| 美女午夜性视频免费| 国产精品香港三级国产av潘金莲 | 最近最新中文字幕大全免费视频 | 女人爽到高潮嗷嗷叫在线视频| 久久国产精品影院| 丁香六月天网| 巨乳人妻的诱惑在线观看| 国产午夜精品一二区理论片| 男人添女人高潮全过程视频| 国产野战对白在线观看| 亚洲第一av免费看| 一个人免费看片子| 91成人精品电影| 国产免费又黄又爽又色| 国产野战对白在线观看| 国产免费又黄又爽又色| 国产又色又爽无遮挡免| 波野结衣二区三区在线| 亚洲激情五月婷婷啪啪| 日韩制服骚丝袜av| 国产精品三级大全| 久久久国产一区二区| 免费久久久久久久精品成人欧美视频| 久久久久精品国产欧美久久久 | 免费看av在线观看网站| 久久精品亚洲熟妇少妇任你| 在线 av 中文字幕| 亚洲国产欧美日韩在线播放| 久久狼人影院| 日韩精品免费视频一区二区三区| av视频免费观看在线观看| 亚洲视频免费观看视频| 亚洲精品美女久久久久99蜜臀 | 久久午夜综合久久蜜桃| 国产精品 欧美亚洲| 亚洲精品国产一区二区精华液| 亚洲国产中文字幕在线视频| 91麻豆av在线| 日本猛色少妇xxxxx猛交久久| 亚洲情色 制服丝袜| a级毛片黄视频| 久久久久久免费高清国产稀缺| 久久精品国产a三级三级三级| www.自偷自拍.com| 好男人电影高清在线观看| 男人舔女人的私密视频| 中文字幕制服av| 女人被躁到高潮嗷嗷叫费观| 国产亚洲av高清不卡| 老司机深夜福利视频在线观看 | 搡老乐熟女国产| 亚洲熟女毛片儿| 亚洲第一青青草原| 日本vs欧美在线观看视频| 在线观看国产h片| 一本大道久久a久久精品| 日韩中文字幕欧美一区二区 | 天天影视国产精品| 国产免费又黄又爽又色| 国产精品二区激情视频| 亚洲中文av在线| 成人亚洲欧美一区二区av| 亚洲av日韩精品久久久久久密 | 精品少妇一区二区三区视频日本电影| www.av在线官网国产| 搡老岳熟女国产| 老司机在亚洲福利影院| 一区二区三区四区激情视频| 成人午夜精彩视频在线观看| 国产淫语在线视频| 91字幕亚洲| 久久久久久免费高清国产稀缺| 精品少妇久久久久久888优播| 欧美日韩视频精品一区| 超色免费av| 亚洲国产av新网站| svipshipincom国产片| 国产精品久久久久成人av| 老司机靠b影院| 18禁国产床啪视频网站| 久久av网站| 中文字幕另类日韩欧美亚洲嫩草| 久久99精品国语久久久| 婷婷色麻豆天堂久久| 成人国语在线视频| 国产精品三级大全| 亚洲欧美日韩高清在线视频 | 欧美97在线视频| 中文字幕人妻丝袜一区二区| 女性生殖器流出的白浆| 男女之事视频高清在线观看 | 黑人猛操日本美女一级片| 天堂8中文在线网| 精品国产国语对白av| 狂野欧美激情性bbbbbb| 国产精品国产三级国产专区5o| a级毛片黄视频| 啦啦啦啦在线视频资源| 亚洲国产精品国产精品| 国产免费又黄又爽又色| 一区二区日韩欧美中文字幕| 亚洲国产日韩一区二区| 波野结衣二区三区在线| 久久久亚洲精品成人影院| 欧美另类一区| 日日摸夜夜添夜夜爱| 老司机亚洲免费影院| 午夜激情av网站| 少妇粗大呻吟视频| 91字幕亚洲| 熟女av电影| 久久精品人人爽人人爽视色| 国产熟女欧美一区二区| 免费高清在线观看日韩| 国产精品国产三级专区第一集| 日本wwww免费看| 国产亚洲午夜精品一区二区久久| 777米奇影视久久| 国产精品一区二区免费欧美 | 欧美日韩视频高清一区二区三区二| 美国免费a级毛片| 亚洲色图综合在线观看| 免费日韩欧美在线观看| 一级黄片播放器| 超碰97精品在线观看| 美女视频免费永久观看网站| 在线观看免费视频网站a站| 国产在线视频一区二区| 欧美精品一区二区大全| 久久这里只有精品19| 亚洲欧洲精品一区二区精品久久久| 欧美 亚洲 国产 日韩一| 男人舔女人的私密视频| 午夜福利一区二区在线看| 国产日韩欧美亚洲二区| 18禁国产床啪视频网站| 日韩av在线免费看完整版不卡| 久久精品久久久久久噜噜老黄| bbb黄色大片| 久久久久视频综合| 天堂8中文在线网| 下体分泌物呈黄色| 中文字幕精品免费在线观看视频| 777久久人妻少妇嫩草av网站| 国产在线一区二区三区精| 尾随美女入室| 亚洲七黄色美女视频| 一本—道久久a久久精品蜜桃钙片| 一级毛片女人18水好多 | 在线看a的网站| 亚洲国产欧美在线一区| 国产99久久九九免费精品| 国产亚洲一区二区精品| 精品亚洲成a人片在线观看| av有码第一页| 国产色视频综合| 国产97色在线日韩免费| 少妇猛男粗大的猛烈进出视频| 又紧又爽又黄一区二区| 精品少妇一区二区三区视频日本电影| 国产精品国产三级国产专区5o| 在线观看免费视频网站a站| 免费观看av网站的网址| 一级毛片 在线播放| 欧美日韩国产mv在线观看视频| 蜜桃国产av成人99| 精品人妻一区二区三区麻豆| 老司机影院毛片| 亚洲av成人精品一二三区| 久久精品久久精品一区二区三区| 在线观看www视频免费| 国产又色又爽无遮挡免| 一二三四社区在线视频社区8| 国产有黄有色有爽视频| 日本一区二区免费在线视频| 久久热在线av| 我要看黄色一级片免费的| 狠狠精品人妻久久久久久综合| 90打野战视频偷拍视频| 人人妻人人澡人人看| 99国产精品99久久久久| 一区二区三区激情视频| 精品亚洲成国产av| 日本黄色日本黄色录像| 一级片免费观看大全| 首页视频小说图片口味搜索 | 国产一区二区激情短视频 | 少妇裸体淫交视频免费看高清 | 午夜日韩欧美国产| 丰满少妇做爰视频| 亚洲图色成人| 99国产精品一区二区蜜桃av | 高清欧美精品videossex| 悠悠久久av| 欧美日韩亚洲综合一区二区三区_| 少妇的丰满在线观看| 欧美 亚洲 国产 日韩一| 国产成人免费无遮挡视频| 男人舔女人的私密视频| 亚洲一区二区三区欧美精品| 亚洲国产最新在线播放| 赤兔流量卡办理| 国产麻豆69| 精品福利观看| 国产高清不卡午夜福利| 国产免费视频播放在线视频| 国产av一区二区精品久久| 日韩大片免费观看网站| 国产亚洲一区二区精品| 欧美在线一区亚洲| av不卡在线播放| 日韩熟女老妇一区二区性免费视频| 一级毛片电影观看| 国产精品香港三级国产av潘金莲 | 国产日韩欧美在线精品| 99热网站在线观看| 成年av动漫网址| 青草久久国产| 天堂俺去俺来也www色官网| 亚洲av欧美aⅴ国产| 欧美乱码精品一区二区三区| 久久久久精品人妻al黑| 国产日韩一区二区三区精品不卡| 国产不卡av网站在线观看| 大片电影免费在线观看免费| 美女午夜性视频免费| 少妇人妻久久综合中文| 亚洲国产精品国产精品| 日本vs欧美在线观看视频| 婷婷色av中文字幕| 中文欧美无线码| 亚洲专区国产一区二区| 亚洲欧美精品自产自拍| 热99国产精品久久久久久7| 99精国产麻豆久久婷婷| 一本久久精品| 肉色欧美久久久久久久蜜桃| 国产精品久久久久成人av| 天天添夜夜摸| 一级毛片我不卡| 国产在线免费精品| 日日爽夜夜爽网站| 日韩,欧美,国产一区二区三区| 久久热在线av| 欧美人与善性xxx| 国产日韩一区二区三区精品不卡| 亚洲成av片中文字幕在线观看| 久久影院123| 午夜福利乱码中文字幕| 久久中文字幕一级| 免费观看a级毛片全部| 黑人巨大精品欧美一区二区蜜桃| 亚洲成av片中文字幕在线观看| 在线观看免费日韩欧美大片| 国产av国产精品国产| 极品少妇高潮喷水抽搐| 欧美亚洲 丝袜 人妻 在线| 王馨瑶露胸无遮挡在线观看| 国产91精品成人一区二区三区 | 成人国产一区最新在线观看 | 欧美 亚洲 国产 日韩一| 另类亚洲欧美激情| 久久这里只有精品19| 下体分泌物呈黄色| 成人亚洲精品一区在线观看| 亚洲国产中文字幕在线视频| 韩国精品一区二区三区| a级毛片在线看网站| 真人做人爱边吃奶动态| 一区二区日韩欧美中文字幕| 日韩制服骚丝袜av| 黄网站色视频无遮挡免费观看| 90打野战视频偷拍视频| 国产亚洲欧美精品永久| 亚洲 国产 在线| 18在线观看网站| 女人爽到高潮嗷嗷叫在线视频| 男人添女人高潮全过程视频| 国产一区二区三区av在线| 老司机深夜福利视频在线观看 | 亚洲男人天堂网一区| 久久精品国产亚洲av涩爱| 一区二区日韩欧美中文字幕| 中文字幕制服av| 国产片特级美女逼逼视频| 亚洲精品国产色婷婷电影| 天堂中文最新版在线下载| 久久亚洲国产成人精品v| 一本一本久久a久久精品综合妖精| 精品视频人人做人人爽| 午夜免费观看性视频| 亚洲,欧美精品.| av在线老鸭窝| 国产又色又爽无遮挡免| 尾随美女入室| 亚洲av电影在线进入| 国产熟女欧美一区二区| 欧美激情高清一区二区三区| 成在线人永久免费视频| 男女下面插进去视频免费观看| 人成视频在线观看免费观看| cao死你这个sao货| 91精品三级在线观看| 亚洲男人天堂网一区| 91国产中文字幕| av天堂久久9| 久久久久久久国产电影| www.av在线官网国产| 国产精品国产三级国产专区5o| 成年人午夜在线观看视频| 热re99久久精品国产66热6| 婷婷色av中文字幕| 少妇粗大呻吟视频| 新久久久久国产一级毛片| 99re6热这里在线精品视频| 亚洲av电影在线进入| 国产成人影院久久av| 黄色片一级片一级黄色片| 亚洲一区二区三区欧美精品| 国产精品久久久久久人妻精品电影 | 中文字幕人妻熟女乱码| 亚洲综合色网址| 欧美日韩视频高清一区二区三区二| 午夜91福利影院| 好男人电影高清在线观看| 欧美黄色片欧美黄色片| 精品一区二区三卡| 亚洲国产精品999| 国产精品成人在线| 国产97色在线日韩免费| 制服人妻中文乱码| 成年人午夜在线观看视频| 国产爽快片一区二区三区| 香蕉丝袜av| 日本wwww免费看| av国产精品久久久久影院| a 毛片基地| 亚洲精品美女久久久久99蜜臀 | 99精品久久久久人妻精品| 欧美大码av| 亚洲精品国产av蜜桃| 两个人免费观看高清视频| 日韩视频在线欧美| 熟女av电影| 超色免费av| 欧美国产精品一级二级三级| 欧美日韩亚洲高清精品| 最近最新中文字幕大全免费视频 | 亚洲,一卡二卡三卡| 国产精品一区二区精品视频观看| 亚洲专区中文字幕在线| 99国产精品免费福利视频| 亚洲欧美精品自产自拍| 亚洲av成人精品一二三区| 啦啦啦视频在线资源免费观看| 国产av一区二区精品久久| 国产一区二区激情短视频 | 精品少妇一区二区三区视频日本电影| www.999成人在线观看| 亚洲精品日韩在线中文字幕| 久久青草综合色| 午夜福利视频在线观看免费| 久久精品人人爽人人爽视色| 别揉我奶头~嗯~啊~动态视频 | 亚洲精品中文字幕在线视频| tube8黄色片| 男女之事视频高清在线观看 | 大陆偷拍与自拍| 永久免费av网站大全| 国产高清videossex| a级毛片黄视频| 悠悠久久av| 人妻一区二区av| 一级毛片 在线播放| 老司机影院成人| 国产成人一区二区在线| 国产成人影院久久av| 午夜免费鲁丝| 一级a爱视频在线免费观看| 2021少妇久久久久久久久久久| 宅男免费午夜| 母亲3免费完整高清在线观看| 亚洲免费av在线视频| 亚洲精品国产区一区二| kizo精华| 99国产精品一区二区蜜桃av | 高潮久久久久久久久久久不卡| 国产亚洲av高清不卡| 久久天堂一区二区三区四区| 如日韩欧美国产精品一区二区三区| 亚洲av成人精品一二三区| 啦啦啦在线免费观看视频4| 亚洲图色成人| 丝袜美腿诱惑在线| 日韩制服骚丝袜av| 咕卡用的链子| av在线老鸭窝| 久久中文字幕一级| 婷婷色麻豆天堂久久| 国产欧美亚洲国产| cao死你这个sao货| 国产一区二区在线观看av| 中国国产av一级| 亚洲自偷自拍图片 自拍| 美女午夜性视频免费| av在线app专区| 欧美在线一区亚洲| 婷婷色综合大香蕉| 国产高清videossex| 久久久久网色| 午夜福利乱码中文字幕| 亚洲人成77777在线视频| 男人操女人黄网站| 免费av中文字幕在线| 亚洲精品成人av观看孕妇| av欧美777| 大陆偷拍与自拍| 免费观看a级毛片全部| 2018国产大陆天天弄谢| 久久精品亚洲熟妇少妇任你| 丁香六月天网| 黄频高清免费视频| 女人精品久久久久毛片| 国产成人av激情在线播放| 久久精品久久久久久久性| 久久久国产精品麻豆| 午夜福利在线免费观看网站| 亚洲精品久久久久久婷婷小说| 99精品久久久久人妻精品| 大片免费播放器 马上看| 精品高清国产在线一区| 久久精品久久久久久久性| av网站免费在线观看视频| 一级毛片 在线播放| 久久久久久久精品精品| 久久久久国产精品人妻一区二区| 伊人久久大香线蕉亚洲五| 1024香蕉在线观看| 18禁裸乳无遮挡动漫免费视频| 欧美日韩精品网址| 肉色欧美久久久久久久蜜桃| av一本久久久久| 亚洲图色成人| cao死你这个sao货| 狂野欧美激情性xxxx| 午夜两性在线视频| 亚洲中文av在线| 国产国语露脸激情在线看| 亚洲精品第二区| 久久女婷五月综合色啪小说| 男女免费视频国产| 精品福利永久在线观看| 亚洲精品久久午夜乱码| 婷婷成人精品国产| av天堂在线播放| 好男人电影高清在线观看| 99热国产这里只有精品6| 曰老女人黄片| 亚洲黑人精品在线| 高清不卡的av网站| 女人高潮潮喷娇喘18禁视频| 国产一区二区在线观看av| 性色av一级| 国产熟女午夜一区二区三区| 亚洲国产欧美网| 在线观看人妻少妇| 18在线观看网站| 看免费av毛片| 啦啦啦 在线观看视频| 日本一区二区免费在线视频| 97人妻天天添夜夜摸| 宅男免费午夜| 精品欧美一区二区三区在线| 国产精品熟女久久久久浪| 肉色欧美久久久久久久蜜桃| 多毛熟女@视频| 久久久久网色| 免费看十八禁软件| 热re99久久精品国产66热6| 久久性视频一级片| 午夜福利视频精品| 久热这里只有精品99| 亚洲综合色网址| 一边摸一边做爽爽视频免费| 99国产综合亚洲精品|