• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    An Intelligent Approach for Intrusion Detection in Industrial Control System

    2023-12-15 03:57:20AdelAlkhalilAbdulazizAljaloudDiaaUliyanMohammedAltameemiMagdyAbdelrhmanYaserAltameemiAakashAhmadandRomanyFouadMansour
    Computers Materials&Continua 2023年11期

    Adel Alkhalil,Abdulaziz Aljaloud,Diaa Uliyan,Mohammed Altameemi,Magdy Abdelrhman,Yaser Altameemi,Aakash Ahmad and Romany Fouad Mansour

    1Department of Information and Computer Science,College of Computer Science and Engineering,University of Ha’il,Ha’il,81481,Saudi Arabia

    2Applied College,University of Ha’il,Ha’il,81481,Saudi Arabia

    3College of Education,New Valley University,El-Kharga,72511,Egypt

    4College of Art,University of Ha’il,Ha’il,81481,Saudi Arabia

    5School of Computing and Communications,Lancaster University,Leipzig,04109,Germany

    6College of Science,New Valley University,El-Kharga,72511,Egypt

    ABSTRACT Supervisory control and data acquisition(SCADA)systems are computer systems that gather and analyze real-time data,distributed control systems are specially designed automated control system that consists of geographically distributed control elements,and other smaller control systems such as programmable logic controllers are industrial solid-state computers that monitor inputs and outputs and make logic-based decisions.In recent years,there has been a lot of focus on the security of industrial control systems.Due to the advancement in information technologies,the risk of cyberattacks on industrial control system has been drastically increased.Because they are so inextricably tied to human life,any damage to them might have devastating consequences.To provide an efficient solution to such problems,this paper proposes a new approach to intrusion detection.First,the important features in the dataset are determined by the difference between the distribution of unlabeled and positive data which is deployed for the learning process.Then,a prior estimation of the class is proposed based on a support vector machine.Simulation results show that the proposed approach has better anomaly detection performance than existing algorithms.

    KEYWORDS Industrial control system;anomaly detection;intrusion detection;system protection

    1 Introduction

    The Industrial Control System is a control system for industrial production,and is an important part of national infrastructure,widely used in key fields such as water conservancy,nuclear power,and energy,as the core control equipment of national infrastructure;its security is related to the national economy and people’s livelihood[1].

    With the fast growth of industrial control systems,which are now extensively employed,security problems are becoming more common.The “Stuxnet” virus outbreak in 2010 immediately caused substantial damage to the centrifuges of Iran’s nuclear plants.After the Stuxnet virus spread,the industrial control system eventually became one of the primary targets of attackers [2].The global WannaCry ransomware epidemic in 2017 made use of the high-risk vulnerability “Eternal Bule”to spread globally,disrupting major businesses such as energy,transportation,and communications in many nations[3].In March 2018,the United States Computer Emergency Preparedness Team issued security warning TA18-074A,which detailed a cyber-attack on a power facility in the United States by Russian hackers.The goal of this attack is to gather intelligence and record pertinent information for the computer implantation programmed to attack,resulting in massive losses for the power plant[4].In 2019,a network targeted the computer system control center of the Guri Hydropower Station,Venezuela’s largest power plant,creating a statewide power outage and affecting around 30 million people.The Guri Hydropower Station in Venezuela was attacked again in July of the same year,resulting in widespread outages in 16 states,including Lagas [5].Because industrial control systems are such an important aspect of national infrastructure,assaults on them frequently result in more significant consequences and bigger economic losses.

    Given the security dangers to the industrial control system,using intrusion detection measures to defend is a critical step.Now,various elements of intrusion detection based on the industrial control system are being explored,and intelligent detection of infiltration of the industrial control system is accomplished by combining the machine-learning model.Among the different machine learning models,the one-class support vector machine (OCSVM) model requires just one sort of data on the training data,allowing it to detect unknown intrusions,and as a result,it has become a popular approach for intrusion detection in industrial control systems.Due to the lack of negative example training data,the trained model will have a high FPR (False Positive Rate),therefore this work provides the learning model for intrusion detection,trains the model using regular traffic as positive example label data,and retains the model for unknown intrusions.While enhancing the model’s detection ability,the model’s intrusion detection ability is enhanced.Because the suggested learning model employs both a class of labeled data and unlabeled data to be identified for model training,its classification performance is frequently superior to that of the anomaly detection model.

    The main contributions of this paper can be summarized as follows:

    ? Because the trained model will have a high FPR(False Positive Rate)due to a lack of negative example training data,this work supplies the learning model for intrusion detection,trains the model using ordinary traffic as positive example label data,and retains the model for unknown intrusions.The model’s intrusion detection ability is improved while its detection ability is improved.Because the proposed learning model uses both labeled and unlabeled data to train the model,its classification performance is typically superior to that of the anomaly detection model.

    ? This paper analyses the class prior probability estimation algorithm based on the positive label frequency,divides the reliable positive example set through the one-class SVM model,improves the calculation method of the positive label frequency,and reduces the error in the prior probability estimate is small.

    ? Based on the concealment characteristics of industrial control system attacks,positive unlabeled learning is applied to the intrusion detection of industrial control systems,a neural network is built for learning,and the classification model is trained using only normal traffic as label data,and a public data set experiment is performed.Experiments confirm the model’s efficacy.

    This paper is structured as follows.Section 2 presents the research status of intrusion detection and positive unlabeled learning in industrial control systems.Section 3 is the main research content of this paper.Section 4 verifies the effectiveness of the proposed algorithm through experiments.Section 5 summarizes the article.

    1.1 Symbols and Notation

    Table 1 lists the symbols and corresponding descriptions.

    Table 1: Symbols and description

    2 Related Work

    2.1 Overview of Industrial Control System

    From top to bottom,the industrial control network layer model is separated into five layers:enterprise resource layer,production management layer,process monitoring layer,field control layer,and field device layer.The requirements for real-time depend on the layer.As indicated in Fig.1,the enterprise resource layer primarily consists of the functional units of the ERP system that are utilized to offer decision-making operation methods for the employees of the enterprise decision-making layer.

    The field device layer is the lowest level of industrial control and contains certain field devices such as sensors,monitors,and other execution equipment units that are used to perceive and run the production process.

    Field devices are monitored and controlled using the process monitoring layer and the field control layer.SCADA and HMI are the primary components of the process monitoring layer.SCADA may monitor and operate on-site operational equipment to perform data acquisition,equipment control,measurement,parameter modification,and other operations.HMI stands for human-machine interface,and it is used to communicate information between the system and the user.The on-site control layer is mostly PLC,which communicates with the HMI,receives control orders and query requests and communicates with field devices,controlling them by delivering operation instructions.

    The production management layer includes MES and MOMS,which are used to manage the production process,such as manufacturing data management,production scheduling management,etc.

    Figure 1:Industrial control system architecture

    The top layer is the enterprise resource layer,where the enterprise resource planning(ERP)system manages core business processes,such as production or product planning,material management,and financial conditions.

    2.2 Features of Intrusion Detection in Industrial Control System

    There are significant differences between the intrusion detection of industrial control systems and the intrusion detection of the Internet.Due to the particularity of the environment of industrial control systems,it has unique characteristics[6]:

    ? High real-time performance.Industrial control systems are usually deployed in fields such as electric power and nuclear energy,and the systems have high real-time performance,so intrusion detection also requires high real-time performance.

    ? The resources of industrial control equipment are limited.Industrial control systems contain a large number of sensors and actuators that perform specific operations.To reduce costs,their computing and storage resources are usually very limited.

    ? The device is difficult to update and reboot.The industrial control system is closely connected with the physical world,and it is usually impossible to suspend work,otherwise,it will cause serious harm to the entire industrial control system,personnel,and the environment.

    Based on the characteristics of the above industrial control system,higher requirements are put forward for the intrusion detection system:

    ? Real-time.Industrial control systems have higher real-time requirements for intrusion detection,requiring intrusion detection systems to use real-time information from industrial control systems for intrusion detection.

    ? Resources are limited.The limited resources of the industrial control system restrict the methods of intrusion detection and require the intrusion detection model to have low resource consumption.The time complexity of some algorithms based on deep learning is relatively high,especially the deep learning model,regardless of the training time,some deep neural network models have a very large number of complex network structure parameters,and the required training and prediction time is also longer.In the case of resources first,some complex deep neural network models are difficult to apply to intrusion detection of industrial control systems.Therefore,when applying the neural network model to the intrusion detection of industrial control systems,it is necessary to focus on the complexity of the model and make the neural network structure as simple as possible while ensuring accuracy.

    ? The device is difficult to update and restart.This feature limits the performance of intrusion detection models.First of all,because it is difficult for the equipment to update the model,it needs to have good generalization performance,that is,the model trained on the training data also needs to have good performance when applied to the real data.The second is the requirement of indicators.Since the device cannot be restarted or suspended,it is necessary to have a high precision rate for intrusion detection,that is,it is better to miss than to falsely report.

    The above are the characteristics of industrial control systems.When performing intrusion detection,it is usually necessary to analyze based on its traffic.The characteristics of industrial data are high dimensionality and strong correlation,which will increase the training time of the intrusion detection model.Therefore,it is necessary to analyze the industrial data Feature extraction reduces the complexity of subsequent data modeling and processing.

    Based on the requirements of high precision rate and low resource consumption of industrial control systems,as well as the difficulty of obtaining data labels,this paper constructs a shallow neural network for PU learning,which is used for intrusion detection of industrial control systems.At the same time,given the high dimensionality and strong correlation of industrial control system data,a feature selection algorithm based on PU learning is proposed for data dimensionality reduction.

    2.3 Literature of Intrusion Detection Methods in Industrial Control Systems

    Industrial control system intrusion detection can be divided into traffic-based detection,device state-based detection,and protocol-based detection.In terms of traffic,construct features through the real traffic of the industrial control system,such as flow duration,port,and other information,and then combine some machine learning models for detection,such as one-class support vector machine(SVM)[7].In terms of equipment status,reference[8]proposed an intrusion detection method based on the CUSUM algorithm.In this method,the difference between the actual value obtained by the sensor and the predicted value of the model is used as the statistical sequence,and the offset is designed according to the 3σprinciple.The constant determines the threshold,and finally,the method is verified in experiments to effectively detect deviation attacks and geometric attacks.In terms of protocols,some industrial control protocols are open,and detection rules can be formulated according to the specifications of these protocols to detect specific industrial control protocols,such as the Modbus protocol[8,9].

    With the rapid development of machine learning and artificial intelligence,its influence gradually radiates to the field of intrusion detection.A large number of machine learning models are used for intrusion detection.Different applicable machine learning algorithms can be divided into traditional classification models and clustering models [10,11],ensemble models,anomaly detection models,and neural networks.Due to the rapid development of neural networks and better classification performance than traditional machine learning models,intrusion detection based on traditional classification models is gradually cooling down.The integrated model and the anomaly detection model have their characteristics.The integrated model has better classification performance by integrating multiple base classifiers,and it is like a random forest [12].The advantages of anomaly detection such as OCSVM are: 1) It can detect unknown intrusions;2) Only background traffic is required as training data.With the deepening of research,neural networks such as autoencoders are used for unsupervised anomaly detection[13].

    The most commonly used anomaly detection algorithm for intrusion detection is OCSVM.Reference[14]investigated the application of a one-class SVM algorithm in the intrusion detection of industrial control systems.On the network layer and the transport layer,the OCSVM algorithm is used for TCP/IP traffic anomaly detection of the SCADA system.On the application layer,the OCSVM model is trained based on the normal communication flow of ModbusTCP for intrusion detection.At the same time,the paper also pointed out that there are three main problems in OCSVM anomaly detection: Industrial control system problem of feature construction,parameter optimization,and high false positive rate.

    Dynamic control center architectures are vulnerable to a variety of potentially active and passive cyber-attacks,as already explored by various industrial control protocols such as IEEE C37.118,IEC 61850,DNP3,IEC-104c,which put a variety of power system assets,such as RTUs,PMUs,protection systems,or relays,as well as control room servers,in danger.MITM attacks,data spoofings(such as inserting fake commands to trip lines or manipulating PMU measurement information),eavesdropping,or reconnaissance assaults are examples of common active or passive attack types.Intrusion detection systems(IDSs)enable the detection of unlawful activities or occurrences in ICT systems and reduce cyberattacks on vital infrastructures as a common defensive strategy.To identify cyber-attacks occurring during the PMU data transmission based on the IEEE C37.118 protocol,specification-based NIDS with a variety of stateful or stateless deep packet inspections are provided.

    Compared with classic anomaly detection models such as one-class SVM,the deep learning model has improved the detection rate,but it takes longer to train the model.

    Table 2 summarizes the work related to intrusion detection of industrial control systems based on machine learning in recent years.From the analysis of related work,the research on intrusion detection of industrial control systems has the following trends:

    Table 2: Summary of various state-of-the-art methods

    ? Tend to anomaly detection.The intrusion detection of industrial control systems is more often treated as an anomaly detection problem.In terms of model selection,a classification model such as one-class SVM or an unsupervised model such as AE is preferred for identification[15].

    ? Tend to high precision.In recent years of research work,some researchers tend to optimize model parameters through some parameter optimization algorithms such as Particle Swarm Optimization(PSO)and Gravitational Search Algorithm(GSA),so that the model has better classification performance.

    ? Tend to be real-time and efficient.Due to limited resources,the industrial control system requires the model to have a small calculation cost.From the perspective of related work,the intrusion detection of industrial control systems pays more attention to the model with low calculation consumption.At the same time,most of the models are trained through feature selection or feature extraction methods,such as principle component analysis(PCA)and fisher score for dimensionality reduction,thereby reducing the time and computation required for model training.The long-short memory network(LSTM)is also compared.

    2.4 Positive Unlabeled Learning

    Positive unlabeled learning is a neural network-based anomaly detection approach that estimates the binary classification error using positive and unlabeled data sets,allowing the positive unlabeled learning model to attain classification performance similar to the binary classification model.Because positive unlabeled learning requires training the model with both positive and unlabeled data sets,the unlabeled data set must first estimate the mixing ratio of positive and negative samples before applying it to positive unlabeled learning [31,32],also known as class prior estimation.The main method of class prior probability estimation is to start from the distribution of the positive unlabeled data set.The distribution of the unlabeled data set is a combination of the positive data distribution and the negative data distribution,the class prior probability can be obtained by comparing the distribution of positive unlabeled data sets[33-35].In addition,the class prior probability estimation algorithm based on the positive label frequency is one of the most advanced algorithms at present.Reference [36] proposed the TICE algorithm,which divides reliable positive examples in the unlabeled data set Estimate the frequency of positive labels,which is currently the algorithm with the lowest time complexity.

    Reference [37] first theoretically analyzed the positive unlabeled learning problem,compared positive unlabeled learning with the binary classification model,and estimated the loss of the binary classification sample under the condition of known class prior probabilityπ,theoretically can obtain the same decision surface as the binary classification model,which is called uPU(unbiased Positveunlabeled learning).Because the loss function of the uPU model needs to satisfy the symmetric condition,reference[38]continued to carry out research,gave a method of applying the loss function that does not satisfy the symmetric condition to the uPU,and verified the non-convex loss function and the convex loss function functions have similar precision.

    Reference[39]further compared the positive unlabeled learning model with the binary classification model and analyzed the reasons why the positive unlabeled learning model performed better than the binary classification model in some cases.

    Reference[40]proposed the nnPU(Positive-unlabeled learning with Non-negative risk estimator)algorithm to solve the problem that uPU is prone to overfitting.Based on uPU,it estimated the binary classification loss method to change.Furthermore,it ensures that the estimated negative example loss is always positive,thereby avoiding the problem caused by the estimated loss being negative,and points out that the performance of nnPU is better than that of uPU.Finally,reference [41] summarized the existing positive unlabeled learning and analyzed the seven main problems of positive unlabeled learning in the article,including the assumptions of positive unlabeled learning,evaluation indicators,main models,and class priors.

    3 Proposed Intrusion Detection Learning Mechanism

    The problem of intrusion detection in industrial control systems has received the attention of scholars as an anomaly detection problem,but some classic anomaly detection algorithms such as the one-class SVM algorithm have a high false positive rate,and the classification performance has a large gap compared with the binary classification model.This paper proposes to use positive unlabeled learning for intrusion detection.This method has been proved to have classification performance close to binary classification,and at the same time,it only needs one type of label data on the training data like the one-class SVM model.

    The intrusion detection process based on positive unlabeled learning is shown in Fig.2.In feature engineering,it is necessary to analyze features through positive label data and wrong label data,select key features,reduce data dimensions,and reduce the impact of irrelevant features on model classification performance.At the same time,the class prior probability of positive unlabeled learning is used as prior knowledge,which needs to be processed at the same time as feature engineering.By analyzing positive data and mislabeled data,a model is built to estimate the class prior probability of mislabeled data sets.Then combine the positive label data,unlabeled data,and class prior probability after feature selection to train the positive unlabeled learning model,and finally output the classification label of the model and the mislabeled data set.

    Figure 2:Proposed algorithm flowchart

    Based on the above process,the main research content of this part is divided into three parts:First,explore a feature selection algorithm based on positive unlabeled learning,and analyze the importance of features based on positive label data and unlabeled data.Secondly,research class prior probability estimation algorithm,improve the accuracy of class prior probability estimation and provide important prior knowledge for positive unlabeled learning.Finally,based on the data after feature selection and the estimated class prior probability,the classification model is trained by positive unlabeled learning.

    In this paper,the problems of anomaly detection are answered in a targeted manner:

    ? In terms of feature engineering,this paper studies the calculation method of feature importance based on positive unlabeled learning,which can be used as a feature selection metric for feature selection of industrial control system data;

    ? In terms of resource constraints and real-time issues in industrial control systems,this paper chooses a shallow neural network,which requires less storage resources and computing resources,which meets the needs of industrial control systems;

    ? In terms of false alarm rate,positive unlabeled learning has been shown to perform similarly to the binary classification model and to have a higher accuracy rate than the unsupervised anomaly detection model.

    3.1 Feature Importance

    In industrial control systems,data has the characteristics of high dimensionality and strong correlation.Many machine learning problems become difficult when the data dimensionality is high,a phenomenon known as the curse of dimensionality.Feature selection is an important part of feature engineering.Its principle is to extract key features from all features,to achieve the purpose of dimensionality reduction.Feature selection methods can be divided into two categories:encapsulation and filtering.Among them,the encapsulation feature selection usually selects a base model for multiple rounds of training and gradually screens out redundant features according to the classification performance of the trained model.Filtering feature selection is to calculate the importance of features,set a threshold to filter out irrelevant features,and further filter out redundant features through correlation.

    In positive unlabeled learning,since there is only one class of labeled samples,it is difficult to evaluate the performance of the packaged model.Therefore,the filtering feature selection method is used in this paper.The commonly used feature importance calculation methods are shown in Table 3.

    Table 3: Feature determination methods

    The importance calculation method of the filtering feature selection method is calculated by evaluating the correlation between features and labels,and it is considered that the features that have an obvious correlation with the target category are the key features.However,in positive unlabeled learning,there is only one class of labeled samples,and the feature importance calculation method in the binary classification model cannot be directly used.Therefore,it is necessary to find a feature importance calculation method suitable for positive unlabeled learning scenarios.

    Inspired by the importance calculation idea of this binary classification,this paper presents a key feature identification method for PU learning: Considering that the unlabeled data set is a mixture of positive samples and negative samples,the attribute value of the feature in the unlabeled data set includes two parts: the positive value and the negative value.If the feature is strongly related to the class label,then the unlabeled data distribution of attribute values of this feature should show obvious bimodal or multimodal features and the distribution of different class sample features is quite different,as shown in Fig.3.When the feature is weakly correlated with the class label,the positive sample similar to the feature distribution of the negative samples.

    Figure 3:Comparison of feature relation of data correlation

    The distribution difference of the feature on the positive data set and the unlabeled data set can be used as the importance of the feature.The Kullback-Leibler (KL) divergence can describe the difference between two distributions,and its discrete form is shown in formula(1).

    The KL divergence requires the probability of a feature attribute value when calculating the difference between two feature distributions.First of all,considering that the value range of the attribute value of the feature is not limited,it is necessary to standardize the maximum and minimum values before the calculation,to limit the attribute value after normalization to the [0,1] interval.Secondly,there are two forms of continuous and discrete attribute values of features.To deal with them uniformly,the[0,1] interval is equally divided in the algorithm,and the KL divergence is calculated by using the frequency of samples in each small area as a probability.The specific steps are shown in Algorithm 1.

    Time complexity analysis:In the third step of the algorithm,data standardization is carried out,and the data standardization method adopted is the standardization of maximum and minimum values,and the time complexity of this step isO(mn).Steps 4 to 6 are to calculate the feature’s importance.By dividing the [0,1] interval into equal parts,the KL divergence is calculated with the frequency in each small interval as the probability.The time complexity of this part isO(mn).So the total time complexity of the algorithm isO(mn).

    Through KL divergence,the estimated value of feature importance can be given in the scene with only positive label data,and key features can be distinguished from irrelevant features.In the case of redundant features,features can be filtered based on feature importance,such as setting feature importance thresholds or specifying the number of selected features.

    She must, then, be very beautiful indeed; how happy you have been! Could not I see her? Ah! dear Miss Charlotte,53 do lend me your yellow suit of clothes which you wear every day. 54

    3.2 Class Prior Probability Estimation for PU Learning

    In the industrial control system,it is very difficult to collect a large amount of intrusion data,but the collection of traffic and status codes in the normal operation of the system is relatively simple.Taking the data in the normal state as the positive label data for positive unlabeled learning is in line with the actual situation of the industrial control system.In positive unlabeled learning,it is very important to analyze the data to be detected and obtain the class prior probability.The class prior probability of positive unlabeled learning is defined asπ=p(y=1),when the collection of samples satisfies the SCAR (select at completely random) assumption,the class prior probability is the proportion of positive samples in the unlabeled data set proportion.

    Definition 1(SCAR assumption):The collection of samples has nothing to do with the attributes of the samples and is completely random,namely:

    According to the different sources of positive data,it can be divided into two categories: One Sample(OS)and Two Samples(TS).When the OS collects data,it only performs random sampling once,that is,randomly collects a part of the data in the real data,digs out some positive data from the collected data and adds labels,and unlabeled data as unlabeled data.When TS collects data,it needs to sample twice,that is,first randomly collect a part of the positive label data,and the unlabeled data set is obtained by random sampling in the real data.

    Since the positive data is randomly selected in the unlabeled data set,an intermediate variablecis generated in this scenario,which is called the positive label frequency (label frequency),which is defined asc=p(s=1|y=1),wheres=1 represents the sample that the sample is selected from.The relationship between label frequency and class prior probability can be expressed by Eq.(3).

    Therefore,the class prior probability can be estimated by estimating the positive label frequencyc.In particular,in the TS scenario,the positive data and unlabeled data can be mixed,the positive samples can be regarded as randomly selected and labeled positive samples,and the frequency of positive labels can also be estimated.

    The lower bound of the estimated positive label frequency is obtained by using the decision tree to obtain the estimated value of the positive label frequency.This algorithm is called the TIcE algorithm.In this paper,the one-class SVM algorithm is used to improve the TIcE algorithm,and the one-class SVM algorithm is proposed to divide the reliable positive example set,and then estimate the positive label frequency.

    The one-class SVM algorithm is a classic anomaly detection algorithm.When it uses the RBF kernel function,its performance is similar to that of support vector data description(SVDD).It can be considered that the one-class SVM algorithm finds a hypersphere in the feature space,contains the positive samples in the hypersphere,and makes the radius of the hypersphere the smallest.Its problem description is shown in formula(4).

    On the estimation of positive label frequency,the estimated value can be given by Chebyshev’s inequality.Through Chebyshev’s inequality,the numberLSof labeled samples in the positive example setSsatisfies the formula(5).

    whereLSobeys the binomial distribution,and the expectation of the random variableLSisE(L)=cNS,the variance isD(L)=c(1-c)NS,andNSis the total number of samples in the positive example setS.Substitute into formula(5)to get:

    Letδ=c(1-c)NS/ε2,then formula(6)is equivalent to:

    Through the formula(7),the upper and lower bounds of the positive label frequencyccan be constrained by the probabilityδ,as shown in the formula(8).

    In the TIcE algorithm,since the algorithm for finding reliable positive examples is a decision tree,with the division of the decision tree,the number of leaf nodes decreases,and there will be some leaf nodes that deviate from the real sample mixing ratio.The lower bound of the experimental probability estimate is constrained.However,by dividing the reliable positive example set by the one-class SVM algorithm,the sample number of the positive example set can be constrained,so the midpoint of the interval can be taken as the estimated value of the positive example label frequencyc,and then the class prior probability can be calculated,called a type of prior probability estimation algorithm(one-class SVM-cE).

    Compared with the TIcE algorithm,the one-class SVM first converts the algorithm for finding positive examples from decision trees to one-class SVM.On the one hand,this can limit the number of samples of reliable positive examples through the parameters of the one-class SVM model,and avoid the problem caused by reliable positive examples.On the other hand,the data used in the training model is optimized.TIcE needs to use both the positive data set and the unlabeled data set when constructing the decision tree.

    When the TIcE algorithm estimates the class prior probability,it needs to repeatedly construct decision trees based on different unlabeled data sets,which is expensive in practical applications.However,the one-class SVM-cE algorithm only needs positive data sets when building the model,and the trained model can be used in different unlabeled data sets,so after the model is trained,the time complexity of the OCSVM-cE algorithm is reduced toO(n).The specific steps are shown in Algorithm 2.

    Algorithm 2 can analyze the industrial control data to be detected,estimate its class prior probability,provide important prior knowledge for positive unlabeled learning,and avoid collecting industrial control system intrusion detection data,greatly reducing labor costs.

    3.3 Neural Network in Positive Unlabeled Learning

    In industrial control systems,intrusions are highly concealed and updated quickly.From“Stuxnet”to“Duqu”,and then to“Flame”flame virus,the traditional classification-based intrusion detection technology is difficult to cope with its update,and the intrusion detection is treated as anomaly detection.Although it cannot identify the type of intrusion,it can also have the ability to warn in the face of unknown intrusions.In this paper,the positive unlabeled learning method is used for intrusion detection,and the normal traffic is used as the label data,which participates in the training of the model at the same time as the data to be detected.The positive unlabeled learning approach,like the anomalous detection algorithm,can detect unknown assaults,and it has been demonstrated that the trained model has an accuracy comparable to the binary classification model.

    3.3.1 Positive Unlabeled Learning under Data Imbalance

    The unlabeled data set is treated as a negative example data set with noisy label samples in PU learning,and the binary classification loss is computed using the class prior probability.Formula(9)depicts the predicted computation of the binary classification loss.

    However,in positive unlabeled learning,there are no labeled negative examples,so the loss of negative examples cannot be directly calculated.In nnPU,it is proposed to estimate the loss of negative examples through unlabeled data sets,which is also the core idea of nnPU.The unlabeled data set mixes positive and negative samples,and it is regarded as a negative data set containing wrongly labeled samples,then the loss expectation can be expressed as follows:

    whereπis the class prior probability in the unlabeled dataset,lis the loss function,andUPis the set of positive samples in the unlabeled dataset.In formula(10),EU(l(f(x),-1))can be directly calculated,andEN(l(f(x),-1))is the negative sample loss to be estimated,so the problem is transformed into calculationEUP(l(f(x),-1)).

    In the TS scenario,both the positively labeled dataset and the unlabeled dataset are obtained by random sampling,so the expected loss of the positively labeled dataset and the expected loss of the positive sample in the unlabeled dataset are approximate,as follows:

    Combine formulas(10) and (11) to get the method of estimating binary classification error,as shown in formula(12).

    The formula is called Non-negative risk estimator [40],where,max(0,EU(l(f(x),-1)) -πEP(l(f(x),-1))) is the estimated counter-example loss,EP(l(f(x),1)) is the expectation of the positive sample loss.

    When performing intrusion detection,normal traffic is taken as a positive sample,so the proportion of positive samples in the unlabeled data set to be detected is usually much larger than that of negative examples,and there is a problem of data imbalance.

    To deal with the data imbalance problem caused by the small prior probability of the class,the loss function of positive unlabeled learning is set as focal loss,as shown in Fig.4,focal loss can be written as:

    During the training process of the model,when positive samples are misidentified,they will be regarded as difficult samples.At this time,there is a gap of tens or even hundreds of times between(f(xi))γand(1-f(xi))γ,the weight of difficult samples can be increased to improve the classification performance of nnPU under data imbalance.The modified non-negative risk estimator is shown in formula(14).

    Figure 4:Loss comparison under various values of learning rate γ

    The specific steps of positive unlabeled learning are shown in Algorithm 3.

    From the above analysis,it can be seen that compared with the binary classification model,positive unlabeled learning is adjusted in the error calculation,the binary classification error is estimated through the risk estimator,and the estimated binary classification error is used for backpropagation to adjust the parameters of the neural network model.

    3.3.2 Neural Network Settings

    In the process of using machine learning methods for industrial control system intrusion detection,it is necessary to pay attention to the real-time requirements of the industrial control system for the model,and the model is required to quickly make judgments on the input data.Therefore,the neural network structure used needs to be simplified as much as possible.On the one hand,the simplified model can reduce the detection response time and improve the real-time performance of the model.On the other hand,it can reduce the demand for computing resources and is more in line with the application scenarios of industrial control systems.

    Positive unlabeled learning is a learning algorithm based on neural networks,which trains neural network models by estimating classification errors in scenarios where there is only one type of labeled data.The difference in neural network structure will also affect the performance of the model.In this section,we discuss two positive unlabeled learning models with different network structures.

    The first is a fully connected deep neural network (DNN).It is a neural network with multiple hidden layers.In theory,DNN can fit any function.Reference [42] discussed the classification performance of DNN with different numbers of hidden layers in intrusion detection,and the results show that when performing binary classification,the DNN model with three hidden layers can have relatively high classification performance,and as the number of layers increases,the classification performance does not improve significantly.Therefore,in this paper,a DNN model with 3 hidden layers is selected,and the numbers of the three hidden nodes are 256,64,and 16,respectively.The network structure settings of the model are shown in Table 4.

    Table 4: Parameters of the neural network model

    The positive unlabeled learning completes a binary classification task through DNN,divides all samples to be detected into normal traffic and intrusion traffic,and the output of DNN is mapped to the[0,1]interval through the Sigmoid function to complete the binary classification task.

    In DNN,batch normalization (BN) is performed between two fully connected layers,that is,the output of each hidden layer neuron is standardized,so that the input value of the nonlinear transformation function falls into an area that is more sensitive to the input.The use of BN can speed up the convergence of the neural network.In addition,BN allows the model to use a higher learning rate and reduces the model’s requirements for network parameter initialization.It can also act as a regulator,and in some cases can eliminate the need for dropout.

    The activation function in DNN is the ReLu function.(1) It can speed up network training.Compared with sigmoid and tanh,its derivation is faster.(2)Prevent the gradient from disappearing.When the value is too large or too small,the derivatives of sigmoid and tanh are close to 0,and ReLu is an unsaturated activation function,which does not exist.(3)Make the grid sparse.

    The weight update algorithm uses the Adam algorithm,which is an adaptive learning rate optimization algorithm,and has the advantages of fast convergence and less memory usage.

    The second is the Convolutional Neural Network(CNN).In this paper,a simple CNN network structure Lenet-5 structure is adopted.Considering that Lenet-5 is a network for processing twodimensional images,the input is required to be 32×32,and the data of industrial control systems are usually one-dimensional vectors,so the network structure is adjusted to replace the two-dimensional convolution in Lenet-5 as For one-dimensional convolution,the input size is 32×1.Therefore,it is necessary to perform feature selection and reduce the dimension to 32 dimensions before training the model.The first layer of the network uses a 5×1 convolution,and after passing through the first layer,six feature maps with a size of 28×1 are obtained,and then through the maximum pooling sampling with a size of 2,the size is changed to 14×1.The first convolutional layer uses 5×1 convolution to output 16 feature maps with a size of 10×1,and then changes it to 5×1 through the maximum pooling sampling with a size of 2,and finally flattens all images into a fully connected layer,the fully connected layer has two layers,the number of neurons in the first layer is 120,and the number of neurons in the second layer is 84.Finally,according to the classification category,the output is performed through the softmax function.The model structure of the industrial control system based on Lenet-5 is shown in Fig.5.The“?”in the input(?,32,1)represents the batch size,and the activation function uses the ReLu function.

    Figure 5:CNN-based framework for intrusion detection in positive unlabeled learning

    So far,the model structure based on positive unlabeled learning can be obtained.The offline training steps of the intrusion detection model based on positive unlabeled learning are as follows:

    ? Read data,including positive label data and unlabeled data to be detected,and perform data preprocessing;

    ? Using the OCSVM-cE technique,estimate the class prior probability of the unlabeled data set and save the OCSVM model;

    ? Calculate the feature importance through the KL divergence,set the thresholdthor the selected feature numberK,and perform feature selection according to the feature importance to obtain a new training data set;

    ? Initialize a deep neural network and use the new training data set after feature selection to train the PU learning model.The training process is shown in Algorithm 3;

    ? Export the trained neural network and return the predicted value of the unlabeled dataset.

    4 Experimental Results

    4.1 Data Introduction and Analysis

    Three publicly available datasets for intrusion detection are used in the experiments:NSL-KDD[43],UNSW-NB15 [44],and WADI [45].Among them,the data in the NSL-KDD and UNSWNB15 datasets are based on the characteristics extracted from Internet traffic,including the basic characteristics of the flow(such as transport layer protocol type,port,etc.),time information of the flow,connection content characteristics,etc.These characteristics can also be provided as industrial control system traffic.At the same time,to further verify the effectiveness of the model in the industrial control scene,the WADI dataset is introduced.On the one hand,the industrial control data provided by the industrial control test bench is applied to the data.On the other hand,simulates the unbalanced characteristics of industrial control data.

    In terms of attack types,the NSL-KDD dataset is improved on the KDDCUP99 dataset,and some redundant data are removed.The dataset contains normal traffic and 22 types of attack traffic.The attack traffic mainly includes denial of service attack(DoS),monitoring and detection(Probing),remote machine illegal access (R2L),and ordinary user unauthorized access (U2R) four categories.The UNSW_NB15 dataset is an intrusion detection dataset generated by the Australian Cyber Security Center,including samples of 9 types of attacks including DoS and Backdoors.The WADI dataset is collected on an attached test rig,which consists of many large tanks that supply water to user tanks.The WADI dataset contains 16 attacks whose goal is to stop the water supply to the user tanks.

    In the experiment,the UNSW-NB15 dataset uses the training and testing data sets provided by the official website,with a total of 257,673 samples.The WADI dataset uses labeled data from October 2019.The sample size of each data set is shown in Table 5.

    Table 5: Description of datasets

    4.2 Data Preprocessing

    In the division of training and test data sets,based on the true labels of the samples,a specified number of positive samples are randomly selected from the positive data as the training set,and the remaining data are used as the test set.In terms of data processing,for the string data existing in NSLKDD,such as protocol types and services,one-hot encoding is required to convert the string into a vector,and the dimension of the NSL-KDD dataset after encoding is increased from 41 dimensions to 122 dimensions.The data in the UNSW-NB15 and WADI datasets do not have null values and strings,so they can be used directly.

    The equipment used in this experiment:the processor is Intel core i7 8750H,the operating system is 64-bit Windows 10 Home Chinese Edition,the hard disk is Western Digital SN720,and the memory is 16 GB.

    4.3 Evaluation Index

    After the model is trained,the data set to be predicted is classified through the model,and based on the judgment result of the model,the confusion matrix shown in Table 6 can be established.

    Table 6: Positive and counterexample matrix

    As shown in Tables 2-5,the row represents the true category of the data,and the column represents the predicted category of the model.In intrusion detection,the focus is on the ability of the model to identify intrusion samples.Therefore,the precision and recall of intrusion samples are used as evaluation indicators.In the sample,the true label is the proportion of positive examples,as shown in formula(15).

    The recall rate is shown in formula(16).The recall rate describes the proportion of the model that recognizes all samples of the true category as positive examples.

    The F1-score is also often used as an evaluation index.F1-score is the harmonic mean of precision and recall,as shown in the formula(17).

    In addition to the above indicators,in the intrusion detection scenario,due to the large amount of data faced,the time taken for model training and prediction is also an important indicator to measure the performance of the model.

    4.4 Analysis of Results

    4.4.1 Effectiveness Analysis and Time Efficiency of Feature Importance

    In this experiment,the importance of each feature is first calculated by random forest in the binary classification scenario,and compared with the feature importance calculated based on KL divergence to verify the effectiveness of the feature weight calculated using KL divergence.

    In the experiment,2000 positive samples were randomly selected from all samples as the positive label data set,and then 2000 positive samples and 4000 negative samples were mixed as the unlabeled data set,and all the remaining samples were used as the test set.Figs.6 and 7 show the experimental results of the KLOCSVM and KDE-OCSVM algorithms in the NSL-KDD dataset and the UNSWNB15 dataset,respectively.

    Further,the correlation of the feature importance obtained by the two algorithms is calculated and the correlation test is carried out.By calculation,under the UNSW-NB15 data set,the average correlation coefficient of the feature importance of the two algorithms after normalization is 0.72,and thepvalue of the test is 4.29×10-7.The correlation coefficient on the NSL-KDD data set is 0.9364,and thepvalue of the test is 1.15 ×10-56.With a significance level of 0.05,it can be concluded that there is a significant correlation between the feature importance calculated by KL divergence and the feature importance in the case of binary classification,that is,the feature importance calculated by KL divergence is effective.

    Figure 6:Evaluation of feature importance(UNSW-NB15)

    Figure 7:Evaluation of feature importance(NSL-KDD)

    4.4.2 Class Prior Probability Estimation

    To verify the effectiveness of the OCSVM-cE algorithm proposed in this paper,it is compared with the following class prior probability algorithm.

    ? KM1/KM2 algorithm.This algorithm embeds it into the kernel space by calculating the distribution of positive and negative data sets and can solve the class prior probability by solving a quadratic programming problem.The algorithm is an algorithm with high estimation accuracy at present.

    ? TICE algorithm.This algorithm divides all samples based on a decision tree,raises the lower bound of the label frequency of positive samples through subsets,obtains the estimated value of the label frequency of positive samples,and then calculates the class priority.This algorithm is currently the algorithm with the lowest time complexity for class-prior probability estimation.

    ? One-class SVM-cE algorithm.The algorithm proposed in this paper trains the One-class SVM model to find reliable positive examples of unlabeled datasets,estimates the label frequency of positive examples through the reliable positive examples,and then calculates the class prior probability.

    In the class prior probability estimation problem,the core evaluation index is the estimation accuracy,that is,the error between the estimated value and the real value.In addition,the time complexity of the algorithm is also an important evaluation index.

    Based on the above evaluation indicators,the following two experiments are designed for verification:1)To verify the accuracy of class prior probability estimation,construct unlabeled data sets with different class prior probabilities in the experiment,and estimate the class prior probabilities of the constructed unlabeled data sets through four different baseline algorithms,analyze the error between the estimated value of different algorithms and the real value;2)Verify the time complexity of the algorithm.In this experiment,we first compare the time required for each algorithm to estimate the class prior probability under the same sample size,and then estimate the time trend of the class prior probability under different sample sizes.

    The first is the accuracy of class prior probability estimates.In the experiment,the sample size of the positive label dataset is set to 1000,and the number of negative samples in the unlabeled dataset is 2000,respectively,constructing unlabeled datasets with class prior probabilities of 0.1,0.2,0.3,0.4,and 0.5.The class prior probabilities were estimated for the constructed datasets using baseline algorithms,respectively.

    The experimental results are shown in Figs.8 and 9.The abscissa is the prior probability of the real class,and the ordinate is the absolute value of the error between the estimated value and the predicted value.The experimental results show that the one-class SVM-cE algorithm can maintain high prediction accuracy on the two data sets,the error is close to that of the KM2 algorithm and maintained below 0.05,and the stability of the algorithm estimation is better.

    Figure 8:Error comparison of algorithms(UNSW-NB15)

    During the experiment,the TIcE algorithm has a large positive error.This is because the TIcE algorithm estimates the real label frequency by seeking the lower bound of the label frequency,which will cause the estimated label frequency to be lower than the real value,so the estimated class prior probability is larger than the true value.In the one-class SVM-cE algorithm,the one-class SVM algorithm is used to find reliable positive examples,avoiding the use of lower bounds,and improving the accuracy of estimation.

    Figure 9:Error comparison of algorithms(NSL-KDD)

    To further test the stability of the one-class SVM-cE algorithm estimation,the number of samples in the positive label data set is set to 2000,and the value is randomly selected in the interval[0.1,0.9]as the class prior probability to construct the unlabeled data set,and the experiment is repeated 100 times,computes the error between the class prior probability estimate and the true value.

    Fig.10 shows the boxplot of 100 repeated experiments.It can be found that the predicted effect of the one-class SVM-cE algorithm on the KDD and UNSW-NB15 data sets is better than that on the WADI data set,and the estimated four-point error is less than 0.05,while The lower quartile of the estimated error on the WADI dataset is 0.0407,the median is 0.0672,and the upper quartile is 0.0884.There are only two outliers,so the estimated value of WADI is relatively stable,and the error is concentrated in [0.05,01] interval,the estimation results of the three data sets are combined,and OCSVM-cE is a stable class prior probability estimation algorithm.

    Figure 10:Error comparison of various datasets

    Fig.10 shows the boxplot of 100 repeated experiments.It can be found that the predicted effect of the one-class SVM-cE algorithm on the KDD and UNSW-NB15 datasets is better than that on the WADI dataset,and the estimated four-point error is less than 0.05,while the lower quartile of the estimated error on the WADI dataset is 0.0407,the median is 0.0672,and the upper quartile is 0.0884.There are only two outliers,so the estimated value of WADI is relatively stable,and the error is concentrated in [0.05,01] interval,the estimation results of the three datasets are combined,and one-class SVM-cE is a stable class prior probability estimation algorithm.

    In PU learning,class prior probability is important prior knowledge,and its estimated error will directly affect the performance of the trained model.Through experiments,we further explore the influence of class prior probability estimation error on model performance.In the experiment,the true class prior probability of the unlabeled data set is set to 0.4,and different values are taken as the estimated value of the class prior probability in the interval[0,1]with 0.05 prior probability.The results are shown in Fig.11.It shows the experimental results of setting the number of positive label samples to 10,000 and the number of negative samples in the unlabeled dataset to 20,000 under the UNSW-NB15 dataset.The abscissa is the estimated class prior probability,and the ordinate is the F1-score.It can be observed that when the estimated class prior probability is 0.4,the F1-score achieves its highest value,and the model’s performance is the best at this moment,and when the error between the estimated and true class prior probabilities increases.When the estimated value is zero,all unlabeled samples are classed as negative,and when the projected value is one,all unlabeled samples are classified as positive.The estimated class prior probability error from the F1-score analysis should be less than 0.05 to guarantee that the model has satisfactory classification performance.

    Figure 11:F1-score evaluation under class prior

    Fig.12 shows that when the number of fixed positive samples is 1000,the time required by the oneclass SVM-cE algorithm and the TIcE algorithm is positively correlated with the number of unlabeled samples [46,47].Considering that in one-class SVM-cE,only positive samples are needed to train the OCSVM model,it can be considered that the one-class SVM-cE algorithm is more suitable for intrusion detection application scenarios,and the one-class SVM model trained in the process can be reused.When the class prior probability estimation is performed on the labeled dataset,the model can be directly loaded to classify reliable positive examples.

    Fig.13 compares the runtime of the proposed algorithm under IEC 60870-5104[48]and DNP3[49]datasets.As can be seen from Fig.13,the runtime of the proposed algorithm under the IEC dataset is better than DNP3.

    4.4.3 Positive Unlabeled Learning Performance Analysis

    The neural network settings of the compared binary classification model:the DNN settings are the same as the DNN network model used for PU learning.The model contains three hidden layers.The number of neurons in the first layer is 256,the number of neurons in the second layer is 64,and the number of neurons in the third layer is 16,but the positive and negative samples with real labels are used for training during the training process.The network structure of CNN uses the same LeNet-5 structure.The input is an image of 32×32,and the first layer uses a convolution of 5×5[50].After passing through the first layer,six images with a size of 28×28 feature map,and through the maximum pooling sampling of 2×2,it is changed to a size of 14×14,and the second convolutional layer uses convolution of 5×5 to output 16 feature maps with a size of 10×10,and then passes The maximum pooling sampling of 2×2 is changed to 5×5,and finally all the images are flattened and input into a fully connected layer,the number of neurons in the first layer is 120,and the number of neurons in the second layer is 84,and finally according to the classification category,output through the softmax function.RNN sets the number of hidden layer nodes to 80.

    Figure 12:Comparison of estimation time of the algorithms(UNSW-NB15)

    Figure 13:Comparison of estimation time of the proposed algorithm under IEC 60870-5104 and DNP3 datasets

    In the experiment,the number of positive labeled samples is set to 10,000,the number of negative examples in the unlabeled data set is 2000,the class prior probability is 0.9,the learning rate is 0.01,and the number of iterations is 50.

    Table 7 shows the comparison results of positive unlabeled learning and binary classification models.The comparison experiments in the table can be divided into two categories: positive unlabeled learning and binary classification performance comparison under the same network structure(DNN/CNN),positive unlabeled learning,and current better performance binary classification model comparison.According to the experimental data,the precision of positive unlabeled learning under the same network topology is comparable to that of the binary classification model,however,there is little difference in the recall rate[51].According to the previous analysis,industrial control intrusion detection requires higher precision of the model,it is expected to achieve “prefer false negatives rather than false positives”,so positive unlabeled learning is suitable for industrial control intrusion detection,and compared with the current advanced CNN-BiLSTM and other models,it can still maintain a small gap in precision.At the same time,the positive unlabeled learning compares the binary classification model,which reduces the requirements for training data.Only one type of labeled data is needed,which can effectively reduce the data collection work.At the same time,only positive and unlabeled data are used for training,so that the model can mine unknown types of intrusion.

    In Table 7,the experiment compares the performance of PU learning and binary classification model,then compares the proposed learning and anomaly detection model,and analyzes the performance difference under the same condition of only one type of label data.From the research analysis listed in Table 2,the anomaly detection models currently used for intrusion detection are mainly AE and one-class SVM,where AE is an unsupervised model,and consists of two parts: Encoder and decoder.The role of the encoder is to find the compressed representation of the given data.The decoder is used to reconstruct the original input and perform anomaly detection by calculating the error between the reconstructed input and the original input [52].At the same time,looking at the research of one-class SVM for intrusion detection,its main work is focused on feature engineering.In this experiment,feature selection is based on the feature importance metric of positive unlabeled learning,and one-class SVM is used for anomaly detection.In terms of parameter setting,one-class SVM sets the upper limit of the error to 0.1,and the parameters of AE adopt the default settings in the source code.

    Table 8 shows the comparison results of the positive unlabeled learning and anomaly detection models.The indicators in the table show that,in particular,it can be observed that the performance of the one-class SVM and AE models on the WADI dataset is poor,which is caused by the imbalance of the test data.The ratio of positive data to negative data in the test data set is about 16:1,which also shows that the one-class SVM and AE algorithms are insufficient when dealing with unbalanced data,and positive unlabeled learning improves the performance of the model under unbalanced data through focal loss[53].Therefore,positive unlabeled learning has significantly improved the precision rate and recall rate.On the three data sets,the proposed algorithm has significantly better performance than AE and one-class SVM in terms of precision rate.

    Table 8: Proposed algorithm evaluation on various datasets(anomaly detection)

    Combining the results of Tables 7 and 8,it is not difficult to find that although proposed learning is similar to the anomaly detection algorithm in terms of training data,only one type of label data is needed,but the classification performance of the trained model has a larger gap than that of the anomaly detection algorithm.Especially in industrial control scenarios,taking the WADI dataset as an example,the ratio of normal data to abnormal data is as high as 16:1,and proposed learning can also maintain a high precision and recall rate,compared with some binary classification algorithms,their only a slight difference in precision.Combined with the previous characteristics of industrial control scenarios,the proposed learning is suitable for anomaly detection in industrial control scenarios.

    To sum up,this paper proposes to use proposed learning for intrusion detection.It is an algorithm similar to anomaly detection,but it needs to label positive data on the training data,and the positive data needs to meet the SCAR condition.It can provide intrusion detection with high precision and high recall,and its precision and recall are significantly improved compared with unsupervised anomaly detection models.In particular,it is close to the binary classification model in terms of precision.

    5 Conclusion

    Industrial control systems are mostly utilized in nuclear power,water conservation,and other critical infrastructures.It is important to assure the safety of industrial control systems.The intrusion detection system ensures network security and is an important component of industrial control system security.In this study,a positive unlabeled learning for intrusion detection in industrial control systems and used normal traffic as label data to find aberrant samples in the data.A feature significance calculation approach for feature selection with the goals of high dimensionality and strong correlation of industrial control system data is deployed.Simultaneously,the class prior probability estimation algorithm is enhanced,and the one-class SVM-cE algorithm for class prior probability estimation is employed,which increases the estimate’s stability and accuracy.Finally,experiments are performed to validate the efficiency of the suggested learning.When compared to a supervised binary classification model,the proposed learning model maintains a high accuracy rate while having a slightly lower recall rate.Although the suggested learning approach avoids using negative data,it also imposes limits on positive data: positive samples are picked randomly.That is,their distribution is the same as the distribution of positive samples in unlabeled dataset.It is also a shortcoming of the suggested learning,and future research can concentrate on executing positive unlabeled learning on a data set with selection bias.

    Acknowledgement:This research is supported by the University of Ha’il-Saudi Arabia.

    Funding Statement:This research has been funded by the Research Deanship at the University of Ha’il-Saudi Arabia through Project Number RG-20146.

    Author Contributions:The authors confirm their contribution to the paper as follows:study conception and design: A.Alkhalil,D.Uliyan;data collection: M.Altameemi;analysis and interpretation of results: A.Abdelrhman,Y.Altameemi;draft manuscript preparation: A.Ahmad,R.Mansour,A.Alkhalil.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:The data used for the findings of this study is available within this article.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    av片东京热男人的天堂| av在线老鸭窝| 丝袜脚勾引网站| 老司机影院成人| 一级黄片播放器| 纯流量卡能插随身wifi吗| 丝袜喷水一区| 日本免费在线观看一区| 久久这里只有精品19| 精品第一国产精品| 韩国高清视频一区二区三区| 一本一本久久a久久精品综合妖精 国产伦在线观看视频一区 | 国产欧美日韩一区二区三区在线| 国产在线免费精品| 最黄视频免费看| 久久综合国产亚洲精品| 在线免费观看不下载黄p国产| 国产色婷婷99| 天堂中文最新版在线下载| www.色视频.com| 天美传媒精品一区二区| 18禁动态无遮挡网站| 色5月婷婷丁香| 亚洲av欧美aⅴ国产| 各种免费的搞黄视频| 亚洲高清免费不卡视频| 内地一区二区视频在线| 国产在线免费精品| 亚洲综合色惰| 最后的刺客免费高清国语| 天天躁夜夜躁狠狠躁躁| 亚洲中文av在线| 欧美人与性动交α欧美精品济南到 | 久久久国产精品麻豆| 亚洲色图 男人天堂 中文字幕 | 夜夜爽夜夜爽视频| 22中文网久久字幕| 青春草亚洲视频在线观看| 日韩 亚洲 欧美在线| 欧美97在线视频| 天堂中文最新版在线下载| 女人精品久久久久毛片| 免费日韩欧美在线观看| 一区二区日韩欧美中文字幕 | 中文字幕av电影在线播放| 国产精品久久久久久精品古装| 国产成人午夜福利电影在线观看| 如日韩欧美国产精品一区二区三区| 久久影院123| 少妇人妻精品综合一区二区| 99国产精品免费福利视频| 国产黄频视频在线观看| 一区二区三区四区激情视频| 亚洲伊人色综图| 国产白丝娇喘喷水9色精品| 国产一区二区在线观看日韩| 午夜免费鲁丝| 国产深夜福利视频在线观看| 寂寞人妻少妇视频99o| 国产精品国产三级国产av玫瑰| 成人无遮挡网站| 久久精品国产鲁丝片午夜精品| 免费在线观看完整版高清| 亚洲精品久久久久久婷婷小说| av一本久久久久| 亚洲国产色片| 久久av网站| 母亲3免费完整高清在线观看 | 视频中文字幕在线观看| 国产一级毛片在线| 欧美成人午夜精品| 一区二区av电影网| av网站免费在线观看视频| 亚洲成色77777| 性高湖久久久久久久久免费观看| 人人妻人人澡人人爽人人夜夜| 亚洲精品aⅴ在线观看| 国产 一区精品| 亚洲av福利一区| 久久精品久久精品一区二区三区| av不卡在线播放| 亚洲精品中文字幕在线视频| tube8黄色片| 欧美激情国产日韩精品一区| 成年美女黄网站色视频大全免费| 亚洲色图 男人天堂 中文字幕 | 亚洲色图综合在线观看| 老司机影院成人| 久久精品国产a三级三级三级| 久久婷婷青草| 国产黄频视频在线观看| 18禁在线无遮挡免费观看视频| 韩国精品一区二区三区 | 水蜜桃什么品种好| 99国产精品免费福利视频| 免费观看a级毛片全部| 久久精品久久精品一区二区三区| 日日啪夜夜爽| 人成视频在线观看免费观看| 国产综合精华液| 亚洲精品中文字幕在线视频| 免费av不卡在线播放| 桃花免费在线播放| 最近手机中文字幕大全| 热99久久久久精品小说推荐| 免费在线观看黄色视频的| 久久国产亚洲av麻豆专区| 高清毛片免费看| 九九在线视频观看精品| 久久ye,这里只有精品| 热re99久久精品国产66热6| 天天躁夜夜躁狠狠久久av| 菩萨蛮人人尽说江南好唐韦庄| 国产在视频线精品| 国产亚洲av片在线观看秒播厂| 成年美女黄网站色视频大全免费| 丰满迷人的少妇在线观看| 国产精品蜜桃在线观看| 欧美最新免费一区二区三区| 成人国语在线视频| 少妇熟女欧美另类| 久久久久精品人妻al黑| 99久久中文字幕三级久久日本| 美女内射精品一级片tv| 少妇 在线观看| 亚洲国产看品久久| 狠狠婷婷综合久久久久久88av| av片东京热男人的天堂| 免费观看无遮挡的男女| 午夜精品国产一区二区电影| 赤兔流量卡办理| 国产女主播在线喷水免费视频网站| 国产极品粉嫩免费观看在线| 久久午夜综合久久蜜桃| 18禁在线无遮挡免费观看视频| 在线观看www视频免费| 亚洲第一av免费看| 黄色一级大片看看| 日韩av不卡免费在线播放| 少妇的丰满在线观看| 91在线精品国自产拍蜜月| 9热在线视频观看99| tube8黄色片| 国产精品一区二区在线观看99| 国产熟女午夜一区二区三区| 久久午夜综合久久蜜桃| 亚洲国产精品成人久久小说| 精品亚洲成a人片在线观看| 亚洲国产精品999| 午夜日本视频在线| 久久精品国产自在天天线| 青春草亚洲视频在线观看| 久久99热6这里只有精品| 午夜福利视频在线观看免费| 亚洲精品乱码久久久久久按摩| 日韩成人av中文字幕在线观看| 高清毛片免费看| 满18在线观看网站| 成人国产av品久久久| 啦啦啦啦在线视频资源| 成人午夜精彩视频在线观看| 99国产综合亚洲精品| 美女国产视频在线观看| 最后的刺客免费高清国语| 女人被躁到高潮嗷嗷叫费观| 另类亚洲欧美激情| 免费高清在线观看日韩| av免费观看日本| 欧美精品av麻豆av| a级毛片黄视频| 国产日韩欧美在线精品| 黑丝袜美女国产一区| av免费观看日本| 久久综合国产亚洲精品| 亚洲欧美日韩另类电影网站| 在线观看免费视频网站a站| 制服丝袜香蕉在线| 极品人妻少妇av视频| 一区二区三区四区激情视频| 国产一级毛片在线| 亚洲久久久国产精品| 美女大奶头黄色视频| 制服人妻中文乱码| 51国产日韩欧美| 男人添女人高潮全过程视频| 边亲边吃奶的免费视频| 国产激情久久老熟女| 国产亚洲精品久久久com| 中文字幕制服av| 成年动漫av网址| 免费观看av网站的网址| 少妇人妻久久综合中文| 黄色毛片三级朝国网站| 青春草国产在线视频| av视频免费观看在线观看| 满18在线观看网站| 黑人高潮一二区| 18禁在线无遮挡免费观看视频| 亚洲国产成人一精品久久久| 亚洲,欧美,日韩| 国产极品天堂在线| 色哟哟·www| 久久久久网色| 亚洲欧美色中文字幕在线| 男女下面插进去视频免费观看 | 久热这里只有精品99| 91国产中文字幕| 最近中文字幕高清免费大全6| 亚洲五月色婷婷综合| 五月开心婷婷网| 日韩av免费高清视频| 飞空精品影院首页| 天天躁夜夜躁狠狠久久av| 一二三四中文在线观看免费高清| 宅男免费午夜| 国产国拍精品亚洲av在线观看| 国产成人欧美| 永久网站在线| 精品卡一卡二卡四卡免费| 日韩欧美一区视频在线观看| 菩萨蛮人人尽说江南好唐韦庄| 国产黄色视频一区二区在线观看| 欧美日韩综合久久久久久| 久久影院123| 国产亚洲av片在线观看秒播厂| 免费人妻精品一区二区三区视频| 春色校园在线视频观看| 国产一区二区激情短视频 | 97人妻天天添夜夜摸| 亚洲成av片中文字幕在线观看 | 丰满迷人的少妇在线观看| 91成人精品电影| 黄片无遮挡物在线观看| 欧美精品亚洲一区二区| 黑丝袜美女国产一区| 亚洲av电影在线观看一区二区三区| 精品亚洲成a人片在线观看| 亚洲精品,欧美精品| 欧美日韩av久久| 精品少妇黑人巨大在线播放| 丰满乱子伦码专区| 欧美97在线视频| 26uuu在线亚洲综合色| 国产成人aa在线观看| 久久久国产精品麻豆| 国产片内射在线| 精品亚洲成a人片在线观看| 亚洲精品,欧美精品| 99久久精品国产国产毛片| 高清视频免费观看一区二区| 欧美国产精品一级二级三级| 老熟女久久久| 久久精品国产自在天天线| 国产成人aa在线观看| 欧美日韩综合久久久久久| 青青草视频在线视频观看| 国产色婷婷99| 国产国语露脸激情在线看| 少妇 在线观看| 人体艺术视频欧美日本| 高清不卡的av网站| 97超碰精品成人国产| 一级毛片 在线播放| 超碰97精品在线观看| 国产免费福利视频在线观看| av卡一久久| 国产精品不卡视频一区二区| 午夜激情久久久久久久| 日本91视频免费播放| 日本色播在线视频| 欧美精品亚洲一区二区| 国产熟女欧美一区二区| videossex国产| 高清视频免费观看一区二区| 有码 亚洲区| 久久精品国产亚洲av天美| 免费人妻精品一区二区三区视频| 久久精品夜色国产| 亚洲精品aⅴ在线观看| 国产免费一区二区三区四区乱码| 91精品三级在线观看| 多毛熟女@视频| 久久久国产欧美日韩av| 26uuu在线亚洲综合色| 一边摸一边做爽爽视频免费| 精品人妻偷拍中文字幕| 久久久久网色| 亚洲国产精品一区二区三区在线| 欧美日韩国产mv在线观看视频| 99国产精品免费福利视频| 多毛熟女@视频| 国产一区有黄有色的免费视频| 免费播放大片免费观看视频在线观看| 香蕉精品网在线| 视频在线观看一区二区三区| 最近2019中文字幕mv第一页| 男的添女的下面高潮视频| 9191精品国产免费久久| 日本黄色日本黄色录像| 国产精品一区二区在线不卡| 少妇被粗大的猛进出69影院 | 在线亚洲精品国产二区图片欧美| 国产精品偷伦视频观看了| a 毛片基地| 在线天堂最新版资源| 亚洲精品av麻豆狂野| 国产精品不卡视频一区二区| 国产精品一区二区在线不卡| 国产精品人妻久久久久久| 啦啦啦在线观看免费高清www| 国产深夜福利视频在线观看| 亚洲欧美成人综合另类久久久| 青春草国产在线视频| 伊人亚洲综合成人网| 99九九在线精品视频| 黄色配什么色好看| 日本av手机在线免费观看| 久久这里有精品视频免费| 亚洲精品乱久久久久久| 国产色爽女视频免费观看| 国产极品粉嫩免费观看在线| 中文字幕免费在线视频6| 在线观看免费日韩欧美大片| 久久久久网色| 亚洲第一区二区三区不卡| 亚洲成人手机| 国产免费现黄频在线看| 制服人妻中文乱码| 久久精品国产亚洲av天美| 丁香六月天网| 国产精品蜜桃在线观看| 五月开心婷婷网| 91aial.com中文字幕在线观看| 大片电影免费在线观看免费| 国产精品嫩草影院av在线观看| 精品国产一区二区三区四区第35| 大香蕉久久成人网| 丰满乱子伦码专区| 欧美3d第一页| 国产精品久久久久久久电影| 午夜久久久在线观看| 成人二区视频| 九九在线视频观看精品| 亚洲国产精品一区三区| 精品亚洲乱码少妇综合久久| av网站免费在线观看视频| 免费观看无遮挡的男女| 午夜免费观看性视频| 亚洲欧美成人综合另类久久久| 久久久久精品性色| 黄色一级大片看看| av免费在线看不卡| 韩国av在线不卡| 五月开心婷婷网| 午夜久久久在线观看| 欧美国产精品va在线观看不卡| 热99久久久久精品小说推荐| 国产一区二区在线观看日韩| av不卡在线播放| 伦精品一区二区三区| 极品人妻少妇av视频| 亚洲精品第二区| videosex国产| 一级片'在线观看视频| 国产69精品久久久久777片| 熟女电影av网| 中国国产av一级| 精品亚洲成a人片在线观看| 丝袜人妻中文字幕| 国产免费一区二区三区四区乱码| 亚洲人成网站在线观看播放| 免费播放大片免费观看视频在线观看| av女优亚洲男人天堂| 不卡视频在线观看欧美| 日韩大片免费观看网站| 建设人人有责人人尽责人人享有的| 免费av不卡在线播放| 久久ye,这里只有精品| 国产乱人偷精品视频| 如日韩欧美国产精品一区二区三区| 久久久久网色| 欧美日本中文国产一区发布| 久久人人爽av亚洲精品天堂| 亚洲在久久综合| 色视频在线一区二区三区| 精品少妇黑人巨大在线播放| 日本av手机在线免费观看| 99国产精品免费福利视频| 天美传媒精品一区二区| 黄色一级大片看看| 王馨瑶露胸无遮挡在线观看| 久久国产精品大桥未久av| 9191精品国产免费久久| 午夜福利在线观看免费完整高清在| 老司机亚洲免费影院| 满18在线观看网站| 久久精品夜色国产| 只有这里有精品99| 晚上一个人看的免费电影| 26uuu在线亚洲综合色| 自线自在国产av| 人人妻人人澡人人看| 丝瓜视频免费看黄片| 菩萨蛮人人尽说江南好唐韦庄| 极品少妇高潮喷水抽搐| 成人国语在线视频| 午夜免费观看性视频| av一本久久久久| 欧美激情 高清一区二区三区| 国产精品久久久久久久久免| 这个男人来自地球电影免费观看 | 成人漫画全彩无遮挡| 亚洲,欧美,日韩| 久久青草综合色| 亚洲久久久国产精品| 青春草亚洲视频在线观看| 高清视频免费观看一区二区| 午夜福利视频精品| 在线天堂最新版资源| 99久国产av精品国产电影| 边亲边吃奶的免费视频| 人妻系列 视频| 国产 一区精品| 男人舔女人的私密视频| 成人国语在线视频| 久久这里有精品视频免费| 欧美精品av麻豆av| 成年人免费黄色播放视频| 毛片一级片免费看久久久久| 久久人妻熟女aⅴ| 国产精品一区www在线观看| 亚洲一区二区三区欧美精品| 18禁动态无遮挡网站| 成人国产麻豆网| 午夜福利视频在线观看免费| 亚洲欧洲日产国产| 久久久久久久久久成人| 国国产精品蜜臀av免费| 国产精品久久久av美女十八| 婷婷色综合大香蕉| 嫩草影院入口| av视频免费观看在线观看| 日韩中文字幕视频在线看片| 亚洲美女搞黄在线观看| 中文欧美无线码| 亚洲欧美一区二区三区国产| 永久网站在线| 一级毛片我不卡| 蜜桃在线观看..| 各种免费的搞黄视频| 热re99久久精品国产66热6| 波野结衣二区三区在线| 多毛熟女@视频| 免费黄频网站在线观看国产| av免费观看日本| 一级黄片播放器| 国产免费一级a男人的天堂| 美女内射精品一级片tv| 交换朋友夫妻互换小说| 桃花免费在线播放| 国产精品久久久久久精品古装| 精品福利永久在线观看| 中文字幕精品免费在线观看视频 | 久久国产亚洲av麻豆专区| 免费在线观看完整版高清| 国产片内射在线| 国产av一区二区精品久久| 一边亲一边摸免费视频| 天天影视国产精品| 国产欧美日韩综合在线一区二区| 曰老女人黄片| 边亲边吃奶的免费视频| 久久久久网色| 精品熟女少妇av免费看| 夜夜骑夜夜射夜夜干| 免费av不卡在线播放| 91精品国产国语对白视频| 欧美日本中文国产一区发布| 久久青草综合色| 日本黄色日本黄色录像| 人人澡人人妻人| 国产精品人妻久久久久久| 夫妻午夜视频| 久久精品aⅴ一区二区三区四区 | 亚洲精品自拍成人| 久久国内精品自在自线图片| 波野结衣二区三区在线| 久久韩国三级中文字幕| 女性生殖器流出的白浆| 中国国产av一级| 男人添女人高潮全过程视频| 免费播放大片免费观看视频在线观看| 狂野欧美激情性bbbbbb| 国产白丝娇喘喷水9色精品| 精品一区二区免费观看| 日韩免费高清中文字幕av| a级片在线免费高清观看视频| 日韩大片免费观看网站| 999精品在线视频| 亚洲激情五月婷婷啪啪| 色婷婷久久久亚洲欧美| 五月开心婷婷网| 国产精品久久久久久av不卡| 久久精品国产亚洲av涩爱| 自拍欧美九色日韩亚洲蝌蚪91| 三上悠亚av全集在线观看| 永久免费av网站大全| 中文乱码字字幕精品一区二区三区| 永久网站在线| 国产亚洲精品久久久com| 国产av精品麻豆| 制服人妻中文乱码| 秋霞伦理黄片| 一边亲一边摸免费视频| 久久婷婷青草| 2018国产大陆天天弄谢| 巨乳人妻的诱惑在线观看| 亚洲成国产人片在线观看| 国产成人精品一,二区| 一级毛片 在线播放| 国产高清三级在线| 在线观看三级黄色| 国产又爽黄色视频| 18禁裸乳无遮挡动漫免费视频| 国产欧美日韩一区二区三区在线| 久久99蜜桃精品久久| 99热6这里只有精品| 国产精品 国内视频| 秋霞在线观看毛片| 视频在线观看一区二区三区| 女性被躁到高潮视频| 亚洲av欧美aⅴ国产| 欧美精品亚洲一区二区| 亚洲精品aⅴ在线观看| 男人爽女人下面视频在线观看| 人人妻人人澡人人爽人人夜夜| 啦啦啦中文免费视频观看日本| 免费在线观看完整版高清| av一本久久久久| h视频一区二区三区| 国产片内射在线| 国产又色又爽无遮挡免| 成人影院久久| 国产成人一区二区在线| 国国产精品蜜臀av免费| 在线观看免费视频网站a站| xxx大片免费视频| 十八禁高潮呻吟视频| 久久精品久久久久久噜噜老黄| 蜜桃国产av成人99| 国产一区二区三区av在线| 97人妻天天添夜夜摸| 美女大奶头黄色视频| 一边摸一边做爽爽视频免费| 如日韩欧美国产精品一区二区三区| 国产欧美日韩一区二区三区在线| 五月伊人婷婷丁香| 亚洲精品456在线播放app| 国产熟女午夜一区二区三区| 婷婷成人精品国产| 国产熟女午夜一区二区三区| 久久精品国产亚洲av天美| 女人精品久久久久毛片| 乱码一卡2卡4卡精品| 桃花免费在线播放| 久久久a久久爽久久v久久| 又粗又硬又长又爽又黄的视频| 飞空精品影院首页| 哪个播放器可以免费观看大片| 美女视频免费永久观看网站| 中文欧美无线码| 999精品在线视频| 日韩中字成人| 热re99久久精品国产66热6| av播播在线观看一区| 国产精品久久久久久精品古装| 亚洲激情五月婷婷啪啪| 久久久久精品性色| 亚洲精品日韩在线中文字幕| 日韩 亚洲 欧美在线| 一级,二级,三级黄色视频| 99热6这里只有精品| 亚洲精品,欧美精品| 建设人人有责人人尽责人人享有的| 国产精品国产三级专区第一集| 激情视频va一区二区三区| 制服诱惑二区| 18禁观看日本| 极品人妻少妇av视频| 国产精品麻豆人妻色哟哟久久| 国产精品免费大片| 亚洲精品一二三| 欧美另类一区| 宅男免费午夜| 成人18禁高潮啪啪吃奶动态图| a级片在线免费高清观看视频| 看免费av毛片| 人体艺术视频欧美日本| 亚洲国产色片| 国产精品 国内视频| 人妻系列 视频| 国产一区二区在线观看日韩| 亚洲av在线观看美女高潮| 亚洲av男天堂| 国产淫语在线视频| 亚洲精品国产色婷婷电影| 久久久久国产网址| 我要看黄色一级片免费的| 男女下面插进去视频免费观看 | 精品熟女少妇av免费看| 丝袜脚勾引网站| 亚洲一区二区三区欧美精品| 午夜久久久在线观看| 免费人成在线观看视频色| 欧美精品av麻豆av|