• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Rotating machinery fault detection and diagnosis based on deep domain adaptation: A survey

    2023-02-09 08:58:44SiyuZHANGLeiSUJiefeiGUKeLILngZHOUMihelPECHT
    CHINESE JOURNAL OF AERONAUTICS 2023年1期

    Siyu ZHANG, Lei SU,*, Jiefei GU, Ke LI, Lng ZHOU, Mihel PECHT

    a Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, School of Mechanical Engineering,Jiangnan University, Wuxi 214122, China

    b HUST-Wuxi Research Institute, Wuxi 214071, China

    c Center for Advanced Life Cycle Engineering, University of Maryland, College Park, MD 20742, USA

    KEYWORDS Deep learning;Domain adaptation;Fault detection and diagnosis;Transfer learning

    Abstract In practical mechanical fault detection and diagnosis,it is difficult and expensive to collect enough large-scale supervised data to train deep networks. Transfer learning can reuse the knowledge obtained from the source task to improve the performance of the target task,which performs well on small data and reduces the demand for high computation power.However,the detection performance is significantly reduced by the direct transfer due to the domain difference.Domain adaptation (DA) can transfer the distribution information from the source domain to the target domain and solve a series of problems caused by the distribution difference of data. In this survey,we review various current DA strategies combined with deep learning(DL)and analyze the principles, advantages, and disadvantages of each method. We also summarize the application of DA combined with DL in the field of fault diagnosis. This paper provides a summary of the research results and proposes future work based on analysis of the key technologies.

    1. Introduction

    With the rapid development of modern industry, industrial system integration and information technology have also been growing,highly reliable components and systems are the guarantee for the safe operation of aerospace.Undetected aviation faults may lead to catastrophic events of aviation machinery.As the core component of air-craft engine, space shuttle and other rotating machinery, mechanical transmission system is prone to various faults under high speed,heavy load and harsh operating conditions for a long time,which directly affects the safe operation of mechanical system. The rotating machinery represented by aviation equipment has attracted the attention of researchers. Early fault detection and diagnosis technology can predict the fault development trend,which plays a key role in the prevention of mechanical engineering transmission system faults.Therefore,in order to avoid the occurrence of subsequent major accidents caused by minor faults, all industries attach great importance to the intelligent fault diagnosis system on rotating machinery.1

    DL used in intelligent fault diagnosis of rotating machinery has achieved great success and has benefited industrial applications.Compared with shallow machine learning,DL can automatically extract features at a high level and fuse feature extraction and classification into one structure to avoid a large number of trials and errors,which has achieved reliable performance.2,3However, DL demands long training time and large supervised samples. In many practical applications, it is often difficult and expensive to obtain enough large-scale labeled data to train the deep network to achieve adequate performance. Further, most of the current machine learning algorithms rely on the basic assumption that the training and testing data are independent and identically distributed. This assumption is rarely true in actual industry, because the data will change with time and space. It is difficult for the generalization error of the diagnosis model to meet the fault diagnosis requirements of the actual production.4

    Transfer learning can transfer a model trained on a separate and labeled source domain to the desired unlabeled or sparsely labeled target domain, which works well on small data and reduces the demand for high computational power. However,the direct cross-domain transfer will result in poor performance stemming from a phenomenon known as domain shift,in which the probability distributions of data and labels are different between domains. Domain shift appears in many situations,such as from data to data,from RGB images to depth images, and from simulation to reality.5

    DA relaxes the constraint that the training and testing data must be independent and identically distributed in traditional machine learning.6It can mine invariant features and essential structures between the different but interrelated fields, which effectively solves the problems resulting from domain shift,small target size, and unbalanced data.7In recent years, DA approach has become the focus in transfer learning. In the DL era, three current DA methods are usually employed to achieve the diagnosis model for the source and target domains,including the discrepancy-based method, adversarial-based method, and reconstruction-based method. The discrepancybased method reduces the discrepancy between the source and the target domains on the corresponding feature layer in the network. The adversarial-based method introduces a domain discriminator to encourage domain confusion to learn invariant features between domains. The reconstruction-based method reconstructs the samples of the source and target domains to ensure intra-domain differences and inter-domain commonality. In this paper, these deep DA approaches are introduced according to categories, in which the discrepancybased method can be divided into the statistic transformation method, structure optimization method, and geometric transformation method, the adversarial-based method is analyzed based on two common algorithms, and the reconstructionbased method based on the coding decoding structure and its variants is introduced. On the basis of the above methods,multi-source do-main adaptation (MDA) method is further analyzed for supervised data from multi-source domains.Then,the in-novation of deep DA method and the actual testing results for typical rotating machinery fault parts are discussed. Finally, based on these current various deep DA strategies, this paper proposes future work for deep DA in fault diagnosis.

    2. Deep DA and its application in rotating machinery

    DA can reduce the distribution differences between domains, so that the model trained on the source data performs well on the target data. The transfer process of deep DA reduces the differences between domains via continually adjusting the network parameters, the goal of DA is to use the labeled source data XSto learn a classifier to predict the label YTof the target data XT, as shown in Fig. 2. When the difference is small enough,the model parameters and detection results on the target domain will be both optimal. Once the environment changes, the collected data in the new environment can be seen as the target data, which can achieve the excellent accuracy by transfer, and there is no need to retrain a new model from random initialization parameters. Therefore,there are several major advantages of deep DA,as shown in Fig.3.Because the distribution difference can be reduced via DA,using the idea of DA,labeled data close to the target data can be used to build the model and increase the annotation of the target data.During the training process,the model parameters trained on big data by big companies can be transferred,fine-tuned,and updated adaptively for the target task,so as to achieve better results.In practical application,in order to solve the challenge of personalized needs, the idea of DA is used to carry out adaptive learning.Considering the similarity and difference between different users, the pervasive model is flexibly adjusted to complete the target task.

    However, there is a negative phenomenon in DA, that is,negative transfer, as shown in Fig. 3. The core problem of DA is to find the similarity between the two domains. Therefore,if there is no similarity between the two domains,or they are basically not similar, then the effect of DA will be greatly damaged. The main reasons for negative transfer are: (1) data problems: the source domain and the target domain are not similar at all;(2)method problems:the source domain and the target domain are similar, but the DA method is not good enough to find the components that can be transferred. Negative transfer has a negative impact on the research and application of DA. In practical application, finding reasonable similarity, and choosing or developing reasonable transfer learning methods can avoid negative transfer phenomenon.

    Fig. 1 Marginal distribution of domains.

    Fig. 2 Process of DA.

    In recent years, there have been three major areas of research on DA. The first group is the discrepancy-based method which measures the distance between the source domain and the target domain on the feature layers of the model and utilize statistical approaches to reduce domain difference.Several main aspects,such as statistic transformation,structure optimization, and geometric transformation are the main targets of research. The second group is the adversarial-based method which contains two competing structures. This group could be categorized into two subgroups: generative adversarial DA and non-generative adversarial DA. In this case, a domain discriminator is used to encourage domain confusion via an adversarial function to minimize the distance between the two do-mains, which is one of the new research topics in DL approaches. The third group is the reconstruction-based meth-od in which the domain difference could be reduced via mapping the source data and the target data, or both do-main data into a shared domain. Encoder-decoder models and generative adversarial nets (GAN) are typical examples. Fig. 4 shows the three research areas of DA.

    Fig. 4 Three research groups of DA.

    Fig. 3 Major advantages and disadvantage of DA.

    Table 1 Comparison of four main detection methods for bearing.

    Deep DA can diagnose many components of rotating machinery, and has achieved satisfactory results. The most important type is complex mechanical transmission parts,for example bearing, gear, and so forth. These parts work in high-speed, variable working conditions, the operating environment is often very complex, it is difficult to obtain diagnostic samples. However, the requirements for detection accuracy and efficiency are often high. Therefore, deep DA is an effective method to diagnose these components. For example, bearing is one of the most important components in rotating machinery. There are several diagnosis methods to detect bearing, such as signal processing and analysis,shallow machine learning, deep learning, and deep DA. Signal processing and analysis identifies the bearing fault according to the energy change of each frequency band of vibration signal when the bearing fault occurs. Although the existing feature extraction methods have achieved good application results, its process is complex and some links rely on expert experience. Shallow machine learning has the advantages of fast learning speed and less training samples, which can realize fast fault diagnosis of bearing, but it also needs to rely on experience to select strong correlation fault features. Deep learning learns the original fault data between layers, establishes the mapping relationship between fault samples and fault categories, does not need feature extraction, but it is difficult to effectively apply in the scene with less fault sample data. Because of the advantages of deep DA, it is the preferred method to diagnosis rotating machinery under the small sample in complex environment.Table 1 shows the comparison of four main detection methods for bearing. With the development of rotating machinery, in which the quality requirements of transmission parts are higher and higher, deep DA will have increasing potential as an effective method for the detection of mechanical transmission components. A specific overview of the research advances of deep DA is given in this survey,followed by a description of applications for fault detection of rotating machinery components.

    3. Research advances of deep DA

    The research advances of deep DA are given in this section,which provides the basis for analyzing the application of DA in fault detection of rotating machinery components. Several main research aspects of deep DA, as shown in Fig. 4, including the discrepancy-based method, adversarial-based method,and reconstruction-based method, are studied.

    3.1. Discrepancy-based method

    The basic idea of the discrepancy-based method is that the feature distributions of different domains are aligned by minimizing the distribution difference. According to different measurement and transformation methods, the discrepancybased method can be further divided into statistic transformation, structure optimization, and geometric transformation sub-methods,as shown in Table 2.8-19In the following,the difference between subgroups are elaborated.Statistic transformation starts from network parameters and minimizes the domain differences by adjusting network parameters, the most commonly used methods for comparing and reducing distribution shift are maximum mean discrepancy(MMD),coral measure,wasserstein distance,and among others.MMD compares the mean value difference between the source domain and the target domain,8which can be explained as

    Table 2 Summary for discrepancy-based method.

    where φ(·)is the mapping.On this basis,MMD was combined with neural network for the first time(DaNN)in 2014,20which reduces the distribution difference in the latent space. However,the representation ability is limited due to its shallow feature extraction layer, so the DA problem cannot be solved effectively.By combining MMD and convolutional neural network (CNN), deep domain confusion (DDC) was proposed,which extends deep CNN to DA.21The corresponding loss function is consistent with that of DaNN.However,DDC only adapts to one layer of network, which leads to low portability of multi-layer. Based on DDC, deep adaptation networks(DAN) were proposed and has made two improvements. The first point is to apply MMD to multiple layers.The later layers were selected to transfer, which can extract more private features. The second point is to change MMD into multiple kernel MMD (MK-MMD), which is a convex combination of multiple kernel functions.9The corresponding multilayer loss can be calculated as

    where A is the second order feature transformation, and T is the corresponding transpose operation,CSis the source covariance matrix,and CTis the target covariance matrix.Coral used in deep DA is simple and efficient for solution, and is suitable for the mismatch in low-level image statistics such as textures,edge contrast, color, etc. Therefore, coral is usually applied to the shallow layer of DL, as shown in Fig. 6. Sometimes it is necessary to transform one distribution into another continuously and ensure the geometric characteristics of the distribution itself to be used as a reliable feature difference measure between task-specific classifiers. For this purpose, a sliced Wasserstein discrepancy method was proposed via defining a sliced barycenter of discrete measures.12However,it is difficult to calculate Wasserstein distance,which has become a difficult problem to limit its application. All the above methods perform well in their respective domain scenarios, but it is not clear which layer or layers of deep network need to implement DA.

    The above studies assume that the conditional distribution of the source domain and the target domain is the same, and the marginal distribution is different,but sometimes it is better to measure the difference of the joint distribution (the conditional and marginal distribution), the source classifier cannot be used in the target data directly. To make it more generalized, a joint adaptation network combined with MMD(JMMD)13was proposed in 2016, which minimized the joint distribution difference via simultaneously learning from class and feature invariants between domains.Fig.7 shows the main research methods of joint distribution adaptation, different data distribution differences will lead to different combination methods. In the above methods, the contribution of marginal distribution and conditional distribution to the network is the same.However,marginal distribution and conditional distribution are not equally important for some tasks.Therefore,dynamic distribution adaptation (DDA) was further introduced to dynamically adjust the importance of each distribution, which achieved a better distribution adaptation effect.14DDA is the first accurate quantitative estimation method that utilizes the global and local domain properties to estimate the value of balance factor μ, it can be explained as

    Fig. 6 Application of coral.

    Fig. 7 Research of joint distribution adaptation.

    Fig. 5 Research of MMD.

    Structure optimization starts from the architecture of the network and to minimize the domain discrepancy, which can be achieved in most deep DA networks. The related domain knowledge, which can be represented by statistical data of the batch normalization (BN)layer, is stored in the weight matrix of each layer. The statistics in the BN layer can be adjusted to achieve well-trained cross-domain models, which has no additional parameters to be adjusted. Therefore, the statistics of the source data can be replaced in each BN layer with those in target data to align the distribution,as shown in Fig.8,BN normalizes the mean and standard deviation of each personal feature channel,so that each layer gets features with similar distribution,no matter it comes from the target or the source.15Fig.9 shows the research advances for BN,it is not only easy to expand to most deep DA networks,but also easy to expand to to multiple source domains. In contrary to statistic transformation method,such as MMD and coral measure,to update the weights for deep DA,this method only adjusts the statistics in BN layer.Each layer gets features with similar distribution via BN normalizes, no matter it comes from the target or the source.15However, sometimes neurons are not effective for all domains due to the domain biases. It is useless for neurons to capture other features and clutter.In view of this situation,weight coefficient optimization method has been widely studied because it pays more attention to shared features and does not completely discard feature information.The weighting scheme used in deep DA can quantitate transferable source examples and control the importance of the source data to learn the target task.16Weight coefficient optimization method aim at adjusting the architectures of DL to improving the ability of learning more share features,which solves the ambiguity between shared features and private features,but does not completely eliminate the influence of the private part.

    Geometric transformation starts from the geometrical properties of the source and target domains to reduce the domain shift. The Grassmann manifold and geodesic in geometric transformation method are widely used in DA,inspired by incremental learning. The basic idea is to take the data of the source domain and the target domain as two points on the Grassmann manifold,and then take the points on the geodesic to form a path in which the source domain can be transformed to the target domain via the path. The geodesic flow kernel method is the most typical, and it introduces kernel learning (KL) to determine the number of intermediate points by integrating infinite points on the path, as shown in Fig. 10.17Inspired by the intermediate representations, Luo et al. introduced an end-to-end progressive graph learning framework whereby they adopted a graph neural network with episodic training and adversarial learning to solve the domain shift problem in both the sample and manifold level.18Kundu et al. aligned the high source density regions in the potential space via learning a potential space of the target,which accommodated new target classes in potential space while preserving semantic information.19These approaches take into account the specific geometric structure of the source domain and the target domain to describe the nature of data,which effectively reduce the domain differences,though it faces a big bottleneck about the high computational complexity.

    Fig. 8 Batch normalization method.15

    Fig. 9 Research of BN.

    3.2. A-based method

    Adversarial training is inspired by the idea of a two-player game in self game theory. The generator and discriminator play each other to complete the adversarial training. The corresponding training process for generative adversarial nets(GAN) can be expressed as:

    where Ex-Pm(x)is the distribution for true data, Ez-PZ(z)is the distribution for generated data,D(x)is mapping for true data,and G(D(z))is mapping for generated data.In this survey,the adversarial-based method based on GAN is further divided into two sub-methods, including generative adversarial DA(GADA) method with an additional generator and nongenerative adversarial DA (non-GADA) method without the additional generator.

    GADA can learn a transformation with an unsupervised way based on GAN,more research focuses on generating data that are similar to the target data while keeping annotations.Coupled generative adversarial net was developed to adjust the joint distribution of the source domain and the target domain, which was composed of two GANs. By imposing the constraint for sharing parameters on two GANs, the abilities of two GANs were limited to generate data with the same distribution from noise.22Bousmalis et al.transformed the source data into the target data via a generator and used a discriminator to judge the true and false. When the data could not be judged by a discriminator, the generated data produced the same distribution as the target data.23Volpi et al. proposed a new DA method with feature augmentation and domain invariance.First,a feature extractor(ES)with softmax layer(C)was trained on the source data and a generator was trained via adversarial training to conduct the data augmentation in the source feature space. A domain-invariant feature extractor(E1) combined with the generated features was trained via adversarial training.Finally,the proposed module can be combined with the softmax layer previously trained to perform testing on the source and target data, as shown in Fig. 11.24Johnson et al.presented a novel theory for the GAN approach,which did not depend on the traditional minimax formulation.A strong discriminator and a good generator were designed via composite functional gradient learning so that the distance measures between domains are improved after each functional gradient step converging to zero.25Generated target data with ground-truth annotations are one of the effective means to solve the problem of the lack of training data.

    Fig. 10 Geodesic flow kernel method.17

    The aim of deep DA is to learn domain-invariant feature representations from the source data and the target data.Therefore, it is crucial to confuse these representations.Inspired by GAN, the confusion loss of domain produced via the discriminator is developed without generator to improve deep DA, which is called non-GADA method. In non-GADA,the function of the generator is changed such that it no longer generates new samples but rather plays the feature extraction function.In this way,the generator is replaced by a feature extractor in many adversarial-based deep DA methods.In 2017, adversarial discriminative domain adaptation(ADDA)was proposed,which is seen as a general framework,and many existing methods can be regarded as special cases of ADDA. In this method, a classifier was pre-trained on the source data.Then,the discriminator structure was used to project the target data to the source domain. Finally, the source classifier was used to classify the target domain.26Shen et al.estimated the Wasserstein distance between the source and the target domains and optimized the feature parameters via adversarial training to minimize the Wasserstein distance.27Long et al. introduced a discriminator for features and a discriminator for class information to align the joint distribution between domains.28Cao et al. designed an adversarial mechanism, the weight was used in each discriminator, and the corresponding weight on the network was affected by the predicted labels of the target data; that is, the discriminator was weighted at the category level to select the source data suitable for the target data.29Pinheiro et al. combined the similarity-based classifier and adversarial training to jointly learn the domain-invariant features and the categorical prototype representations.30Luo et al. added a discriminator after the classifier to adaptively weight the adversarial loss of different features, which emphasized the importance of class-level feature alignment to reduce the domain shift.31Zhang et al.proposed domain-symmetric networks, which designed an additional classifier based on shared neurons to complete the learning of domain discrimination and confusion.32Lee et al.used adversarial dropout to learn strong discriminative features and designed the loss function to realize the robust DA.33Shu et al. proposed a novel DA method based on curriculum learning and adversarial training to select noiseless and transferable source data, which enhanced positive transfer.34Yu et al. further extended the concept of DDA to GAN and proposed a dynamic adaptive network to solve the problem of dynamic JDA via two discriminators.35Xue et al. proposed a novel deep adversarial mutual learning method, which designed two groups of domain discriminators to learn from each other to obtain domain-invariant properties.36Shin et al.presented a new two-phase pseudo label densification method based on a combination of intrinsic spatial correlations in the first phase and confidence-based easy-hard classification in the second phase. On this basis, the feature alignment was achieved via adversarial learning.37Hu et al.proposed a discriminative partial domain adversarial framework (DPDAN), which used hard binary weight to maximize the distribution divergence between the source-negative data and all the other data and minimized domain shift between the target data and the source-positive data,38as shown in Fig. 12. Wu et al. proposed a dual mixup regularized learning network,including category mixup regularization and domain mixup regularization to enforce prediction consistency and explore more intrinsic features for better DA.39Bao et al.presented a two-level approach, in which the first level utilizes MMD to reduce the distribution discrepancy of deep features between domains and the second level makes the deep features closer to their category centers via the domain adversarial network.40

    Compared with other generative models, GADA uses a random distribution to directly generate samples,so as to truly approach the target data in theory,which is the biggest advantage of GAN. However, collapse problems may occur in the learning process of GAN, which always generates the same samples and cannot continue learning. When the generator is corrupted,the discriminator will also point to the similar samples in the same direction, and the training cannot continue.Non-GADA does not rely on the generator to generate identically distributed samples, but realizes domain alignment through adversarial training in discriminator,which can maintain high sample diversity.However,adversarial training needs to obtain Nash equilibrium which cannot always be realized,so the training is unstable compared with the traditional training method.

    3.3. Reconstruction-based method

    In much of the deep DA researches,a data reconstruction task based on the self-encoder structure and GAN has been proposed to ensure the invariant features and keep the individual features of each domain during the transfer. In the basic framework of the self-encoder,the encoding and decoding process is designed, which first encodes the data to feature representations and then decodes these representations back to the reconstructed version. The DA method based on reconstruction usually shares the coder to learn domain-invariant representations and keeps the domain-specific representations through minimizing reconstruction loss for the source and the target domain.

    Fig. 11 Detailed framework of deep DA method combined GAN.24

    Fig. 12 Detailed framework of DPDAN.38

    Reconstruction technology-based self-encoder aims at reconstructing the input information, which can extract good feature description and has strong feature learning ability.The self-encoder was introduced to combine with KLdivergence to reconstruct data, which can achieve better domain alignment.41On this basis, Bousmalis et al. proposed domain separation networks(DSN).The source and the target domain were divided into two parts (including the public part and the private part) via feature reconstruction and the domain distance measure method. The public part learned the domain-invariant features, the private part was used to maintain the independent features of each field, as shown in Fig.13.42Ghifary et al.designed two networks:one is the classification model for the source data,and the other is the reconstruction network for the target data. The two networks were trained at the same time, and they can encode the data on the shared network and learn the common features.43By separating the space in such a way, the shared features will not be influenced by individual features of each domain,so that a better DA ability can be achieved. In order to realize the knowledge transfer between two domains with large distribution differences, a selective learning algorithm was built, which defined an intermediate domain to transfer the information flow from the source domain to the target domain through reconstruction. And a regular term was added to control the data selection for transfer.44Furthermore,a stacked local constraint auto-encoder method was used to learn domaininvariant features via the back propagation and lowdimensional manifold. To measure the importance of each neuron in the process of aligning the domain differences, the proposed method calculated the weighted sum of neighbor data on the defined manifold.45To further guarantee the separation effect and promote the completeness and uniqueness of learned features, a hybrid generative network, including encoder, decoder, classifier, and separation modules, has been widely concerned, in which the separation module can study the irrelevant units of classification.46In summary, reconfiguration technology used in DA method can be seen as an auxiliary task, which focuses on separating shared and private characteristics between domains to transfer the shared characteristic to avoid negative transfer caused by private characteristic. However, the decoder is trained by minimizing the reconstruction error rather than cheating the discriminator like GAN,it may be more difficult to generate a sharp new image.Combining the advantages of other models (GAN, and dictionary model,et al)with automatic encoder can achieve better performance.

    Fig. 13 Detailed framework of DSN.42

    Due to the unique advantages of reconfiguration technology and GAN,deep DA method combined with the two methods has attracted the attention of researchers, in which duallearning mechanisms—one for primal tasks and the other for dual tasks—can teach each other via reinforcement learning and the reconstruction error.47Inspired by the mechanism of dual-learning,an adversarial loss was learned to make the distribution between domains approximate, while an inverse mapping was defined to reconstruct the domain, which mapped the generated images back to the source domain by the cyclic consistency loss to learn the mapping relationship between the input and the output in the aligned dataset, as shown in Fig. 14.48Dual-learning is further improved to minimize the loss of the two GANs and the reconstruction errors to complete the transformation between domains, which can solve the problem that neither source domain nor target domain has labels.49In order to ensure the one-to-one mapping relationship, a combination of standard GAN and a GAN with reconstruction loss was developed to learn the relation between domains.50Although reconstruction technologybased GAN method combines the advantages of the two algorithms, it also has some problems. The reconstruction error solves the problem of the authenticity of the generated samples, but also greatly reduces the diversity of the generated samples, which is a contradiction. In addition, the parameter updating problem caused by complex structure is a problem.

    4. Research and application of deep DA in fault diagnosis

    Thus far, previous studies can be divided into two stages according to the diagnosis framework. In the first stage, data collection, feature extraction, and fault diagnosis are mainly included in the diagnosis framework (Fig. 15(a)). Under this framework, a lot of efforts in the extraction and selection of manual features have been made, which finally benefits from a wide range of expertise.51,52In addition,the designed characteristics are always aimed at special tasks, so that the adaptability is limited in new diagnostic problems or the different physical characteristics of the original system. Furthermore,the final diagnosis performance is usually sensitive to the model parameters, which indicates that additional parameter optimization process needs to be performed.53,54In order to solve these problems,a diagnosis framework of automatic feature learning based on DL is developed in the second stage,which provides an end-to-end learning process from input data to output diagnosis tag, as shown in Fig. 15(b).55DL shows strong feature learning and fitting ability, which has significantly promoted the application of artificial intelligence in fault diagnosis. However, domain differences are ignored in most of the developed methods.For actual industrial diagnosis task, the fault data is limited, and the training data is usually from the experimental environment or the historical data of related equipment. Due to the complex application scenarios of equipment,real-time data may get different feature distributions. Therefore, the research on cross-domain fault diagnosis has important practical significance.

    In recent years, deep DA has a wide application prospect.We have conducted a comprehensive survey to review its current development in Section 3. Some related algorithms, such as discrepancy-based method, adversarial-based method, and reconstruction-based method have been gradually designed for image detection tasks. However, in the field of intelligent fault diagnosis,the application of deep DA is rarely considered to enhance the applicability and flexibility of the diagnosis framework for tasks in different fields,56-58as shown in Fig. 15(c). In this survey, a specific description of the research advances of deep DA for fault detection of rotating machinery components is given in the following, which is further divided into three categories.

    Fig. 14 Detailed framework of reconfiguration technology-based GAN.48

    Fig. 15 Intelligent fault diagnosis framework. (a) Stage I,51,52 (b) Stage II55 and (c) New one.56

    4.1. Discrepancy-based method for fault diagnosis

    In the actual working environment of rotating machinery, it is difficult to obtain the available diagnosis information,resulting in a bottleneck in the diagnosis. Therefore, many researchers have developed exploratory studies in deep fault diagnosis of rotating machinery based on deep DA method.Deep DA method used in DL measures the difference between the source domain and the target domain on corresponding feature layers of two networks. There are two goals of measurement: one is to measure the similarity of the two domains, not only to tell us whether they are similar qualitatively, but also to give the degree of similarity quantitatively.The second is to increase the similarity between the two domains based on measurement via the machine learning methods that we need, so as to complete transfer learning.Therefore, finding similarity of the two domains is the core.With this similarity, the next step is how to measure and use the similarity.

    Because a kernel function mapping method is introduced into MMD to improve its efficiency of computation and memory-efficient, MMD has become one of the most commonly used transformation methods to achieve cross-domain fault diagnosis. Single-layer MMD in deep fault diagnosis has achieved a certain degree of cross-domain feature extraction,59-65as shown in Fig. 16, MMD was introduced into the adaptation layer after feature extraction layers to measure the feature distribution difference between the source and target domains. Then the difference was added to the network loss for training, which can be explained as

    Fig. 17 shows the diagnosis accuracy of DL and the proposed deep DA trained by different data samples. All trails of deep DA can achieve an accuracy greater than 95%, and the average accuracy is 97.96%, which are higher than traditional DL.62Further, some variant methods based on MMD have been developed and applied to fault diagnosis. Li et al.utilized CMD to build a DA CNN to test fault.66Zhang et al. employed MMD to the sparse filtering via mapping the source and target data into a kernel space to obtain more transferable features. To reduce the domain discrepancy, L1-norm and L2-norm were used to MMD to get the sparse distribution of domain.67Zhang et al. combined the maximum variance discrepancy with the MMD for the feature matching.68Wang et al. evaluated the distribution distance of the corresponding domain via a high-dimensional kernel space.69Deng et al. developed an order spectrum transfer algorithm to transform the target data to the source domain according to the fault characteristic orders.70Coral used in deep neural network is similar to MMD, in which the MMD layer is replaced by coral. It is more powerful than DDC (only aligning sample means), and much simpler to optimize than DAN.11Si et al. focused on matching the second moments and the high order moments at the same time to achieve a new intelligent fault diagnosis approach for bearings, which combined MMD and coral.71Xu et al. modified a second order statistics fusion network based on coral to learn the depth nonlinear domain invariant features.72On this basis,the high-order moments of the domain-specific distributions73were proposed after fully connected layers to achieve bearing fault diagnosis. However, adjusting only one layer of network will lead to low portability of multi-layer. By extending single layer DA to multi-layer DA, An et al.,74Zhu et al.75and Che et al.76proposed a model based on multilayer MK-MMD for defect diagnosis, which realizes domain alignment in the last two feature extraction layers of network. Zhang et al.77and Li et al.78developed an intelligent fault diagnosis approach for bearing to reduce the distribution discrepancy of the learned transferable features based on multilayer MMD, in which Zhang et al. designed a two-stage method to obtain strong fault-discriminative and domain-invariant performance, including pre-training the feature extractor on the labeled source data and fine-tuning the feature extractor on the target data. Li et al. developed different kernel MMDs to construct multiple deep transfer networks.79Xiong et al.adopted CMD to minimize the domain discrepancy at each dense block inside the network.80Most existing DA methods based on MMD are applicable to vectors only. For adapting the source and target tensor representations directly, a range of alignment matrices without vectorization is presented to align the tensor representations of two domains into the invariant tensor subspace.81All the above methods perform well in their respective domain scenarios, but it is not clear which layer or layers of deep network need to implement DA. Hoffman et al.observed that the first several layers are independent of classification,and the last layers are independent of domain,so the first several layers were used to extract the source and the target features.Then these features were fused in latter layers for classification.82All the above studies assume that the conditional distribution of the source domain and the target domain is the same,and the marginal distribution is different. That is, only distance of the hidden layer between the source domain and the target domain is considered,but sometimes it is better to measure the difference of the joint distribution,the source classifier cannot be used in the target data directly. To make it more generalized, a joint adaptation network combined with MMD was proposed,13which minimized the joint distribution difference via simultaneously learning from class and feature invariants between domains. The corresponding loss function is carried out as

    Fig. 16 Detailed framework of DL with MMD.62

    Fig. 17 Comparison results of DL with MMD.62

    where CXS,1:|L|(RS)is the joint distribution of the source domain computed via the tensor product of feature spaces, and CXT,1:|L|(RT) is the joint distribution of the target domain.Pseudo label learning is one of the most commonly used methods to adjust conditional distribution, which is usually combined with statistical transformation to adjust joint distribution between domains. For example, MMD and the pseudo label were used to adjust marginal and conditional distributions to achieve multi-representation adaptation,83,84a pseudo label learning combined a similarity guide constraint method can reduce the distance between similar images and increase dissimilar images,85and a pseudo label learning combined domain discriminative loss based on coral can align the joint distribution in a typical two-stream network framework for DA (J-DCDA), as shown in Fig. 18.86On this basis,Han et al. developed a deep transfer network based on a joint distribution adaptation (JDA) approach, which presented smooth convergence and avoided negative adaptation in comparison with marginal distribution adaptation.Compared with different detection tasks and methods about rotating machinery, deep JDA achieves the highest accuracy, As shown in Fig. 19.56Tong et al. realized JDA via MMD and pseudo test labels obtained from the nearest-neighbor classifier in the feature space, which extracted transferable features for testing under different environments.87,88Yang et al. proposed the regularization terms of multi-layer DA and pseudo label learning on the parameter set of the domain-shared CNN, which can learn transferable and discriminated features.89Qian et al. developed an improved JDA, which conducted dimension reduction on high-dimension inputs and used softmax regression to generate more accurate pseudo labels.90Further,The MK-MMD combined with pseudo test labels has been developed.91Wang et al. aligned the conditional distributions of multiple scale high-level features extracted through a multiple scale feature extractor via MMD.92Wu et al. introduced grey wolf optimization algorithm to adaptively learn key parameters of joint distribution adaptation.93

    According to difference data distribution characteristics, a series of joint distribution adaptation approaches were developed,different data distribution differences will lead to different combination methods. In practical application, according to data distribution characteristics between domains, the proportion of two distributions is often different.Marginal distribution and conditional distribution are not equally important in some tasks. In order to quantitatively evaluate the domain differences,a balance factor is employed to measure the weight between the two distributions,which needs to be set according to the data distribution.94On this basis, a multi-kernel dynamic distribution adaptation method (MDDA) was proposed based on DL to achieve cross-machine fault diagnosis,which defined a mix kernel function to map different domains to a unified space and dynamically evaluated the importance of marginal and conditional distributions,95as shown in Fig. 20.Moreover, as for bearing fault diagnosis, MDDA solves the problem that DDA needs to obtain some crosscharacteristics of different domains in advance. Fig. 21 shows diagnosis results with different levels of additional noises under different methods,and it can be seen that MDDA shows the best test performance.The quantitative estimation of marginal distribution and conditional distribution is of great significance to the study of DA, which is a trend to apply different probability distribution adaptation methods to DL.However, due to the existence of multiple feature layers in DL, it is not clear which layer needs domain adaptation. At the same time, a poor transfer result will generate from an inappropriate distance metric and transformation method.The details of statistic transformation method in rotating machinery are summarized in Table 3.56,62,66-81,87-95

    Fig. 18 Detailed framework of J-DCDA method.86

    Fig. 19 Detailed framework and comparison results of DL with JDA.56

    Fig. 20 Detailed framework of MDDA.95

    Fig. 21 Comparison results of MDDA.95

    BN method has performed well in fault diagnosis. As shown in Fig. 22,96for the fault detection of bearing, BN was employed after the feature layers (WDCNN) to align the mean and variance of the two domains, which has achieved high-performance testing in different conditions.Fig.23 shows the comparison results for different methods under six domain shifts,in which FFT-SVM performs poorly in domain adaptation, MLP and DNN perform better, both achieving roughly 80% accuracy. In contrast, the proposed WDCNN method is much more precise than the algorithms compared, which achieves 90.0% accuracy in average and proves the domain invariant of BN. BN used in a stacked autoencoder and DNN has also achieved high precision fault diagnosis of bearing by fine-tuning and modifying the network.97Further, Hu,et al. modified and utilized adaptive BN based on exponential moving average instead of the common BN to self-adapt the traits of different domains.98Wu, et al. proposed an adaptive logarithm normalization to realize data distribution preprocess.99The structure of BN method is simple, which only adjusts the BN layer statistics and has no additional parameters to be adjusted. However, this kind of global information will weaken the particularity between domains in some cases,and sometimes neurons are not effective for all domains due to the domain biases.It is useless for neurons to capture other features and clutter. In view of this situation, Zhang et al.designed a CNN with kernel dropout for fault diagnosis of bearings under different operation conditions.100Shen et al.selected valid source channels via cross minkowski distance and fused the target channels via two-order selective ensemble.101Further, weight coefficient optimization method has been widely studied because it pays more attention to shared features and does not completely discard feature information.Fine-tuning partial weight can interleave DL layers in the adaptor and those in the base to improve the domain perceptual sensitivity.102The weight coefficient method was combined with statistical transformation to build an attentionaware weighed distance method103and adaptive weighted complement entropy104to learn the discriminant features in different domains, which encouraged incorrect classes to get uniform and low scores in the process of DA. A novel fault diagnosis network based on a modified TrAdaBoost method for weight coefficient optimization was used to test the small target data under different operating conditions and fault types, which can perform higher accuracy.105By combining the swarm optimization method and L2 regularization to optimize weights, the DL network ensured the diagnosis stability under different conditions.106The detection results are more than 90%, much higher than the traditional methods. However, the algorithm also has shortcomings. At the beginning,if the source samples have more noise and the iteration times are not well controlled,it will increase the difficulty of training in classifier. BN method and weight coefficient optimization method have performed well in fault diagnosis. When the source data and the target data have a lot of similarities, the weight coefficient optimization method can achieve good results.Weight coefficient optimization method aims at adjusting the architectures of DL to improve the ability of learning more share features, which solves the ambiguity between shared features and private features, but does not completely eliminate the influence of the private part.The details of structure optimization method in rotating machinery are listed in Table 4.96-101,105,106

    The Grassmann manifold and geodesic method are widely used in geometric transformation method. For example, Rui et al. extended the sampling geodesic flow via smooth polynomial functions described by splines on the Grassmann manifold.107Shaw et al. used the Grassmannian manifold to exploit the parameter space structure for DA via subspace estimation.108Li et al. minimized the domain difference by keeping the discriminative information to find new representations in a common subspace via the conjugated gradient method on the Grassmann manifold.109Thopalli et al. presented a multiple subspace alignment method, which used low-dimensional subspaces to represent the datasets and exploited the natural geometry on the subspaces of the Grassmann manifold to aligndomains.110Hua et al. proposed a new progressive data augmentation method for DA, which generated a series of intermediate virtual domains via the interpolation method to achieve multiple subspace alignment.111Inspired by the intermediate representations, the source and target data for rotating machinery are projected into intermediate subspaces along the shortest geodesic path connecting the two ddimensional subspaces on the Grassmann manifold, which can reduce the domain differences via the intermediate subspaces connecting the source domain and the target domain.57,58,112In order to solve the problem that the normal data is much more than the fault data in the actual diagnosis,a novel DA model based on geodesic flow kernel is further improved to realize the fault diagnosis when there is only normal data in the target domain.58To make the model more adaptable to the target domain, a cross-domain stacked denoising autoencoder network was built, which introduced MMD and manifold regularized fine-tuning to develop the cross-domain and the cross-task fault diagnosis,Fig.24 shows the corresponding results, which performs well on both gear detection and rolling bearing detection under different tasks,Fig. 25 shows the framework.113To eliminate feature distortions when conducting distribution alignment,an adaptive factor based on A-distance was proposed to dynamically adjust the influence of the conditional and marginal distributions,then the manifold embedded distribution alignment based the adaptive factor was applied to a new transferable fault diagnosis method to get transformed representations.114These approaches take into account the specific geometric structure of the source domain and the target domain to describe the nature of data, which effectively reduce the domain differences, though it faces a big bottleneck about the high computational complexity. The details of geometric transformation method in rotating machinery are listed in Table 5.58,113,114

    Table 3 Application of statistic transformation method for rotating machinery.

    Fig. 22 Detailed framework of DL with BN.96

    Fig. 23 Comparison results of DL with BN.96

    It can be concluded from Table 3,56,62,66-81,87-95,96-101,105,106Table 4, and Table 558,113,114that Discrepancybased method has been widely used in detection of unlabeled fault data in practical application.And the comparison results also show satisfactory performance. However, the marginal distribution obtained by hidden layers may be destroyed by nonlinear mapping, which will weaken the DA performance.Secondly, although most of the existing deep DA methods achieve domain adaptation by adjusting the marginal distribu-tion, they still lack the domain adaptation ability. Matching the marginal distribution and the conditional distribution helps to achieve better domain adaptation. In the case of no labeled data in the target domain, the pseudo label learning method is proposed to achieve conditional distribution. This method needs to iteratively update the network parameters to obtain satisfactory performance, which increases the training time.In summary,it is very important to master the distribution discrepancy characteristics of different domains for achieving high accuracy and efficiency of fault diagnosis.Future research on the discrepancy-based method will continue.For statistical transformation method,how to effectively find appropriate difference measure and transformation function will be the key research area.For structure optimization method,it is necessary to eliminate the influence of private part and maintain the intact feature information. For geometric transformation method,it is a key to further normalize the intermediate step size and transformation path.In addition,in order to improve the performance of DA, joint distribution adaptation is often used in DA processing,which lacks systematic evaluation mechanism for marginal distribution and conditional distribution. How to achieve high performance DA under appropriate discrepancy-based method while ensuring better contributions of both marginal distribution and conditional distribution is also an important research task.

    Table 4 Application of structure optimization method for rotating machinery.

    Fig. 24 Results of DL with manifold learning.113

    4.2. Adversarial-based method for fault diagnosis

    On the basis theory of adversarial training, generative adversarial net (GAN) is proposed, which consists of two parts.One is the generator (G), which is responsible for generating as many true samples as possible. The other is the discrimina-tor(D),which judges whether these generated samples are real.The generated samples are encouraged to cheat true samples to make the discriminator unable to distinguish the differences between the two domains.

    Table 5 Application of geometric transformation method for rotating machinery.

    Fig. 25 Structure of DL with manifold learning.113

    Fig. 26 Two-stage model based on GADA.115

    Fig. 27 Comparison results of two-stage model based on GADA.115

    Fig. 28 Detailed framework of WD-DTL.121

    Using GAN to align domain distribution,adversarial-based method is developed. Adversarial-based method has been successfully used in mechanical fault diagnosis because of its strong flexibility and robustness. For GADA method in fault diagnosis,a two-stage model training approach was proposed,which generated data for different classes in the target domain on the first stage via different generators and trained the crossdomain classifier on the second stage (Fig. 26).115Fig. 27 shows the comparison results of different methods under four cross-domain fault diagnosis tasks.It can be seen that the proposed method achieves the best testing diagnosis accuracy under different tasks,and all the testing results are above 92%,which proved the superiority of GADA fault diagnosis framework. To avoid losing information in GADA, a cycleconsistent GAN was further designed to achieve the fault diagnosis DA,which generated new data based on known conditions via twice circular mapping in GAN and pre-trained a classifier for testing raw fault diagnosis data under various conditions.116To solve data imbalance problem for rolling bearing fault diagnosis,Li et al.unified framework incorporating predictive generative denoising autoencoder and deep coral network, which the generative model is used to generate extra fault data, and the diagnosis deep coral network is used for fault recognition via correlation alignment.117To fully utilize more information of labeled source data, Wu et al. designed a BN long-short term memory model to learn the mapping relationship between two domains to generate auxiliary data,then a transfer maximum classifier discrepancy method was applied via adversarial training to align probability distributions of generated auxiliary data and unlabeled target data.118

    Fig. 29 Detailed framework of WD-DTL.121

    For non-GADA method, the feature extraction part of the network can be equivalent to a generator.For transfer learning,the process of generating samples can be avoided sometimes,because there are a source domain and a target domain. The data in one of the domains(usually the target data)can be seen as the generated samples.The non-GADA method as one of the commonly used unsupervised deep DA algorithms has been integrated into a unified framework for fault diagnosis,119which has been constantly developed that used a domain discriminator to encourage feature extractors to learn domain-independent features.120Based on the idea of adversarial training, a new Wasserstein distance-based deep transfer learning network(WD-DTL) for fault diagnosis tasks was proposed, as shown in Fig.28,which learned domain features based on the feature extractor and minimized distribution distance between domains via adversarial training.Fig.29 shows the comparison results of different methods under different fault diagnosis tasks.121It can be seen that the testing accuracy of WD-DTL is increased from 59.47%to 64%as the sample number increasing,and the accuracies of WD-DTL are all higher than DAN and CNN.However, the results also reveal a limitation of the WD-DTL that the proposed improvement is limited under large differences.Further,Li et al.used an adversarial training scheme to realize marginal domain fusion for different bearing work conditions.122,123Guo et al.built a domain classifier and a distribution discrepancy metrics based on MMD to learn domain-invariant features.124Jiao et al.presented a double-level adversarial DA for fault diagnosis,which achieved domain-level alignment via adversarial training between feature extractor and domain discriminator, and achieved class-level alignment via adversarial training based on Wasserstein discrepancy between feature extractor and two classifiers.125Jiao et al. further designed a one-dimensional residual network for adaptive feature learning,which used JMMD and adversarial discriminator to eliminate the conditional distribution and marginal distribution differences.126To further realize the class-level alignment between domains,Li et al.created an adversarial multi-classifier method for fault diagnosis,which exploited the overfitting phenomena of different classifiers via adversarial training to extract domain-invariant features and built cross-domain classifiers.127To obtain more valid data for conditional distribution alignment,Zhang et al.proposed a statistical distribution recalibration method of soft labels (SDRS), then SDRS and center distance metric is used to reduce the distribution differences of fault data via adversarial learning.128To reduce the negative transfer,Li et al.proposed a class weighted adversarial network via attachment of class-level weights on the source domain to encourage positive transfer of the shared features and ignore the source outliers.129To improve domain generalization, Li et al.ued a domain augmentation method to expand the available data,which used domain adversarial training and distance metric learning to learn generalized features of different domains, and enhanced robustness via scaling the temporal vibration data horizontally.130,131To enhance the adaptability of auxiliary data,Li et al.used the joint distribution of labeled auxiliary data and unlabeled target data via domainadversarial training, which has improved the performance of transfer and classification.132To solve the problems of high training cost and low testing accuracy of traditional deep DA,such as DDC and RevGrad, Qin et al. proposed a parameter sharing adversarial DA model, which built shared classifier and domain classifier to reduce the complexity of model structure, and added coral to enhance the domain confusion via unbalanced adversarial training.133To handle unsupervised cross-domain fault diagnosis tasks, Zhao et al. developed an improved joint distribution adaptation model via adversarial learning,which more accurately calculated the value of joint discrepancy without any approximation.134

    The application details of adversarial-based method in fault diagnosis are shown in Table 6.115-118,120-130,132-134The researches and applications of GAN in DA field is of great significance for practical fault diagnosis.Because there is no need to design models that follow any kind of factorization for GAN, any generator and discriminator will be useful. Therefore, GADA is a very flexible design framework, and various types of loss functions can be integrated into GAN model.When the probability density cannot be calculated in DA,some generation models that rely on the natural interpretation of data cannot be learned and applied.GADA can still be used in this case, because it introduces a very smart training mechanism of internal confrontation, which can approximate some objective functions that are not easy to calculate. Although great progress has been made at current, the interpretability of the generation model is poor, in which its distribution is not expressed explicitly, further it is difficult to train that the discriminator and the generator need a good synchronous training. Further, the generator will be easy to degenerate due to collapse problem,while the discriminator will also point to the similar samples in the similar directions, so the training cannot continue. Non-GADA does not rely on the generator to generate identically distributed samples and keeps high sample diversity. In addition, for GADA and non-GADA, discriminant network is equivalent to provide an adaptive loss according to different tasks and data, which is more robust than non-adversarial network with fixed loss.However,adversarial training needs to achieve Nash equilibrium which can be achieved by gradient descent method sometimes, and sometimes it cannot,the training is unstable compared with the traditional training method.

    Table 6 Application of Adversarial-based method method for rotating machinery.

    4.3. Reconstruction-based method for fault diagnosis

    Fig. 30 Detailed framework of sparse stacked denoising autoen-coder with MMD.136

    Fig. 31 Comparison results of sparse stacked denoising autoen-coder with MMD.136

    Fig. 32 Detailed framework of CatAAE.144

    The reconstruction of the source data or the target data can improve the performance of DA, which has been gradually applied and achieved success in fault diagnosis.To realize better performance, the aforementioned methods can be used simultaneously, for example, considering the reconstruction ability of data, MMD was used to combine with the selfencoder structure to minimize the domain discrepancy between the training and testing features,which has improved the accuracy of fault diagnosis.135,136Fig. 30 shows a sparse stacked denoising autoen-coder structure with MMD,136which sent the target to the model trained on the source data directly in the fine-tuning process. Therefore, the pre-training model on the source domain can be directly applied on the target data without retraining. Comparison results of the proposed reconstruction-based method and other methods are shown in Fig. 31136. This comparison shows the superiority of the proposed DA method. Inspired by shared encoder structure,a shared dictionary matrix that combined two regularization terms on a common low-dimensional subspace was proposed to learn the knowledge from the source and target domains,which aims at reducing large differences between domains.137To effectively diagnose gearbox faults with very few training data, He et al. presented a new deep transfer multi-wavelet auto-encoder, which designed new-type multi-wavelet autoencoder based on multi-wavelet activation function, and utilized similarity measure to select high-quality auxiliary data to train a source model containing the shared features with the target domain.138Peng et al.proposed a new DA model with a sparse auto-encoder and a CNN,which drastically reduced the risk of negative transfer through instantaneous rotating speed information of the target domain in the training process.139Li et al.utilized an auto-encoder model to project features of different equipment into the same subspace, and adopted crossmachine adaptation algorithm based on MMD for knowledge generalization, which minimized the distribution discrepancy between domains.140Cao et al.developed a soft JMMD based on class weight bias to reduce the marginal and conditional distribution differences of the extracted features via a reconstruction-based unsupervised learning strategy. The conflict with classification tasks is low,which is more conducive to transfer learning.141The synthetic fault data may not follow the true fault data distribution or exploit excessively over the available small data,which will result bias or overfitting.What is more, the value of the abundant normal data with essential information for fault discrimination has not been well explored. To address these issues, Lu et al. developed a new two stage transferable common feature space mining method for fault diagnosis. A weakly supervised DA convolutional autoencoder with MMD was used to learn the shared features underlying multi-domain data in the first stage, and then the common feature net is combined with a unique feature net to construct a dual-channel feature extraction and comparison model in the second stage, which can mine both the transferable shared features and unique features of different faults.142

    Fig. 33 Comparison results of CatAAE.144

    Table 7 Application of reconstruction-based method for rotating machinery.

    Inspired by GAN,more and more researchers have embedded the idea of GAN in the field of reconstruction-based method. For the reconstruction technology-based GAN method in fault diagnosis, a framework combined GAN and the stacked denoising autoencoder was developed to perform cross-domain fault diagnosis tasks.143To adjust the conditional distribution at the same time, a categorical adversarial autoencoder (CatAAE) was proposed, including an encoder,decoder,and discriminator,to impose a prior class distribution to obtain a satisfactory unsupervised clustering result. As shown in Fig.32,the encoder was seen as the generator to generate latent vectors(fake samples),which can be confused with random samples (true samples) via a discriminator.144Fig. 33 shows the corresponding comparison results, which shows a good intra-class compactness and and inter-class difference compared with other method. In addition, the deep stack autoencoder145and the variational autoencoder146have been successfully used in cross-domain fault identification via combining with domain discriminator. The application details of reconstruction-based method in fault diagnosis are shown in Table 7.135-146Deep DA method based on reconstruction technology has been widely concerned, because it can effectively separate the share and private features between domains without relying on explicit functions and avoid negative transition.However,its powerful flexibility also brings some problems.In particular, it ignores the private features of the target domain via reconstruction and is hard to train under the adversarial training.

    Fig. 34 Data distribution in different domains.

    5. Multi-source DA method (MDA)

    In practice, there may be more than one labeled dataset. Different from the general single DA problem, MDA can collect multiple supervised data from multi-source domains with different distributions, which is of great significance in practice.For example, data can sometimes be obtained in multiple domains. A very natural idea is to combine these data into a data set to train the model. However, as shown in Fig. 34,because the distribution of each domain is different, such a processing method cannot provide sufficient data, and sometimes it may even have a negative impact on the model.Therefore, MDA has attracted more attention from both academia and industry. In essence, the purpose of DA in single domain or multiple domains is to align the features of the target domain with the source domain, so the DA method in single source domain can also be applied to MDA. On this basis,Rebuffi et al.proposed a residual adaptive module to compress multiple domains and share substantial parameters between domains to achieve multiple-domain learning.147Mancini et al. associated the source domain to a latent domain to find multiple latent domains and introduced specific domain alignment layers based on BN to learn variables.148Zhao et al.proposed a novel multi-source distilling DA method, which mapped the target data to the feature space of each source by minimizing the empirical variance and selected the source data via the domain weight computed from the difference between each the source domain and the target domain.149Peng et al. used the moment matching to align the multisource domain with the target domain, in which the error bound was proposed in the framework of cross-moment divergence.150Li et al. presented a new multi-source DA method based on a mutual learning network, including a branch conditional adversarial DA network trained on the target domain and the single-source domain, a guidance conditional adversarial DA network trained on the target domain and the multi-source domain. The two networks were aligned to each other to realize mutual learning, as shown in Fig. 35.151Lin et al. introduced a multi-source sentiment GAN to find a unified sentiment latent share space to handle multi-source data.152It can be seen from the above research that MDA method combining different deep DA methods has been designed, but there are still many modular implementation details, such as how to align the target domain with multiple source domains, whether feature extractors are shared, how to select more relevant sources, and how to combine multiple predictions from different classifiers.

    Fig. 35 MDA with GAN.151

    Fig. 36 MDA with GAN for fault diagnosis.153

    The multi-source domain often exits in actual industry,which result in the wide application of MDA in fault diagnosis.A fault diagnosis approach based on the multi-source domain has been successfully designed—which used local Fisher discriminant analysis to learn discriminant directions from multimodal data via preserving the intra-class local structure—and utilized the Karcher mean to compute the mean source subspace on the Grassmann manifold to assist the target task.58Furthermore, in order to achieve more flexible DA, GAN is introduced into the training process of MDA. As shown in Fig. 36, an MDA framework based on GAN was developed to extract general features with discriminative information about different equipment and machine health conditions,which transferred diagnostic knowledge learned from multisource rotating machines to the target machine via adversarial training.153Fig.37 shows the superiority of MDA by comparing with other method under different training sizes. In general, the adversarial MDA method consists of a shared feature extractor, a multi-classifier, and a domain discriminator, in which the parameters were updated via a crossentropy loss,a domain alignment loss,and a domain classifier alignment loss were used to minimize the distribution differences for all domains.154A multi-source DA framework was presented to transfer the knowledge from multiple labeled source domains into a single unlabeled target domain via aligning feature-level and task-specific distribution based on sliced wasserstein discrepancy,which can be easily turned into a single-source DA problem and readily updated to unsupervised DA of other fields.155Wang et al. developed an MDA method with different weights applied to different source domains for machinery fault classification. Different weights will be assigned under different working conditions based on the distributional similarities of the source domains to the target domain.156

    The application details of MDA in fault diagnosis are shown in Table 8.153,155,156MDA is a powerful development to explore more available information,which can utilize supervised data from multiple sources with different distribution.157However,how to choose the most relevant data in each source domain automatically and adaptively is still a key problem.

    6. Discussion

    As a kind of transfer learning, DA fits the setting where the source data labels are available, and the target data labels are unavailable, which is normally seen in practical detection environment of various areas. The broad application prospect of deep DA has been viewed in different research fields.56-58Some DA algorithms concerned have been designed for detection gradually, which can be further categorized into four branches, including the discrepancy-based DA method,adversarial-based DA method, reconstruction-based method,and multi-source DA method. Table 9 shows the characteristics of different Deep DA methods.

    In discrepancy-based method, the distance metric function represented by MMD and coral is easy to implement, which needs no additional parameters, and has efficient calculation performance.8,11After combining it with DL, all network parameters are adjusted by back propagation to minimize the domain differences.9,20,21BN method realizes DA by optimizing the network structure, which only adjusts the parameters of BN layer and does not produce additional parameters.15The geometric transformation method represented by manifold transformation can well describe the characteristics of data, especially its specific geometric structure.17All discrepancy-based methods well describe the distribution characteristics of domains, however, when the distribution characteristics cannot be calculated in DA,some models that rely on the natural interpretation of data cannot be learned and applied. In this case, the adversarial-based method was developed. It can adaptively fit the objective function according to specific tasks and data to learn the distribution characteristics via an internal adversarial training mechanism, which performs more robust detection result.17But the discriminator is usually implemented as a part of network, which adds new learning parameters.To avoid negative transfer caused by private feature, reconstruction-based method is proposed, which focuses on separating shared and private features between domains,and can be seen as an auxiliary task in DL.158However,it is more difficult to generate a sharp new image,because the decoder is trained for minimizing reconstruction error,rather than cheating the discriminator like GAN. By combining the reconstruction method with GAN, the advantages of two algorithms and powerful flexibility can be obtained. The reconstruction error solves the problem of the authenticity of the generated samples, but also greatly reduces the diversity of the generated samples, which is a contradiction. Further,to collect multiple supervised data from multi-source domains,MDA which can be combined with the above methods, has been concerned to explore more available information.

    Fig. 37 Comparison results of MDA with GAN.153

    Table 8 Application of MDA method for rotating machinery.

    Deep DA-based methods performed well in reducing crossdomain discrepancy, which has been gradually used to solve domain shift problem in intelligent fault diagnosis.In this survey,we have made comprehensive survey to review the present development of deep DA in rotating machinery fault detection and diagnosis. In recent years, there are only a few researches considering the application of DA to enhance the applicability and flexibility of rotating machinery fault detection and diagnosis task of different domains.56-58Nevertheless, deep DAis still expected to be more widely used for fault diagnosis of rotating machinery in the future because of its broad capabilities. From the application of DA in different detection fields,it can be seen that the harder problems for DA are far from being solved(Table 9).Based on the introduction and analysis of the literature collection on deep DA-based fault diagnosis model published in different periods, we summarize the main problems and research directions in the future. More studies need to be focused on these difficult problems to achieve higher performance fault diagnosis in rotating machinery:

    Table 9 Summary for deep DA method.

    (1) As we have seen that unsupervised deep DA has been successfully applied to fault diagnosis, which rely on the classification performance of the source data. However,if the marginal distributions are significantly different, the optimization for the source classification loss and divergence between domains will actually increase the target error.94,95How to better align the label distributions without target label is an important problem.

    (2) The transfer results are related to the metric and transformation method (such as MMD,77CMD,78BN,97and Grassmann manifold,113,114et al.) of the distribution discrepancy in the discrepancy-based method,which will result in a poor transfer result on the target domain in diagnosis models if the discrepancy cannot be correctly described via the measured distance.Therefore,how to design a metric and transformation method suitable for specific data driven is one of the necessary research directions in DA. Further, it is not clear that which layer needs to be measured.82

    (3) For deep DA based on GAN method, the best results for generalization ability and feature matching in the source domain will produce a good generalization ability in the target domain. However, if XS∩XT= and the feature mapping space is complex enough,the mismatch of the distribution for different categories between the source and the target domains in the feature space will be generated, and the two optimization objectives of DA can be obtained simultaneously: 1) the error rate of the classifier in the source domain is as small as possible; 2) in the feature space, the feature distributions in the source and the target domains are identical,in which the model can be mistaken for being well trained.Therefore,how to add alignment constraints is one of the necessary research directions in GAN. Further, the interpretability of the generation model is poor,in which its distribution is not expressed explicitly, and it is hard to train.

    (4) It is necessary for reconstruction-based method to pay more attention to the decoder to generate better images.Further, it ignores the uniqueness of the target domain in the process of classification, how to mine more information in feature space is important. MDA lacks the ability to process the labeled source data originated from various patterns, such as vibration data, acoustic data, and image data. It is necessary to find the method of integrating different data modes in the multi-source DA method.157Further, how to choose the most relevant data automatically and adaptively in each source domain is still a key problem.

    (5) For all deep DA methods, there is no hyperparameter tuning method for specific tasks. Most of them depend on empirical value or cross validation method, which is time-consuming and labor-consuming. It is necessary to design a uniform hyperparameter tuning method so that other methods can get similar benefits from this tuning; sometimes, due to the domain transformation in fault detection,it is also necessary to further optimize the parameters to achieve the bidirectional network. In additional, in the above analysis of literature, different detection methods are compared to confirm the superiority of their respective DA methods. However, the DA methods proposed from different literatures are not compared on the same data set, so it is difficult to make a direct comparison between the proposed DA methods. More experiments are needed to compare these methods. Similarly, other promising methods may be superior to DA methods on some data sets,which can be evaluated by additional experiments.

    Based on the above analyses, Fig. 38 summarizes several possible directions for future research in deep DA method,including designing the metric and transformation method for specific domain difference measurement to achieve more accurate testing, studying the alignment constraints method in GAN for more effective transfer,mining more feature information for better training, developing the DA method with multimode data fusion function for excavating more information,building general adaptive hyperparameter tuning module for the application to a wider range of tasks, and conducting further experimental comparison between DA methods for more clearly knowing which aspects of DA method are responsible for performance gains and finding more effective combining DA methods. Furthermore, designing more realistic label distribution fitting methods in unsupervised environment,visualizing data distribution characteristics in adversarialbased DA method to enhance their interpretability,and developing relevant data adaptive selection methods in multi-source DA are also expected to be better implemented in future research.

    Fig. 38 Future development of DA.

    7. Conclusions

    In this survey, we focused on reviewing deep DA technology and its application in fault diagnosis from aspects of the discrepancy-based method, adversarial-based method,reconstruction-based method, and multi-source DA method.For the discrepancy-based method, statistic transformation,structure optimization and geometric transformation are introduced and the application technologies in these three operations are analyzed. The adversarial-based method is divided into two categories, including GADA and non-GADA,depending on whether there are additional generators, in which the feature extraction structure is usually regarded as a generator in non-GADA. The reconstruction-based method is introduced from encoder-decoder and GAN structures,which can ensure the invariance of features during the transfer.For labeled datasets of multiple domains, MDA is analyzed based on the above methods. Finally, based on the analysis of deep DA methods in these references, we summarized the current problems and provided the possible future works on deep DA methods for intelligent fault diagnosis.

    Declaration of Competing Interest

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    Acknowledgements

    This work is supported by the National Natural Science Foundation of China (Grant Nos. 52175096, 51775243, and 11902124),the fellowship of China Postdoctoral Science Foundation (Grant No. 2021T140279) and 111 Project (Grant No.B18027).

    久久精品熟女亚洲av麻豆精品| 各种免费的搞黄视频| 亚洲中文av在线| 丰满少妇做爰视频| 国产男女超爽视频在线观看| 一区二区三区乱码不卡18| 天天影视国产精品| 国产精品三级大全| 一级黄片播放器| 日本与韩国留学比较| 91精品三级在线观看| 高清毛片免费看| 在线观看免费高清a一片| 80岁老熟妇乱子伦牲交| 亚洲成人手机| 精品午夜福利在线看| 捣出白浆h1v1| 久久久国产一区二区| 欧美最新免费一区二区三区| 亚洲欧美一区二区三区国产| 国产一区二区在线观看av| 久久av网站| 亚洲精品456在线播放app| 亚洲国产精品999| videossex国产| 成人综合一区亚洲| 狂野欧美激情性bbbbbb| 99国产精品免费福利视频| 色视频在线一区二区三区| 国产精品 国内视频| 夫妻性生交免费视频一级片| 国产在视频线精品| 亚洲精品日韩在线中文字幕| 亚洲国产精品国产精品| 99九九在线精品视频| 精品久久蜜臀av无| 永久免费av网站大全| av在线app专区| 男人爽女人下面视频在线观看| 成人18禁高潮啪啪吃奶动态图| av播播在线观看一区| 日韩视频在线欧美| 国产成人免费无遮挡视频| 免费看光身美女| 亚洲精品国产av成人精品| 一级片'在线观看视频| 2022亚洲国产成人精品| 精品少妇黑人巨大在线播放| 少妇精品久久久久久久| 一级毛片我不卡| 中文欧美无线码| 亚洲综合精品二区| 久久久久久伊人网av| 亚洲欧美一区二区三区国产| 国产无遮挡羞羞视频在线观看| 成人免费观看视频高清| 人人澡人人妻人| 亚洲内射少妇av| 成人毛片60女人毛片免费| 国产成人精品一,二区| 性色avwww在线观看| 大片电影免费在线观看免费| 视频在线观看一区二区三区| 午夜视频国产福利| 免费在线观看黄色视频的| 尾随美女入室| 又黄又爽又刺激的免费视频.| 少妇人妻精品综合一区二区| 久久99一区二区三区| 国产精品熟女久久久久浪| 久久久久久久久久久免费av| 亚洲av男天堂| 日本黄色日本黄色录像| 亚洲丝袜综合中文字幕| 成人综合一区亚洲| av国产久精品久网站免费入址| 超色免费av| av有码第一页| 亚洲综合精品二区| 色婷婷av一区二区三区视频| a 毛片基地| 高清av免费在线| 十八禁网站网址无遮挡| 午夜影院在线不卡| 在线观看美女被高潮喷水网站| 成人国产av品久久久| 伦理电影大哥的女人| 久热久热在线精品观看| 午夜福利视频精品| 激情五月婷婷亚洲| 最黄视频免费看| 国产精品 国内视频| 九九在线视频观看精品| 免费人成在线观看视频色| 国产欧美日韩一区二区三区在线| 在线 av 中文字幕| 啦啦啦视频在线资源免费观看| www.色视频.com| av在线观看视频网站免费| 精品少妇黑人巨大在线播放| 婷婷色综合www| 一级a做视频免费观看| 免费播放大片免费观看视频在线观看| 婷婷色麻豆天堂久久| 99久国产av精品国产电影| 日日摸夜夜添夜夜爱| 一级毛片电影观看| 亚洲综合色惰| 香蕉国产在线看| 18在线观看网站| 伊人亚洲综合成人网| 一级毛片黄色毛片免费观看视频| 国产又爽黄色视频| 男男h啪啪无遮挡| 国产av国产精品国产| 五月玫瑰六月丁香| 精品熟女少妇av免费看| 亚洲精品久久成人aⅴ小说| 欧美日韩视频高清一区二区三区二| 免费在线观看黄色视频的| 波多野结衣一区麻豆| 欧美最新免费一区二区三区| 一本大道久久a久久精品| 男女高潮啪啪啪动态图| 少妇的丰满在线观看| 久久久久视频综合| 久久久久久久久久久免费av| 精品人妻熟女毛片av久久网站| 男人舔女人的私密视频| 五月天丁香电影| 精品一区二区三区四区五区乱码 | 亚洲欧美一区二区三区黑人 | 一级毛片 在线播放| 1024视频免费在线观看| 寂寞人妻少妇视频99o| 国产在线免费精品| 成年女人在线观看亚洲视频| xxxhd国产人妻xxx| 我的女老师完整版在线观看| 国产精品秋霞免费鲁丝片| 1024视频免费在线观看| 永久网站在线| 国产av码专区亚洲av| 国产免费现黄频在线看| 日本黄色日本黄色录像| 99国产精品免费福利视频| 一本一本久久a久久精品综合妖精 国产伦在线观看视频一区 | 五月玫瑰六月丁香| 国产永久视频网站| 黑丝袜美女国产一区| 亚洲av日韩在线播放| 日韩欧美一区视频在线观看| 久久久久久人妻| 中文字幕制服av| 午夜精品国产一区二区电影| 99热这里只有是精品在线观看| 免费黄色在线免费观看| 国产老妇伦熟女老妇高清| 国产成人a∨麻豆精品| 亚洲av福利一区| 国国产精品蜜臀av免费| 国产高清国产精品国产三级| 久久99精品国语久久久| 一区二区三区精品91| 高清黄色对白视频在线免费看| 赤兔流量卡办理| 黄色 视频免费看| 久久国内精品自在自线图片| 在线 av 中文字幕| 亚洲欧美一区二区三区国产| av在线老鸭窝| 黄片无遮挡物在线观看| 成人亚洲欧美一区二区av| 国产女主播在线喷水免费视频网站| 国产精品国产av在线观看| 狂野欧美激情性bbbbbb| 丰满乱子伦码专区| 777米奇影视久久| 99re6热这里在线精品视频| 超碰97精品在线观看| 欧美 日韩 精品 国产| 欧美成人午夜精品| 国产 精品1| 亚洲精品第二区| 免费黄色在线免费观看| 一边摸一边做爽爽视频免费| 99热全是精品| 在线免费观看不下载黄p国产| 一二三四在线观看免费中文在 | 99热这里只有是精品在线观看| 在线观看www视频免费| 欧美成人午夜精品| 亚洲精品中文字幕在线视频| 激情视频va一区二区三区| 国产精品一区二区在线不卡| 国产精品免费大片| av天堂久久9| 一级片'在线观看视频| 亚洲伊人久久精品综合| 国产欧美日韩综合在线一区二区| 香蕉国产在线看| 免费观看a级毛片全部| 99热国产这里只有精品6| 99热国产这里只有精品6| 高清黄色对白视频在线免费看| 99精国产麻豆久久婷婷| 亚洲第一av免费看| 超碰97精品在线观看| 免费黄网站久久成人精品| 精品一区二区三卡| 亚洲欧美色中文字幕在线| 欧美97在线视频| tube8黄色片| 黑人巨大精品欧美一区二区蜜桃 | 99国产精品免费福利视频| 亚洲中文av在线| www.av在线官网国产| 亚洲国产精品一区二区三区在线| 18在线观看网站| 22中文网久久字幕| 欧美xxⅹ黑人| 一级毛片黄色毛片免费观看视频| 九九在线视频观看精品| 久久久久久久久久人人人人人人| 成人毛片a级毛片在线播放| 亚洲天堂av无毛| 国产一区二区在线观看av| 十分钟在线观看高清视频www| 亚洲内射少妇av| 男人爽女人下面视频在线观看| 丝袜脚勾引网站| 国产精品一二三区在线看| 欧美3d第一页| 久久久久久久亚洲中文字幕| 国产不卡av网站在线观看| a级片在线免费高清观看视频| 久久久久精品人妻al黑| 亚洲成人av在线免费| 国产精品.久久久| 欧美日韩综合久久久久久| 免费观看av网站的网址| 永久免费av网站大全| 香蕉国产在线看| 亚洲av.av天堂| 七月丁香在线播放| 一边亲一边摸免费视频| 日韩精品有码人妻一区| 少妇的丰满在线观看| 2022亚洲国产成人精品| 成人亚洲精品一区在线观看| 99国产精品免费福利视频| 日本-黄色视频高清免费观看| 人体艺术视频欧美日本| 国产在线免费精品| 男女下面插进去视频免费观看 | 国产精品久久久久久av不卡| 日韩 亚洲 欧美在线| 香蕉精品网在线| 久久久a久久爽久久v久久| 亚洲国产精品999| 亚洲国产精品999| 少妇被粗大的猛进出69影院 | 中文欧美无线码| 精品午夜福利在线看| 天堂8中文在线网| 久久国内精品自在自线图片| 超碰97精品在线观看| 国产深夜福利视频在线观看| 久久婷婷青草| 高清视频免费观看一区二区| 99re6热这里在线精品视频| 性高湖久久久久久久久免费观看| 在线看a的网站| 亚洲,欧美精品.| 国产乱来视频区| 久久久精品区二区三区| 国产一区有黄有色的免费视频| 国产日韩欧美在线精品| 99re6热这里在线精品视频| 亚洲人成77777在线视频| 美女脱内裤让男人舔精品视频| 亚洲国产精品一区二区三区在线| 男女下面插进去视频免费观看 | 人人澡人人妻人| 午夜老司机福利剧场| 久久久久久久国产电影| 亚洲精华国产精华液的使用体验| 在线天堂最新版资源| a级毛片在线看网站| 日韩精品有码人妻一区| 男女高潮啪啪啪动态图| av国产久精品久网站免费入址| 国产 精品1| 国产成人欧美| 国产高清三级在线| a级片在线免费高清观看视频| 成人亚洲欧美一区二区av| 全区人妻精品视频| 国产白丝娇喘喷水9色精品| 中国美白少妇内射xxxbb| av在线老鸭窝| 中文字幕免费在线视频6| 人妻少妇偷人精品九色| 亚洲色图综合在线观看| 久久ye,这里只有精品| 国产一区亚洲一区在线观看| 五月伊人婷婷丁香| 一级片免费观看大全| 久久女婷五月综合色啪小说| 国产在线视频一区二区| 精品一区二区三卡| 伊人久久国产一区二区| 亚洲精品日本国产第一区| 成人无遮挡网站| 高清不卡的av网站| 如日韩欧美国产精品一区二区三区| 精品福利永久在线观看| 亚洲一码二码三码区别大吗| 大香蕉久久成人网| 中文字幕人妻熟女乱码| 一本一本久久a久久精品综合妖精 国产伦在线观看视频一区 | 一本色道久久久久久精品综合| 免费不卡的大黄色大毛片视频在线观看| 精品人妻偷拍中文字幕| 丝袜人妻中文字幕| 一级毛片 在线播放| 91精品国产国语对白视频| 国产男女内射视频| 亚洲美女黄色视频免费看| 99热国产这里只有精品6| 国产男人的电影天堂91| 丰满少妇做爰视频| 亚洲综合色惰| 中文字幕最新亚洲高清| 午夜福利影视在线免费观看| 青春草国产在线视频| 宅男免费午夜| 免费高清在线观看视频在线观看| 九色成人免费人妻av| 亚洲国产最新在线播放| 黄色 视频免费看| 两个人免费观看高清视频| 国产成人精品久久久久久| 成人国产麻豆网| 久久热在线av| 水蜜桃什么品种好| 亚洲三级黄色毛片| 香蕉丝袜av| 少妇人妻久久综合中文| 最近最新中文字幕免费大全7| 人人妻人人添人人爽欧美一区卜| 亚洲综合精品二区| 自线自在国产av| 久久久久国产网址| freevideosex欧美| 精品国产乱码久久久久久小说| 中文欧美无线码| 亚洲精品,欧美精品| 免费高清在线观看视频在线观看| 日产精品乱码卡一卡2卡三| 香蕉国产在线看| 国产一区二区三区av在线| 免费观看a级毛片全部| 日韩精品有码人妻一区| 极品人妻少妇av视频| 亚洲综合色惰| 亚洲国产欧美在线一区| 一级毛片 在线播放| 大码成人一级视频| 多毛熟女@视频| 成人18禁高潮啪啪吃奶动态图| 在线观看美女被高潮喷水网站| 人人妻人人添人人爽欧美一区卜| 亚洲精品av麻豆狂野| 精品少妇内射三级| 18禁动态无遮挡网站| 精品一品国产午夜福利视频| 国产av精品麻豆| 一区在线观看完整版| 欧美日韩视频高清一区二区三区二| 国产片特级美女逼逼视频| 国产一区亚洲一区在线观看| 精品少妇内射三级| 亚洲精品第二区| 国产精品不卡视频一区二区| 成人亚洲精品一区在线观看| 欧美精品一区二区大全| 国产亚洲午夜精品一区二区久久| 亚洲色图综合在线观看| 侵犯人妻中文字幕一二三四区| 极品少妇高潮喷水抽搐| 久久久久久久久久成人| 日韩人妻精品一区2区三区| 日韩av免费高清视频| 久久青草综合色| 麻豆乱淫一区二区| 两个人看的免费小视频| 亚洲情色 制服丝袜| 国产精品久久久久久av不卡| 亚洲国产成人一精品久久久| 亚洲少妇的诱惑av| 99国产精品免费福利视频| 日本免费在线观看一区| 一级毛片 在线播放| 亚洲欧美日韩卡通动漫| 久久精品国产鲁丝片午夜精品| 91成人精品电影| 久久亚洲国产成人精品v| 午夜老司机福利剧场| 亚洲人与动物交配视频| 精品亚洲成a人片在线观看| 久久毛片免费看一区二区三区| 国产黄色免费在线视频| 国产免费一级a男人的天堂| 美女主播在线视频| 精品一区二区免费观看| av电影中文网址| 久热久热在线精品观看| 美女中出高潮动态图| 最近最新中文字幕大全免费视频 | 国产精品久久久久久精品电影小说| 热re99久久精品国产66热6| 999精品在线视频| 一级黄片播放器| 久久精品熟女亚洲av麻豆精品| 欧美日韩综合久久久久久| 我的女老师完整版在线观看| 成人漫画全彩无遮挡| 99久国产av精品国产电影| 秋霞在线观看毛片| 人体艺术视频欧美日本| 国产高清不卡午夜福利| 日产精品乱码卡一卡2卡三| 亚洲美女黄色视频免费看| 亚洲欧美日韩卡通动漫| 欧美成人午夜精品| 国产亚洲精品第一综合不卡 | 秋霞伦理黄片| 免费看不卡的av| 女人久久www免费人成看片| 26uuu在线亚洲综合色| 国产国语露脸激情在线看| 久久久国产一区二区| 精品卡一卡二卡四卡免费| 最近的中文字幕免费完整| 久久精品国产鲁丝片午夜精品| 午夜免费男女啪啪视频观看| 2022亚洲国产成人精品| 欧美日韩精品成人综合77777| 人人澡人人妻人| 天堂俺去俺来也www色官网| 成人亚洲精品一区在线观看| 欧美日韩精品成人综合77777| 大片电影免费在线观看免费| 99热这里只有是精品在线观看| 妹子高潮喷水视频| 久久久久网色| 免费黄频网站在线观看国产| 肉色欧美久久久久久久蜜桃| 久久午夜综合久久蜜桃| 麻豆乱淫一区二区| 高清视频免费观看一区二区| 亚洲精品乱久久久久久| 亚洲欧美清纯卡通| 亚洲欧美色中文字幕在线| 日本与韩国留学比较| 国国产精品蜜臀av免费| av女优亚洲男人天堂| 日韩,欧美,国产一区二区三区| 2022亚洲国产成人精品| 高清av免费在线| 美女国产高潮福利片在线看| 国产精品久久久久久久久免| 在线观看免费视频网站a站| 美女主播在线视频| 国产精品久久久久久久久免| 亚洲av综合色区一区| 女性生殖器流出的白浆| 91aial.com中文字幕在线观看| www.色视频.com| 狂野欧美激情性bbbbbb| 亚洲欧美色中文字幕在线| 狠狠婷婷综合久久久久久88av| 久久午夜综合久久蜜桃| 国产69精品久久久久777片| 亚洲三级黄色毛片| 最新的欧美精品一区二区| 欧美日韩视频高清一区二区三区二| 久久久a久久爽久久v久久| 岛国毛片在线播放| 成人黄色视频免费在线看| 高清欧美精品videossex| 国产乱人偷精品视频| 老司机影院毛片| 在线观看国产h片| 国产黄色视频一区二区在线观看| 亚洲三级黄色毛片| 日韩伦理黄色片| 久久精品久久精品一区二区三区| 国产爽快片一区二区三区| 99精国产麻豆久久婷婷| 七月丁香在线播放| 青春草国产在线视频| 国产午夜精品一二区理论片| 超色免费av| 亚洲四区av| 欧美变态另类bdsm刘玥| 亚洲精品视频女| 日日爽夜夜爽网站| 亚洲美女搞黄在线观看| 亚洲av电影在线进入| 国产高清国产精品国产三级| 麻豆乱淫一区二区| 中国美白少妇内射xxxbb| 一本—道久久a久久精品蜜桃钙片| 国产免费一区二区三区四区乱码| 十八禁高潮呻吟视频| 日本与韩国留学比较| 久久韩国三级中文字幕| 久久久久久久亚洲中文字幕| 久久这里只有精品19| a 毛片基地| 欧美精品一区二区大全| 97在线视频观看| 我要看黄色一级片免费的| 日韩,欧美,国产一区二区三区| 看非洲黑人一级黄片| 一二三四中文在线观看免费高清| 亚洲三级黄色毛片| 亚洲av国产av综合av卡| 99久久精品国产国产毛片| 亚洲国产看品久久| 卡戴珊不雅视频在线播放| 在线观看免费视频网站a站| 精品一区二区三区四区五区乱码 | 肉色欧美久久久久久久蜜桃| 男女国产视频网站| 亚洲精品成人av观看孕妇| 国产成人精品福利久久| 久久热在线av| 亚洲国产最新在线播放| 精品一区二区三卡| 七月丁香在线播放| 亚洲国产色片| 十八禁高潮呻吟视频| 亚洲欧美一区二区三区黑人 | 亚洲伊人色综图| 18在线观看网站| 欧美少妇被猛烈插入视频| 亚洲五月色婷婷综合| 久久久欧美国产精品| 亚洲性久久影院| 男女国产视频网站| 男人舔女人的私密视频| 日本与韩国留学比较| 亚洲,一卡二卡三卡| 亚洲精品456在线播放app| 亚洲成人av在线免费| 亚洲美女视频黄频| 另类精品久久| 国产av码专区亚洲av| 久久 成人 亚洲| 国产免费又黄又爽又色| 一级片免费观看大全| 亚洲四区av| 亚洲性久久影院| 国产视频首页在线观看| 久久人人97超碰香蕉20202| 看免费成人av毛片| 9色porny在线观看| 三上悠亚av全集在线观看| 亚洲色图 男人天堂 中文字幕 | 制服人妻中文乱码| 国产精品久久久久久精品古装| 亚洲精品视频女| 免费观看无遮挡的男女| 在线亚洲精品国产二区图片欧美| 久久99蜜桃精品久久| 亚洲综合精品二区| 99国产综合亚洲精品| 男女国产视频网站| 成人二区视频| 国产黄频视频在线观看| 丝袜美足系列| 久久久国产精品麻豆| 日韩av免费高清视频| 国产日韩欧美在线精品| 国产极品天堂在线| 国产av国产精品国产| 亚洲成av片中文字幕在线观看 | 亚洲av电影在线观看一区二区三区| 在线观看免费视频网站a站| 波多野结衣一区麻豆| 亚洲精品456在线播放app| 90打野战视频偷拍视频| 91精品国产国语对白视频| 18禁观看日本| 男女啪啪激烈高潮av片| 老司机亚洲免费影院| 欧美激情极品国产一区二区三区 | 热re99久久精品国产66热6| 免费看不卡的av| 成人18禁高潮啪啪吃奶动态图| 国产亚洲av片在线观看秒播厂| 桃花免费在线播放| 丝袜脚勾引网站| 天天操日日干夜夜撸| 国产亚洲最大av| 亚洲国产欧美日韩在线播放| 亚洲熟女精品中文字幕| 亚洲av免费高清在线观看| 插逼视频在线观看| 在线观看国产h片| 狠狠精品人妻久久久久久综合| 亚洲天堂av无毛|