• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization

    2022-01-25 12:50:52HanjiangHuHeshengWangZheLiuandWeidongChen
    IEEE/CAA Journal of Automatica Sinica 2022年2期

    Hanjiang Hu,Hesheng Wang,,Zhe Liu,and Weidong Chen,

    Abstract—Visual localization is a crucial component in the application of mobile robot and autonomous driving.Image retrieval is an efficient and effective technique in image-based localization methods.Due to the drastic variability of environmental conditions,e.g.,illumination changes,retrievalbased visual localization is severely affected and becomes a challenging problem.In this work,a general architecture is first formulated probabilistically to extract domain-invariant features through multi-domain image translation.Then,a novel gradientweighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.We also propose a new adaptive triplet loss to boost the contrastive learning of the embedding in a self-supervised manner.The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models with and without Grad-SAM loss.Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset.The strong generalization ability of our approach is verified with the RobotCar dataset using models pre-trained on urban parts of the CMU-Seasons dataset.Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision,especially under challenging environments with illumination variance,vegetation,and night-time images.Moreover,real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization.

    I.INTRODUCTION

    VISUAL localization is an essential problem in visual perception for autonomous driving and mobile robots[1]–[3],which is low-cost and efficient compared with global positioning system-based (GPS-based) or light detection and ranging-based (LiDAR-based) localization methods.Image retrieval,i.e.,recognizing the most similar place in the database for each query image [4]–[6],is a convenient and effective technique for image-based localization,which serves place recognition for loop closure and provides initial pose for finer 6-DoF camera pose regression [7],[8] for relocalization in simultaneous localization and mapping (SLAM).

    However,the drastic perceptual changes caused by longterm environmental condition variance,e.g.,changing seasons,illumination,and weather,casts serious challenges on image-based localization in long-term outdoor self-driving scenarios [9].Traditional feature descriptors (SIFT,BRIEF,ORB,BRISK,etc.) can be only used for image matching under scenes without significant appearance changes due to the reliance on image pixels.With convolutional neural networks (CNNs) making remarkable progress in the field of computer vision and autonomous driving [10],learning-based methods have gained significant attention owing to the robustness of deep features against changing environments for place recognition and retrieval [11]–[13].

    Contrastive learning is an important technique for image recognition tasks [14]–[16],also known as deep metric learning,which aims to learn metrics and latent representations with closer distance for similar images.Compared to face recognition,supervised learning for place recognition[13],[17] suffers from difficulty in determining which clip of images should be grouped to the same place in the sequence of continuous images.Moreover,supervised contrastive learning methods for outdoor place recognition [18],[19] need numerous paired samples for model training due to heterogeneously entangled scenes with multiple environmental conditions,which is costly and inefficient.Additionally,considering a feature map with salient areas in the explanation of CNNs for classification task [20]–[22],retrieval-based localization could be addressed through such attentive or contextual information [23],[24].However,these methods have no direct access to the similarity of the extracted feature so they are not appropriate for high-precision localization.

    To address these issues,we first propose an unsupervised and implicitly content-disentangled representation learning through probabilistic modeling to obtain domain-invariant features (DIF) based on multi-domain image translation with feature consistency loss (FCL).For retrieval with high accuracy,a novel gradient-weighted similarity activity mapping (Grad-SAM) loss is introduced inside the training framework inspired by [20]–[22].Furthermore,a novel unsupervised adaptive triplet loss is incorporated in the pipeline to promote the training of FCL or Grad-SAM and the two-stage test pipeline is implemented in a coarse-to-fine manner for the performance compensation and improvement.We further investigate the localization and place recognition performance of the proposed method by conducting extensive experiments on both CMU-Seasons dataset and RobotCar-Seasons dataset.Compared to state-of-the-art image-based baselines,our method presents competitive results in medium and high precision.In the real-site experiment,the proposed two-stage method is validated to be simultaneously timeefficient and effective.An example of image retrieval is shown in Fig.1.Our contributions are summarized as follows:

    1) A domain-invariant feature learning framework is proposed based on multi-domain image-to-image translation architecture with feature consistency loss and is statistically formulated as a probabilistic model of image disentanglement.

    2) A new Grad-SAM loss is proposed inside the framework to leverage the localizing information of feature map for highaccuracy retrieval.

    3) A novel adaptive triplet loss is introduced for FCL or Grad-SAM learning for the self-supervised contrastive learning and gives the effective two-stage retrieval pipeline from coarse to fine.

    4) The effectiveness of the proposed approach is validated on CMU-Seasons dataset and RobotCar-Seasons dataset for visual localization through extensive experimentation.Our results are on par with state-of-the-art baselines of image retrieval-based localization for medium and high precision.Also,the time-efficiency and effectiveness of its applicability is shown through a real-site experiment as well.

    The rest of this paper is organized as follows.Section II presents the related work in place recognition and representation learning for image retrieval.Section III presents the formulation of domain-invariant feature learning model with FCL.Section IV introduces the adaptive triplet loss and the two-stage retrieval pipeline with Grad-SAM loss.Section V shows the experimental results on visual localization benchmark.Finally,in Section VI we draw our conclusions and present some suggestions for the future work.

    II.RELATED WORK

    A.Place Recognition and Localization

    Outdoor visual place recognition has been studied for many years for visual localization in autonomous driving or loop closure detection of SLAM,in which the most similar images are retrieved from key frame database for query images.Traditional feature descriptors have been used in traditional robotic applications [25],[26] and are aggregated for image retrieval and matching [27]–[30],which have successfully addressed most cases of loop closure detection in visual SLAM [31] without significant environmental changes.VLAD [32] is the most successful man-made feature for place recognition and has been extended to different versions.NetVLAD [4] extracts deep features through VLAD-like network architecture.DenseVLAD [6] presents impressive results through extracting multi-scale SIFT descriptor for aggregation under drastic perceptual variance.To reduce the false positive rates of single feature-based methods,sequencebased place recognition [33],[34] is proposed for real-time loop closure for SLAM.

    Fig.1.On the first row,column (a) shows a query image under Overcast +Mixed Foliage condition and column (b) shows the retrieved image under Sunny+No Foliage condition.On the second row,the gradient-weighted similarity activation maps are shown for the above images.The activation map visualizes the salient area on the image which contributes most to the matching and retrieval across the different environments.

    Since convolutional neural networks (CNNs) has successfully addressed many tasks in computer vision [35],long-term visual place recognition and localization have significantly developed assisted along with CNNs [4],[13],[36].Some solutions to the change of appearance are based on image translation [37]–[40],where images are transfered across different domains based on generative adversarial networks (GANs) [41],[42].Poravet al.[43] first translates query images to database domain through CycleGAN [44] and retrieves target images through hand-crafted descriptors.ToDayGAN [45] similarly translates night-images to dayimages and uses DenseVLAD for retrieval.Jenicek and Chum [36] proposes to use U-Net to obtain photometric normalization image and finds deep embedding for retrieval.However,generalization ability is limited by translation-based methods because the accuracy of retrieval on image level largely depends on the quality of the translated image compared to the retrieval with latent-feature.

    Some other recent work follows the pipeline of learning the robust deep representation through neural networks together with semantic [46],[47],geometric [48],[49],context-aware information[23],[24],[50],[51],etc.Although these models can perform the image retrieval in the feature level,the representation features are trained with the aid of auxiliary information which is costly to obtain in most cases.With the least human effort for auxiliary perception information and inspired by classification activation map [20]–[22] in visual explanation of CNN,we introduce the notion of activation map to the representation learning for fine place recognition,of which the necessity and advantages lie in implementing retrieval in the latent feature space with self-supervised attentive information without any human effort or laborious annotations.

    B.Disentanglement Representation

    Latent representation reveals the feature vectors in the latent space which determine the distribution of samples.Therefore,it is essential to find the latent disentangled representation to analyze the attributes of data distribution.A similar application is the latent factor model (LFM) in recommender systems [52]–[54],where the latent factor contributes to the preference of specific users.In the field of style transfer or image translation [37],[55],deep representations of images are modeled according to the variations of data which depend on different factors across domains [56],[57],e.g.,disentangled content and style representation.Supervised approaches [58],[59] learn class-specific representations through labeled data,and many works have appeared to learn disentangled representation in unsupervised manners [60],[61].Recently,fully-and partially-shared representation of latent space have been investigated for unsupervised image-to-image translation [39],[40].Inspired by these methods,where the content code is shared across all the domains but the style code is domain-specific,our domain-invariant representation learning is probabilistically formulated and modeled as an extended and modified version of CycleGAN [44] or ComboGAN [38].

    For the application of representation learning in place recognition under changing environments,where each environmental condition corresponds to one domain style and the images share similar scene content across different environments,it is appropriate to make the assumption of disentangled representation to this problem case.Recent works for condition-invariant deep representation learning [5],[62]–[64] in long-term changing environments mainly rely on variance-removal or other auxiliary information introduced in Section II-A.Reference [17] removes the dimension related to the changing condition through PCA for the deep embeddings of latent space through classification model.Reference [12]separates the condition-invariant representation from VLAD features with GANs across multiple domains.Reference [65]filters the distracting feature maps in the shallow CNNs but matches with deep features in deeper CNNs to improve condition-and viewpoint-invariance [66] using image pairs.Compared to these two-stage or supervised methods,we adopt domain-invariant feature learning methods [63],[64] which possess advantages on direct,low-cost,and efficient learning.

    C.Contrastive Learning

    Contrastive learning,a.k.a.,deep metric learning [14],[67]stems from distance metric learning [68],[69] in machine learning but extracts deep features through deep neural networks,i.e.,learning appropriate embeddings and metrics for effective discrimination between similar sample pairs and different sample pairs.With the help of neural networks,deep metric learning typically utilizes siamese networks [70],[71]or triplet networks [72],[73],which makes the embedding of same category closer than that of different category with triple labeled input samples for face recognition,human reidentification,etc.

    Coming to long-term place recognition and visual localization,many works have recently used supervised learning together with siamese networks and triplet loss [18],[62].To avoid vanishing gradient of small distance from different pairs with triplet loss form [14],[15] proposes another form of triplet loss.Due to the hard-annotated data for supervised learning,Radenovi?et al.[19] proposes to leverage geometry of 3D model from structure-from-motion(SfM) for triplet learning in an automated manner.But SfM is off-line and costly,so it is not possible for end-to-end training.Instead we employ an unsupervised triplet training technique adapted to the DIFL framework [63] so that domain-invariant and scene-specific representation can be trained in an unsupervised and end-to-end way efficiently.

    III.FORMULATION OF DOMAIN-INVARIANT FEATURE LEARNING

    A.Problem Assumptions

    Our approach to long-term visual place localization and recognition is modeled in the setting of multi-domain unsupervised image-to-image translation,where all query and database images are captured from multiple identical sequences across environments.Images in different environmental conditions belong to corresponding domains respectively.Let the total number of domains be denoted asNand two different domains are randomly sampled from{1,...,N}for each translation iteration,e.g.,i,j∈{1,...,N},i≠j.Letxi∈Xiandxj∈Xjrepresent images from these two domains.For the multi-domain image-to-image translation task [38],the goal is to find all conditional distributionsp(xi|xj),?i≠j,i,j∈{1,...,N}with known marginal distribution ofp(xi),p(xj),and translated conditional distributionp(xj→i|xj),p(xi→j|xi).Since different domains correspond to different environmental conditions,we suppose the conditional distributionp(xi|xj) is monomodal and deterministic compared to multimodal distribution across only two domains in [40].AsNincreases to infinity and becomes continuous,the multi-domain translation model covers more domains and can be regarded as a generalized multi-modal translation with limited domains.

    Fig.2.The architecture overview for model training from domain i to j and general image retrieval pipeline while testing.The involved losses include GAN loss,cycle loss and feature loss,SAM loss,and triplet loss.Note that SAM loss are only used for fine model and the encoder,decoder and discriminator are specific for each domain.

    Like the shared-latent-space assumption in the recent unsupervised image-to-image translation methods [39],[40],the content representationcis shared across different domains while the style latent variablesibelongs to each specific domain.For the image joint distribution in one domainxi∈Xi,it is generated from the prior distribution of content and style,xi=Gi(si,c),and the content and style are independent of each other.Since the condition distributionp(xi|xj)is deterministic,the style variable is only embodied in the latent generator of the specific domain,i.e.,Under such assumptions,our method could be regarded as implicitly partially-shared,although only content latent code is explicitly found across multiple domains with corresponding generators.Following the previous work [40],we further assume that the domain-specific decoder functions for shared content code,are deterministic and their inverse encoder functions exist,whereAnd our goal of domain-invariant representation learning is to find the underlying decodersand encodersfor all the environmental domains through neural networks,so that the domain-invariant latent codeccould be extracted for any given image samplexithroughc=The overview architecture based on the assumption is shown in Fig.2,of which the details are introduced in the Sections III and IV.

    B.Model Architecture

    We adopt the multi-domain image-to-image translation architecture [38],which is an expansion of CycleGAN [44]from two domains to multiple domains.The generator networks in the framework are decoupled into domainspecific pairs of encodersand decodersfor any domaini.The encoder are the first half of the generator while the decoder is the second half for each domain.For image translation across multiple domains,the encoders and decoders can be randomly combined like manipulation of blocks.The discriminatorsDiare also domain-specific for domainiand optimized in adversarial training as well.The detailed architectures of encoder,decoder,and discriminator for each domain is the same as ComboGAN [38].Note that[63] applies the ComboGAN architecture for image retrieval with feature consistency loss,resulting in an effective selfsupervised retrieval-based localization method.However,in this section we further formulate the architecture in a probabilistic framework,combining the multi-domain image translation and domain-invariant representation learning.

    For images in similar sequences under different environments,first suppose domaini,jar e selected randomlyxi,xjand images are denoted as .The basic framework DIFL is shown as Fig.3,including GAN loss,cycle consistency loss and feature consistency loss.For the image translation pass from domainito domainj,the latent feature is fir st encoded by encoderand then decoded by decoder.The translated image goes back through encoderand decoderto find the cycle consistency loss (1) [44].Also,the translated image goes through the discriminatorDjto find adversarial loss (2) [41].The pass from domainjto domainiis similar.

    The adversarial loss (2) makes the translated imagexi→jindistinguishable from the real imagexjand the distribution of translated images close to the distribution of real images.

    The cycle consistency loss (1) originates from CycleGAN[44],which has been proved to infer deterministic translation[40] and is suitable for representation learning through image translation among multiple domains.For the pure multidomain image translation task,i.e.,ComboGAN [38],the total ComboGAN loss only contains adversarial loss and cycle consistency loss.

    Fig.3.Network architecture for image translation from domain i to j.Constraint by GAN loss,cycle loss,and feature loss,the latent feature code is the domain-invariant representation.The discriminator D j results in GAN loss through adversarial training,given the real image of domain j and the translated image from domain i to j.

    Since every domain owns a set of encoder,decoder,and discriminator,the total architecture is complicated and can be modeled through a probabilistic graph if all the encoders and decoders are regarded as conditional probability distribution.Supposing the optimality of ComboGAN loss (3) is reached,the complex forward propagation during training can be simplified and the representation embedding can be analyzed.

    Without loss of generality,imagexi,m,xj,nare selected from image sequencesxi,xj,i≠j,wherem,nrepresent the places of the shared image sequences and only related to the content of images.According to the assumptions in Section III-A,m,nrepresent the shared domain-invariant content latent codecacross different domains.For the translation from imagexi,mto domainj,we have

    The latent codezi,mimplies the relationship of domainiand the content of imagemfrom (4).Due to the adversarial loss(2),the translated imagexi→j,mhas the same di stribution as imagexj,n,i.e.,xi→j,m,xj,n~p(xj).For the rec onstructed image from (7),the cycle consistency loss (1) limits it to the original image .xi,m

    From (4) and (5),we have

    which indicatesxi→j,mandiare independent if theoptimality of adversarial loss (2) is reached,andzi→j,mandiare also independent from (6).Similarly,zi,mandjare independent for anyj≠i.Combining (5),(6) and (4),(7),we can find the relationship betweenzi,mandzi→j,mand the weak form of inverse constraint on encoders and decoders below:

    When the optimality of original ComboGAN loss (3) is reached,for anyi≠j,the latent codezi,mandzi→j,mare not related tojandi,respectively,which is consistent with the proposition that cycle consistency loss cannot infer sharedlatent learning in [39].Consequently,the representation embeddings are not domain-invariant and not appropriate for image retrieval,and the underlying inverse encoders and decoders have not been found through the vanilla ComboGAN image translation model.

    C.Feature Consistency Loss

    To obtain the shared-latent feature across different domains,unlike [39],we use an additional loss exerted on the latent space called the feature consistency loss as proposed in [63].Under the above assumptions,for imagexifrom domainiit is formulated as

    As a result,the domain-invariant feature [63] can be extracted by combining all the weighted losses together

    Here gives the theoretical analysis for FCL.Supposing the optimality of the DIF loss (13) is reached,(4)–(11) are still satisfied.Additionally,because of the feature consistency loss(12),based on (4),(6),(10),we have

    Sincezi→j,mandiare independent (as discussed in the previous section),zi,mandiare independent for any domainifrom (14),which indicates that the latent feature is wellshared across multiple domains and represents the content latent code given any image from any domain.Furthermore,the trained encoders and decoders are inverse and the goal of finding underlying encodersand decodersis reached according to Section III-A.So it is appropriate to use the content latent code for image representation across different environmental conditions.

    IV.COARSE-TO-FINE RETRIEVAL-BASED LOCALIZATION

    A.Gradient-Weighted Similarity Activation Mapping Loss

    The original domain-invariant feature (13) cannot excavate the context or localizing information of the content latent feature map;as a result the performance of place recognition under high accuracy is limited.To this end,we propose a novel gradient-weighted similarity activation mapping loss for shared-latent feature to fully discover the weighted similar area for high-accuracy retrieval.

    Inspired by CAM [20],Grad-CAM [21],and Grad-CAM++[22] in visual explanation for classification with convolutional neural networks,we assume that the place recognition task can be regarded as an extension of image multi-classification with infinite target classes,where each database image represents a single target class for each query image during the retrieval process.Then,for each query image,the similarity to each database image is treated as the score before softmax or probability for multi-classification task and the one with the largest similarity is the retrieved result,which is similar to the classification result with the largest probability.

    Ideally,suppose the identical content latent feature maps from domaini,j,zi,m,zj,m,have the shape ofn×h×w,where identical contentmis omitted for brevity.First the mean value of the cosine similarity on the height and width dimension is calculated below:

    Yis the score of similarity betweenziandzj.Following the definition of Grad-CAM [21],we have the similarity activation weight and map:

    Equations (17) and (18) are the mathematic formulation of the proposed Grad-SAM,where the activation map is aggregated by each gradient-weighted feature map,retaining the localizing information of the deep feature map.In order to only input the positively-activated areas for training,we exert aReLUfunction to obtain the final activation mapLi,jorLj,i.

    Particularly,as shown in Fig.4,inside the unsupervised DIFL architecture,the content latent codeszi,m,zj,nare shared from the same distribution butzi,m≠zj,nfor the unpairedm≠n.The similarity activation mapLi,m,Li→j,mcould be visualized by resizing to the original size in Fig.4.According to FCL loss (12),zi,mandzi→j,mtend to be identical,which means that the calculation of similarity between them is meaningful and so is the SAM loss.Therefore,the selfsupervised Grad-SAM loss for domainicould be formulated below based on (16)–(18):

    wherezi,mandzi→j,mare substituted intoziandzjin (16)–(18)andLi,mandLi→j,mare short forLi,j,mandLi→j,i,mderived from(17) and (18).

    Fig.4.The illustration of one branch of SAM loss from domain i to j.The real image in domain i is first translated to fake image in domain j,and the gradient of similarity w.r.t.each other could be calculated,denoted as red dashed lines.And then the activation map is the sum of feature map weighted by the gradient,shown as color-gradient line from red to black,and SAM loss could be calculated in a self-supervised manner.Note that the notation of Li,m and L i→j,m here are short for L i,j,m and L i→j,i,m derived from (17) and (18).

    B.Adaptive Triplet Loss

    Though the domain-invariant feature learning is obtained through feature consistency loss (12) and Grad-SAM loss (19)is for further finer retrieval with salient localizing information on the latent feature map,it is difficult to distinguish different latent content codes using domain-invariant features without explicit metric learning.As the distance of the latent features with the same content is decreasing due to feature consistency loss (12) and Grad-SAM loss (19),the distance of latent features for different contents may be forced to diminish as well,resulting in mismatched retrievals for test images in long-term visual localization.

    Toward this end,we propose a novel adaptive triplet loss based on feature consistency loss (12) and Grad-SAM loss(19) to improve the contrastive learning of the latent representation inside the self-supervised DIFL framework.Suppose unpaired imagesxi,m,xj,nare selected from domaini,j,i≠j,wherem,nrepresent the content of images.Note that for the purpose of unsupervised training pipelines,one of the selected image is horizontally flipped while the other is not so thatm≠nis assured for the negative pair.The operation of flipping only one of the input images is random and also functions as data augmentation due to the fact that the flipped images follow the distribution of original images.Details could be found in Section V-A.For the self-supervised contrastive learning,the positively paired samples are not given but generated from the framework in (4)–(6) and(16)–(18),i.e.,zi,m,zi→j,mandLi,m,Li→j,m.For the negatively paired samples,for the sake of the fact that the images under the same environmental condition tend to be closer than ones under different conditions,the stricter constraint is implemented for negative pairs with the translated image and the other real image,which are under the same environment but different places,i.e.,zi→j,m,zj,nandLi→j,m,Lj,n.

    Moreover,in order to improve the efficiency of the triplet loss for representation learning during the late iterations,the negative pair with the least distance between the original and the translated one is automatically selected as the hard negative pairfrom a group of random negative candidates Zj,nor Lj,n,shown as (20) and (21).The adaptive triplet loss is calculated through these hard negative pairs without any supervision or extra priority information.

    We adopt the basic form of triplet loss from [15],but themargindepends on the feature consistency loss (12) or Grad-SAM loss (19),which adapts to the representation learning of(12) or (19).The illustrations of the adaptive triplet loss for FCL and SAM are shown in Figs.5 and 6.The adaptive triplet loss for FCL and Grad-SAM for domainiis shown below:

    where hyperparametersmf,msare themargin,which is the value that the distance of negative pairs exceeds the distance of self-generated positive pairs when the image translation is well trained,i.e.,p(xi→j,m)=p(xj,m).However constantmarginhas an influence on the joint model training with FCL or Grad-SAM loss,so we propose the self-adaptive term,which is the exponent function of negative FCL loss or Grad-SAM loss weighted by αfor αs.

    Combining with the adaptive triplet loss (22) or (23),in the beginning of the whole model training,the exponential adaptive term is close to 0 so the triplet loss term does not affect the FCL (12) or Grad-SAM (19).But as the training process goes by,the triplet loss would dominate the model training since the exponential adaptive term becomes larger and closer to 1.

    C.Coarse-to-Fine Image Retrieval Pipeline

    Different applications have different requirements for coarse-or high-precision localization,e.g.,loop closure and relocalization in SLAM and 3D reconstruction.As shown in the Section III-C,the feature consistency loss together with the cycle consistency loss and GAN loss in image-to-image translation contribute to the domain-invariant representation learning architecture,where the latent feature is independent of the multiple environmental domains so that the feature could be used for image representation and retrieval across different environments.While the Grad-SAM loss in Section IV-A is incorporated to the basic architecture for the purpose of learning salient area and attentive information from the original latent feature,which is important to the highprecision retrieval.The adaptive triplet loss in Section IV-B can balance the self-supervised representation learning and feature consistency loss,which improves the retrieval results through ablation studies in Section V-D.

    Fig.5.The illustration of one-branch adaptive triplet loss for FCL from domain i to j.The inputs of the loss are the encoded latent features from real images in domain i,j and the translated image i →j,resulting in the negative pairs with the red dashed box and the positive pairs with the green dashed box.Note that positive pairs only differ in the environment while the place is the only difference for the negative pairs.

    Fig.6.The illustration of one-branch adaptive triplet loss for Grad-SAM from domain i to j.The inputs of the loss are the similarity activation maps from real images in domain i,j and the translated image i →j.The negative pairs are bounded with the red dashed box while the positive pairs are bounded with the green dashed box.Note that the activation maps in domaini from two pairs are slightly different.

    For the image retrieval,we adopt the coarse-to-fine strategy to fully leverage the models with different training settings for different specific purposes.The DIFL model with FCL (12)and triplet loss (22) aims to find the database retrieval for the query image using general domain-invariant features and results in better performance of localization within larger error thresholds,shown in Section V-D,which gives a good initialrange of retrieved candidates and can be used as a coarse retrieval.

    TABLE IABLATION STUDY ON DIFFERENT STRATEGIES AND LOSS TERMS

    The total loss for coarse-retrieval model training is shown below:

    where λcyc,λFCL,and λTriplet_FCLare the hyperparameters to weigh different loss terms.

    Furthermore,to obtain the finer retrieval results,we incorporate the Grad-SAM (19) with its triplet loss (23) into the coarse-retrieval model,which fully digs out the localizing information of feature map and promotes the high-accuracy retrieval across different conditions shown in Table I.However,according to Section V-D,the accuracy of lowprecision localization for fine-retrieval model is lower than the coarse-retrieval model,which shows the initial necessity of the coarse retrieval.The total loss for the finer model training is shown below:

    where λcyc,λFCL,λSAM,λTriplet_SAM,and λTriplet_FCLare the hyperparameters for each loss term.

    Once the coarse and fine models are trained,the test pipeline contains coarse retrieval and finer retrieval.The 6-DoF poses of database images are given while the goal is to find the poses of query images.We first pre-encode each database image under the reference environment into feature maps through coarse model off line,forming the database of coarse features.While testing,for every query image,we extract the feature map using coarse encoder of the corresponding domain and retrieve thetop-Nmost similar ones from pre-encoded coarse features in the database.TheNcandidates are then encoded through the fine model to find the secondary feature maps,and the query image is also encoded through the fine model to find the query feature.The most similar one in theNcandidates is retrieved as the final result for localization.Although the coarse-to-fine strategy may not get the most similar retrieval globally in some cases,it will increase the accuracy within coarse error in Section V-D compared to the only single fine model,which is beneficial to the application of pose regression for relocalization.It may also benefit from the filtered coarse candidates in some cases,as in Table I,to improve medium-precision results.The 6-DoF pose of query image is the same as the finally-retrieved one in the database.

    V.ExPERIMENTAL RESULTS

    We conduct a series of experiments on CMU-Seasons dataset and validate the effectiveness of coarse-to-fine pipelines with the proposed FCL loss,Grad-SAM loss and adaptive triplet loss.With the model only trained on the urban parts of the CMU seasons dataset in an unsupervised manner,we compare our results with several image-based localization baselines on the untrained suburban and park parts of the CMU-Seasons dataset and RobotCar-Seasons dataset,showing the advantage under scenes with massive vegetation and robustness to huge illumination change.To prove the practical validity and applicability for mobile robotics,we have also conducted real-site field experiments under different environments with more angles using a mobile robot with camera and RTK-GPS.We conduct these experiments on two NVIDIA 2080Ti cards with 64 G RAM on Ubuntu 18.04 system.Our source code and pre-trained models are available on https://github.com/HanjiangHu/DISAM.

    A.Experimental Setup

    The first series of experiments are conducted on the CMUSeasons dataset [9],which is derived from the CMU Visual Localization [75] dataset.It was recorded by a vehicle with left-side and right-side cameras over a year along a route roughly 9 kilometers long in Pittsburgh,U.S.The environmental change of seasons,illumination,and especially foliage is very challenging on this dataset.Reference [9] benchmarks the dataset and presents the groudtruth of camera pose only for the reference database images,adding new categories and area divisions of the original dataset as well.There are 31 250 images in 7 slices for urban area,13 736 images in 3 slices for suburban area,and 30 349 images in 7 slices for park area.Each area has only one reference and eleven query environmental conditions.The condition of database isSunny+No Foliage,and conditions of query images could be any weather intersected with vegetation condition,e.g.,Overcast+Mixed Foliage.Since the images in the training dataset contain both left-side and right-side ones,the operation of flipping horizontally is reasonable and acceptable for the unsupervised generation of negative pairs and data augmentation,as introduced in Section IV-B.

    The second series of experiments are conducted on RobotCar-Seasons dataset [9] derived from Oxford RobotCar dataset [76].The images were captured with three Point Grey Grasshopper2 cameras on the left,rear,and right of the vehicle along a 10 km route under changing weather,season,and illumination across a year in Oxford,U.K.It contains 6 95 4 triplets for database images under overcast condition,3 100 triplets for day-time query images under 7 conditions,and 878 triplets for night-time images under 2 conditions.In the experiment we only test rear images with the pre-trained model on the urban part of CMU-Seasons dataset to validate the generalization ability of our approach.Considering that not all conditions of RoborCar datasets have exactly corresponding conditions in CMU-Seasons,we choose the pre-trained models under the conditions with the most similar descriptions and dates from CMU-Seasons dataset for all the conditions in RobotCar dataset listed in Table II.Note that for the conditions which are not included in CMU-Seasons,we usethe pre-trained models under the reference condition instead,Overcast+Mixed Foliage,for the sake of fairness.

    TABLE IICONDITION CORRESPONDENCE FOR ROBORCAR DATASET

    The images are scaled to 286×286 and cropped to 256×256randomly while training but directly scaled to 256×256while testing,leading to a feature map with the shape of 256×64×64.We follow the protocol introduced in[9] which is the percentage of correctly-localized query images.Since we only focus on high and medium precision,the pose error thresholds are (0.25 m,2°) and (0.5 m,5°) while coarse-precision (low-precision) (5 m,10°) is omitted for the purpose of high-precision localization except for the ablation study.We choose several image-based localization methods FAB-MAP [74],DIFL-FCL [63],NetVLAD [4],and Dense-VLAD [6],which are the best image-based localization methods.

    B.Evaluation on CMU-Seasons Dataset

    Following the transfer learning strategy for DIFL in [63],we fine-tune the pre-trained models in [63] at epoch 300 which are trained only with cycle consistency loss and GAN loss under all the images from the CMU-Seasons dataset for pure image translation task.Then,for the representation learning task,the model is fine-tuned with images from Urban areas in an unsupervised manner,without paired images across conditions.After adding the other loss terms in (24) or(25),we continue to train until epoch 600,with a learning rate linearly decreasing from 0.000 2 to 0.Then the model is trained in the same manner until epoch 1 200 with split 300 epochs.In order to speed up and stabilize the training process with triplet loss,we use the random negative pairs from epoch 300 to epoch 600 for the fundamental representation learning and adopt the hard negative pairs from epoch 600,as shown in Section IV-B.We choose the hard negative pair from 10 pairs of negative samples for each iteration.

    For the coarse-retrieval model training,the weight hyperparameter are maximumly set as λcyc=10,λFCL=0.1,and λTriplet_FCL=1,which are all linearly increasing from 0 as the training process goes by to balance the multi-task framework.Similarly for the fine-retrieval model training,we set λcyc=10,λFCL=0.1,λSAM=1000,λTriplet_SAM=1,and λTriplet_FCL=1with a similar training strategy.The fine model consists of the metrics of bothL2 andcosine similarityfor FCL terms while onlyL2 metric is used in the coarse model for FCL terms.For the adaptive triplet loss,we setmf=5,αf=2 in triplet FCL loss (22) andms=0.1,αs=1000 in triplet SAM loss (23).And during the two-stage retrieval,the number of coarse candidatestop-Nis set to be 3,which makes it both efficient and effective.In the two-stage retrieval pipeline,we use the mean value of thecosine similarityon the height and width dimension as the metric during the coarse retrieval,as shown in (16).For the fine retrieval,we use the normalcosine similarityfor the flatten secondary features due to the salient information in the feature map.

    Our final result is compared with baselines shown as Table III,which shows that ours outperforms baseline methods for highand medium-precision localization,(0.25 m,2°) and(0.5 m,5°),in park and suburban area,which shows powerful generalization ability because the model is only trained on theurban area.The medium-precision localization in the urban area is affected by numerous dynamic objects.We further compare the performance on different foliage categories from[9],FoliageandMixed Foliagewith the reference database underNo Foliage,which is the most challenging problem for this dataset.The results are shown in Table IV,from which we can see that our result is better than baselines under different conditions of foliage for the localization with medium and high precision.To investigate the performance under different weather conditions,we compare the models with baselines on theOvercast,Cloudy,andLow Sunconditions with the reference database underSunnyin Table V,which covers almost all the weather conditions.It could be seen that our results present the best medium-and highaccuracy results on most of the weather conditions.TheCloudyweather contains plenty of clouds in the sky,which provides some noise in the activation map for fine retrieval with reference to the clear sky underSunny,which could be regarded as a kind of dynamic objects.

    TABLE IIIRESULTS COMPARISON TO BASELINES ON CMU-SEASONS DATASET

    TABLE IVCOMPARISON WITH BASELINES ON FOLIAGE CONDITION REFERENCE IS NO FOLIAGE

    From the results of different areas,vegetation,and weather,it can be seen that the finer retrieval boosts the results of coarse retrieval.Moreover,the coarse-to-fine retrieval strategy gives better performance than the fine-only method in some cases,showing the significance and effectiveness for highand medium-precision localization of the two-stage strategy.The reasonable explanation for the good performance under different foliage and weather conditions lies in that the latent content code is robust and invariant for changing vegetation and illumination.All the results (including ours) are from theofficial benchmark website of long-term visual localization[9].Some results of fine-retrieval are shown in Fig.7,where the activation maps give the localizing information of feature maps and the salient areas mostly exist around the edges or adjacent parts of different instance patches due to the gradient-based activation.

    TABLE VCOMPARISON WITH BASELINES ON WEATHER CONDITION REFERENCE IS SUNNY

    C.Evaluation on RobotCar Dataset

    In order to further validate the generalization ability of our proposed method to the unseen scenarios,we directly use the pre-trained models on urban area of CMU-Seasons to test on the RobotCar dataset,according to the correspondent condition from CMU-Seasons for every query condition of RobotCar based on Table II.Considering the database images are much more than query images under each condition,the two-stage strategy is skipped for practicality and efficiency,only testing coarse-only and fine-only models.The metric for both coarse and fine retrieval is the mean value of thecosine similarityon the height and width dimension as shown in (16).

    The comparison results are shown in Table VI,where we can see that our method outperforms other baseline methods under theNightandNight-rainconditions.Note that the model we use for the night-time retrieval is the same as the database because night-time images are not included in the training set,showing the effectiveness of the representation learning in the latent space form autoencoder-structured model.Since the images underNightandNight-rainconditions have too poor context or localizing information to find the correct similarity activation maps,the coarse model performs better than the finer model.

    Our results under all theDayconditions are the best for high-precision performance,showing the powerful generalization ability in the unknown scenarios and environments through attaining satisfactory retrieval-based localization results.All the results (including ours) are also from the official benchmark website of long-term visual localization[9].Some day-time results are shown in Fig.8,including all the environments which have similar ones among pre-trained models on CMU-Seasons dataset.

    D.Ablation Study

    Fig.7.Results on CMU-Seasons dataset.For each set of images in (a) to (e),the top left is the query image while the top right is the database image under the condition of Sunny+No Foliage.The query images of set (a) to (e) are under the conditions of Low Sun+Mixed Foliage,Overcast+Mixed Foliage,Low Sun+Snow,Low Sun+Foliage and Sunny+Foliage,respectively.The visualizations of similarity activation maps are on the bottom row for all the query or database RGB images.

    TABLE VIRESULTS COMPARISON TO BASELINES ON ROBOTCAR DATASET

    For the further ablation study in Table I,we implement different strategies (Coarse-only,Fine-only,andCoarse-to-fine) and different loss terms (FCL,Triplet FCL,SAM,andTriplet SAM) during model training,and test them on CMUSeasons dataset.The only difference betweenCoarse-onlyandFine-onlylies in whether the model is trained withSAMor not,while coarse-to-fine strategy follows the two-stage strategy in Section IV-C.It could be seen thatCoarse-onlymodels perform the best in low-precision localization,which is suitable to provide the rough candidates for the upcoming finer retrieval.With the incorporation ofSAM-related loss,the medium-and high-precision accuracies increase while the low-precision one decreases.TheCoarse-to-finecombines the advantages ofCoarse-onlyandFine-onlytogether,improving the low-precision localization of fine models as well as the medium-and high-precision localization of coarse models simultaneously,which shows the effectiveness and significance of the two-stage strategy by overcoming both the weaknesses.Furthermore,because of the high-quality potential candidates provided byCoarse-onlymodel,some medium-precision results ofCoarse-to-fineon the last row perform the best and other results are extremely close the best ones,which shows the promising performance of the twostage strategy.

    From the first two rows ofCoarse-onlyandFine-onlyin Table I,theFlipped Negativeand theHard Negativesamples are shown to be necessary and beneficial to the final results,especially for the flipping operation for data augmentation.On the third and fourth row,the DIFL with FCL performs better than vanilla ComboGAN (3),which indicates that FCL assists to extract the domain-invariant feature.Due to the effective self-supervised triplet loss with hard negative pairs,the performance withTriplet FCLorTriplet SAMis significantly improved compared with the results on the fourth or ninth row,respectively.To validate the effectiveness ofAdaptive Marginin triplet loss,we compare the results ofConstant MarginandAdaptive Margin,which show that the model with adaptive margin gives better results than that with constant margin for bothTriplet FCLandTriplet SAM.The last row inFine-onlystrategy shows the hybrid adaptive triplet losses of both FCL and SAM are beneficial to the fine retrieval.Note that the settings of training and testing for Table I are consistent internally,but are slightly different from the experimental settings in the [63] in many aspects,like training epochs,the metrics for retrieval,the choice of the pre-trained models for testing,etc.Also,the adaptive margin for triplet loss is partially influenced by the hard negative samples,because the less distance of negative pairs means relatively less margin to positive pairs which reduce the positive distance and the adaptive margin increases consequently.

    Fig.8.Results on RobotCar dataset.For each set of images in (a) to (e),the top left is the day-time query image while the top right is the database image under the condition of Overcast.The query images of set (a) to (e) are under the conditions of Dawn,Overcast-summer,Overcast-winter,Snow and Sun,respectively.The visualizations of similarity activation maps are on the bottom row for all the query or database RGB images.

    E.Real-Site Experiment

    For the real-site experiment,the dataset is collected through an AGV with ZED Stereo Camera and RTK-GPS,and mobile robot is shown as Fig.9(a).The routine we choose is around the Lawn besides the School Building on the campus,which is around 2 km and is shown in Fig.9(b).We collect different environments including the weather,daytime,and illumination changes,as classify them asSunny,Overcast,andNight,respectively.There are 12 typical places as key frames with 25 different angles of view point for challenging localization,marked in red circles in Fig.9(b),compensating the single perspective of driving scenes in both CMU-Seasons and Oxford RobotCar dataset.There are 300 images for each environment and some samples of the dataset are shown as Fig.10.The same places along the routes are mainly within the distance of 5 m,which acts as the 25 groundtruth images of place recognition from GPS data.

    Fig.9.Image (a) shows the mobile robot used to collect dataset with RTKGPS and ZED Stereo Camera.Image (b) shows the routines of the dataset under changing environments,illustrated in different colored lines and the red circles indicate the 12 typical places for recognition and retrieval with different perspectives.

    Since all three environments are during autumn,we use the CMU-Seasons pretrained models underLow Sun+Mixed FoliageforSunny,Overcast+Mixed FoliageforOvercast,andCloudy+Mixed FoliageforNightin the experiments.The three place recognition experiments are query images underSunnywith database underOvercast,query images underSunnywith database underNight,and query images underNightwith database underOvercast.

    Fig.10.Dataset images for real-site experiment.Column (a) is Sunny,Column (b) is Overvast,and Column (c) is Night.The first and last two rows show the changing perspectives,which gives more image candidates for the typical places.

    For each query image,we retrieve Top-Ncandidates (Nfrom 1 to 25) from the database and calculate average recall rate to demonstrate the performance of the coarse-only and fine-only methods.For the coarse-to-fine method,first the coarse-only model retrieves the Top-2Ncandidates (2Nfrom 2 to 50) and then the fine-only model retrieves finer Top-Ncandidates (Nfrom 1 to 25) from them,the average recall is calculated over all the query images.

    As shown in the Fig.11,the results of three proposed methods are validated under three different retrieval environmental settings.From the results,it can be seen that the coarse-only method performs better than the fine-only method in the large-scale place recognition,which is consistent with the results of coarse precision on CMUSeasons in Table I.Besides,the coarse-to-fine strategy obviously improves the performance of both coarse-only and fine-only methods,which shows that the effectiveness and applicability of the two-stage method.The coarse-to-fine performance within top 5 recall is limited by the performance of fine model,which is improved as the number of database retrieval candidates (N) increases.

    Since time consumption is important for place recognition in robotic applications,we have measured the time cost of three proposed methods in the real-site experiment.As shown in Table VII,the average time of inference is to extract the feature representation through encoder while the average time of retrieval is to retrieve top 25 out of 300 database candidates through brute-force searching in the real-site experiment.By comparing the three methods,it could be seen that although the inference time of coarse-to-fine is almost the sum of coarse-only and fine-only,the time consumption is short enough for representation extraction.For brute-force retrieval,the time of coarse-to-fine is a little bit larger than the coarseonly and fine-only methods because the second finer-retrieval stage only find top 25 out of 50 coarse candidates,which costs much less time.Note that the retrieval time cost could be significantly reduced through other ways of search,like KDtree instead of brute-force search,but these techniques are beyond the focus of this work so Table VII only gives relative time comparison of the three proposed strategies,validating the time-efficiency and effectiveness of the two-stage method.

    VI.CONCLUSION

    In this work,we have formulated a domain-invariant feature learning architecture for long-term retrieval-based localization with feature consistency loss (FCL).Then a novel loss based on gradient-weighted similarity activation map (Grad-SAM) is proposed for the improvement of high-precision performance.The adaptive triplet loss based on FCL loss or Grad-SAM loss is incorporated to the framework to form the coarse or fine retrieval methods,resulting in the coarse-to-fine testing pipeline.Our proposed method is also compared with several state-of-the-art image-based localization baselines on CMUSeasons and RobotCar-Seasons dataset,where our results outperform the baseline methods for image retrieval in medium-and high-precision localization in challenging environments.Real-site experiment validate the efficiency and effectiveness of the proposed further.However,there are a few concerns about our method that the performance under the dynamic scenes is weak compared to other image-based methods,which can be addressed by adding semantic information to enhance the robustness to dynamic objects in the future.Another concern lies in the unified model for robust visual localization where the front-end network collaborate with representation learning better.

    Fig.11.Results of the real site experiment.(a) is the result of Sunny query images with the database of Overcast;(b) is the result of Sunny query images with the database of Night;(c) is the result of Night query images with the database of Overcast.

    TABLE VIITIME CONSUMPTION OF DIFFERENT METHODS

    ACKNOWLEDGMENT

    The authors would like to thank Zhijian Qiao from Department of Automation at Shanghai Jiao Tong University for his contribution to the real-site experiments including the collection of dataset and the comparison experiments.

    人妻制服诱惑在线中文字幕| 色精品久久人妻99蜜桃| 美女免费视频网站| 亚洲国产精品合色在线| 亚洲在线自拍视频| 在线看三级毛片| 少妇裸体淫交视频免费看高清| 婷婷精品国产亚洲av在线| 国产精品电影一区二区三区| 两人在一起打扑克的视频| 毛片一级片免费看久久久久 | 国产主播在线观看一区二区| 婷婷精品国产亚洲av| 成人综合一区亚洲| 欧美xxxx性猛交bbbb| 舔av片在线| 国产主播在线观看一区二区| 日本黄大片高清| 久久久久久久久久成人| 一本精品99久久精品77| 日本黄色片子视频| 丝袜美腿在线中文| 免费高清视频大片| 欧美+亚洲+日韩+国产| 国产精品一区二区三区四区免费观看 | 一区二区三区激情视频| 日日夜夜操网爽| 性欧美人与动物交配| 日本免费a在线| 男女下面进入的视频免费午夜| 麻豆久久精品国产亚洲av| 国产成人一区二区在线| 欧美日韩亚洲国产一区二区在线观看| 欧美一区二区亚洲| 色吧在线观看| ponron亚洲| 精品久久久久久久人妻蜜臀av| 婷婷丁香在线五月| 亚洲四区av| 99九九线精品视频在线观看视频| 精品人妻偷拍中文字幕| 日韩,欧美,国产一区二区三区 | 中文字幕人妻熟人妻熟丝袜美| 一边摸一边抽搐一进一小说| 欧美黑人欧美精品刺激| .国产精品久久| 可以在线观看毛片的网站| 女人被狂操c到高潮| 男人的好看免费观看在线视频| 在线观看美女被高潮喷水网站| 欧美日韩黄片免| 久久精品国产亚洲av天美| 麻豆成人午夜福利视频| 一级毛片久久久久久久久女| 少妇被粗大猛烈的视频| av天堂中文字幕网| 久久久久久久久中文| 在线免费观看不下载黄p国产 | 欧美丝袜亚洲另类 | 亚洲中文日韩欧美视频| 亚洲欧美精品综合久久99| 国产成人av教育| 日本一二三区视频观看| 精品免费久久久久久久清纯| 久久精品夜夜夜夜夜久久蜜豆| 国产精品综合久久久久久久免费| 欧美日韩瑟瑟在线播放| 亚洲精品一卡2卡三卡4卡5卡| 老熟妇仑乱视频hdxx| 午夜a级毛片| 亚洲真实伦在线观看| 国产aⅴ精品一区二区三区波| 一边摸一边抽搐一进一小说| 偷拍熟女少妇极品色| 在线天堂最新版资源| 99久国产av精品| 嫩草影院新地址| 变态另类丝袜制服| 黄色女人牲交| 成人毛片a级毛片在线播放| 精品福利观看| 成人精品一区二区免费| 中文字幕人妻熟人妻熟丝袜美| 欧美xxxx性猛交bbbb| 欧美成人一区二区免费高清观看| 精品午夜福利在线看| 亚洲乱码一区二区免费版| 婷婷亚洲欧美| 天堂av国产一区二区熟女人妻| 亚洲久久久久久中文字幕| 无人区码免费观看不卡| 乱人视频在线观看| 国产精品一区二区三区四区久久| 午夜福利在线在线| 国产三级中文精品| 日本黄大片高清| av天堂中文字幕网| 国产精品乱码一区二三区的特点| 国产色爽女视频免费观看| 成年女人毛片免费观看观看9| 色精品久久人妻99蜜桃| 欧美成人性av电影在线观看| 九色成人免费人妻av| 日日撸夜夜添| 久久精品久久久久久噜噜老黄 | netflix在线观看网站| 色av中文字幕| 久久久久性生活片| 国产精品嫩草影院av在线观看 | 级片在线观看| 国产av一区在线观看免费| 亚洲成人久久爱视频| 啪啪无遮挡十八禁网站| АⅤ资源中文在线天堂| 国产乱人视频| 搞女人的毛片| 精品久久久久久久久久免费视频| 91麻豆av在线| 国产成人aa在线观看| 一本久久中文字幕| 嫩草影院新地址| 成人特级av手机在线观看| 国产成年人精品一区二区| 又粗又爽又猛毛片免费看| 国产极品精品免费视频能看的| 国产熟女欧美一区二区| 22中文网久久字幕| 国产 一区精品| 成人av一区二区三区在线看| 国产白丝娇喘喷水9色精品| 久久精品国产亚洲av涩爱 | 日韩欧美精品v在线| 精品人妻偷拍中文字幕| av在线蜜桃| 给我免费播放毛片高清在线观看| 中文字幕熟女人妻在线| АⅤ资源中文在线天堂| 国产精品三级大全| 国产男人的电影天堂91| 一进一出好大好爽视频| 草草在线视频免费看| 精华霜和精华液先用哪个| 我的老师免费观看完整版| 又紧又爽又黄一区二区| 亚洲精品一卡2卡三卡4卡5卡| h日本视频在线播放| 少妇熟女aⅴ在线视频| 精品久久久久久久人妻蜜臀av| 午夜精品一区二区三区免费看| 精品一区二区三区av网在线观看| 真实男女啪啪啪动态图| 国产精品一区二区免费欧美| av黄色大香蕉| 国产大屁股一区二区在线视频| 国产色爽女视频免费观看| 十八禁国产超污无遮挡网站| 午夜久久久久精精品| 久久精品国产鲁丝片午夜精品 | 我的老师免费观看完整版| 国产老妇女一区| 久久国产精品人妻蜜桃| 欧美国产日韩亚洲一区| 久久久久久久久久久丰满 | bbb黄色大片| 最近视频中文字幕2019在线8| 成年女人永久免费观看视频| 国国产精品蜜臀av免费| 亚洲精品色激情综合| 草草在线视频免费看| 精品一区二区免费观看| 国产真实伦视频高清在线观看 | 午夜日韩欧美国产| 成人特级黄色片久久久久久久| 亚洲精华国产精华液的使用体验 | 欧美最黄视频在线播放免费| 在线播放国产精品三级| 国产又黄又爽又无遮挡在线| 久久人妻av系列| 亚洲人成网站高清观看| 国产 一区 欧美 日韩| 日韩欧美国产在线观看| 别揉我奶头 嗯啊视频| 国产aⅴ精品一区二区三区波| 一本精品99久久精品77| 国产色爽女视频免费观看| 成人综合一区亚洲| 赤兔流量卡办理| 亚洲无线观看免费| 网址你懂的国产日韩在线| 三级男女做爰猛烈吃奶摸视频| 成人国产综合亚洲| 国产高清不卡午夜福利| 色哟哟哟哟哟哟| 婷婷精品国产亚洲av在线| 2021天堂中文幕一二区在线观| 少妇丰满av| 高清日韩中文字幕在线| 成年女人永久免费观看视频| 高清日韩中文字幕在线| 久久人人爽人人爽人人片va| 亚洲性夜色夜夜综合| 91在线精品国自产拍蜜月| 69人妻影院| 99精品久久久久人妻精品| 最新在线观看一区二区三区| 91麻豆av在线| 亚洲美女搞黄在线观看 | 韩国av在线不卡| 国产真实乱freesex| 在线免费观看不下载黄p国产 | 永久网站在线| 日韩人妻高清精品专区| 亚洲乱码一区二区免费版| 国产精品av视频在线免费观看| avwww免费| 国产91精品成人一区二区三区| 亚洲av免费在线观看| 狠狠狠狠99中文字幕| 日本爱情动作片www.在线观看 | 亚洲18禁久久av| 国国产精品蜜臀av免费| 男女之事视频高清在线观看| 99国产精品一区二区蜜桃av| 国内精品宾馆在线| 国产日本99.免费观看| 久久久国产成人精品二区| 五月伊人婷婷丁香| 3wmmmm亚洲av在线观看| 免费不卡的大黄色大毛片视频在线观看 | 亚洲精品国产成人久久av| 超碰av人人做人人爽久久| 日韩av在线大香蕉| 国内精品宾馆在线| 99热这里只有是精品在线观看| 欧洲精品卡2卡3卡4卡5卡区| 日韩欧美在线二视频| 国产aⅴ精品一区二区三区波| 国产色婷婷99| 赤兔流量卡办理| 国产精品国产高清国产av| 午夜福利在线观看免费完整高清在 | 久久欧美精品欧美久久欧美| 美女大奶头视频| 国产91精品成人一区二区三区| 床上黄色一级片| 偷拍熟女少妇极品色| 亚洲精华国产精华精| 在现免费观看毛片| 热99re8久久精品国产| 午夜福利成人在线免费观看| 高清在线国产一区| 成人综合一区亚洲| 亚洲精华国产精华液的使用体验 | 国产蜜桃级精品一区二区三区| 国产精品人妻久久久影院| 久久久久久大精品| 成人无遮挡网站| 麻豆成人午夜福利视频| 一卡2卡三卡四卡精品乱码亚洲| 久久国产精品人妻蜜桃| 在线观看舔阴道视频| 变态另类成人亚洲欧美熟女| 身体一侧抽搐| 精品人妻视频免费看| 日本精品一区二区三区蜜桃| 亚洲最大成人av| 免费观看的影片在线观看| 免费看av在线观看网站| 久久国产精品人妻蜜桃| 国产免费男女视频| 老熟妇仑乱视频hdxx| 热99在线观看视频| 嫩草影院精品99| 亚洲欧美日韩高清专用| 日本三级黄在线观看| av在线亚洲专区| 欧美三级亚洲精品| 欧美日本亚洲视频在线播放| 天堂动漫精品| 性插视频无遮挡在线免费观看| 少妇人妻精品综合一区二区 | 91狼人影院| 国产一级毛片七仙女欲春2| 日韩欧美精品v在线| 色哟哟·www| 国产成人影院久久av| 91精品国产九色| 国产在线精品亚洲第一网站| 伦精品一区二区三区| 国产精品久久久久久亚洲av鲁大| 国产精品免费一区二区三区在线| 一个人免费在线观看电影| 国产精品一区www在线观看 | 日韩欧美国产一区二区入口| 91麻豆精品激情在线观看国产| а√天堂www在线а√下载| 国产精品美女特级片免费视频播放器| 黄色丝袜av网址大全| 中文资源天堂在线| 日本黄大片高清| 久久精品夜夜夜夜夜久久蜜豆| 舔av片在线| 亚洲成人久久爱视频| av黄色大香蕉| 天天一区二区日本电影三级| 亚洲av中文av极速乱 | 麻豆精品久久久久久蜜桃| 亚洲人成网站高清观看| 久久久久国产精品人妻aⅴ院| 1000部很黄的大片| 日韩欧美一区二区三区在线观看| 超碰av人人做人人爽久久| 可以在线观看的亚洲视频| 精品久久久久久成人av| 国产伦一二天堂av在线观看| 午夜福利高清视频| 国产私拍福利视频在线观看| 91久久精品国产一区二区成人| 国产69精品久久久久777片| 亚洲精华国产精华液的使用体验 | 性插视频无遮挡在线免费观看| 久久草成人影院| 精品久久久久久久人妻蜜臀av| 小说图片视频综合网站| 久久人人精品亚洲av| 搡女人真爽免费视频火全软件 | 熟妇人妻久久中文字幕3abv| 22中文网久久字幕| 在线看三级毛片| 精品久久久噜噜| 看免费成人av毛片| 人妻夜夜爽99麻豆av| 国产精品久久久久久亚洲av鲁大| 久久精品国产亚洲av香蕉五月| 久久久久久国产a免费观看| 国产精华一区二区三区| 欧美极品一区二区三区四区| 免费观看精品视频网站| 看黄色毛片网站| 色哟哟哟哟哟哟| 黄色女人牲交| 亚洲精品影视一区二区三区av| 色视频www国产| 成人欧美大片| 国产精品一区二区性色av| 国产私拍福利视频在线观看| 欧美zozozo另类| 国产单亲对白刺激| 看片在线看免费视频| 亚洲专区国产一区二区| 村上凉子中文字幕在线| 国产精品爽爽va在线观看网站| 在线免费十八禁| 天堂影院成人在线观看| 少妇熟女aⅴ在线视频| 在线观看一区二区三区| 免费观看人在逋| 免费在线观看成人毛片| 天堂网av新在线| 一区二区三区激情视频| 国产伦精品一区二区三区视频9| 女生性感内裤真人,穿戴方法视频| 最近中文字幕高清免费大全6 | 亚洲天堂国产精品一区在线| 亚洲va日本ⅴa欧美va伊人久久| 日韩一本色道免费dvd| 天堂动漫精品| 在现免费观看毛片| 天堂影院成人在线观看| 在线天堂最新版资源| 婷婷六月久久综合丁香| 国产私拍福利视频在线观看| 亚洲av一区综合| 可以在线观看的亚洲视频| 真实男女啪啪啪动态图| 精品久久久久久久久久免费视频| 国产乱人伦免费视频| 欧美不卡视频在线免费观看| 国产午夜精品久久久久久一区二区三区 | 久久久国产成人免费| 级片在线观看| 免费无遮挡裸体视频| 亚洲真实伦在线观看| 久久精品综合一区二区三区| 可以在线观看的亚洲视频| 日韩一区二区视频免费看| 国内精品美女久久久久久| 精品乱码久久久久久99久播| 中国美女看黄片| 精品一区二区三区av网在线观看| 一级黄色大片毛片| 国产蜜桃级精品一区二区三区| 丝袜美腿在线中文| 蜜桃亚洲精品一区二区三区| 熟妇人妻久久中文字幕3abv| 欧美又色又爽又黄视频| 精品久久国产蜜桃| 国产精品美女特级片免费视频播放器| 亚洲人成网站高清观看| 在线a可以看的网站| 91麻豆精品激情在线观看国产| 亚洲五月天丁香| 欧美成人免费av一区二区三区| 国产精品99久久久久久久久| eeuss影院久久| 午夜亚洲福利在线播放| 夜夜夜夜夜久久久久| 国产伦一二天堂av在线观看| 久久久午夜欧美精品| 亚洲最大成人av| 欧美丝袜亚洲另类 | 99久久中文字幕三级久久日本| 男女视频在线观看网站免费| 久久久久久久久久久丰满 | ponron亚洲| 国产精品爽爽va在线观看网站| 韩国av在线不卡| 亚洲va日本ⅴa欧美va伊人久久| 国产 一区精品| av视频在线观看入口| 校园春色视频在线观看| 免费无遮挡裸体视频| 久久婷婷人人爽人人干人人爱| 一个人免费在线观看电影| 校园人妻丝袜中文字幕| 校园人妻丝袜中文字幕| 91在线精品国自产拍蜜月| 香蕉av资源在线| 国产蜜桃级精品一区二区三区| 国产三级在线视频| 久久国产乱子免费精品| 乱人视频在线观看| 老司机福利观看| 国内精品美女久久久久久| 欧美性猛交╳xxx乱大交人| 国产久久久一区二区三区| 色哟哟哟哟哟哟| 舔av片在线| 久久精品国产99精品国产亚洲性色| 国产精品国产高清国产av| 国产精品,欧美在线| 国产欧美日韩一区二区精品| 亚洲国产日韩欧美精品在线观看| 国产精品无大码| 久久久久九九精品影院| 少妇的逼水好多| 搡老岳熟女国产| 午夜精品在线福利| 国产伦一二天堂av在线观看| 亚洲美女视频黄频| 美女黄网站色视频| 简卡轻食公司| 少妇高潮的动态图| 久久精品影院6| 精品午夜福利视频在线观看一区| 日本黄色视频三级网站网址| 国产一区二区亚洲精品在线观看| 色av中文字幕| 1000部很黄的大片| 国产精品自产拍在线观看55亚洲| 日本三级黄在线观看| 麻豆成人午夜福利视频| 啦啦啦观看免费观看视频高清| 九九在线视频观看精品| 热99在线观看视频| 国产高清有码在线观看视频| xxxwww97欧美| 色综合婷婷激情| 成人特级黄色片久久久久久久| 黄色女人牲交| 丝袜美腿在线中文| 久久精品国产99精品国产亚洲性色| 成人精品一区二区免费| 1000部很黄的大片| 丰满人妻一区二区三区视频av| 啪啪无遮挡十八禁网站| 极品教师在线视频| 亚洲无线观看免费| av在线观看视频网站免费| 琪琪午夜伦伦电影理论片6080| 久久午夜亚洲精品久久| 国产高清三级在线| 国产熟女欧美一区二区| 国产精品,欧美在线| 一边摸一边抽搐一进一小说| 少妇的逼水好多| 国产91精品成人一区二区三区| 国产亚洲欧美98| 身体一侧抽搐| 亚洲五月天丁香| 尾随美女入室| 亚洲自偷自拍三级| 一卡2卡三卡四卡精品乱码亚洲| 欧美日韩黄片免| 女人十人毛片免费观看3o分钟| 美女被艹到高潮喷水动态| 久久99热这里只有精品18| 99热这里只有是精品在线观看| 小说图片视频综合网站| 99九九线精品视频在线观看视频| 国内精品久久久久精免费| 一级a爱片免费观看的视频| 国产淫片久久久久久久久| 亚洲专区国产一区二区| 我要搜黄色片| 看片在线看免费视频| 97超视频在线观看视频| 欧美成人免费av一区二区三区| 精品人妻偷拍中文字幕| 午夜免费激情av| 不卡视频在线观看欧美| 久久精品影院6| 九九爱精品视频在线观看| 国产三级中文精品| av女优亚洲男人天堂| 亚洲专区国产一区二区| 亚洲av一区综合| 成人美女网站在线观看视频| 男女下面进入的视频免费午夜| 51国产日韩欧美| 亚洲成人免费电影在线观看| 91久久精品国产一区二区三区| 别揉我奶头 嗯啊视频| 91av网一区二区| 国产中年淑女户外野战色| 精品一区二区三区人妻视频| 91在线精品国自产拍蜜月| 中文字幕高清在线视频| 美女高潮喷水抽搐中文字幕| 欧美另类亚洲清纯唯美| 国产三级中文精品| 麻豆成人午夜福利视频| 国产精品亚洲一级av第二区| 岛国在线免费视频观看| 中文资源天堂在线| 国产精品日韩av在线免费观看| 国产成人a区在线观看| 国产伦精品一区二区三区视频9| 国产69精品久久久久777片| 日韩人妻高清精品专区| 亚洲av免费高清在线观看| 免费av不卡在线播放| 校园人妻丝袜中文字幕| 国产亚洲精品av在线| 国产一区二区激情短视频| 成人美女网站在线观看视频| 色综合亚洲欧美另类图片| 婷婷丁香在线五月| 日本与韩国留学比较| 99热6这里只有精品| 国产爱豆传媒在线观看| 性色avwww在线观看| 在线观看舔阴道视频| 成人特级av手机在线观看| 欧美最黄视频在线播放免费| 亚州av有码| 尤物成人国产欧美一区二区三区| 久久亚洲精品不卡| 女的被弄到高潮叫床怎么办 | 两个人视频免费观看高清| 日日摸夜夜添夜夜添av毛片 | 一本精品99久久精品77| 日韩精品有码人妻一区| 女的被弄到高潮叫床怎么办 | 桃色一区二区三区在线观看| 久久久午夜欧美精品| 可以在线观看的亚洲视频| 日本-黄色视频高清免费观看| 久久6这里有精品| 亚洲最大成人中文| 欧美成人性av电影在线观看| 中国美白少妇内射xxxbb| 色哟哟·www| 日日夜夜操网爽| 少妇的逼好多水| 99热精品在线国产| 国产成年人精品一区二区| 国产大屁股一区二区在线视频| 精品一区二区免费观看| 国产av一区在线观看免费| 搡女人真爽免费视频火全软件 | 久久精品综合一区二区三区| 欧美极品一区二区三区四区| 亚洲精品成人久久久久久| 中出人妻视频一区二区| 国产 一区精品| 女人被狂操c到高潮| 亚洲精品一区av在线观看| 两人在一起打扑克的视频| 国产精品久久久久久亚洲av鲁大| 亚洲人与动物交配视频| 日韩精品青青久久久久久| 简卡轻食公司| 麻豆久久精品国产亚洲av| 欧美色视频一区免费| 我的老师免费观看完整版| 成人性生交大片免费视频hd| 国产高潮美女av| 国产美女午夜福利| 午夜精品久久久久久毛片777| 九九在线视频观看精品| a在线观看视频网站| 成年女人看的毛片在线观看| 亚洲四区av| 1000部很黄的大片| 亚洲成人久久爱视频| 亚洲av中文av极速乱 | 成年女人看的毛片在线观看| 禁无遮挡网站| 精品一区二区三区人妻视频| 精品一区二区三区视频在线| 22中文网久久字幕| x7x7x7水蜜桃| 亚洲国产精品sss在线观看| 亚洲男人的天堂狠狠|