• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Position-Aware Transformer for Image Captioning

    2022-11-09 08:17:40ZelinDengBoZhouPeiHeJianfengHuangOsamaAlfarrajandAmrTolba
    Computers Materials&Continua 2022年1期

    Zelin Deng,Bo Zhou,Pei He,Jianfeng Huang,Osama Alfarraj and Amr Tolba,5

    1School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha,410114,China

    2School of Computer Science and Cyber Engineering,Guangzhou University,Guangzhou,510006,China

    3Advanced Forming Research Centre,University of Strathclyde,Renfrewshire,PA4 9LJ,Glasgow,United Kingdom

    4Department of Computer Science,Community College,King Saud University,Riyadh,11437,Saudi Arabia

    5Department of Mathematics and Computer Science,Faculty of Science,Menoufia University,Egypt

    Abstract:Image captioning aims to generate a corresponding description of an image.In recent years,neural encoder-decoder models have been the dominant approaches,in which the Convolutional Neural Network (CNN) and Long Short Term Memory(LSTM)are used to translate an image into a natural language description.Among these approaches,the visual attention mechanisms are widely used to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.However,most conventional visual attention mechanisms are based on high-level image features,ignoring the effects of other image features,and giving insufficient consideration to the relative positions between image features.In this work,we propose a Position-Aware Transformer model with image-feature attention and position-aware attention mechanisms for the above problems.The image-feature attention firstly extracts multi-level features by using Feature Pyramid Network(FPN),then utilizes the scaled-dot-product to fuse these features,which enables our model to detect objects of different scales in the image more effectively without increasing parameters.In the position-aware attention mechanism,the relative positions between image features are obtained at first,afterwards the relative positions are incorporated into the originalimage features to generate captions more accurately.Experiments are carried out on the MSCOCO dataset and our approach achieves competitive BLEU-4,METEOR,ROUGE-L,CIDEr scores compared with some state-of-the-art approaches,demonstrating the effectiveness of our approach.

    Keywords: Deep learning;image captioning;transformer;attention;position-aware

    1 Introduction

    Image captioning [1] aims to describe the visual contents of an image in natural language,which is a sequence-to-sequence problem and can be viewed as translating an image into its corresponding descriptive sentence.With these characteristics,the model not only needs to be able to identify objects,actions,and scenes in the image,but also to be powerful enough to capture and express the relationships of these elements in a properly-formed sentence.This scheme analogically simulates the extraordinary abilities of humans to convert large amounts of visual information into descriptive semantic information.

    Earlier captioning approaches [2,3] used some unsophisticated templates and two auxiliary modules object detector and attribute detector.The two detectors filled the blank items of the templates to generate a complete sentence.According to the great successes achieved by deep neural networks [4] in computer vision [5,6] and natural language processing [7,8],a broad collection of image captioning methods has been proposed [1,9,10].Based on the neural encoderdecoder framework [1],these methods use the Convolutional Neural Network (CNN) [4] to encode the input image into image features.Subsequently,the Recurrent Neural Network (RNN) [11] is applied to decode these features word-by-word into a natural language description of the image.

    However,there are two major drawbacks in the plain encoder-decoder based models as follows: (1) the image representation does not change during the caption generation process;(2) The decoder processes the image representation from a global view,rather than focusing on local aspects related to parts of the description.The visual attention mechanisms [12-15] can solve these problems by dynamically attending to different parts of image features relevant to the semantic context of the current partially-completed caption.

    RNN-based caption models have become the dominant approaches in recent years,but the recurrent structure of RNN makes models suffer from gradient-vanishing or gradient-exploding with the growth of sentence and precludes parallelization within training examples.Recently,the work of Vaswani et al.[16] shows that the transformer has excellent performance on machine translation or other sequence-to-sequence problems.It is based on the self-attention mechanism and enables models to be trained in parallel by excluding recurrent structures.

    Human-like and descriptive captions require the model to describe primary objects in the image and also present their relations in a fluent style.While image features obtained by CNN commonly correspond to a uniform grid of equally-sized image regions,each feature only contains information in its corresponding region,irrespective of the relative positions with any other features.Thus,it is hard to get an accurate expression.Furthermore,these image features are mainly visual features extracted from a global view of the image,and only contain a small amount of local visual features that are crucial for detecting small objects.Such limitations of image features keep the model from producing more human-like captions.

    In order to obtain captions of superior quality,a Position-aware Transformer model for image captioning is proposed.The contributions of this model are as follows: (1) To enable the model to detect objects of different scales in the image without increasing the number of parameters,the image-feature attention is proposed,which uses the scaled-dot-product to fuse multi-level features within an image feature pyramid;(2) To generate more human-like captions,the position-aware attention is proposed to learn relative positions between image features,making features can be explained from the perspective of spatial relationship.

    The rest of this paper is organized as follows.In Section 2,the previous critical works about image captioning and the transformer architecture are briefly introduced.In Section 3,the overall architecture and the details of our approach are introduced.In Section 4,the results of the experiment on the COCO dataset are reported and analyzed.In Section 5,the contributions of our work are concluded.

    2 Related Works

    2.1 Image Captioning and Attention Mechanism

    Image captioning is the task of generating a descriptive sentence of an image.It requires an algorithm to understand and model the relations between visual and textual elements.With the development of deep learning,a variety of methods based on deep neural networks have been proposed.Vinyals et al.[1] firstly proposed an encoder-decoder framework,which used the CNN as the encoder and the RNN as the decoder.However,the input of RNN was a consistent representation of an image,and this representation was generally analyzed from an overall perspective,thus leading to a mismatch between the context of visual information and the context of semantic information.

    To solve the above problems,Xu et al.[12] introduced the attention mechanism for image captioning,which guided the model to different salient regions of the image dynamically at each step,instead of feeding all image features to the decoder at the initial step.Based on Xu’s work,more and more improvements in attention mechanisms have been developed.Chen et al.[13] proposed spatial and channel-wise attention,in which the attention mechanism calculated where (spatial locations at multiple layers) and what (channels) the visual attention was.Anderson et al.[14] proposed a combined bottom-up and top-down visual attention mechanism.The bottom-up mechanism chose a set of salient image regions through the object detection technology,the top-down mechanism used task-specific context to predict attention distribution of the chosen image regions.Lu et al.[15] proposed adaptive attention by adding a visual sentinel,determining when to attend to an image or the visual sentinel.

    2.2 Transformer and Self-Attention Mechanism

    Recurrent models have some limitations on parallel computation and have gradient-vanishing or gradient-exploding problems when trained with long sentences.Vaswani et al.[16] proposed the transformer architecture and achieved state-of-the-art results for machine translation.Experimental results showed that the transformer was superior in quality while being more parallelizable and requiring significantly less time to be trained.Recently,the work in [17,18] applied the transformer to the task of image captioning and improved the model performance.Without recurrence,the transformer uses the self-attention mechanism to compute the relation of two arbitrary elements of a single input,and outputs a contextualized representation of this input,avoiding the vanishing or exploding gradients and accelerating the training process.

    2.3 Relative Position Information

    Most attention mechanisms for image captioning attend to CNN features at each step [12,13],while CNN features do not contain relative position information.This makes relative position information unavailable during the caption generation process.However,not all the words have corresponding CNN features.Consider Fig.1a and its ground truth caption “A brown toy horse stands on a red chair”.The words “stand” and “on” do not have corresponding CNN features,but can be determined by the relative position information between CNN features (see Fig.1b).Therefore,we developed the position-aware attention to learn relative position information during training.

    3 The Proposed Approach

    To generate more reasonable captions,a Position-aware Transformer model is proposed to make full use of the relative position information.It contains two components: the image encoder,and the caption decoder.As shown in Fig.2,the combination of the Feature Pyramid Network(FPN) [19],image-feature attention,and position-aware attention is regarded as the encoder to obtain visual features.The decoder is the original transformer decoder.Given an image,the FPN is first leveraged to obtain two kinds of image features,one is high-level visual features containing the global semantics of the image,the other is low-level visual features which are local details of the image [19].These two kinds of features are fed into the image-feature attention and position-aware attention to get fused features containing relative position information.Finally,the transformer takes the fused features and the start token<BOS>or the partially-completed sentence as input,and then outputs probabilities of each word in the dictionary being the next word of the sentence.

    Figure 1:Original image and relative position (a) Original image (b) Red arrows represent relative position information

    Figure 2:Overall structure of our proposed approach

    3.1 Image-Feature Attention for Feature Fusion

    The input of image captioning is an image.Traditional methods use a pre-trained CNN model on the image classification task as the feature extractor and mostly adopt the final conv-layer feature map as the image representation.However,not all objects in the image have corresponding features stored in this representation,particularly for those small-sized objects.As shown in Fig.3.

    Figure 3:Original image and its features (a) Original image (b) The first-level feature (c) The second-level feature (d) The third-level feature (e) The fourth-level feature

    Fig.3a is the original image,and the others are image features having semantics from lowlevel to high-level.The lower the feature is,the more information it contains,and the weaker semantics it presents.Weaker semantics are harmful to the model to grasp the topic of the image;less information is negative for capturing the local details of the image.As a result,determining an optimal level of image features invariably leads to an unwinnable trade-off.To recognize image objects at different scales,we use the FPN model to construct a feature pyramid.Features in the pyramid combine low-resolution,semantically strong features with high resolution,semantically weak features via a top-down pathway and lateral connections.In this work,the feature pyramid has four feature maps in total.The first two are high-level features and the rest are low-level features.

    Predicting on each level feature of a feature pyramid has many limitations,especially the inference time will increase considerably,making this approach impractical for real applications.Moreover,training deep networks end-to-end on all features is infeasible in terms of memory.To build an effective and lightweight model,we choose one feature from high-level features and low-level features respectively:Vlow={vl1,...,vlm},vli∈RdmodelandVhigh={vh1,...,vhn},vhj∈Rdmodel,dmodelis the hidden dimension of the model.Because low-level features are still too large to use(e.g.,4 times more than high-level features in spatial size),the image-feature attention is then used to fuse such two features according to Fig.4.

    Figure 4:The structure of image-feature attention

    As shown in Fig.4,the image-feature attention takesVlowandVhighas input and firstly uses Eq.(1) to calculate the relevance-coefficients matrix C between elements inVlowandVhigh.

    The relevance-coefficients matrixCis then used to compute attention weightsWaccording to Eq.(2).

    Finally,the attention weightsWare applied to calculate a weighted sum ofVlow,and the fused featureVfusedis computed by Eq.(3).

    wheredmodelis the hidden dimension of our approach,WQ,WK,WVare learnable parameters during the training process.

    3.2 Position-Aware Attention

    RNN networks capture relative positions between input elements directly through their recurrent structure.However,the recurrent structure is abandoned in the transformer to support the use of self-attention,and CNN features do not contain relative position information.As we mentioned earlier,relative position information is helpful for achieving an accurate expression,so introducing it explicitly is a considerably important step.When dealing with the machine translation task,the transformer manually introduces position information to the model using sinusoidal position coding.But sinusoidal position coding might not work for image captioning,because images and language sentences are two very different ways of describing things,images mainly contain visual information,while sentences mainly contain semantic information.In this work,rather than using an elaborated handwritten function as the transformer does,the position-aware attention is proposed to learn relative position information during training.

    Because an image is split into a uniform grid of equally-sized regions from the perspective of image features,in this sense,we model the image features as a normative directed graph,see Fig.5.Each vertex (the blue block in the image) stands for the feature of a certain image region,and each directed edge (the red arrows) denotes the relative position between two vertices.Note that in this graph all the edges are direct,because the relative positions from feature A to B are different from the relative positions from feature B to A.

    Figure 5:The directed graph model of image features

    The position-aware attention takes two inputs,Vfused,and an edge matrixEin which each elementEijrepresents the edge starts from vertexSito vertexSj.In this case,we use Eq.(4) to calculate the relevance-coefficients within elements ofVfused.

    Then obtain a new representation ofVfusedthrough incorporating relative position information according to Eq.(5).

    Given a feature map of sizem×n,the directed graph model hasmnvertices,and each vertex has edges that directly connect any other vertices,so the position-aware attention has to maintainedges,which are redundant in most cases because objects are usually located sparsely in th e image.Moreover,maintaining edges with space complexityleads to parameters to be trained increasing significantly.

    In order to reduce space complexity,the locations of two vertices in horizontal and vertical directions are leveraged to construct the relative positions between these two vertices.As shown in Fig.6,the vertices are placed in a cartesian coordinate,and each vertex has an unique coordinate.

    Figure 6:Using differences in horizontal and vertical directions to construct the relative positions

    Algorithm 1: Calculate Edge Matrix Emn for each element in m × n size feature mapimages/BZ_2025_265_1540_1499_2549.png

    Instead of using the edge that directly connects two vertices (the dashed line in Fig.6),the coordinates of these two vertices are utilized to compute the edge.For example,Sihas coordinate(2,n)andSjhas coordinate(4,n-3),their distance (fromSitoSj) in horizontal direction is-2,in vertical direction is 3,and their relative position (fromSitoSj)Eijis represented byIn practice,in order to get a compact computation process,we useAlgorithm 1to get an edge matrixEfor each element.

    The model needs to store two kinds of edges in this way,one isand the other isthere are 2·(m+n-1)edges in total.For a feature map of sizem×n,we reduce the space complexity of storing edges fromtoO(max(m,n))by using coordinates of two vertices to compute their edge.

    4 Experimental Results and Analysis

    4.1 Metrics

    Our caption model was evaluated in several different evaluation metrics,including BLEU [20],CIDEr [21],METEOR [22],and SPICE [23],etc.These metrics focus on different aspects of generated captions and give a scalar evaluation value quantitatively.BLEU is a precision-based metric and is traditionally used in machine translation to measure the similarity between the generated captions and the ground truth captions.CIDEr measures consensus in generated captions by performing a Term Frequency-Inverse Document Frequency weighting for each n-gram.METEOR is based on the explicit word to word matches between the generated captions and the ground-truth captions.SPICE is a semantic-based method that measures how well caption models recover objects,attributes and relations shown in the ground truth captions.

    4.2 Loss Functions

    Given the ground truth sentenceSgt={y0,y1,...,yt} and its corresponding imageI,the sentenceSgtwas split into two partsStarget=Sgt[0:-1] andStarget_y=Sgt[1: ].The model was trained by minimizing the following cross-entropy loss:

    whereθwas the parameters of the model.At the training stage,the model was trained to generate the next ground-truth word given the previous ground-truth words,while during the testing phase,the model used the previously generated words from the model distribution to predict the next word.This mismatch resulted in error accumulation during generation at test time,because the model had never been exposed to its own predictions.To make a fair comparison with recent works [24].At the beginning,the model was trained with standard cross-entropy loss for 15 epochs.After that,the pre-trained model continued to adjust its parameters under the proposed Reinforcement Learning (RL) method described in [24] for another 15 epochs.

    This method can relieve the mismatch between training and testing by minimizing the negative expected reward:

    whereωs=was the generated sentence andrwas the CIDEr score of the generated s entence.

    4.3 Dataset

    The MSCOCO2014 dataset [25],one of the most popular datasets for image captioning,was used to evaluate the proposed model.This dataset contains 123,287 images in total (82783 training images and 40504 validation images respectively),each image has five different captions.To compare our experimental results with other methods precisely,the widely used “Karpathy”split [26] was adopted for MSCOCO2014 dataset.This split has 112,387 images for training,5000 images for validation and 5000 images for testing.The performance of the model was measured on the testing set.

    4.4 Data Preprocessing

    The images were normalized to mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225],and the captions with length larger than 16 got clipped.Subsequently,a vocabulary was built with three tokens<BOS>,<EOS>,<UNK>and the words that occurred at least 5 times in the preprocessed captions.The token<UNK>represented words appearing less than 5 times,the token<BOS>and<EOS>indicated the start and the end of a sentence.Finally,the captions were vectorized by the indices of words and tokens in the vocabulary.During the training process,for the convenience of transformation between words and indices,two mapswtoianditowwere maintained.wtoimaps a word or token to its corresponding index,anditowmaps an index to the word or token.

    4.5 Inference

    The inference was similar to RNN-based models,and the word would be generated one by one at a time.Firstly,the model began with the sequenceS0that only contained the start token<BOS>,and obtained the dictionary probabilityyi~p(yi|S0;θ;I)through the first iteration.Afterwards,some sampling methods such as the greedy method or the beam search method were used to generate the first wordy1.Then,y1was fed back into the model to generate the next wordy2.This process would continue until the end token<EOS>or the max length L was reached.

    4.6 Implementation Details

    A FPN from a pretrained instance segmentation model [27] was used to produce features at five levels.Experiments were carried out based on the second and the fourth features.The spatial size of the second feature was set to 14 × 14 and the other was set to 28 × 28 via adaptive average pooling.We did not train the fine-tune model,thus,the parameters of the two features were fixed in the whole training process.

    In Tab.1,the hyperparameter settings of the position-aware transformer model trained with standard cross-entropy loss are presented.

    For our model trained with standard cross-entropy loss,we used 6 attention layers,dmodel=256,4 attention heads,dhead=64,1024 feed forward inner-layer dimensions,andPdropout=0.1.This model was trained for 15 epochs,each epoch had 12000 iterations and the batch size was 10.The initial learning rate of the model was 5×10-4,the warmup strategy withwarmupsteps=20000 was used to speed up the training and the same weight decay strategy as in [16] was adopted for learning rate adjustment.The Adam optimizer [28] withβ1=0.9,β2=0.98,and∈=10-9was used to update parameters of our model.During training,we employed label smoothing of valuelabelsmoothing=0.1 [29].At the inference stage,the beam search method with a beam size of 3 was chosen for better caption generation.The Pytorch framework was adopted to implement our model for image captioning.

    For our model optimized by CIDEr optimization (Initializing from the pretrained crossentropy trained model),it was trained for another 15 epochs to adjust parameters.The initial learning rate was set to 1×10-5,and both warmup and weight decay options were turned off.The rest of the settings were identical to the cross-entropy model.

    Table 1:Hyperparameter settings of the model

    4.7 Ablation Studies

    In this section,we conducted several ablative experiments for the position-aware transformer model on the MSCOCO datasets.In order to further verify the effectiveness of the sub-modules in our model,a Vanilla Transformer model for image captioning was implemented.It regarded the CNN and the transformer encoder as the image encoder and the transformer decoder as the caption decoder.Based on the vanilla transformer model,the other two models (FPN Transformer and Position-aware Transformer) were implemented as follows:

    FPN Transformer: a model equipped with the image-feature attention sub-module and employed image features built by the FPN.

    Position-aware Transformer: a model equipped with the image-feature attention and positionaware attention sub-modules.This model also used the image features built by the FPN.

    In the experiments,Vanilla Transformer model used the ResNet to encode the given imageIto the spatial image feature and the image feature was obtained from the 5th pool layer of the ResNet.The ResNet was pre-trained on the ImageNet dataset.We then apply adaptive average pooling to obtain an image spatial featureV={v1,...,v14x14},vi∈Rdmodel,where 14 × 14 is the number of regions,andvirepresents a region of the image.FPN Transformer used the same FPN network as in [27] to encode the given imageIand the image feature attention to fuse image features built by the FPN to size of 14 × 14 too.Position-aware Transformer was the proposed approach described in Fig.2.All hyperparameters of the three models stayed the same if possible.In Tab.2,the test results of the Vanilla Transformer,FPN Transformer and Position-Aware Transformer on BLEU-1,BLEU-2,BLEU-3,BLEU-4,METEOR,ROUGE-L,CIDEr metrics are presented,and the validation results of the three models are shown in Fig.7.

    As shown in Tab.2,through image-feature attention and position-aware attention,the Vanilla Transformer model can achieve better performance in terms of BLEU-1,BLEU-2,BLEU-3,BLEU-4,METEOR,ROUGE-L and CIDEr.

    Table 2:The performance of our models optimized by standard cross-entropy loss

    Figure 7:Validation results of several metrics

    From Fig.7,it turns out that FPN Transformer has better performance compared with Vanilla Transformer on all metrics,which is due to the fact that the FPN produces a multi-scale feature representation in which all levels are semantically strong,including the high-resolution levels.This enables a model to detect objects across a large range of scales by scanning the model over both positions and pyramid levels.Also,it can be noticed that the combination of imagefeature attention and position-aware attention provides the best performance,mainly because that the position-aware attention makes features can be explained from the perspective of spatial relationship.

    SPICE is a semantic-based method that measures how well caption models recover objects,attributes and relations.To investigate the performance improved by the proposed sub-modules,we report SPICE F-scores over various subcategories on the MSCOCO testing set in Tab.3 and Fig.8.When equipped with the image-feature attention,the FPN Transformer increases the SPICE-Objects metric by 2.2% compared with the Vanilla Transformer,exceeding the relative improvement of 1.85% on the SPICE-Relations metric and the relative improvement of 0.15%on the SPICE metric.It shows that the image-feature attention can improve the performance in terms of identifying objects.After incorporating the position-aware attention,the Position-aware Transformer shows more remarkable relative improvement of 9.0% on the SPICE-Relations metric than the relative improvements on the SPICE and the SPICE-Objects metrics,demonstrating that the position-aware attention improves the performance by identifying the relationships between objects.

    Table 3:SPICE F-scores over various subcategories on the MSCOCO test set

    Figure 8:Performance comparison of different transformers

    4.8 Comparing with Other State-of-the-Art Methods

    The experimental results of the Position-aware Transformer and previous state-of-the-art models on the MSCOCO testing set are shown in Tab.4.All results are produced by models trained with standard cross-entropy loss.The Soft-Attention model [12],which uses the ResNet-101 as the image encoder,is our baseline model.

    Table 4:Experimental results of our approach compared with other methods (optimized by standard cross-entropy loss)

    In contrast to recent state-of-the-art models,our model shows a better performance.When compared with the Bottom-Up model,the METEOR score,ROUGE-L score and CIDEr score increase from 27.0 to 27.8,56.4 to 56.5,113.5 to 114.9 respectively,the BLEU-1 score and BLEU-4 score obtain similar results.Among these metrics,METEOR,ROUGE-L and CIDEr are specialized for image captioning tasks,which validates the effectiveness of our model.

    The experimental results of the Position-aware Transformer and Bottom-up model that trained with CIDEr optimization on the MSCOCO testing set are shown in Tab.5.

    As shown in Tab.5,our model improves the BLEU4 score from 36.3 to 38.4,METEOR score from 27.7 to 28.3,ROUGE-L score from 56.9 to 58.4 and CIDEr score from 120.1 to 125.5 respectively.In addition,we can also see that all the metrics increase,specifically,the CIDEr metric gets 4.5% relative improvement.This shows that the proposed approach has better performance.

    Table 5:Experimental results of our approach compared with the bottom-up (optimized by CIDEr optimization)

    5 Conclusion and Future Work

    A position-aware transformer with two attention mechanisms,i.e.,the position-aware attention and image-feature attention,is proposed in this work.To generate more accurate and more fluent captions,the position-aware attention enables the model to make use of relative positions between image features.These relative positions are modeled as the directed edges in a directed graph in which vertices represent the elements of image features.In addition,to make the model be able to detect objects of different scales in the image without increasing the number of parameters,the image-feature attention brings multi-level features through the FPN and uses the scaled-dotproduct to fuse multi-level features.With these innovations,we obtained a better performance than some state-of-the-art approaches on the MSCOCO benchmark.

    At a high level,our work utilizes multi-level features and position information to increase performance.While this suggests several directions for future research: (1) The image-feature attention pick up features of particular levels for fusion.However,in some cases,determining these features depends on the specific image.For some images,all the objects may be large objects,so the fusion of low-level features may bring inevitable noises to the prediction process of the model due to the weak semantics of low-level features;(2) The position-aware attention uses the relative positions between features to infer the words with abstract concepts in descriptions,but not all such words are related to spatial relationships.Based on these issues,further research will be carried out subsequently,and we will apply this approach to the image retrieval based on text information.

    Acknowledgement: The authors extend their appreciation to the Deanship of Scientific Research at King Saud University,Riyadh,Saudi Arabia for funding this work through research Group No.RG-1438-070.This work is supported by NSFC (61977018).This work is supported by Research Foundation of Education Bureau of Hunan Province of China (16B006).

    Funding Statement: This work was supported in part by the National Natural Science Foundation of China under Grant No.61977018,the Deanship of Scientific Research at King Saud University,Riyadh,Saudi Arabia for funding this work through research Group No.RG-1438-070 and in part by the Research Foundation of Education Bureau of Hunan Province of China under Grant 16B006.

    Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

    午夜福利网站1000一区二区三区| 国产伦精品一区二区三区视频9| 26uuu在线亚洲综合色| 2018国产大陆天天弄谢| 久久精品久久精品一区二区三区| 亚洲av电影不卡..在线观看| 欧美+日韩+精品| 亚洲成色77777| 久久久久久九九精品二区国产| 亚洲精品乱久久久久久| 日韩欧美精品免费久久| 精品99又大又爽又粗少妇毛片| 尾随美女入室| 美女大奶头视频| 乱人视频在线观看| 51国产日韩欧美| 联通29元200g的流量卡| 国产乱人视频| 精品一区在线观看国产| 日韩一区二区三区影片| 人妻一区二区av| 日韩人妻高清精品专区| 日韩强制内射视频| 五月天丁香电影| 国产精品麻豆人妻色哟哟久久 | 大香蕉97超碰在线| 精品久久久噜噜| 九草在线视频观看| 久久99热这里只有精品18| 日日摸夜夜添夜夜爱| 成人二区视频| 伊人久久精品亚洲午夜| 久久精品夜夜夜夜夜久久蜜豆| 啦啦啦中文免费视频观看日本| 国产精品一二三区在线看| 精品国内亚洲2022精品成人| 国产免费福利视频在线观看| 有码 亚洲区| 美女cb高潮喷水在线观看| 乱码一卡2卡4卡精品| 久久久久久久久久人人人人人人| 我的女老师完整版在线观看| 久久久久网色| 真实男女啪啪啪动态图| 寂寞人妻少妇视频99o| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 国产精品熟女久久久久浪| 国产大屁股一区二区在线视频| 一个人看的www免费观看视频| 免费播放大片免费观看视频在线观看| 午夜老司机福利剧场| 国产在线一区二区三区精| 成人亚洲精品一区在线观看 | 亚洲色图av天堂| 91av网一区二区| 校园人妻丝袜中文字幕| 伊人久久国产一区二区| 亚洲成色77777| 婷婷六月久久综合丁香| 日本一二三区视频观看| 成人欧美大片| 中文欧美无线码| 综合色丁香网| 国产精品蜜桃在线观看| 日本熟妇午夜| 日日撸夜夜添| 日韩 亚洲 欧美在线| 亚洲丝袜综合中文字幕| 中文欧美无线码| 观看美女的网站| 久久精品夜色国产| 久久久久国产网址| 亚洲av免费在线观看| 日本一二三区视频观看| 免费av观看视频| 男女啪啪激烈高潮av片| 观看美女的网站| 婷婷色av中文字幕| www.av在线官网国产| 又大又黄又爽视频免费| 美女cb高潮喷水在线观看| 免费少妇av软件| 大话2 男鬼变身卡| 色综合站精品国产| 成人鲁丝片一二三区免费| 麻豆成人av视频| 日本色播在线视频| 亚洲精品国产av成人精品| 日本一本二区三区精品| 日韩一本色道免费dvd| 激情五月婷婷亚洲| 成年免费大片在线观看| 网址你懂的国产日韩在线| 国产伦精品一区二区三区视频9| videossex国产| 久久久a久久爽久久v久久| 欧美三级亚洲精品| av一本久久久久| 欧美3d第一页| av天堂中文字幕网| 精品久久久久久久久av| 欧美日韩亚洲高清精品| 精品国产一区二区三区久久久樱花 | 高清毛片免费看| 人体艺术视频欧美日本| 久久精品综合一区二区三区| 国产一级毛片在线| 夫妻性生交免费视频一级片| 国产成人福利小说| 91精品国产九色| 成人亚洲精品一区在线观看 | 身体一侧抽搐| 淫秽高清视频在线观看| 中文字幕亚洲精品专区| 国产成人精品久久久久久| 国产成人福利小说| 久久精品久久久久久久性| 色5月婷婷丁香| 亚洲精品,欧美精品| 精品久久久久久电影网| 精品一区二区三区视频在线| videos熟女内射| 天天一区二区日本电影三级| 爱豆传媒免费全集在线观看| 三级毛片av免费| 亚洲国产精品sss在线观看| 久久精品久久精品一区二区三区| 国产色婷婷99| 久久国产乱子免费精品| 一本久久精品| 看黄色毛片网站| 在线观看av片永久免费下载| 激情 狠狠 欧美| 中文字幕制服av| 插逼视频在线观看| 国产精品日韩av在线免费观看| 国产乱来视频区| 亚洲在久久综合| 亚洲怡红院男人天堂| 一区二区三区乱码不卡18| 大又大粗又爽又黄少妇毛片口| 亚洲精品自拍成人| 18禁在线无遮挡免费观看视频| 日韩欧美国产在线观看| 自拍偷自拍亚洲精品老妇| 欧美激情在线99| 真实男女啪啪啪动态图| 午夜福利在线在线| 亚洲aⅴ乱码一区二区在线播放| av黄色大香蕉| 亚洲精品视频女| 免费观看性生交大片5| 极品少妇高潮喷水抽搐| 自拍偷自拍亚洲精品老妇| av在线播放精品| 国产综合懂色| 午夜福利在线在线| 嫩草影院入口| 精品国内亚洲2022精品成人| 最新中文字幕久久久久| 亚洲欧美中文字幕日韩二区| 看免费成人av毛片| 有码 亚洲区| 91精品国产九色| 夜夜看夜夜爽夜夜摸| 国产黄a三级三级三级人| 久久99热6这里只有精品| 少妇高潮的动态图| 又粗又硬又长又爽又黄的视频| 少妇熟女欧美另类| 亚洲电影在线观看av| 舔av片在线| 内地一区二区视频在线| 国产女主播在线喷水免费视频网站 | 国产一区有黄有色的免费视频 | 99久国产av精品国产电影| 人人妻人人看人人澡| 国产不卡一卡二| 日本欧美国产在线视频| 看免费成人av毛片| 国产男人的电影天堂91| 亚洲不卡免费看| 22中文网久久字幕| 啦啦啦中文免费视频观看日本| 十八禁国产超污无遮挡网站| 自拍偷自拍亚洲精品老妇| 国产精品久久视频播放| 国产又色又爽无遮挡免| 日韩欧美三级三区| 欧美日韩亚洲高清精品| 国产乱人视频| 亚洲精品国产av成人精品| 国产精品人妻久久久久久| 久久这里只有精品中国| 狂野欧美白嫩少妇大欣赏| www.av在线官网国产| 国产男人的电影天堂91| 午夜精品国产一区二区电影 | 日韩人妻高清精品专区| 亚洲国产最新在线播放| 久久久久免费精品人妻一区二区| 日韩三级伦理在线观看| 成人亚洲精品av一区二区| 国产精品一区二区性色av| 在线观看美女被高潮喷水网站| 丝瓜视频免费看黄片| 国产爱豆传媒在线观看| 国模一区二区三区四区视频| 欧美+日韩+精品| 国产成年人精品一区二区| 天堂俺去俺来也www色官网 | videossex国产| kizo精华| 国产黄a三级三级三级人| 亚洲精品国产av蜜桃| 国产av码专区亚洲av| 男女国产视频网站| 亚洲av成人精品一区久久| 久久99蜜桃精品久久| 亚洲va在线va天堂va国产| 在线播放无遮挡| 99久久中文字幕三级久久日本| 天堂俺去俺来也www色官网 | 99久久精品国产国产毛片| 特级一级黄色大片| 成人特级av手机在线观看| 亚洲图色成人| 亚洲精品乱码久久久久久按摩| 啦啦啦中文免费视频观看日本| 大又大粗又爽又黄少妇毛片口| 3wmmmm亚洲av在线观看| 日韩一区二区三区影片| av网站免费在线观看视频 | 亚洲,欧美,日韩| 国产欧美另类精品又又久久亚洲欧美| 色尼玛亚洲综合影院| 亚洲av中文av极速乱| 欧美xxxx黑人xx丫x性爽| 日韩在线高清观看一区二区三区| 精品午夜福利在线看| 亚洲国产成人一精品久久久| 亚洲欧美精品专区久久| 简卡轻食公司| 高清午夜精品一区二区三区| 免费看美女性在线毛片视频| 国产在线男女| 又粗又硬又长又爽又黄的视频| 午夜福利成人在线免费观看| av播播在线观看一区| 久久久久网色| 亚洲第一区二区三区不卡| 啦啦啦中文免费视频观看日本| 日韩一区二区视频免费看| 久久久成人免费电影| 欧美xxxx性猛交bbbb| 欧美区成人在线视频| 校园人妻丝袜中文字幕| 在线播放无遮挡| 亚洲国产欧美在线一区| 免费黄频网站在线观看国产| 久久99热这里只有精品18| 成人鲁丝片一二三区免费| 亚洲精品aⅴ在线观看| 性插视频无遮挡在线免费观看| 国产免费又黄又爽又色| 最近最新中文字幕免费大全7| 女人被狂操c到高潮| 日韩欧美 国产精品| av卡一久久| 亚洲,欧美,日韩| 国产亚洲午夜精品一区二区久久 | 国产有黄有色有爽视频| 嫩草影院入口| 日韩,欧美,国产一区二区三区| 一级片'在线观看视频| 日本三级黄在线观看| 久久99蜜桃精品久久| 国产人妻一区二区三区在| 夫妻性生交免费视频一级片| 一个人观看的视频www高清免费观看| 18+在线观看网站| 能在线免费观看的黄片| 1000部很黄的大片| 日本欧美国产在线视频| 肉色欧美久久久久久久蜜桃 | av线在线观看网站| 男女下面进入的视频免费午夜| 免费黄色在线免费观看| 男女啪啪激烈高潮av片| 国产精品熟女久久久久浪| .国产精品久久| 99九九线精品视频在线观看视频| 中文字幕av在线有码专区| 色吧在线观看| 中文字幕人妻熟人妻熟丝袜美| 少妇被粗大猛烈的视频| 欧美 日韩 精品 国产| 汤姆久久久久久久影院中文字幕 | 国产老妇女一区| 啦啦啦中文免费视频观看日本| 免费看不卡的av| 国产成人精品婷婷| 男女那种视频在线观看| 亚洲精品乱久久久久久| 又黄又爽又刺激的免费视频.| 日本色播在线视频| 久久草成人影院| 亚洲真实伦在线观看| 国产精品一区www在线观看| 你懂的网址亚洲精品在线观看| 91久久精品国产一区二区成人| 国产精品人妻久久久久久| 一边亲一边摸免费视频| 日韩视频在线欧美| 天堂√8在线中文| 国产 一区 欧美 日韩| 91av网一区二区| 99热这里只有是精品在线观看| 久久热精品热| 人妻少妇偷人精品九色| 成人毛片a级毛片在线播放| 久久久久九九精品影院| 欧美区成人在线视频| 毛片一级片免费看久久久久| 国产探花极品一区二区| 欧美一区二区亚洲| 特级一级黄色大片| 91在线精品国自产拍蜜月| 尾随美女入室| 国产综合精华液| 久久这里有精品视频免费| 国产男人的电影天堂91| 国产淫语在线视频| 九色成人免费人妻av| 久久久久久久亚洲中文字幕| 欧美精品国产亚洲| 乱码一卡2卡4卡精品| 亚洲国产精品专区欧美| 国产在线一区二区三区精| 成人国产麻豆网| 18禁动态无遮挡网站| 蜜臀久久99精品久久宅男| 免费黄色在线免费观看| av在线观看视频网站免费| 99热6这里只有精品| 精品人妻偷拍中文字幕| 国产美女午夜福利| 午夜福利在线在线| 免费看av在线观看网站| 哪个播放器可以免费观看大片| 日韩 亚洲 欧美在线| 国产爱豆传媒在线观看| 少妇人妻精品综合一区二区| 汤姆久久久久久久影院中文字幕 | 人妻夜夜爽99麻豆av| 禁无遮挡网站| 国产精品一区二区性色av| 久久久色成人| 天天躁夜夜躁狠狠久久av| 亚洲成人久久爱视频| 人妻夜夜爽99麻豆av| 在线 av 中文字幕| 国产永久视频网站| 99热全是精品| 欧美日韩在线观看h| 亚洲成人一二三区av| 久久国产乱子免费精品| 女的被弄到高潮叫床怎么办| 五月伊人婷婷丁香| 亚洲美女搞黄在线观看| 人人妻人人澡欧美一区二区| 男人狂女人下面高潮的视频| 久久精品国产自在天天线| 中文字幕制服av| 少妇丰满av| 成人亚洲精品av一区二区| 26uuu在线亚洲综合色| 99热这里只有精品一区| 免费av毛片视频| 尤物成人国产欧美一区二区三区| 我的女老师完整版在线观看| 精品一区二区三卡| 国产精品三级大全| 国产成人午夜福利电影在线观看| 99九九线精品视频在线观看视频| 天天一区二区日本电影三级| 18禁在线无遮挡免费观看视频| 三级男女做爰猛烈吃奶摸视频| 久久午夜福利片| 岛国毛片在线播放| 亚洲精品影视一区二区三区av| 国产精品一区二区在线观看99 | 亚洲最大成人中文| 在线观看免费高清a一片| h日本视频在线播放| 国模一区二区三区四区视频| 国内精品美女久久久久久| 国产精品熟女久久久久浪| 我的老师免费观看完整版| 国产片特级美女逼逼视频| 午夜福利在线观看免费完整高清在| 禁无遮挡网站| 国产乱人视频| 婷婷色av中文字幕| 久久这里只有精品中国| 亚洲av成人精品一区久久| 一边亲一边摸免费视频| 亚洲欧美成人精品一区二区| 国产黄片视频在线免费观看| av天堂中文字幕网| 五月天丁香电影| 在线观看av片永久免费下载| 麻豆av噜噜一区二区三区| 亚洲国产精品sss在线观看| 欧美极品一区二区三区四区| 精品一区二区免费观看| 99热这里只有是精品50| 男女国产视频网站| 日韩,欧美,国产一区二区三区| 久久韩国三级中文字幕| 夜夜看夜夜爽夜夜摸| 精品一区二区三区视频在线| 国产av在哪里看| 极品教师在线视频| 国产精品一及| 国产欧美日韩精品一区二区| 欧美性感艳星| 少妇高潮的动态图| 亚洲四区av| 久久久久久九九精品二区国产| 99热全是精品| 男人舔女人下体高潮全视频| 又黄又爽又刺激的免费视频.| 一边亲一边摸免费视频| 亚洲性久久影院| 一二三四中文在线观看免费高清| 全区人妻精品视频| 熟女人妻精品中文字幕| 精品人妻视频免费看| 九九爱精品视频在线观看| 国产精品国产三级国产av玫瑰| 免费人成在线观看视频色| 亚洲内射少妇av| 中国美白少妇内射xxxbb| 亚洲欧美日韩卡通动漫| 免费观看av网站的网址| 97精品久久久久久久久久精品| av播播在线观看一区| 99热这里只有精品一区| 欧美精品国产亚洲| 国产午夜精品论理片| 国精品久久久久久国模美| 老司机影院毛片| 好男人在线观看高清免费视频| 国产黄色免费在线视频| 99热这里只有精品一区| 又粗又硬又长又爽又黄的视频| 国产又色又爽无遮挡免| 日韩av在线大香蕉| 精品熟女少妇av免费看| 一级黄片播放器| 亚洲精品成人久久久久久| 精品久久久久久久久久久久久| 2022亚洲国产成人精品| 久久人人爽人人片av| 欧美bdsm另类| 亚洲在线观看片| 亚洲在线自拍视频| or卡值多少钱| 久久鲁丝午夜福利片| 久久午夜福利片| 欧美精品一区二区大全| a级毛色黄片| 热99在线观看视频| 久久精品熟女亚洲av麻豆精品 | 久久久精品免费免费高清| 欧美性猛交╳xxx乱大交人| 网址你懂的国产日韩在线| 久久久色成人| 日韩欧美精品v在线| 99久久人妻综合| 777米奇影视久久| 久久久久免费精品人妻一区二区| 一区二区三区免费毛片| 在线观看av片永久免费下载| 中文字幕制服av| 国产一区亚洲一区在线观看| 国产伦理片在线播放av一区| 好男人视频免费观看在线| 国产高潮美女av| 一级av片app| 日韩大片免费观看网站| 午夜免费男女啪啪视频观看| 久久韩国三级中文字幕| 男的添女的下面高潮视频| 亚洲精品456在线播放app| 精品一区二区免费观看| 久久久久久久久中文| 午夜福利视频精品| av网站免费在线观看视频 | 汤姆久久久久久久影院中文字幕 | 久久国产乱子免费精品| 又黄又爽又刺激的免费视频.| 久久久精品欧美日韩精品| 亚洲av二区三区四区| 在线 av 中文字幕| 91av网一区二区| 国产精品麻豆人妻色哟哟久久 | 色网站视频免费| 午夜老司机福利剧场| 男人舔奶头视频| 精品久久久久久成人av| 日本猛色少妇xxxxx猛交久久| 精品少妇黑人巨大在线播放| 亚洲四区av| 国产精品无大码| 国产高清三级在线| 国产精品精品国产色婷婷| 精品不卡国产一区二区三区| 日韩一区二区三区影片| 国产成人免费观看mmmm| 中文资源天堂在线| 丰满少妇做爰视频| 人人妻人人澡欧美一区二区| 国产 一区 欧美 日韩| 欧美性猛交╳xxx乱大交人| 人体艺术视频欧美日本| 建设人人有责人人尽责人人享有的 | 大又大粗又爽又黄少妇毛片口| 天美传媒精品一区二区| 欧美xxⅹ黑人| 91精品国产九色| 国产亚洲91精品色在线| 欧美激情久久久久久爽电影| 好男人在线观看高清免费视频| 日本色播在线视频| 联通29元200g的流量卡| 国产免费视频播放在线视频 | 中文字幕亚洲精品专区| 日本猛色少妇xxxxx猛交久久| 欧美激情久久久久久爽电影| 熟女人妻精品中文字幕| 婷婷六月久久综合丁香| kizo精华| 最近手机中文字幕大全| 亚洲自拍偷在线| 成人美女网站在线观看视频| 午夜亚洲福利在线播放| 永久免费av网站大全| 欧美极品一区二区三区四区| 国国产精品蜜臀av免费| 国产精品一区二区在线观看99 | 国产免费一级a男人的天堂| 国产v大片淫在线免费观看| 又粗又硬又长又爽又黄的视频| www.av在线官网国产| 精品人妻熟女av久视频| 国产伦精品一区二区三区视频9| 国产又色又爽无遮挡免| 午夜免费激情av| 精品人妻视频免费看| 青春草视频在线免费观看| 久久97久久精品| 欧美精品一区二区大全| 别揉我奶头 嗯啊视频| 亚洲精品国产成人久久av| 夫妻性生交免费视频一级片| 久久99热6这里只有精品| 国产一区亚洲一区在线观看| 欧美激情久久久久久爽电影| 最近的中文字幕免费完整| 亚洲综合色惰| 成年女人在线观看亚洲视频 | 99久国产av精品| 有码 亚洲区| 夫妻性生交免费视频一级片| 国产精品一二三区在线看| 亚洲18禁久久av| 色综合色国产| 99热6这里只有精品| 在线免费观看的www视频| 日韩国内少妇激情av| 能在线免费观看的黄片| 久久久久久久久大av| 亚洲在久久综合| 能在线免费观看的黄片| 亚洲av.av天堂| 日韩国内少妇激情av| 水蜜桃什么品种好| 一级毛片 在线播放| 欧美人与善性xxx| 男人爽女人下面视频在线观看| 亚洲美女视频黄频| 尾随美女入室| 精品一区二区三区视频在线| 日韩制服骚丝袜av| 特级一级黄色大片| 能在线免费观看的黄片| 高清日韩中文字幕在线| 国产一区二区三区综合在线观看 | 国产亚洲精品久久久com| 国产一区二区在线观看日韩| 国产成人午夜福利电影在线观看| 免费观看的影片在线观看| 3wmmmm亚洲av在线观看| 国产精品人妻久久久久久| 在现免费观看毛片| 久久这里只有精品中国| 麻豆成人av视频| 国产精品美女特级片免费视频播放器| 精品久久久久久电影网| 日韩国内少妇激情av| 久久精品夜色国产| 亚洲18禁久久av| 老司机影院毛片|