• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Attention-Enhanced Voice Portrait Model Using Generative Adversarial Network

    2024-05-25 14:40:50JingyiMaoYuchenZhouYifanWangJunyuLiZiqingLiuandFanliangBu
    Computers Materials&Continua 2024年4期

    Jingyi Mao,Yuchen Zhou,Yifan Wang,Junyu Li,Ziqing Liu and Fanliang Bu

    School of Information Network Security,People’s Public Security University of China,Beijing,100038,China

    ABSTRACT Voice portrait technology has explored and established the relationship between speakers’voices and their facial features,aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker.Due to its powerful advantages in image generation,Generative Adversarial Networks (GANs) have now been widely applied across various fields.The existing Voice2Face methods for voice portraits are primarily based on GANs trained on voice-face paired datasets.However,voice portrait models solely constructed on GANs face limitations in image generation quality and struggle to maintain facial similarity.Additionally,the training process is relatively unstable,thereby affecting the overall generative performance of the model.To overcome the above challenges,we propose a novel deep Generative Adversarial Network model for audio-visual synthesis,named AVPGAN(Attention-enhanced Voice Portrait Model using Generative Adversarial Network).This model is based on a convolutional attention mechanism and is capable of generating corresponding facial images from the voice of an unknown speaker.Firstly,to address the issue of training instability,we integrate convolutional neural networks with deep GANs.In the network architecture,we apply spectral normalization to constrain the variation of the discriminator,preventing issues such as mode collapse.Secondly,to enhance the model’s ability to extract relevant features between the two modalities,we propose a voice portrait model based on convolutional attention.This model learns the mapping relationship between voice and facial features in a common space from both channel and spatial dimensions independently.Thirdly,to enhance the quality of generated faces,we have incorporated a degradation removal module and utilized pretrained facial GANs as facial priors to repair and enhance the clarity of the generated facial images.Experimental results demonstrate that our AVP-GAN achieved a cosine similarity of 0.511,outperforming the performance of our comparison model,and effectively achieved the generation of highquality facial images corresponding to a speaker’s voice.

    KEYWORDS Cross-modal generation;GANs;voice portrait technology;face synthesis

    1 Introduction

    Voice portrait technology aims to analyze the voice of a speaker through cross-modal intelligent techniques,seeking the mapping relationship between facial features and voice,and generating facial images that correspond to the respective identity.This technology involves biology,speech analysis,image generation,cross-modal and other multi-disciplinary fusion technology,which has a wide range of applications in the fields of public security,judicial identification,medical diagnosis and personalized entertainment.For example,for the actual combat needs of public security,typical network fraud shows a trend of frequent and high incidence around the world,which has become a worldwide crime problem.When the police obtain the audio samples of the criminal suspect,they can use voice portrait technology to generate facial images similar to the identity characteristics of the criminal suspect and provide important clues.On the other hand,the technology can also be used to generate the personalized digital image of the speaker from voice,enhance the sense of experience of human-computer interaction,and provide new possibilities for personalized entertainment and other fields.In this paper,we analyze the cross-modal correlation between the speakers’speech and their facial features.We aim to reconstruct an image with the facial characteristics of the speaker by inputting a short segment of the speaker’s voice.

    There is a close relationship between human voice and face.From a person’s voice,we can extract many biometric features such as age,gender,emotion,race,lip movement,weight,skin color,facial shape,etc.[1,2].Recently,some research work based on deep learning is exploring the relationship between voice and faces.In the research of speech and face matching,Wen et al.[3]have achieved voice-face cross-modal matching by mapping voice and face modes to common covariates and learning their common embedding.With the in-depth study of the relationship between speech and face,Kameoka et al.[4] proposed a method based on auxiliary classifier,which makes use of the correlation between speech and face to generate a face image that matches the input speech.In addition,Duarte et al.[5]proposed a Conditional Generative Adversarial Network(CGAN)model to generate face pixels directly from speech.Although the above work can obtain reasonable face portrait effect,there is still a big challenge in finding the corresponding feature mapping between voice and face.Therefore,we discuss how to combine the attention mechanism with the generative and discriminative modules in the network model to further analyze the key pixel blocks of voice and face to improve the effect of the model.

    Generative Adversarial Networks(GANs)[6]have advantages in effectively generating the desired samples and eliminating deterministic biases [7].Based on these characteristics,GANs has become one of the mainstream methods in the field of image generation.In this paper,we propose a new attention-enhanced voice portrait model using Generative Adversarial Network (AVP-GAN).This model only needs to use voice as input and does not need any a priori knowledge to generate human face.Specifically,we first input a segment of the speaker’s voice into the voice encoder to extract the Mel-frequency cepstral coefficients(MFCCs)[8],which includes rich facial attribute information.Subsequently,we input the representation containing rich identity information into the generative model.To achieve a better match between the face images generated through voice and real face images in more facial details,we combine the Convolutional Block Attention Module(CBAM)[9]with the Deep Convolutional Generative Adversarial Network(DCGAN).This combination involves adaptive scaling in both spatial and channel dimensions.This enables the discriminator to better assist the generator in creating more distinct and discriminative regions.Finally,to enhance the quality of the generated face images,we employ a pre-trained model to perform restoration on the generated face images.Overall,our main contributions can be summarized as follows:

    ?We have designed a novel attention-enhanced voice portrait model using generative adversarial network(AVP-GAN).Specifically,our AVP-GAN model consists of three modules:The voice encoder module,the face generation module,and the image enhancement module.This model is capable of generating facial images that align with the identity features of a speaker based on their voice.

    ? We combine the feature extraction ability of the convolutional network with the generative network,and apply spectral normalization in the network structure.This ensures that the discriminator satisfies Lipschitz continuity by constraining the degree of sharp changes in the function,and helps prevent issues such as mode collapse during training,making the model more stable.

    ?We combine the CBAM with the GANs and apply it to the voice portrait task.By incorporating adaptive scaling in both spatial and channel dimensions,the network is capable of capturing more facial feature details,thereby enhancing the face generation performance of the voice portrait model.

    ? To address the issue of low fidelity in generated images,we have introduced a degradation removal module and utilized pre-trained facial GANs as facial priors for facial restoration.This approach aims to enhance the clarity of the generated facial images.Experimental results demonstrate that our voice portrait model exhibits favorable face generation performance.

    The rest of this paper is organized as follows: Section 2 briefly introduces related work on cross-modal generation techniques for voice portrait.Section 3 describes the details of our proposed AVP-GAN model.Section 4 presents the qualitative and quantitative evaluation of our model’s performance.Section 5 concludes the full paper.

    2 Related Works

    2.1 Voice Representation Learning

    Voice portrait technology aims to explore the relationship between human voice and face.Voice is produced in the human vocal tract,which includes channels and cavities connecting the lungs to the external environment.In the process of voice production,the movement of the tongue,lips,chin and so on will change the shape of the sound channel,resulting in a resonance chamber with different shape contraction separation.The air from the lungs is converted into a series of periodic pulses by the vocal cords and resonates within these resonance chambers,producing an auditory voice characterized by spectral peaks and troughs.At present,a large number of studies have shown that voice signals have direct or indirect influence on many human biometric parameters.For example,the increase of age[10,11] will affect the harmonic structure of the voice signal,and the body size of the speaker can be inferred by analyzing the pronunciation method of the voice [12].In the field of learning voice representations,a method known as Speech Enhancement GANs (SEGAN) [13] works end-to-end with original audio,capable of learning from various speakers and noise types,and incorporating them into the network’s parametrization.In order to adapt to different data distribution,Hong et al.[14]explored the method of Audio Albert to self-supervise the speech representation in the original speech and compress the speech waveform into a vector containing high-level semantic information from the original speech.Yi et al.[15] proposed a self-supervised pre-training architecture to encode speech through a multi-layer convolutional neural network.MFCCs is proposed based on the auditory characteristics of human ear,and shows a good application prospect in speaker verification task and voice recognition.

    2.2 Research on the Correlation between Voice and Facial Features

    In recent years,the task of finding the relationship between voice and facial features is becoming a hot research topic.In 1916 [16],Swift indirectly hinted at the connection between speech and human body parameters in an article.It is clearly pointed out in the article that voice quality is solely determined by bone structure,an inherited characteristic,leading to the conclusion that sound quality is also a hereditary attribute.There is also a great correlation between the shape of the lips and the speaker’s pronunciation habits.In fact,the shape of the vocal tract can be determined according to the pulse reflection of the lips [2].For example,British people generally have thinner lips,which is attributed to the reduced use of retroflex consonants in the pronunciation of British English.There is less variation in lip movements,and the mouth typically tilts backward,appearing flatter.Additionally,the upper lip often exhibits a converging motion.In contrast,for Korean pronunciation,to distinguish between different vowels,Koreans need to mobilize the muscles in the oral cavity more frequently.This leads to the appearance of many Koreans having a puckered mouth shape.

    As early as in 2006,a study by Rosenblum et al.[17]used point-light technology experiments to show that visible voice motion is capable of supporting cross-modal speaker matching,suggesting that it is possible for humans to utilize the relationship between faces and voices for speaker identification.With the continuous development of machine learning,Nagrani et al.[18]proposed a machine learning algorithm for speech and face association learning,which introduces the task of cross-mode matching between face and voice,and uses Convolutional Neural Networks(CNNs)architecture to solve this problem.Kameoka et al.[4]combined StarGAN and Conditional Variational Autoencoder(CVAE)to construct a cross-modal voice conversion(VC)model,which is used to generate facial images that match the speech features of the input speech.Wen et al.[19]proposed a new framework of adaptive identity weights for voice-face association,which obtained better results on the retrieval task of voice portrait.Although there is a large semantic gap between voice and human face,the above studies have confirmed that there is a great correlation between voice and face.In our voice portrait method,we have chosen to process the voice signal using the log-mel spectrum,which is based on the characteristics of human auditory perception,in the hope of better extracting the features of the audio signal.

    2.3 Cross-Modal Face Generation from Voice

    Voice and image are two direct and important ways of human communication.In the field of image processing,image enhancement and segmentation are very important for processing and analyzing images.Wang et al.[20] used GANs to propose a blind face restoration model,achieving effective facial image enhancement.Linguo et al.[21] proposed a method called Fuzzy Multilevel Image Thresholding Based on Improved Coyote Optimization Algorithm,which showed strong advantages in image segmentation.In the field of voice-to-face generation,human hearing and vision are the main basis for the brain to obtain external information,respond,judge and make decisions.There is a close relationship between them,and both can provide biological attribute monitoring information data for the other party.Speech2Face[22]used the videos of millions of speakers on the internet to map speech sonogram features to real face features in high-dimensional space,thus training a depth neural network model that can reconstruct the speaker’s facial image through speech.However,because a single convolution layer is generally unable to capture long-distance features,the traditional CNNs may lead to poor image quality.In order to meet this challenge,Wang et al.[23] proposed a new residual speech portrait model based on attention mechanism,thus improving the effect of the model.Duarte et al.proposed a deep neural network Wav2Pix[5]for cross-modal visual synthesis.The model does not need any prior knowledge and is trained from the beginning in an end-to-end manner.The GANs is used to synthesize the speaker’s face image under a given original speech waveform.At the same time,Bragin et al.[24]designed an automatic encoder neural network including speech encoder and facial decoder to realize the task of reconstructing human face from speech.In addition to the two-dimensional static voice portrait task,Wu et al.[25] designed a cross-modal three-dimensional facial generation model using 3D Morphable Models.This model achieves a rough reconstruction of a 3D face based on sound.

    2.4 GAN in Face Generation

    GANs is a powerful tool for unsupervised image generation,which has been widely studied in the field of computer vision,and its core idea comes from zero-sum game in game theory.The traditional GANs consists of generator network (Generator) and discriminant network (Discriminator).Its goal is to train the discriminator’s ability to distinguish between real samples and false samples through against training,so that the generator can generate realistic data samples that can deceive the discriminator.The loss function is as follows:

    wherepdata(x)represents the distribution of real data,pzrepresents the distribution of original noise,G(z)represents the generated mapping function,andD(x)represents the discriminant mapping function.

    Since being proposed by Ian Goodfellow and others,GANs have gained widespread attention.GANs-based models are widely used in a variety of tasks,including handwritten number generation,style transfer [26],face image generation [27] and so on.Aiming at the application of GANs in the face image generation task,Yeh et al.[28]filled and repaired the damaged face image,and achieved good results.In 2017,Karras et al.[29]used GANs to generate new facial images with celebrity facial features by using celebrity faces as input.Huang et al.[30]used GANs to generate frontal face images with different angles,which can be used in face verification system.However,the current application of GANs in the field of voice portraits is limited,and their training exhibits instability.Moreover,existing methods still face challenges in generating high-quality facial images.Therefore,we employ a GANs based on attention mechanisms,applied to the task of voice portrait,to generate facial images that align with the different speech features.

    Overall,we have summarized and analyzed the current methods for voice portraits,as outlined in Table 1.

    Table 1: Current methods for voice portraits

    3 Method

    Our method aims to solve a challenge related to voice-face cross-modal tasks,which involves drawing clear facial features of a person based on their voice features.CNNs exhibit strong performance in feature extraction,and GANs include a generator(D)and a discriminator(G),which play games with each other and train to find and achieve Nash equilibrium by training to achieve convergence,and are widely used in image generation and image transformation tasks.We combined CNNs with GANs,and used the structure of CNNs and added Spectral Normalization[31]in both the generator network and discriminator network of GANs to improve the training stability of the model and the quality of the generated results.CBAM extracts more details by adaptively scaling the spatial and channel features.In this research,we adopted a combined approach of using the CBAM with a deep convolutional GANs.Additionally,we incorporated a degradation removal module [20] and a pre-trained image enhancement model serving as a facial prior.This combination aims to achieve higher-quality facial image generation from speaker’s voice,resulting in improved model performance.In this section,we will provide a detailed introduction to our newly proposed AVP-GAN model.

    3.1 Symbolic Conventions

    First of all,we have stipulated some symbolic representations.In this paper,we use a set of voiceface pairs as the training data set,where we use the symbolP={P1,P2,...,Pm}to represent the face image data and the symbolV={V1,V2,...,Vn}to represent the voice data.Here,mandndenote the number of face and voice samples in the training set,respectively.Moreover,we use the symbolYto represent the identity of the speakers in the training set,Y={Y1,Y2,...,YT},andTrepresents the total number of different speakers in the training set,wherem,n,andTcan be unequal from each other.The identity label of the speaker corresponding to the face is denoted as YP=and∈Y,similarly,the identity label of the speaker corresponding to the speech data is denoted as YV=and∈Y.We define the relevant symbols as shown in Table 2.

    Table 2: Related symbols and their meanings in this paper

    3.2 Overall

    Fig.1 shows the framework of our proposed AVP-GAN.Due to the inherent correlation between the way a person voices and their facial features,our objective is to generate a face that corresponds to the speaker’s voice.To achieve this goal,we propose a new AVP-GAN model.Specifically,our model consists of three modules:The voice encoder module,the face generation module,and the image enhancement module.We integrate the CBAM with the GANs to focus on extracting correlational features between the voice and facial vectors in a shared mapping space.Additionally,we introduce a degradation removal module in our AVP-GAN model to enhance the clarity of generated images.This effectively produces higher-quality facial images that align with the characteristics of the given voice.More details to be described in the ensuing subsections.

    Figure 1: (a) Overall structure of AVP-GAN.Specifically,our AVP-GAN model consists of three modules:The voice encoder module,the face generation module,and the image enhancement module.(b)Traditional GANs framework

    In summary,first,input the speaker’s voice into the Voice Encoder for voice encoding.Secondly,the CBAM and the GANs are combined to generate facial images corresponding to the identity of the speaker through training.Thirdly,the Channel-Split Spatial Feature Transform(CS-SFT)is used to balance the fidelity of the images,and the image enhancement model based on facial priors is applied to obtain higher quality images.In our AVP-GAN model,there are two discriminators:One judges whether the generated image is a facial image,and the other assesses the similarity between the generated facial image and the voice features.

    For the voice coder module,we normalize the extracted 64-dimensional logarithmic Mel spectrogram,clip the voice segment to around 8 seconds,and input it into the voice coder network for feature extraction.The voice coder network is a one-dimensional convolutional neural network designed to extract features from voice for the purpose of creating a voice profile model.The structure of the voice encoder is illustrated in Fig.2,wherek,s,andprepresent kernel size,stride,and padding,respectively.

    Figure 2: (a)Structure diagram of the voice encoder network;(b)Internal structure of the ConvBlock(the internal structure of all network structure diagrams in this article is consistent with the one shown above)

    3.3 Generation Module and Image Enhancement Module

    CBAM is a lightweight convolutional attention module that can perform adaptive rescaling of spatial and channel features,and combining CBAM with GANs can enhance the discrimination of salient regions and extract richer detailed features[32].The CBAM consists of two processes:One is to compress the spatial dimension while keeping the channel dimension unchanged,i.e.,the Channel Attention Module (CAM);and the other is to compress the channel dimension while keeping the spatial dimension unchanged,i.e.,the Spatial Attention Module (SAM).Fig.3 illustrates the entire process of CBAM.The CBAM formula is shown below,where the input featureF∈RC*H*W,MC∈RC?1?1is the one-dimensional convolution of the channel attention module andMS∈R1*H*Wis the two-dimensional convolution of the spatial attention module.

    Figure 3: The overall process of the convolutional block attention module(CBAM)

    For CAM:

    For SAM:

    Note that the MLP weights,W0∈RC/r?CandW1∈RC?C/r,are shared for both inputs and the ReLU activation function is followed byW0.

    Generator Network StructureThe network structure of generatorGis composed of multiple layers of 2D transposed convolution.CBAM are inserted into the network structure of theGto focus on the important features of the face extracted from the voice signal,thereby improving the expressive power of the generator.Specifically,the network structure of theGis shown in Fig.4.In the figure,except for the last layer,the ReLU activation function is added after each layer of two-dimensional transposed convolution,and Spectral Normalization is added in the second to fifth layers.TheGneeds to be able to generate a face portrait consistent with the identity features of the voice,as follows:

    Yrepresents the identity of an entity providing voice or facial data.YVdenotes the real identity of the voice subject,andYPrepresents the real identity of the facial subject.IC()denotes the function that maps a voice or facial record to its corresponding identity.

    Discriminator Network StructureThe discriminatorDincludes a discriminator and a classifier.One of the discriminator serves to identify the authenticity of the image,and define labels for the real photo and the generated photo,respectively;another discriminator,which we also call a classifier,is used to verify that the generated faces are matched with real faces.The network of theDconsists of multiple layers of 2-dimensional convolution,except for the last layer,where a LeakyReLU activation function is added after each layer of 2-dimensional convolution.In addition,Spectral Normalization is added to each layer of 2D convolution except for the first and last layers.The use of spectral normalization in theDcan ensure the Lipschitz continuity of the network,which limits the drastic degree of function variation in the network,and makes the model more stable.The structure of theDnetwork is shown in Fig.5.

    Figure 4: The Generator network structure diagram

    Figure 5: Discriminator network structure diagram

    Image Enhancement ModuleThe purpose of the image enhancement module is to improve the clarity of the generated facial images by estimating a new,high-quality image based on the already generated ones.Our image enhancement module draws inspiration from the work of Wang et al.[20],incorporating a degradation removal module and a pretrained face GANs as a facial prior.The pretrained face GANs model was fine-tuned on the Voxceleb facial dataset.Through the face repair of the image generated by the generation module,the quality of the generated face image is further improved and the credible details are restored.The loss function used is consistent with that of Generative Facial Prior using GANs (GFP-GAN),comprising four parts: Adversarial loss,reconstruction loss,facial component loss,and identity preserving loss.The loss function is as follows:

    Reconstruction Loss:

    where ?is the pretrained VGG-19 network [33].λl1andλperdenote the loss weights of L1 and perceptual loss,respectively,λl1=0.1,λper=1.

    Adversarial Loss:

    where D denotes the discriminator andλadvrepresents the adversarial loss weight,λadv=0.1.

    Facial Component Loss:

    where ROI is region of interest from the component collection {eyes and mouth}.DROIis the local discriminator for each region.?denotes the multi-resolution features from the learned discriminators.λlocalandλfsdenote the loss weights of local discriminative loss and feature style loss,repectively,λlocal=1,λfs=200.In Eq.(9),the first part represents the discriminator loss in the adversarial loss,and the second part is the feature style loss.The Gram matrix statistics are typically effective in capturing essential information about the problem.

    Identity Preserving Loss:

    whereμrepresents face feature extractor.λiddenotes the weight of identity preserving loss,λid=10.The identity fidelity loss employs the pretrained facial recognition ArcFace model,enforcing that the restoration results maintain a small distance from the ground truth in a compact deep feature space.

    The overall model objective is a combination of the above losses:

    3.4 Loss Function

    The GAN-based face image generation method first generates the calculation of the loss function in the discriminator.Since the discriminator typically outputs a binary judgment of true or false,our AVP-GAN model uses binary cross-entropy loss function,in order to make the generated data distribution of the generator output closer to the real data distribution.Initially,the parameters of the generator are frozen,and the discriminator is trained to judge the ability of both real data and generated data,updating the parameters of the discriminator.We define the following loss function:

    whereLD,LC,LGrepresents the loss functions of the discriminator(D),classifier(C),and generator(G).Here,wnrepresents the weight,which is weighted for each sample and is set to a default value of 1 in this paper.G(pi)represents the generated face portrait by the generator.yprepresents the binary classification of the face portrait,which indicates whether the generated face image piis the same as the real identity face imageyi.If they are the same,yp=1;otherwise,yp=0.yvrepresents the binary classification of the voice data,which indicates whether the voice identity labelviis the same asyi.If they are the same,yv=1;otherwise,yv=0.

    The algorithm description of the AVP-GAN is shown in Algorithm 1.

    4 Experiments

    To assess the effectiveness of the proposed method in this paper,this section provides detailed explanations of the dataset used,experimental details,evaluation metrics,and experimental results.The specific experimental details are as follows.

    4.1 Datasets and Experimental Details

    In the training process,we utilized a dataset of 1251 speech samples from Voxceleb [18,34] and obtained corresponding facial datasets from the VGG face[35]dataset.The datasets corresponding to two identities include approximately 150,000 speech segments and facial images from 1225 different speakers.Following the data processing approach outlined in previous work [36],the speech was randomly cropped into audio segments of approximately 8 seconds.We extracted 64-dimensional logarithmic MEL spectrograms and cropped RGB facial images to a size of 3×64×64.This study follows the partition method of Nagrani et al.[18],where the training set,validation set,and test set are mutually exclusive.The dataset partitioning is detailed in Table 3.

    Table 3: Dataset used in our experiments

    We used Pytorch to implement our methods and the AVP-GAN network proposed in this paper is trained for 50,000 epochs using the Adam optimizer,with a learning rate of 0.0002,β1of 0.5%,andβ2of 0.999%.All experiments were conducted on a computer with a 12th Gen Intel(R)Core(TM)i7-12700H CPU,16 GB RAM,and NVIDIA GeForce RTX 3070Ti GPU.

    4.2 Evaluation Metrics

    To assess the performance of our model in the task of voice portrait generation,we chose Voice2Face[36]as the baseline model for comparison.We evaluated our proposed AVP-GAN model from both qualitative and quantitative perspectives.In terms of qualitative evaluation,we trained the Voice2Face model and our model on our equipment,comparing the generated facial images from both models.Both models were tested using approximately 6-s and 8-s voice data.For quantitative evaluation,we employed the Face Cosine Similarity[37]as the assessment metric.A higher Face Cosine Similarity indicates that the generated facial images are closer to the real facial images.Specifically,we utilized Arcface [38] for facial image preprocessing.After performing facial recognition and feature extraction on the image data,we analyzed the facial image matrices and computed the cosine distance between the corresponding vectors of two compared images.The formula for calculating facial cosine similarity is as follows:

    where suppose thatAandBare the corresponding vectors in the original face picture and the generated face picture,respectively,andθis the angle between vectorAand vectorBin space.The smaller the angleθ,the larger the cosine value,indicating a higher cosine similarity.

    4.3 Experimental Results

    4.3.1 Qualitative Evaluation

    We input speaker voice segments into our model and conduct tests on varying durations of voice recordings for the same speaker’s speech segments.Fig.6 displays the test results for six randomly selected audio recordings from different speakers.In each example,we present facial images generated using different durations of speaker voice—specifically,2,4,6,and 8 s.The qualitative results indicate that,when given voice segments longer than 6 s as input,our model’s output facial images tend to stabilize,depicting clearer facial features and facial expressions.

    Figure 6: Facial images generated from voice segments of different durations

    Fig.7 presents a comparison between the original real images,the intermediate images(without image enhancement),and the final facial images generated by our AVP-GAN.It is evident that some facial details are not well-reconstructed in the intermediate images.The purpose of the image enhancement module is to better restore the details of the face generated from the voice and to improve the clarity of the generated facial images.

    Figure 7: Comparison of the final face,intermediate face and original face

    Choosing Voice2Face as the comparative model,both models were tested using 8-s audio data.Fig.8 illustrates examples of the generated images from the comparative model and our AVP-GAN.It can be observed that the face images generated by our method exhibit higher quality,with facial features closer to real faces.

    Figure 8: Comparison of results between our AVP-GAN model and the Voice2Face model

    Fig.9 displays a comparison between the facial images generated by our AVP-GAN model on the test set and real facial images.We randomly selected seven sets of male and seven sets of female results for presentation.From the images,it can be observed that our model is able to accurately capture features such as gender,age,face shape,facial features,and expressions for individuals.

    Figure 9: Comparison of 8-s voice-generated faces vs.reference faces

    Through the analysis of qualitative results,we found that due to the significant variability in attributes such as hairstyle,makeup,and background,which have less correlation with the speaker’s voice,it is generally challenging to establish a strong connection between these attributes and the voice of the speaker.Therefore,our model combines the attention mechanism with GANs to seek information about facial shape,facial features,and gender contained in the voice.From the comparison results between the generated faces and the reference faces,it can be seen that our proposed AVP-GAN model can establish a correlation between voice and attributes such as facial shape,expression,and gender,and generate a face image that is consistent with the identity of the speaker.

    4.3.2 Quantitative Evaluation

    Due to the difficulty in accurately capturing objective features such as hairstyle,background,and makeup through sound,directly comparing the similarity of two faces may result in significant errors.Therefore,we consider employing the following method for quantitative assessment of the depicted faces,initially recognizing facial features,followed by calculating facial cosine similarity.We generated faces using 4 and 8 s voice data,respectively,and calculated the cosine feature similarity between the original faces and the generated faces.This allowed for a quantitative assessment of the generated facial images.Simultaneously,we conducted ablation experiments on the 8 s voice dataset,evaluating the generated facial images with and without the inclusion of the CBAM attention module to verify its impact within the GANs.The quantitative evaluation results are presented in Table 4.

    Table 4: Quantitative assessment results

    The results indicate that the similarity of generated facial images from the 4 s audio dataset is lower compared to the similarity of generated facial images from the 8 s audio dataset.On the 8-s voice dataset,the model enhanced with the CBAM exhibited significantly higher cosine similarity metrics for generated images,compared to the model without the CBAM.This indicates that the CBAM effectively enhances the feature extraction capability of GANs.Furthermore,the cosine similarity metric between the facial images generated by our proposed AVP-GAN and the real images surpassed that of the baseline model Voice2Face on the 8-s voice dataset.This indicates that our AVP-GAN model possesses the ability to generate relatively high-quality static facial images.

    5 Conclusion

    In this research,we explored a relatively novel audio-visual cross-modal challenge:How to infer facial features from the voice of a person.To enhance the performance of the voice portrait model and increase the similarity between the generated and original faces,we innovatively introduced a new deep GANs model for voice portrait generation,known as the AVP-GAN,incorporating a convolutional attention mechanism.By integrating CBAM with the convolutional GANs,our model can better capture features that align with both the voice and facial characteristics.Additionally,we incorporated an image enhancement module,further improving the clarity of the generated facial images.The experiments demonstrate that our AVP-GAN can generate facial images that better align with the original facial features and exhibit higher clarity.We chose cosine similarity as the quantitative evaluation metric,and the quantitative results indicate that our AVP-GAN achieves a cosine similarity of 0.511,surpassing the performance of the comparison model.Furthermore,our AVP-GAN model demonstrates superior capability in generating improved facial images on an 8-s audio dataset,affirming the effectiveness of our approach without the need for any prior information.However,continuous improvement is still required in generating facial images that accurately capture the identity features of the speaker.For instance,our model has potential for refinement in exploring the correlation between speech and skin tone,and has yet to investigate the effects of voice portraits based on ethnicities from regions such as Asia and Africa.In future work,we plan to explore both the attributes of voice features and the search for cross-modal correlations between speech and facial features,with the goal of further improving the accuracy of generating facial features.

    Acknowledgement:We would like to express our sincere gratitude to Fanliang Bu for his patient guidance and meticulous care during the writing of our paper.We are especially grateful to Dr.Yuchen Zhou for his assistance in writing our paper.Furthermore,we would like to extend our heartfelt thanks to the reviewers for their valuable suggestions.

    Funding Statement:This work was supported by the Double First-Class Innovation Research Project for People’s Public Security University of China(No.2023SYL08).

    Author Contributions:The authors confirm contribution to the paper as follows: conceptualization,methodology,formal analysis,writing-original draft,validation: Jingyi Mao;conceptualization,supervision,writing-review &editing: Fanliang Bu;formal analysis,validation,writing-editing:Yuchen Zhou;investigation,resources,validation,writing-editing: Yifan Wang;investigation,resources,writing-editing:Junyu Li and Ziqing Liu.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:The datasets used in this article are freely available in the mentioned references.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    久久久国产欧美日韩av| 免费黄网站久久成人精品| 啦啦啦啦在线视频资源| 黄色视频在线播放观看不卡| 国产 一区精品| 天天操日日干夜夜撸| 久久久久久久久久久久大奶| 欧美亚洲 丝袜 人妻 在线| 久久99蜜桃精品久久| 国产精品国产三级专区第一集| 中文欧美无线码| 国产极品粉嫩免费观看在线| av免费在线看不卡| 五月玫瑰六月丁香| 国产成人精品婷婷| 国产亚洲最大av| 免费观看无遮挡的男女| 纯流量卡能插随身wifi吗| 男女无遮挡免费网站观看| 免费黄网站久久成人精品| 精品人妻偷拍中文字幕| 各种免费的搞黄视频| 午夜福利网站1000一区二区三区| 久久久亚洲精品成人影院| 免费高清在线观看视频在线观看| av福利片在线| av片东京热男人的天堂| 有码 亚洲区| 综合色丁香网| 99久久综合免费| 大片免费播放器 马上看| 99久国产av精品国产电影| 国产精品99久久99久久久不卡 | 人人澡人人妻人| 国产精品99久久99久久久不卡 | 精品亚洲乱码少妇综合久久| 美女脱内裤让男人舔精品视频| 免费播放大片免费观看视频在线观看| 卡戴珊不雅视频在线播放| 国产成人欧美| 欧美激情国产日韩精品一区| 五月伊人婷婷丁香| 91精品国产国语对白视频| 另类精品久久| 少妇高潮的动态图| 一级毛片 在线播放| 午夜老司机福利剧场| 亚洲伊人色综图| 国产成人精品在线电影| 亚洲伊人久久精品综合| 大香蕉97超碰在线| 精品一区二区三卡| 蜜臀久久99精品久久宅男| 国产片内射在线| 亚洲精品成人av观看孕妇| 久久婷婷青草| 最新的欧美精品一区二区| 又粗又硬又长又爽又黄的视频| 黑人欧美特级aaaaaa片| 国产亚洲最大av| 中国三级夫妇交换| 日韩大片免费观看网站| 一本色道久久久久久精品综合| 亚洲欧洲精品一区二区精品久久久 | 性色avwww在线观看| 一级黄片播放器| av线在线观看网站| 国产精品人妻久久久久久| 久久精品人人爽人人爽视色| 国产精品一区二区在线观看99| av.在线天堂| 国产精品国产三级国产av玫瑰| 一本大道久久a久久精品| 亚洲国产毛片av蜜桃av| a级毛片在线看网站| www.熟女人妻精品国产 | 婷婷色综合大香蕉| 久久韩国三级中文字幕| 久久精品国产鲁丝片午夜精品| 大陆偷拍与自拍| 国产成人免费观看mmmm| av女优亚洲男人天堂| 成人午夜精彩视频在线观看| 亚洲av中文av极速乱| 国产精品一区二区在线不卡| 狠狠精品人妻久久久久久综合| 少妇高潮的动态图| 亚洲av福利一区| 国产精品久久久久久精品电影小说| 在线观看免费视频网站a站| a级毛片在线看网站| 中文字幕最新亚洲高清| 中文字幕免费在线视频6| 国产一区二区激情短视频 | 午夜91福利影院| 精品亚洲成国产av| 人成视频在线观看免费观看| 久久人人爽av亚洲精品天堂| 黑人巨大精品欧美一区二区蜜桃 | 久久精品夜色国产| 亚洲少妇的诱惑av| 国产精品蜜桃在线观看| 亚洲av男天堂| 99香蕉大伊视频| 午夜av观看不卡| 国产亚洲欧美精品永久| 亚洲色图综合在线观看| 女性被躁到高潮视频| 精品99又大又爽又粗少妇毛片| 久久久欧美国产精品| 一级毛片黄色毛片免费观看视频| 日本与韩国留学比较| 午夜激情久久久久久久| 国产精品久久久久久av不卡| 国产无遮挡羞羞视频在线观看| 久久久久久久亚洲中文字幕| 欧美亚洲 丝袜 人妻 在线| 日本av手机在线免费观看| 久久久久久人妻| 最近中文字幕2019免费版| 各种免费的搞黄视频| 国产 精品1| 亚洲,欧美精品.| 看非洲黑人一级黄片| 久久精品国产综合久久久 | 亚洲综合色惰| 丝袜脚勾引网站| a级片在线免费高清观看视频| 国产精品一国产av| 99久久精品国产国产毛片| 精品人妻在线不人妻| 久久99精品国语久久久| 国精品久久久久久国模美| 丝瓜视频免费看黄片| 午夜福利在线观看免费完整高清在| 国产精品免费大片| 亚洲精品中文字幕在线视频| 日本免费在线观看一区| 亚洲欧洲日产国产| 国产爽快片一区二区三区| 久久人妻熟女aⅴ| 男女高潮啪啪啪动态图| 伊人久久国产一区二区| 亚洲精品视频女| 啦啦啦中文免费视频观看日本| 亚洲av在线观看美女高潮| 男女免费视频国产| 看免费av毛片| 免费人成在线观看视频色| 超色免费av| 啦啦啦视频在线资源免费观看| 91成人精品电影| 色网站视频免费| 亚洲精品久久久久久婷婷小说| 69精品国产乱码久久久| 啦啦啦视频在线资源免费观看| 九九爱精品视频在线观看| 伦理电影大哥的女人| 制服诱惑二区| 高清不卡的av网站| 妹子高潮喷水视频| 捣出白浆h1v1| 亚洲熟女精品中文字幕| 久久久精品94久久精品| 亚洲欧美一区二区三区国产| av黄色大香蕉| 伦理电影大哥的女人| 丝袜人妻中文字幕| 中文字幕最新亚洲高清| 亚洲欧美日韩另类电影网站| 国产精品国产av在线观看| 免费女性裸体啪啪无遮挡网站| 美女主播在线视频| 日韩av免费高清视频| 男男h啪啪无遮挡| 一边摸一边做爽爽视频免费| 国产日韩欧美在线精品| 人体艺术视频欧美日本| 两性夫妻黄色片 | 丰满少妇做爰视频| 麻豆乱淫一区二区| 黄网站色视频无遮挡免费观看| 久久99热6这里只有精品| 亚洲精品国产av蜜桃| 精品人妻一区二区三区麻豆| 99久久人妻综合| 国产一区有黄有色的免费视频| 国产高清不卡午夜福利| 视频区图区小说| 少妇猛男粗大的猛烈进出视频| 亚洲av在线观看美女高潮| 女性生殖器流出的白浆| 亚洲伊人久久精品综合| 精品卡一卡二卡四卡免费| 侵犯人妻中文字幕一二三四区| 精品视频人人做人人爽| 中文天堂在线官网| 久久97久久精品| 亚洲三级黄色毛片| 欧美人与性动交α欧美精品济南到 | 亚洲av男天堂| 亚洲国产欧美日韩在线播放| 美女脱内裤让男人舔精品视频| 男女边吃奶边做爰视频| 国产乱人偷精品视频| 天天躁夜夜躁狠狠久久av| 中文字幕av电影在线播放| 久久精品熟女亚洲av麻豆精品| 日本色播在线视频| 51国产日韩欧美| 五月开心婷婷网| 国产精品三级大全| 九九爱精品视频在线观看| 日韩人妻精品一区2区三区| 久久97久久精品| 久久狼人影院| 一区二区av电影网| 免费看av在线观看网站| 男女边摸边吃奶| 午夜91福利影院| 97超碰精品成人国产| 菩萨蛮人人尽说江南好唐韦庄| 99热6这里只有精品| 成人亚洲欧美一区二区av| 国产片内射在线| 精品亚洲成国产av| 亚洲欧美成人综合另类久久久| 熟女人妻精品中文字幕| 免费观看在线日韩| 亚洲高清免费不卡视频| 在线精品无人区一区二区三| xxx大片免费视频| 欧美成人精品欧美一级黄| 日本-黄色视频高清免费观看| 少妇人妻久久综合中文| 人人妻人人添人人爽欧美一区卜| 久久国内精品自在自线图片| 久久精品国产鲁丝片午夜精品| 不卡视频在线观看欧美| 99热国产这里只有精品6| 高清不卡的av网站| 亚洲图色成人| 国产精品久久久久久久电影| 五月玫瑰六月丁香| 97人妻天天添夜夜摸| 熟女av电影| 日本色播在线视频| 国产1区2区3区精品| 精品亚洲乱码少妇综合久久| 日本免费在线观看一区| 国产国拍精品亚洲av在线观看| 国产精品99久久99久久久不卡 | 18禁动态无遮挡网站| 人成视频在线观看免费观看| 国产亚洲精品久久久com| 91精品三级在线观看| 午夜久久久在线观看| 韩国精品一区二区三区 | 97在线人人人人妻| 亚洲欧美成人综合另类久久久| 99热6这里只有精品| 亚洲第一区二区三区不卡| av国产精品久久久久影院| 午夜视频国产福利| 又粗又硬又长又爽又黄的视频| 成年av动漫网址| 日本与韩国留学比较| 亚洲国产色片| 满18在线观看网站| 国产亚洲精品久久久com| av有码第一页| 啦啦啦中文免费视频观看日本| 成人午夜精彩视频在线观看| 国产男女内射视频| 国产精品久久久av美女十八| 日韩制服骚丝袜av| 黑人巨大精品欧美一区二区蜜桃 | 97在线人人人人妻| 一区二区av电影网| 国产爽快片一区二区三区| 亚洲精品,欧美精品| 国产精品久久久久久精品电影小说| 欧美最新免费一区二区三区| 91精品伊人久久大香线蕉| 宅男免费午夜| 国产男女内射视频| 美女中出高潮动态图| av片东京热男人的天堂| 精品亚洲乱码少妇综合久久| 草草在线视频免费看| 蜜桃国产av成人99| 汤姆久久久久久久影院中文字幕| 一二三四在线观看免费中文在 | 好男人视频免费观看在线| 久久精品国产亚洲av涩爱| 欧美少妇被猛烈插入视频| 亚洲国产毛片av蜜桃av| 免费不卡的大黄色大毛片视频在线观看| 纯流量卡能插随身wifi吗| 一区二区三区精品91| 美女国产高潮福利片在线看| 制服丝袜香蕉在线| 免费日韩欧美在线观看| 欧美日韩综合久久久久久| 中文字幕av电影在线播放| 亚洲性久久影院| 午夜久久久在线观看| 国产精品麻豆人妻色哟哟久久| 久久这里只有精品19| 国产精品久久久久久精品古装| 亚洲精品视频女| 曰老女人黄片| 欧美精品人与动牲交sv欧美| 成年人午夜在线观看视频| 日本免费在线观看一区| 国产亚洲精品久久久com| 国产高清不卡午夜福利| 9191精品国产免费久久| 国产成人精品在线电影| 久久午夜福利片| 伦理电影大哥的女人| 日韩成人av中文字幕在线观看| 亚洲精华国产精华液的使用体验| 久久久精品94久久精品| 久久久欧美国产精品| 国产男女内射视频| 人妻 亚洲 视频| 大码成人一级视频| 天天影视国产精品| 国产欧美另类精品又又久久亚洲欧美| 国产一区有黄有色的免费视频| av片东京热男人的天堂| 久久精品国产鲁丝片午夜精品| 免费在线观看黄色视频的| 黄色 视频免费看| 久久99精品国语久久久| 国产69精品久久久久777片| 18在线观看网站| 水蜜桃什么品种好| 亚洲精品日韩在线中文字幕| 日韩一区二区三区影片| 国产精品成人在线| 国产精品久久久久久久久免| 看非洲黑人一级黄片| 国产精品久久久久久久久免| 久久精品国产自在天天线| 欧美亚洲日本最大视频资源| 久久99精品国语久久久| 婷婷成人精品国产| 日本色播在线视频| 天天躁夜夜躁狠狠躁躁| 久久精品久久久久久噜噜老黄| 晚上一个人看的免费电影| 国产精品无大码| 妹子高潮喷水视频| 美女主播在线视频| 欧美成人午夜免费资源| 国产成人免费观看mmmm| 国产成人精品一,二区| 日韩大片免费观看网站| 国产色婷婷99| 97在线视频观看| h视频一区二区三区| 亚洲综合色网址| 夜夜爽夜夜爽视频| 午夜福利乱码中文字幕| av免费在线看不卡| 巨乳人妻的诱惑在线观看| 国产精品人妻久久久影院| 天天影视国产精品| 国产亚洲午夜精品一区二区久久| 考比视频在线观看| 久久综合国产亚洲精品| 成年人免费黄色播放视频| av在线播放精品| 少妇高潮的动态图| 亚洲国产最新在线播放| 天堂中文最新版在线下载| 日本欧美国产在线视频| 国产精品 国内视频| 国产 精品1| 日韩av在线免费看完整版不卡| 99热6这里只有精品| av在线观看视频网站免费| 久久精品久久久久久久性| 日本与韩国留学比较| 免费看av在线观看网站| 免费播放大片免费观看视频在线观看| av黄色大香蕉| 自拍欧美九色日韩亚洲蝌蚪91| 成人综合一区亚洲| 男女高潮啪啪啪动态图| 高清视频免费观看一区二区| 91在线精品国自产拍蜜月| 久久久久国产网址| 咕卡用的链子| 最近最新中文字幕大全免费视频 | 69精品国产乱码久久久| av线在线观看网站| 国产精品人妻久久久影院| 成人综合一区亚洲| 国产视频首页在线观看| 免费人成在线观看视频色| 亚洲国产精品专区欧美| 黄色 视频免费看| 女性生殖器流出的白浆| 最黄视频免费看| 免费日韩欧美在线观看| 日韩欧美一区视频在线观看| 亚洲国产av影院在线观看| 韩国高清视频一区二区三区| 肉色欧美久久久久久久蜜桃| 日本爱情动作片www.在线观看| 久热这里只有精品99| 中国三级夫妇交换| 18禁动态无遮挡网站| 欧美亚洲 丝袜 人妻 在线| 女性被躁到高潮视频| 色94色欧美一区二区| 精品第一国产精品| 日韩视频在线欧美| 美女国产视频在线观看| 亚洲av综合色区一区| 男女边摸边吃奶| 免费日韩欧美在线观看| videosex国产| 精品国产国语对白av| 中文字幕人妻丝袜制服| 一二三四在线观看免费中文在 | 精品一区二区三区视频在线| 欧美亚洲 丝袜 人妻 在线| 黄色 视频免费看| 久久这里只有精品19| 少妇精品久久久久久久| 精品久久国产蜜桃| 亚洲在久久综合| 国产成人免费无遮挡视频| 国产av码专区亚洲av| 美女国产高潮福利片在线看| 大香蕉久久成人网| 国产深夜福利视频在线观看| 亚洲综合色网址| 国产精品人妻久久久久久| 久久久久久久精品精品| 九九在线视频观看精品| 最近2019中文字幕mv第一页| 观看av在线不卡| 97在线人人人人妻| 夫妻午夜视频| 欧美日本中文国产一区发布| 美国免费a级毛片| 最近最新中文字幕大全免费视频 | 久久这里只有精品19| 黑丝袜美女国产一区| 欧美 亚洲 国产 日韩一| 交换朋友夫妻互换小说| 丰满饥渴人妻一区二区三| 欧美日韩精品成人综合77777| 十八禁高潮呻吟视频| 国产国拍精品亚洲av在线观看| 熟女av电影| 在线观看免费高清a一片| 国产熟女欧美一区二区| 免费黄色在线免费观看| 国产欧美日韩综合在线一区二区| 日本欧美视频一区| 两个人看的免费小视频| 成人亚洲精品一区在线观看| 亚洲国产精品成人久久小说| 日本色播在线视频| 亚洲国产毛片av蜜桃av| 亚洲国产精品一区三区| 国产免费现黄频在线看| 曰老女人黄片| 一区在线观看完整版| 乱码一卡2卡4卡精品| 色哟哟·www| 十八禁网站网址无遮挡| 美国免费a级毛片| 男女啪啪激烈高潮av片| 天天躁夜夜躁狠狠躁躁| 久久久国产一区二区| 亚洲,欧美精品.| 18禁在线无遮挡免费观看视频| 边亲边吃奶的免费视频| 飞空精品影院首页| xxx大片免费视频| 亚洲,欧美精品.| 亚洲成人手机| 久久久亚洲精品成人影院| 18禁动态无遮挡网站| 国产精品人妻久久久影院| 尾随美女入室| 美国免费a级毛片| 视频在线观看一区二区三区| 九九爱精品视频在线观看| 2018国产大陆天天弄谢| 精品少妇黑人巨大在线播放| 国产精品久久久久成人av| 日产精品乱码卡一卡2卡三| 久久精品国产亚洲av天美| 有码 亚洲区| 久久鲁丝午夜福利片| 2021少妇久久久久久久久久久| 超色免费av| 亚洲少妇的诱惑av| 国产精品一区二区在线不卡| 波野结衣二区三区在线| 国产成人aa在线观看| 免费在线观看黄色视频的| 精品一区二区三卡| av在线播放精品| 最近最新中文字幕大全免费视频 | 亚洲精品色激情综合| 欧美激情 高清一区二区三区| videossex国产| 日韩av在线免费看完整版不卡| 成年av动漫网址| 午夜久久久在线观看| 国产在线免费精品| 丝袜脚勾引网站| 久久精品国产自在天天线| 男人添女人高潮全过程视频| 亚洲,一卡二卡三卡| 国产成人精品无人区| 国产精品久久久久久精品古装| 亚洲成国产人片在线观看| 国产成人一区二区在线| 国产在视频线精品| 亚洲美女黄色视频免费看| 精品亚洲成a人片在线观看| 亚洲欧洲精品一区二区精品久久久 | 春色校园在线视频观看| 91aial.com中文字幕在线观看| 涩涩av久久男人的天堂| 久久影院123| 成人亚洲欧美一区二区av| 99热全是精品| 成人国产av品久久久| 高清毛片免费看| 国产精品欧美亚洲77777| 国产免费现黄频在线看| 亚洲精品第二区| 精品国产一区二区久久| 国产精品人妻久久久久久| 最后的刺客免费高清国语| 亚洲精品,欧美精品| 又黄又爽又刺激的免费视频.| 观看美女的网站| 高清黄色对白视频在线免费看| 女人被躁到高潮嗷嗷叫费观| 有码 亚洲区| 90打野战视频偷拍视频| 狂野欧美激情性bbbbbb| 母亲3免费完整高清在线观看 | 99久久人妻综合| 亚洲欧洲精品一区二区精品久久久 | 国产精品一区www在线观看| 日韩精品有码人妻一区| 久久久欧美国产精品| 美国免费a级毛片| 日本av免费视频播放| 国产日韩欧美在线精品| 中文字幕av电影在线播放| 久久免费观看电影| 国产精品蜜桃在线观看| 亚洲欧洲日产国产| 黑丝袜美女国产一区| 欧美丝袜亚洲另类| 一级片'在线观看视频| 国产精品.久久久| 99热6这里只有精品| 成人国语在线视频| 日韩,欧美,国产一区二区三区| 美女国产高潮福利片在线看| 搡女人真爽免费视频火全软件| 午夜免费男女啪啪视频观看| 婷婷色综合www| 亚洲熟女精品中文字幕| h视频一区二区三区| 国产免费一区二区三区四区乱码| 国产黄频视频在线观看| 国产又爽黄色视频| 国产成人a∨麻豆精品| 国产精品国产三级专区第一集| 欧美精品高潮呻吟av久久| 国精品久久久久久国模美| 一级,二级,三级黄色视频| 免费黄网站久久成人精品| 久久午夜综合久久蜜桃| 永久网站在线| 自线自在国产av| 美女xxoo啪啪120秒动态图| 国产福利在线免费观看视频| 丝瓜视频免费看黄片| 黑人欧美特级aaaaaa片| 国产女主播在线喷水免费视频网站| 黑人巨大精品欧美一区二区蜜桃 | 九九在线视频观看精品| 亚洲国产欧美日韩在线播放| www.熟女人妻精品国产 | 建设人人有责人人尽责人人享有的| 国产精品人妻久久久影院| 看十八女毛片水多多多| 99热这里只有是精品在线观看| 亚洲精品色激情综合| 久久久国产精品麻豆| 久久精品久久久久久久性| 亚洲内射少妇av| 国产av码专区亚洲av| 女人被躁到高潮嗷嗷叫费观| 欧美精品av麻豆av| 精品熟女少妇av免费看| 日韩制服丝袜自拍偷拍| 视频中文字幕在线观看|