• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Automated Video Generation of Moving Digits from Text Using Deep Deconvolutional Generative Adversarial Network

    2023-12-15 03:57:16AnwarUllahXinguoYuandMuhammadNuman
    Computers Materials&Continua 2023年11期

    Anwar Ullah,Xinguo Yu,★ and Muhammad Numan

    1National Engineering Research Center for E-Learning,Central China Normal University,Wuhan,430079,China

    2Wollongong Joint Institute,Central China Normal University,Wuhan,430079,China

    ABSTRACT Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network (DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network (DDNN) as a Generator (G) and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric (GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.

    KEYWORDS Generative Adversarial Network(GAN);deconvolutional neural network;convolutional neural network;Inception Score(IS);temporal coherence;Fréchet Inception Distance(FID);Generative Adversarial Metric(GAM)

    1 Introduction

    In the currently advanced era,Artificial Intelligence(AI)is an active field with many significant applications and valuable research topics.AI is transforming every area of life and is a tool that allows people to rethink how the data is analyzed,integrate the information,and use the resulting insights to improve decision-making.Among the various AI techniques,machine and deep learning have gained widespread attention from researchers due to their ability to power numerous applications such as image classification,multimedia concept retrieval,text mining,video recommendations,and much more [1].In deep learning,the layered concept is used to represent data abstraction to build computational models,and algorithms such as convolutional neural networks and generative adversarial networks have completely changed the perception of information processing.Therefore,deep learning has prevailed in the field of artificial intelligence.

    Generative Adversarial Networks(GANs)[2],proposed by Goodfellow et al.in 2014,are one of the deep learning models based on zero-sum game theory,where the total gains of two players are zero,and the gain or loss of each player’s utility is precisely balanced[3].GANs often simultaneously involve a Generator (G) and a Discriminator (D) learning.The G attempts to capture the potential distribution of the real samples and creates new data samples.At the same time,the D is often a binary classifier to distinguish the real samples from the generated samples as accurately as possible.Thus,G and D inherited the structure of the currently popular deep neural networks[4,5].The GAN optimization process is a minimax game process,and the goal is to achieve Nash equilibrium[6],which assumes that the G has captured the distribution of real samples.Therefore,the importance of this emerging generative model is to preserve data distribution through unsupervised learning and generate more realistic/actual data [3].GANs have been extensively studied due to their massive application viewpoint,including language,image,video processing,etc.

    Intelligent graphic design tools have the potential to generate engaging and informative videos that help people learn about the world around them.However,these tools can be challenging and inaccessible,especially for those with limited technical knowledge and resources.Therefore,an intelligent system capable of performing text-based video editing tasks is necessary to make video creation easier for people with less extensive technical expertise.Such techniques can be applied across various domains,including gaming,virtual reality,and educational materials.In Computer Vision(CV),the use of Generative Adversarial Networks (GANs) for the automatic generation of visual content is a significant advancement,enabling the creation of highly realistic images and videos.By incorporating GANs into intelligent video editing systems,accessibility and ease of use can be improved,empowering more people to create compelling visual content.Significant research has been devoted to improving the quality of results in various fields,including these GAN for super-resolution images [7,8],for image and video classification [9,10],respectively,for cartoon images generation from face photos [11],for Computed Tomography (CT) image denoising [12],etc.GANs are also particularly effective in tackling complex tasks such as multi-domain synthesis[13,14],and multi-view generation [15].These techniques have demonstrated remarkable success in image denoising,highresolution,and generating high-quality images and videos,making them valuable tools for various applications.

    Video generation from text is a complex task compared to image generation[16-19]because videos are a complex sequence of individual images that follow spatial and temporal dependencies and are more difficult due to semantic alignment requirements between text and video at frame and video levels [20].Realistic and synthetic video generation from text is a difficult task because there are multiple issues,such as capturing the coherence between individual frames,digit deforming,noises between co-related structures,blur output,and temporal coherence between frames.For example,a hybrid Variational Auto Encoder and Generative Adversarial Network(VAE-GAN)-based model was proposed by Li et al.[21]to first produce a“gist”of the video from the given text using VAE,where the “gist”is an image that specifies the background color and object arrangement.Then,based on the gist,the video’s motion and substance are created using GAN.Unfortunately,the movements of generated video are incoherent since they ignore the relationship between successive frames.

    On the other hand,Pan et al.[22]paid attention to the temporal coherence between frames and produce video using the input text using a properly conceived discriminator.However,because text and video are aligned broadly based on the classical conditional loss [16],some precise semantic words essential for synthesizing the details are ignored in the resulting videos.Moreover,Chen et al.[20]proposed a Bottom-Up Generative Adversarial Network (BoGAN) model to deal with multi-level alignment and coherence problems using region-level,frame-level,and video-level losses;the results are competitive.However,they are facing some overshoot because of multiple levels and 3D deconvolution.In contrast to 1D and 2D deconvolution,3D deconvolution results in a more significant loss of information.While applying the video-level discriminator to the whole video,they need to pay more attention to the significant words for a better result,causing some incoherent problems in the adjacent frames.

    This research proposes a novel solution for generating moving digit videos from text using the Deep Deconvolutional Generative Adversarial Network (DD-GAN).This research focuses on generating preschool math videos from a text that exhibits spatial and temporal coherence.Using DD-GAN,we can generate synthetic videos for addition,subtraction,multiplication,and division from written text,potentially significantly improving children’s math education.Some of the generated videos of DDGAN can be seen in Fig.1 using SBMG,TBMG,and custom math video datasets.

    Figure 1:The experimental results of our DD-GAN on the SBMG,TBMG,and custom math videos datasets

    In the DD-GAN architecture,we start with an LSTM-based text encoder that transforms the text description into a text embedding (φt) and then applies a conditioning augmentation (CA)technique[18]to the embedding,which helps to smooth the text in multiple ways.The resulting text embedding φtis then combined with a conditioning extension variable ′c and a random noise variablezand fed through a deep deconvolutional neural network (DDNN) that serves as the Generator(G) to produce the video.The DDNN model uses an inverse convolution approach,which allows it to generate videos with spatial and temporal coherence that match the input text description.A modified deep convolutional neural network(DCNN)serves as the Discriminator(D),which is used to distinguish between the generated and real videos.The DDNN and DCNN networks are trained using an adversarial training scheme.The DDNN network is trained to fool the DCNN network into thinking that the generated frames are real,while the DCNN network is trained to classify the generated frames as fake correctly.This feedback loop between the DDNN and DCNN networks allows the DDNN to improve over time and generate more realistic videos.The DCNN model has been modified to enhance its ability to identify differences between the generated and real videos,as described later in this paper.First,we trained our DD-GAN with two synthetic datasets called Single Digit Bouncing Mnist-4(SBMG)[22]and Two Digit Bouncing Mnist-4(TBMG)[22].It then trained on a custom early school mathematics video dataset (Math Dataset).For the quantitative assessment,we used the most important and commonly used metrics,namely:(1)Inception Score(IS)[23],(2) Fréchet Inception Distance (FID) [24],(3) Fréchet Inception Distance 2 Video (FID2vid)[25],and (4) Generative Adversarial Metric (GAM) [26].In addition,we also conducted a human study for further evaluation based on the realism,relevance,and coherence of the generated videos.As a result,our DDGAN performs better than other state-of-the-art methods.Fig.2 showcases the simplified architecture of DD-GAN,and Fig.3 shows the simplified architecture of other state-of-theart methods.In contrast,the complete architecture of the DD-GAN can be seen later in this paper.

    Figure 2:Simple architecture of our proposed DD-GAN model

    Figure 3:Simple architecture of the other state-of-the-art text-to-video GAN model

    The contribution to the proposed DD-GAN is summarized as follows:

    ■Firstly,we introduced a novel Deep Deconvolutional Generative Adversarial Network(DDGAN)that generates high-quality moving digit videos from written text.The generated frames are closely temporal coherent with the given scripts.

    ■Secondly,we used a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)to generate moving digits and semantically matched video from the input text due to its deep architecture and proposed several modifications to the Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to utilize the conditional information provided by the input text and effectively distinguish between generated and real videos.

    ■Thirdly,we tested the performance of the DD-GAN model on Mnist SBMG and TBMG datasets,as well as a custom-generated mathematics video dataset.We evaluated the results using standard metrics such as Inception Score (IS),Fréchet Inception Distance (FID),FID2vid,and Generative Adversarial Metric(GAM).

    ■Fourthly,we also conducted a human study to assess realism,relevance,and coherence and demonstrated significant improvements in metrics and human analysis,thus proving the effectiveness of the proposed approach.

    This research work is structured as follows:Section 2 provides a comprehensive overview of related work,examining various video-generated GAN models.The proposed methodology is then detailed in Section 3 before presenting the results of our experiments and a thorough discussion of our findings,including limitations,in Section 4.Finally,we conclude our research in the last section.

    2 Related Works

    The increasing trend of AI Generative Adversarial Networks (GANs) models and algorithms for automatic content creation in various media,entertainment,and education sectors has sparked an increasing interest in automatically generating content such as text,audio,images,and videos.Different variations of the GAN models are covered in various reviews and research papers for more details about the generation and synthesis of multimedia content(images,videos,and audio),along with some applications,classification,challenges,and performance of various GAN models presented by Lalit and Singh[27].But this related work focuses only on the video GANs models,which is the nature of our research topic.Vondrick et al.[28] made the first attempt in 2016 in this direction to produce video sequences.This GAN uses a two-stream technique,with one stream concentrating on the static background and the other on the foreground.They proposed a GAN architecture based on 3D convolutions;they generated encouraging results;closer observation showed that it could only produce fixed-length,somewhat noisy,and lacking in object structural integrity videos.In contrast to 1D and 2D deconvolution,3D deconvolution results in a more significant loss of information.The comprehensive review of the video GAN models discussed briefly is based on two main divisions of condition,with our focus being on the conditional category of text-to-video GANs.

    2.1 Unconditional Video Generation

    The unconditional video GANs are those with an unsupervised frame where the videos are produced without prior information [29].The model must capture the data distribution without an input signal wizard that can help narrow the target gap.Training unconditional video GANs is so complicated that some of them become the foundation for conditional frameworks,like Motion Content GAN(MoCoGAN)[30]is an unconditional video GANs model used in conditional video GANs models such as storyGAN[31],and Text Filter Conditioning GAN(TFGAN)[32].

    2.2 Conditional Video Generation

    Several works used conditional signals in GANs to control the modes of generated data.These conditions may be based on images[33-35],semantic maps[36-39],audio signals(speech)[40-43],or video[44-47],but due to the nature of our DD-GANs model,we reviewed and explained the current work based on textual conditions in Section 2.3.

    2.3 Text-to-Video Generation

    Text-to-Video GAN’s models focus on two main purposes:producing video according to conditional text.Firstly,to maintain semantically aligned consistency between the given text condition and video generation,and secondly,to generate realistic videos to maintain consistency and coherence in the frames.Mittal et al.[48]developed a method that captures the time-dependent sequence of frames to merge a variational autoencoder with a recurrent attention mechanism.This method was the first text-to-video generation approach implemented in 2017.They addressed some of the drawbacks in[28],especially the lack of object structural integrity in videos,and they upheld the objects’structure to a significant extent.An improved model[49]was later proposed,namely(Cap2vid),in which the shortand long-term dependencies between frames and generated videos are incrementally integrated.They specially addressed the spatiotemporal semantics of the video,thus generating good-quality videos from the caption.

    Onward,in 2018,Pan et al.[22] proposed a novel Temporal Generative Adversarial Network Conditioning on Captions (TGANs-C),where the input of the Generator (G) was random noise along with a caption.Where the latent noise vector and caption embedding are combined as the input to the generator network,which is then used to create a frame sequence using 3D spatiotemporal convolutions,they tried to overcome the temporal and semantic coherence-dependent frame problems and successfully overcome them somehow.In contrast,the GANs model proposed in[21]generated videos using a two-step VAE based on input text to generate a gist of the video.The gist,which is an image,gives a background color and layout of the object,and after that,the video content and motion are generated.The authors tried to address the mode-collapse problems in the frames.In some results,they achieved competitive results.

    However,reference[32]introduced a TFGAN method with multiscale text conditions to generate convolution filters that extract the text features from the coded text.After generating the convolution filter,they are fed into the discriminator to facilitate and strengthen the links between text and video and generate some competitive samples.Contrary to this,the story[31]method,a story visualization model based on the condition of multiple sentences,contains a context encoder and a story encoder.Unlike other video GAN models,storyGAN focuses less on the stability of motion and instead on the global consistency of the story,resulting in videos with a coherent storyline rather than just smooth motion.

    The Cap2vid[49]model was improved by incrementally integrating short-and long-term dependencies between frames and generated videos,addressing spatiotemporal semantics,and generating good-quality videos from the caption.TGANs-C [22] proposed a novel approach to overcome temporal and semantic coherence-dependent frame problems by using 3D spatiotemporal convolutions to create a frame sequence from a combination of latent noise vectors and caption embedding as input to the generator network.The GANs model proposed in [21] attempted to address modecollapse problems in the frames by generating videos using a two-step VAE based on input text to generate a GIST of the video,followed by video content and motion.TFGAN[32],with multiscale text conditions,introduced a method called TFGAN that generates convolution filters that extract the text features from the coded text and feeds them into the discriminator to facilitate and strengthen the links between text and video.StoryGAN[31],a story visualization model based on the condition of multiple sentences,focuses less on the stability of motion and instead on the global consistency of the story.

    While all the methods discussed above produce positive results but still have limitations that need to be addressed in future studies,the more profound studies demonstrated that their generations need more abjectness and are generally noisy,incoherent,low-quality,and fixed in duration.The ability of video generation models to produce high-quality,coherent,and realistic videos that are an exact match for the input text description falls short of the current state-of-the-art.Video generation models aim to teach machines how to generate videos from textual descriptions.This involves advanced techniques in natural language processing,computer vision,and machine learning.These models have many potential applications,such as video summarization,content creation,and virtual reality.By improving these models,machines can better understand human language and generate useful videos for various tasks.Considering the coherence issue,DD-GANs combine video-text semantic matching and frame coherence to generate realistic,coherent videos that match the given text description.This GAN-based approach ensures synthetic videos are visually convincing while maintaining fidelity to the text.It represents a promising direction for advancing video synthesis and semantic matching.

    3 Proposed Methodology

    The fundamental difficulties in generating video from text lie in capturing both spatial and temporal coherence and the semantic relationship between text and video.This research addresses the temporal coherence problem and proposes a novel approach called Deep Deconvolutional Generative Adversarial Network(DD-GAN)for generating moving digit videos from text descriptions.DD-GAN is designed to overcome the challenges of text-to-video generation,particularly in generating synthetic and early school mathematics videos from the text.

    GAN is a deep neural network consisting of a Generator(G)and a Discriminator(D).These two components are trained competitively,where G generates new data while D authenticates the data.In our proposed method,the Generator(G)is a deep deconvolutional neural network that generates the data.At the same time,the Discriminator(D)is a modified deep convolutional neural network that authenticates the data.

    The overall diagram of our proposed DD-GAN method is shown in Fig.4,which includes the Long Short Term Memory(LSTM)-based text-encoder,Conditional Augmentation(CA),DDNN as a Generator (G),and DCNN as a Discriminator (D) with every step and process.In the following sections,we will present the details of our novel proposed method.

    3.1 Text Encoder Network

    To convert the provided text description intoztextlatent code vectors or machine-readable codes for video generation,we used an LSTM-based encoder.Words from a phrase are fed one by one in sequence to the encoder at each time step during the testing and training phases.For example,if the text“Digit 2 is moving up and down”is used,the word“Digit”is fed at time stept=1,the word 2 is provided at time stept=2,and so on.Each word is first represented as a single-hot vector.As a result,{w1,w2,...,wn}can be used to express a sentence of length n.The single-hot vector for thet-thword iswt.After that,the sentence is sent into a bidirectional LSTM network,which contextualizes each word inht.Next,we use an LSTM-based encoder to input the contextually embedded word sequence{h1,h2,...,hn},and use the latent text codeztext? Rdtextas the final LSTM output.After getting the embedding output,Conditioning Augmentation(CA)techniques are used to further extract valuable features.

    Figure 4:The framework of our proposed DD-GAN text-to-video generation model.A first LSTMbased encoder is used for text-embedding,Conditioning Augmentation (CA) for smooth condition manifolds,Generator(G)for generating the data,and Discriminator(D)for authenticating the data

    3.2 Conditional Augmentation(CA)Technique

    The majority of text-to-video models obtain the text and encode it using an encoder,resulting in text-embedding (φt).A text-embedding (φt) is a nonlinear transformation that produces a latent dependent variable for an independent generator variable (G).Furthermore,the latent space for the φtis extremely high-dimensional,exceeding 100 dimensions.As a result,if there are fewer data or small datasets,the latent data manifold is more likely to break or discontinue,which is undesirable for the generator (G).This issue is vast because it causes blurry artefacts,incoherence,and frame disconnectivity.To mitigate the problem,Zhang et al.[18] proposed a novel technique called Conditioning Augmentation(CA),which adds a new conditioning variable`c to the equation.Due to the inconsistency with the fixed or static conditioned text variable “c,”the latent variable `c was computed using an independent or autonomous Gaussian DistributionN(μ(φt),∑(φt)),whereμ(φt),stands for the mean(average)and ∑(φt)stands for principle diagonal covariance matrix.The proposed conditioning augmentation provides better outcomes and more training pairs,resulting in the smallest number of text and frames pairs possible,motivating robustness to share fewer changes with the conditioning phase.To perfect the conditioning process and avoid overfitting.

    While training,we introduced a specific condition to enhance the performance of G.This condition involved the equation concerning divergence,named the Kullback-Leibler(KL)divergence,between the standard Gaussian distribution and the conditioning Gaussian distribution.This innovative approach aims to achieve smoothness in the condition manifold.Additionally,we employed CA to improve the performance of the proposed method and the conditional manifold’s smoothness and robustness.

    3.3 Deep Deconvolutional Neural Network as Generator(G)

    Generating videos from semantic texts faces two main challenges: first,extracting relevant information from sentences and linking it to video content,which can be addressed with advances in Natural Language Processing(NLP)and multimodal learning.Second,accurately modeling both long-and short-term temporal dependencies of video frames,including slow-changing global and fast-changing local patterns.Existing techniques focus only on fast-changing patterns with 3D convolutions or treat each frame generation independently,ignoring temporal dependency modeling.

    In this research,we proposed a novel approach for generating moving digit videos from text,specifically targeting synthetic and preschool mathematics scripted video generation.The proposed method,Deep Deconvolutional Generative Adversarial Network(DD-GAN),effectively captures the long-term temporal dependencies between frames and ensures coherence between individual frames during video generation.Through advanced deep learning techniques and a carefully crafted network architecture,our method produces high-quality videos that accurately reflect the intended content of the input text.Fig.4 illustrates our approach and demonstrates its effectiveness in generating realistic and coherent videos.The DD-GAN has a deep spatiotemporal architecture,whose input combines embedded text φtand an extra variable of Conditioning Augmentation (CA) `c and noisez.The result is a sequence of generated video frames.In more detail,each word in the bi-directional LSTM relates to 2 hidden states,one for the forward direction and one for the backward direction.As a result,we combine its 2 hidden states to express a word’s semantic meaning.E ? RT×wnrepresents the feature matrix for all words.The ith column represents the embedding of the ith word,ei.T represents the dimension of the extracted word,while Wnrepresents the number of words in a sentence.The actual sentence representation,`e ? RT,concatenates the bidirectional LSTM’s latest hidden states.We combine the semantic text embedding`e with the additional variable of Conditioning Augmentation(CA)`c and random noisez? RTz~N(0,1)to generate a video with the same semantic content as the input text.Then we learn a unified embedding using a fully-connected layer.

    whereWe ?Rtp×(tz+T)represents the embedding weight,and tpis used to represent the embedding size;moreover,(;)represents the concatenation operation.Below is the actual input for the Generator(G)to produce the video.

    Here,n=1,...,l,andlrepresent the number of frames in the video.InG,weight sharing is commonly employed in both the temporal and spatial directions to reduce the number of parameters and improve model stability.The entire network can be viewed as a nonlinear mapping between semantic texts and desired videos.For a frame-wise generation,there is a deep deconvolution neural network in the spatial direction at each time step.Deconvolution,batch-normalization,and ReLU activation layers are employed first,followed by another deconvolution and up-pooling(Up-sample)layer to upscale the frame size by a factor of two across consecutive levels of feature maps.This process is repeated several(n=5)times to generate a 64 ×64 size video containing 32 frames.Every step is a memory unit,so the state of these memory units is communicated between frames,allowing each frame to be built based on historical data,contributing to temporal coherence.Finally,we combine all of the created frames into a single video with a size oftl×tc×th×tw,or we can represent the whole video as:

    Here,χvideorepresents the generated video,andG(p)?denotes for the ithgenerated frame.Wheretlis the channel number,tcis the sequence length,andthandtwis used for frame height and width.Due to the deep deconvolutional scheme,our DD-GAN can generate synthetic,plausible,and preschool mathematics videos from text with realism,relevance,and coherence between frames.The cost function that DD-GAN uses is shown in the equation below:

    where is the length of the text,andGis the Generator function for video generation.?is the generator function’s stochastic gradient that will be optimized by the generator.Moreover,we utilize our work by using the Wasserstein GAN[50]formulation,which is given below:

    Here,Drepresents the discriminator,Grepresents the generator,?dand ?gare used for maintaining maximum and minimum Lipschitz constants of the function,and z represents random noise.Other equations used in our proposed model are given below:

    For Generator(G)and Discriminator(D)losses,we used the following equations,respectively:

    whereGAmeans generator accuracy,CORis the currently organized rate,RSrepresents the correctly recognized samples,andAdenotes the number of all samples.

    And for Discriminator(D)accuracy,we used the below equation:

    whereDAmeans discriminator accuracy,similarly,CORdenotes the currently organized rate,ASrepresents the accepted samples,and finally,ESis the experimental sample.

    3.4 Modified Deep Convolutional Neural Network as Discriminator(D)

    The discriminator’s role is crucial in determining the authenticity of a video,as it categorizes it into two distinct categories:“genuine”and“fake”.To differentiate between actual and generated videos,the discriminator community relies on three perspectives: (1) the complete video,(2) every video frame,and (3) the movement throughout adjacent frames.Deep Convolutional Neural Network (DCNN)has proven to be an effective and accurate tool for classifying imaging and video data.Therefore,we utilized modified DCNN as a discriminator to distinguish between genuine and fake videos.

    Dextractsmvvideo-level features from the input video firstV∈via deep convolution layers;after that,mvis sent into a fully-connected layer with SoftMax to determine whether the input video is real or fake from a global perspective.We modify the DCNN in the following ways:1)One 3D convolution layer replaces the last two linear layers,2) We reduce the layer’s size accordingly(for 64 × 64 videos) for better results and time complexity,3) Following each 3D convolution,batch normalization,leakyReLU activation function,and max-pooling(down-sampling)layer is used,4)Embedded text(φt)along with the extra CA“`c”variable are combined in the 2nd last layer.5)The Sigmoid activation function is used at last.In more detail,convolution,batch-normalization,and LeakyReLU activation layers are employed first,followed by another convolution and down-pooling(Max-pooling)layer to downscale the frame size by a factor of two across consecutive levels of feature maps.This process is repeated several timesn=5 to down-sample the video,which contains 32 frames.Every step is a memory unit,so the state of these memory units is communicated between frames,allowing each frame to be built based on historical data and contributing to temporal coherence.Finally,we down-sample the video to a single vector to classify whether the video is real or generated.

    The following objects can be used to jointly train the weights of the Generator(G)and Discrimination(D).

    where G and D are the generator and discriminator parameters,respectively,the distribution of actual videosvis represented bypdata(v).The ?dand ?gused for maintaining the maximum and minimum Lipschitz constants of the function;the noise taken from a Gaussian distribution is denoted byz,while the text condition is denoted by φt.WhenpG(z)=pdatathe object will reach the global optimum.

    4 Experiments

    This section presents the experimental details and results of our DD-GAN.To provide a comprehensive evaluation,we compared our proposed model with several state-of-the-art methods on three datasets:Mnist-4 single and two-digit moving/bouncing datasets(SBMG and TBMG)and a custom-generated dataset.Our evaluation included both qualitative and quantitative metrics.By doing so,we aimed to better understand and demonstrate the performance of our DD-GAN in comparison to existing methods.

    4.1 Dataset Details and Creation

    First,we trained our proposed model on single-digit bouncing Mnist-4 and two-digit bouncing Mnist-4 datasets (SBMG and TBMG).Both datasets were publicly available and created according to Mittal et al.[48].In these datasets,every video contains 16 frames,where every frame has exactly 64×64 pixels in size.Furthermore,there is no publically available dataset for preschool mathematics videos with associated text,so we used the same trick as Li et al.[21]and downloaded a huge number of videos from YouTube and Google with related text and tags,as well as some videos made by ourselves,and wrote the associated text for them.We used the similar concept of Kim et al.[45],who created a dataset for large-scale video classification.All these three datasets’details are given below.

    4.1.1 Single-Digit Bouncing Mnist-4 Dataset(SBMG)

    This is the publically available synthetic dataset.In this dataset,only one digit from 0-9 is moving/bouncing in a specific direction.This dataset contains 1200 GIFs;each GIF is a combination of 16 frames,where every frame size is exactly 64 × 64 pixels.This dataset mainly focused on two motions,which are up-down and right-left.Every GIF is associated with a single text sentence describing the digits and their direction.We know that two motions are slightly simple,so we improved the dataset with more motion directions like;down-to-up,up-to-down,right-to-left,and left-to-right.

    4.1.2 Two-Digits Bouncing Mnist-4 Dataset(TBMG)

    This dataset is also a publicly available synthetic dataset.In this dataset,two digits from 0-9 are moving/bouncing together in similar or different directions.Two-digit bouncing Mnist-4 is a new and complicated version of the single-digit bouncing Mnist dataset.Similarly,this dataset also has 16 frames of GIFs with a similar exact 64×64-pixel size for a single frame.We also improved this dataset with similar motion directions:down-up,left-right,right-to-left,and left-to-right.

    4.1.3 Custom Generated Dataset(Math Dataset)

    For this research,we created a new preschool mathematics video dataset.We used the same trick as Li et al.[21]and downloaded videos from YouTube and Google with associated text and tags;we also made some videos by ourselves with related text descriptions.However,downloaded videos are too long,so we divided them into small parts of 3-10 s and later converted them into GIFs.Moreover,establish it at 64×64 pixels in 32 frames of each GIF.The processes we have followed up on for the data collection are as follows;we selected four keywords;“Addition,”“Subtraction,”“Multiplication,”and “Division.”After that,we downloaded videos for each keyword with their title,tags,duration,and description from YouTube and Google and made some by ourselves.Onward,outlier-removal techniques are used to clean the dataset and follow the Breg et al.[51]techniques to achieve the ten most frequent tags for the set of videos.The quality of the chosen tags is further ensured by correlating them to words in the ImageNet[52]and ActionBank[53]categories.These two datasets verify that the chosen tags contain visually natural objects and actions.Only videos that have at least three of the tags chosen were considered.Some other requirements for our dataset are:1)Every video or GIF contains 3-10 s of duration.2)The title and description are purely written in English.3)The title has at least 4-5 meaningful words out of digits and stops words.4) Our dataset contains 1040 videos or GIFs collected from different sources;every category has 200+GIFs.

    4.2 Implementation

    For robust comparison,we followed Pan et al.’s[22]methodology.As we discussed,we are only focused on 64 × 64 pixels of 32 frames of video generation from the text.So,fl=32 andfh=fw=64.In the whole experiment,we usedn=5 steps,which means that we generated 32 frames of video.We used a bi-LSTM model for sentence encoding,where input,output,and hidden layers are all set to 256,i.e.,F=fp=256.The random noise variablezis set to 100 dimensions.The weights are all calculated using an independent Gaussian distributionN(μ(φt),∑(φt))and a normal distribution with a meanμor average of 0 and a standard deviationσof 0.02.In leakyReLU,the slope of the leak is set at 0.2.Moreover,we set the mini-batch to 64 and the learning rate to 0.0002,as well as the momentum to 0.5.Onward,Generator(G)used up-pooling(up-sample)layers and the Tanh activation function.In contrast,Discriminator (D) used down-pooling (down-sample/max-pooling) layers and the sigmoid activation function to distinguish the real and fake frames.When training the DD-GAN model,we used the HP Pavilion series with 256 GB of Solid State Drive(SSD),a 1 TB hard drive,and an 8 GB NVIDIA GeForce GPU.The training time was several days while using this system.

    The choice of VGG-16 is for DD-GAN because it has been widely used in image classification and recognition tasks and has achieved state-of-the-art performance on several benchmark datasets,especially the datasets used in this study.While ResNet-50 and U-Net are advanced models that have shown promising results in computer vision tasks,they were originally designed for different purposes.ResNet-50 for vanishing gradients and U-Net for image segmentation tasks.Therefore,their architectures may not be optimal for text-to-video generation tasks.On the other side,VGG-16 has been shown to extract high-level features from images,making it ideal for generating high-quality videos from text.Our experiments have proven that VGG-16 performs exceptionally well in this task,producing results on par with other state-of-the-art models.

    4.3 Quantitative Evaluation

    This research must consider both the visual quality and the semantic match when generating a video from the text.For qualitative evaluation,we used four quantifiable evaluation metrics,which are:1)Inception Score(IS),a mathematical measurement method proposed by Saliman et al.[54].IS is used to observe the diversity and classification quality of the generated frames.2)Fréchet Inception Distance(FID),this evaluation metric is used to evaluate the visual quality of each frame.3)FID2vid,this metric is used for the temporal consistency and visual quality measurement of the whole video.Furthermore,we also used 4)Generative Adversarial Metric(GAM)further evaluates the generated videos.More details of these matrices are given below.

    4.3.1 Inception Score(IS)

    Judging the performance of a generative model is not easy.But,if we want to do that,some numerical approaches are used.One is the Inception Score (IS),a standard quantitative evaluation metric in generative models.

    Here,srepresents the generated frames or samples,whereyis labeled as a predicated parameter of the inception model.The primary motivation for introducing this metric was to generate or synthesize meaningful frames that show the diversity and classification quality of the model.As a result,theKLdivergence between the conditional probability distancep(y|s) and the marginal probability distributionp(y) must be significant.Fine-grained Mnist-4 single-digit bouncing,twodigits bouncing(SBMG and TBMS),and custom datasets are precisely adapted to the“sp”to get the best possible performance in the inception model.We tested the measure on many samples based on[54],with each model randomly picking over 1000 samples.The Inception Score (IS) results of our proposed method and other state-of-the-art methods can be seen in Table 1;a higher score means better performance.More evaluations based on Generative Adversarial Metric (GAM),which was proposed by Im et al.[55],are discussed later in this paper.

    Table 1: Performance comparison of our DD-GAN with other state-of-the-art methods by FID,FID2vid,and Inception Score (IS).Smaller is better in FID and FID2vid,while higher is better in IS

    4.3.2 Fréchet Inception Distance(FID)

    FID is the most common quantitative evaluation metric used in generative models,especially in GAN.This metric is used to evaluate the visual quality of the generated frames,in contrast to the previous Inception Score(IS),which considers the distribution of the generated frames.In contrast,the FID compares the distribution of the fake frames (generated by GAN) with the distribution of real frames(real dataset frames)that were used to train the Generator(G).We can calculate the FID score by using the following education:

    The score is denoted by F2,indicating that it is a distance measurement using square units.The feature-wise mean of the real frames and generated frames are referred to asfu_1andfu_2,respectively.Herec_1+c_2the covariance matrices of the actual feature vector and the generated feature vector,commonly denoted by Sigma(∑).The||fu_1-fu_2||2means the sum squared difference between the two mean vectors,whereTrdenotes the trace linear algebra operation.We can see the FID score in Table 1 and the FID feature distance in Fig.5.

    Figure 5:FID score calculating using the SBMG dataset

    4.3.3 Fréchet Inception Distance to Video(FID2vid)

    It is an updated form of FID;FID is a frame-or image-level comparison,while FID2vid is a videolevel comparison.We extracted features of the 2nd to the last layer from the Generator(G),where G is trained on SBMG and TBMG datasets.Between the real videos and the generated videos,an FID score is calculated.This metric is used for the visual quality and temporal consistency measurement of the whole video.

    4.3.4 Generative Adversarial Matric(GAM)

    This metric was proposed by Im et al.[55].GAM allows you to compare two generative adversarial models by competitively pitting one against the other.Given the two generative models:

    The ratio of the two types between the discriminators of the two models is calculated.

    The average classification error rate is denoted by ∈(),while the testing set is represented byxtest.If theytestis close to 1,meaning the two models have almost equal capacity to identify the real videos.The connection betweenysampleand 1 can tell which model is more likely to deceive the other model.For more details,see[55].We change the mutual information term’s weights to ensure that theytestis near to 1,which allows us to compare the two models using theysample.The results of GAM can be seen in Table 2.

    Table 2: Generative Adversarial Metric(GAM):The ysample score with ytest balanced to one

    4.4 Human Rank(HR)or Human Evaluation

    To better evaluate our proposed DD-GAN model,we also conducted human studies to assess the generated videos’ visual quality and semantic consistency.That is because IS,FID,FID2vid,and GAM focus only on measuring the realism of the generated videos.Still,on the other side,they all ignore the semantic match between the generated video and the text description.We followed Chen et al.’s [20] methodology.Forty generated videos with associated text descriptions from the Mnist-4 (SBMG) and 40 from the Mnist-4 (TBMG) were randomly selected and evaluated by 80 human subjects,which are university and college students from Abdul Wali Khan University Mardan(Timergara Campus)and Government Post Graduate College Timergara Dir Lower,KPK,Pakistan.Students rate the generated videos according to three criteria,which are: 1) Realism: This means how much realism is found in the generated videos according to the given text.2) Relevance: This means matching between the generated videos and their text description.3) Coherence: This means the integrity of consecutive frames or the consistency of time across multiple frames.

    Each criterion has ten rankings,ranging from 1 to 10,with 1 for bad and 10 for good.After collecting all the data,our proposed DD-GAN model achieved better performance as compared to other state-of-the-art models.The result of the human study can be seen in Table 3.

    Table 3:Our proposed DD-GAN method averages ratings on each criterion of all created videos by each approach on the SBMG and TBMG datasets(realism,relevance,and coherence).Higher(better)

    4.5 Compared Methods

    Several state-of-the-art methods are used to compare the performance of our DD-GAN method.Those comparison methods are:GAN Conditional Latent Space(GAN-CLS)[16],BoGAN[20],T2V[21],TGAN-C[22],VGAN[28],MocoGAN[30],TFGAN[32],Sync-DRAW[48],Cap2vid[49],and IRC-GAN[56].The comparison results are shown in Figs.6 and 7,as well as Tables 1-3.

    Figure 6: Using the TBMG dataset,the experimental results of our DD-GAN and various other approaches for the caption“digit 7 is going right then left while digit 3 is going down than up”

    Figure 7: Using the SBMG dataset,the experimental results of our DD-GAN and various other approaches for the caption“digit 9 is going up and down”

    4.6 Qualitative Analysis and Results

    Figs.6 and 7 show examples of results generated by several models,including our DD-GAN using Mnist-4 single-digit bouncing (SBMG) and two-digit bouncing (TBMG) datasets.VGAN-c doesn’t converge and seems to perform poorly.Although Cap2vid achieves the best FID score on the Mnist-4 single-digit moving (SBMS) dataset for synthetic data,it cannot capture the coherence between individual frames,causing the visual output of this method to appear disordered over the sequence of the produced videos.Regarding temporal organization,the output of TGAN-C,IRCGAN,and Sync-DRAW appears well-organized,but every frame’s output digit is deformed.Although the results generated using GAN-CLS appear realistic,they have several noises in each frame and lack movement between frames.In a photo-realistic example,IRC-GAN,MoCoGAN,VGAN-C,and TGAN-C get some convincing but blurred outputs related to the input description.BoGAN achieved some excellent results using Mnist-4 (SMBG and TBMG) but faces some issues in the temporal coherence between frames.Ultimately,our DD-GAN generated competitive results using the Mnist-4(SBMG and TBMG)datasets.Our proposed method results show that our model can generate videos of clear,fine quality,and coherence.Using a custom-generated dataset,our model can also generate preschool mathematics videos from the associated text,and the results are exciting and impressive.

    For more qualitative evaluation,we presented a lot of additional generated intact samples,which can be seen in Figs.8-10.The results of our proposed DD-GAN method using the Mnist-4 single-digit bouncing (SBMG) dataset are shown in Fig.10,and Mnist-4 two-digit bouncing (TBMG) results can be seen in Fig.8 while using a custom generated dataset,the results are shown in Fig.9.When compared to videos generated using other approaches,the videos generated by our DD-GAN look more realistic and more similar to the real videos.

    Figure 8:Generated videos from text descriptions by our DD-GAN using the TBMG Dataset

    Figure 9:Generated videos from text descriptions by our DD-GAN using our custom Math dataset

    4.7 More Discussion and Limitations

    4.7.1 More Results on High-Resolution Video

    The proposed research is not limited to 64×64 resolutions.We increased then=6 to generate 128×128 pixels of resolution videos,and the results were quite comparative.

    Figure 10:Generated videos from text descriptions by our DD-GAN using the SBMG dataset

    4.7.2 Generating More Video Frames

    The proposed research generated 16 and 32 frames of video.Onward,we also tried to generate more frames for judging the evaluation of our model,and we did it successfully,but the coherence between frames needed to be greater.

    4.7.3 No Motion Case Videos

    First,we tried to insert a sentence without including the movement and directions of the digits,for example,“Digit 2+digit 1=digit 3”.We found that our model generated a video without any movement of the digits.After that,we inserted a sentence including digit motion and direction,for example,“Digit 2+digit 2=digit 4 moving from up to down”,and then,as a result,generated a video with digit 4 moving from up to down.Fig.11 shows the results of the custom dataset without moving any digits.

    Figure 11:Generated without motion videos from text descriptions by our DD-GAN using the customgenerated dataset(no motion case)

    4.7.4 Failure Case Videos

    In some cases,we faced some failures;when we inserted complex,multiple entities,and long text description sentences in that case,we found that our proposed model(DD-GAN)did not work fine because the semantic information was difficult to extract.Moreover,our custom-generated dataset is also complicated,so sometimes the generated videos contain disconnectivity between frames.

    4.7.5 Complex Structure Limitations

    Some more limitations can be found in our proposed DD-GAN,such as instability when learning over too many frames,the inability to generate well-defined videos of larger size as well as of longer length,and the inability to insert complex and long description sentences,and the results can be affected.Moreover,while generating essential mathematics videos from text,some limitations need to be addressed.These include the fact that the movement of digits is not permanently fixed.It is just moving in the same direction again and again.Additionally,giving motion to all digits can result in confusing and unintelligible video content.The direction of digit movements can also be problematic,among other issues.While these limitations present significant challenges,they will be considered in future work to improve the quality and clarity of generated mathematics videos.In our future work,we are considering training DD-GAN better to generate longer,more extended in size,and more realistic frame videos from a text description,including but not limited to mathematics videos.

    4.7.6 DD-GAN Step-by-Step Improvement

    The proposed DD-GAN has undergone several improvements to enhance its efficiency and accuracy.Initially,we utilized a traditional deep convolutional GAN,but we gradually enhanced the performance by incorporating advanced techniques.We first integrated an LSTM text encoder into our model,which resulted in a slight improvement in accuracy.Next,we implemented Conditioning Augmentation (CA) techniques,changing layers,reducing the number of layers,and experimenting with different activation functions.These incremental changes allowed us to improve efficiency and accuracy significantly.Please refer to Table 4,Figs.12 and 13 to see the step-by-step process and the corresponding performance gains.In the table,we used keywords with different meanings,such as NM: No Modification,LSTM: Long Short Term Memory,CA: Conditioning Augmentation,CL:Changing Layers,RLS:Reducing Layer Size,AF:Activation functions,and MM:More Modification.

    Table 4: The effect of every modification of the DD-GAN is evaluated on the SBMG and TBMG datasets.A high inception score is better,while low FID and FID2vid mean better results

    Figure 12:The effect of every modification of the DD-GAN evaluated on the TBMG dataset

    Figure 13:The effect of every modification of the DD-GAN evaluated on the SBMG dataset

    5 Conclusion

    Generating realistic and synthetic videos from a text description is a complex task,requiring sophisticated techniques.This research proposes a Deep Deconvolutional Generative Adversarial Network (DD-GAN) framework to address this challenge.The DD-GAN framework consists of a Deep Deconvolutional Neural Network (DDNN) as a Generator (G),which generates moving digit videos from the text.The generated videos cannot be distinguished from real ones by a Deep Convolutional Neural Network (DCNN) used as a Discriminator (D).The DD-GAN model can generate moving digits and preschool math videos while maintaining temporal coherence between adjacent frames and ensuring that the resulting videos are well-matched with the given text.To train the DD-GAN model,we used two publicly available synthetic Mnist datasets (SBMG and TBMG),a custom dataset of mathematics videos that we collected from publicly accessible online sources,and some videos that were custom generated with matching text-video pairs.The proposed model performed well based on evaluation metrics,including Inception Score (IS),FID,FID2vid,and Generative Adversarial Metric (GAM),compared to existing state-of-the-art GAN methods.Similarly,our proposed DD-GAN also performs well regarding realism,relevance,and coherence based on human studies.Moreover,we discussed future directions and limitations that can be addressed soon.

    Acknowledgement:The authors are thankful to the National Engineering Research Center for ELearning at Central China Normal University(Wuhan)and Wollongong Joint Institute Central China Normal University(Wuhan)for every bit of support.We would like to extend our acknowledgement to those who are involved in this work,directly or indirectly.

    Funding Statement:This work is partially supported by the General Program of the National Natural Science Foundation of China(Grant No.61977029).

    Author Contributions:A.Ullah: Conceptualization,methodology,software,review,editing,writing original draft and funding acquisition.X.Yu:Conceptualization,methodology,reviewing and funding acquisition.M.Numan:Validation,data collection and review.

    Availability of Data and Materials:Data will be made available on request.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    各种免费的搞黄视频| 妹子高潮喷水视频| 欧美黑人精品巨大| 男女下面插进去视频免费观看| av片东京热男人的天堂| 黄色a级毛片大全视频| 在线 av 中文字幕| 日韩精品免费视频一区二区三区| 亚洲国产精品一区二区三区在线| 大码成人一级视频| 国产精品久久久久久精品电影小说| www.自偷自拍.com| 亚洲av电影在线进入| 国产av又大| 欧美另类一区| 一级黄色大片毛片| 国产一卡二卡三卡精品| 亚洲黑人精品在线| 午夜激情久久久久久久| 成年人午夜在线观看视频| 日本91视频免费播放| 90打野战视频偷拍视频| 两人在一起打扑克的视频| 日日摸夜夜添夜夜添小说| 黄色 视频免费看| 欧美黄色片欧美黄色片| 成年人黄色毛片网站| 国产精品一区二区在线不卡| 精品国产超薄肉色丝袜足j| 80岁老熟妇乱子伦牲交| 免费人妻精品一区二区三区视频| 青青草视频在线视频观看| 精品少妇一区二区三区视频日本电影| 蜜桃在线观看..| 亚洲国产欧美一区二区综合| 两个人看的免费小视频| 久久久久久人人人人人| 亚洲人成77777在线视频| 国产国语露脸激情在线看| 啦啦啦中文免费视频观看日本| 亚洲精品在线美女| 欧美黑人欧美精品刺激| 亚洲国产日韩一区二区| 午夜精品久久久久久毛片777| 久久综合国产亚洲精品| 久久久久久人人人人人| 国产在线免费精品| 色婷婷av一区二区三区视频| 欧美人与性动交α欧美软件| avwww免费| 狠狠婷婷综合久久久久久88av| 性少妇av在线| 99九九在线精品视频| 丝瓜视频免费看黄片| 在线天堂中文资源库| 男人操女人黄网站| 国产99久久九九免费精品| 老司机靠b影院| 十八禁网站免费在线| 如日韩欧美国产精品一区二区三区| 丰满迷人的少妇在线观看| 午夜激情av网站| 日韩电影二区| 男女边摸边吃奶| svipshipincom国产片| 97人妻天天添夜夜摸| 午夜免费观看性视频| 母亲3免费完整高清在线观看| 午夜久久久在线观看| 曰老女人黄片| 日韩精品免费视频一区二区三区| 国产精品偷伦视频观看了| 欧美另类一区| 黄色 视频免费看| 19禁男女啪啪无遮挡网站| 久久香蕉激情| 天天躁日日躁夜夜躁夜夜| 丁香六月天网| 老鸭窝网址在线观看| 国产精品欧美亚洲77777| 日韩 亚洲 欧美在线| 另类亚洲欧美激情| 午夜精品久久久久久毛片777| 黄色毛片三级朝国网站| av在线老鸭窝| 男女免费视频国产| 欧美精品av麻豆av| 欧美日韩亚洲高清精品| 俄罗斯特黄特色一大片| 爱豆传媒免费全集在线观看| 一边摸一边抽搐一进一出视频| 国产精品久久久av美女十八| 纵有疾风起免费观看全集完整版| 欧美久久黑人一区二区| 婷婷色av中文字幕| 久久久水蜜桃国产精品网| bbb黄色大片| 黑人操中国人逼视频| 18禁观看日本| 男女床上黄色一级片免费看| 久久99一区二区三区| 嫁个100分男人电影在线观看| 各种免费的搞黄视频| 9色porny在线观看| 国产亚洲精品一区二区www | 永久免费av网站大全| 精品国产乱码久久久久久小说| 黑人猛操日本美女一级片| 少妇猛男粗大的猛烈进出视频| 一级a爱视频在线免费观看| 亚洲av成人不卡在线观看播放网 | 中文字幕最新亚洲高清| 两性午夜刺激爽爽歪歪视频在线观看 | 欧美 日韩 精品 国产| 美女高潮到喷水免费观看| 精品国产一区二区久久| 男人爽女人下面视频在线观看| 久久精品aⅴ一区二区三区四区| 日韩大码丰满熟妇| 一级毛片女人18水好多| 国产亚洲av片在线观看秒播厂| 欧美xxⅹ黑人| 9色porny在线观看| 在线观看人妻少妇| 69av精品久久久久久 | 一本综合久久免费| 欧美少妇被猛烈插入视频| 欧美大码av| 美国免费a级毛片| 国产无遮挡羞羞视频在线观看| 91麻豆精品激情在线观看国产 | 首页视频小说图片口味搜索| 久久ye,这里只有精品| 搡老乐熟女国产| 男人爽女人下面视频在线观看| 巨乳人妻的诱惑在线观看| 女人久久www免费人成看片| 免费观看av网站的网址| 欧美精品一区二区大全| 波多野结衣一区麻豆| 伦理电影免费视频| 国产在视频线精品| 丝袜在线中文字幕| 热99久久久久精品小说推荐| 久久精品国产亚洲av高清一级| 最新在线观看一区二区三区| 国产成人啪精品午夜网站| 大型av网站在线播放| 十分钟在线观看高清视频www| 在线十欧美十亚洲十日本专区| 日韩视频在线欧美| 久久女婷五月综合色啪小说| 精品久久久久久电影网| 伦理电影免费视频| 免费在线观看视频国产中文字幕亚洲 | 狂野欧美激情性bbbbbb| 国产一区有黄有色的免费视频| 国产有黄有色有爽视频| 中文精品一卡2卡3卡4更新| 亚洲国产精品成人久久小说| 老司机在亚洲福利影院| videos熟女内射| 免费高清在线观看日韩| 狂野欧美激情性xxxx| 精品国产国语对白av| 无遮挡黄片免费观看| 亚洲九九香蕉| 国产成人av教育| 高潮久久久久久久久久久不卡| 免费日韩欧美在线观看| 午夜影院在线不卡| 免费女性裸体啪啪无遮挡网站| 欧美日韩福利视频一区二区| 欧美精品av麻豆av| 欧美日韩精品网址| 成年动漫av网址| 天堂中文最新版在线下载| 亚洲国产av新网站| 久久国产亚洲av麻豆专区| 动漫黄色视频在线观看| 久久精品aⅴ一区二区三区四区| 男女之事视频高清在线观看| 色婷婷av一区二区三区视频| 国产免费一区二区三区四区乱码| 老熟妇仑乱视频hdxx| www.999成人在线观看| 欧美亚洲 丝袜 人妻 在线| 精品一区在线观看国产| 亚洲午夜精品一区,二区,三区| 亚洲精品国产av蜜桃| 国产一区二区三区综合在线观看| 亚洲精品粉嫩美女一区| 亚洲熟女精品中文字幕| 午夜成年电影在线免费观看| 一本大道久久a久久精品| 99国产极品粉嫩在线观看| 视频区欧美日本亚洲| av福利片在线| 日日夜夜操网爽| 午夜视频精品福利| 国产一区二区三区综合在线观看| 欧美精品一区二区免费开放| 亚洲国产精品一区三区| 亚洲中文字幕日韩| 国产精品国产av在线观看| 久久久久久久久久久久大奶| 午夜成年电影在线免费观看| 一级a爱视频在线免费观看| 日韩欧美一区二区三区在线观看 | 老熟女久久久| 日韩熟女老妇一区二区性免费视频| 亚洲中文av在线| 久久精品国产a三级三级三级| 欧美变态另类bdsm刘玥| 老鸭窝网址在线观看| 国产成人啪精品午夜网站| 色老头精品视频在线观看| 美女中出高潮动态图| 永久免费av网站大全| 一级片'在线观看视频| 国产老妇伦熟女老妇高清| 亚洲成人免费av在线播放| 免费观看av网站的网址| 18禁裸乳无遮挡动漫免费视频| 老司机深夜福利视频在线观看 | 美女大奶头黄色视频| 夫妻午夜视频| 另类精品久久| 男女床上黄色一级片免费看| 久久香蕉激情| netflix在线观看网站| 天天躁狠狠躁夜夜躁狠狠躁| 国产极品粉嫩免费观看在线| 韩国高清视频一区二区三区| cao死你这个sao货| 女人爽到高潮嗷嗷叫在线视频| kizo精华| 大片电影免费在线观看免费| 99久久人妻综合| 黄片大片在线免费观看| 亚洲自偷自拍图片 自拍| 99re6热这里在线精品视频| 一级片'在线观看视频| 伊人久久大香线蕉亚洲五| 亚洲国产欧美一区二区综合| 中文字幕精品免费在线观看视频| 亚洲av电影在线进入| 国产精品一区二区在线不卡| 欧美黑人精品巨大| 久久久久久亚洲精品国产蜜桃av| 午夜精品国产一区二区电影| 成人手机av| 男人爽女人下面视频在线观看| 19禁男女啪啪无遮挡网站| 午夜成年电影在线免费观看| 国产精品久久久av美女十八| 国产区一区二久久| 999久久久精品免费观看国产| 最近最新中文字幕大全免费视频| 精品卡一卡二卡四卡免费| 18禁国产床啪视频网站| 国产激情久久老熟女| 每晚都被弄得嗷嗷叫到高潮| 男女边摸边吃奶| 丰满少妇做爰视频| 亚洲国产日韩一区二区| 午夜免费鲁丝| 国产欧美日韩一区二区三 | 欧美精品亚洲一区二区| 黄频高清免费视频| 淫妇啪啪啪对白视频 | 一区二区av电影网| 美女视频免费永久观看网站| 国产精品成人在线| 国产精品1区2区在线观看. | 日韩 欧美 亚洲 中文字幕| 一区二区av电影网| 男人舔女人的私密视频| 桃红色精品国产亚洲av| 精品少妇一区二区三区视频日本电影| av超薄肉色丝袜交足视频| 91九色精品人成在线观看| 99re6热这里在线精品视频| 免费av中文字幕在线| 亚洲成人免费av在线播放| 精品国产乱码久久久久久小说| 亚洲欧美激情在线| 亚洲精品日韩在线中文字幕| 亚洲专区国产一区二区| 一级,二级,三级黄色视频| 不卡av一区二区三区| videosex国产| 黄色毛片三级朝国网站| 亚洲精品国产精品久久久不卡| 中国国产av一级| 亚洲精品久久成人aⅴ小说| 两人在一起打扑克的视频| 午夜精品国产一区二区电影| 久久精品亚洲av国产电影网| 国产精品一区二区精品视频观看| av网站免费在线观看视频| 日韩免费高清中文字幕av| 久久久水蜜桃国产精品网| 国产成人av激情在线播放| 久久精品aⅴ一区二区三区四区| 久久久久视频综合| videos熟女内射| 视频区图区小说| 婷婷成人精品国产| 欧美精品一区二区免费开放| 国产精品一二三区在线看| a级毛片在线看网站| 我要看黄色一级片免费的| 老司机影院毛片| 日日夜夜操网爽| 中文字幕人妻丝袜一区二区| 天堂8中文在线网| 久久精品国产亚洲av香蕉五月 | 大型av网站在线播放| 免费在线观看日本一区| 亚洲国产毛片av蜜桃av| 伊人亚洲综合成人网| 国产精品欧美亚洲77777| 18禁观看日本| 电影成人av| 国产成人啪精品午夜网站| 国产成人欧美| 少妇裸体淫交视频免费看高清 | 电影成人av| 人人妻人人添人人爽欧美一区卜| 午夜免费成人在线视频| 欧美在线一区亚洲| 精品国内亚洲2022精品成人 | 手机成人av网站| 两个人免费观看高清视频| 桃红色精品国产亚洲av| 人人妻人人添人人爽欧美一区卜| www.自偷自拍.com| 亚洲欧美精品自产自拍| 久久人人爽av亚洲精品天堂| 人人妻,人人澡人人爽秒播| 超色免费av| 男男h啪啪无遮挡| 国产主播在线观看一区二区| 免费观看a级毛片全部| 色婷婷久久久亚洲欧美| 亚洲av美国av| 日韩大片免费观看网站| 国产精品免费视频内射| 色94色欧美一区二区| 国产男女内射视频| 国产高清视频在线播放一区 | 建设人人有责人人尽责人人享有的| 国产成人精品无人区| 五月天丁香电影| 国产免费福利视频在线观看| 色老头精品视频在线观看| 国产在线免费精品| 免费久久久久久久精品成人欧美视频| 大片免费播放器 马上看| 欧美另类一区| 婷婷成人精品国产| 最近中文字幕2019免费版| 丰满人妻熟妇乱又伦精品不卡| 99久久人妻综合| 这个男人来自地球电影免费观看| 亚洲成人手机| 久久久欧美国产精品| 精品国产乱码久久久久久小说| 国产精品久久久久久精品电影小说| 大香蕉久久网| 久久久久精品国产欧美久久久 | 久久久欧美国产精品| 久久久久久亚洲精品国产蜜桃av| 啦啦啦啦在线视频资源| 亚洲av日韩在线播放| 人人妻人人添人人爽欧美一区卜| 一个人免费看片子| 亚洲成国产人片在线观看| 波多野结衣av一区二区av| 成人国语在线视频| 日韩精品免费视频一区二区三区| 正在播放国产对白刺激| 99国产精品免费福利视频| 高潮久久久久久久久久久不卡| 国产欧美日韩精品亚洲av| 亚洲精品久久午夜乱码| 91成年电影在线观看| 午夜日韩欧美国产| 日韩欧美免费精品| 美女国产高潮福利片在线看| 久久久久国内视频| 动漫黄色视频在线观看| 国产极品粉嫩免费观看在线| 精品亚洲成国产av| av网站在线播放免费| 日韩欧美一区视频在线观看| 久久热在线av| 国产三级黄色录像| 亚洲国产欧美网| 99精国产麻豆久久婷婷| 免费日韩欧美在线观看| 亚洲精品日韩在线中文字幕| 国产一级毛片在线| 日韩一卡2卡3卡4卡2021年| av在线app专区| 成年女人毛片免费观看观看9 | 国产精品99久久99久久久不卡| 一级片免费观看大全| 又紧又爽又黄一区二区| 久久久国产成人免费| 嫩草影视91久久| 女人精品久久久久毛片| 精品一品国产午夜福利视频| 亚洲av电影在线观看一区二区三区| 亚洲男人天堂网一区| 亚洲 欧美一区二区三区| 日韩三级视频一区二区三区| 女人爽到高潮嗷嗷叫在线视频| 性色av乱码一区二区三区2| 国产精品久久久久久精品电影小说| 久久免费观看电影| 国产亚洲一区二区精品| 欧美日韩福利视频一区二区| 亚洲成人免费av在线播放| 大片电影免费在线观看免费| 欧美激情 高清一区二区三区| 精品国产乱子伦一区二区三区 | 丝袜脚勾引网站| 汤姆久久久久久久影院中文字幕| 国产高清视频在线播放一区 | 国产免费av片在线观看野外av| 女性被躁到高潮视频| 美女高潮到喷水免费观看| 91九色精品人成在线观看| 亚洲av日韩在线播放| 亚洲专区国产一区二区| 美女主播在线视频| 在线观看人妻少妇| 久久久精品94久久精品| 19禁男女啪啪无遮挡网站| 欧美日韩国产mv在线观看视频| 亚洲七黄色美女视频| 久热爱精品视频在线9| 欧美成狂野欧美在线观看| 91字幕亚洲| 亚洲免费av在线视频| 美女午夜性视频免费| 国产人伦9x9x在线观看| av有码第一页| 色婷婷久久久亚洲欧美| 国产黄频视频在线观看| 自线自在国产av| 精品欧美一区二区三区在线| 亚洲,欧美精品.| √禁漫天堂资源中文www| 免费黄频网站在线观看国产| 人妻久久中文字幕网| 他把我摸到了高潮在线观看 | 欧美日韩亚洲高清精品| 各种免费的搞黄视频| 少妇被粗大的猛进出69影院| 黄色a级毛片大全视频| 午夜福利在线免费观看网站| 下体分泌物呈黄色| 免费观看a级毛片全部| 新久久久久国产一级毛片| 免费久久久久久久精品成人欧美视频| 91精品三级在线观看| 国产免费一区二区三区四区乱码| 伊人亚洲综合成人网| 国产精品久久久久久精品古装| 99久久国产精品久久久| 久久精品熟女亚洲av麻豆精品| 人人澡人人妻人| videosex国产| 欧美日韩精品网址| 美女主播在线视频| 啦啦啦免费观看视频1| 青春草亚洲视频在线观看| 女人久久www免费人成看片| 亚洲va日本ⅴa欧美va伊人久久 | 97人妻天天添夜夜摸| 国产一区二区三区av在线| 午夜免费成人在线视频| 久久久精品94久久精品| 欧美日韩中文字幕国产精品一区二区三区 | 丰满迷人的少妇在线观看| 视频区欧美日本亚洲| 亚洲一区中文字幕在线| 香蕉国产在线看| 99久久人妻综合| 国产激情久久老熟女| 十八禁网站免费在线| 日韩免费高清中文字幕av| 亚洲精品日韩在线中文字幕| 亚洲av成人不卡在线观看播放网 | 国产在线视频一区二区| 不卡av一区二区三区| 国产精品偷伦视频观看了| 精品人妻1区二区| 国产激情久久老熟女| 国产在线视频一区二区| 老司机影院成人| 香蕉国产在线看| 国产亚洲av片在线观看秒播厂| 亚洲精品国产色婷婷电影| 岛国毛片在线播放| 美女国产高潮福利片在线看| 99精品久久久久人妻精品| 国产日韩欧美在线精品| 日本av手机在线免费观看| 性色av乱码一区二区三区2| 国产精品一二三区在线看| 日本vs欧美在线观看视频| 少妇被粗大的猛进出69影院| 亚洲国产欧美网| 久久天躁狠狠躁夜夜2o2o| 午夜视频精品福利| 亚洲avbb在线观看| 欧美激情极品国产一区二区三区| a级毛片黄视频| 中文字幕人妻熟女乱码| 久久精品熟女亚洲av麻豆精品| 国产99久久九九免费精品| 亚洲国产精品一区二区三区在线| 丝袜在线中文字幕| 久久久国产一区二区| 十八禁网站网址无遮挡| 女性被躁到高潮视频| 免费观看a级毛片全部| 国产xxxxx性猛交| 精品国内亚洲2022精品成人 | av视频免费观看在线观看| 久久国产亚洲av麻豆专区| av网站免费在线观看视频| 成年av动漫网址| 亚洲精品国产一区二区精华液| 欧美激情极品国产一区二区三区| 老熟妇乱子伦视频在线观看 | 久久久久久免费高清国产稀缺| 一本色道久久久久久精品综合| 亚洲国产精品999| 一区二区三区乱码不卡18| 国产精品 国内视频| 亚洲熟女毛片儿| 麻豆国产av国片精品| 另类精品久久| 青草久久国产| 午夜精品国产一区二区电影| 亚洲国产av影院在线观看| 人人妻,人人澡人人爽秒播| 日韩一卡2卡3卡4卡2021年| 日韩制服丝袜自拍偷拍| 精品国产乱码久久久久久小说| 美女视频免费永久观看网站| 一区在线观看完整版| 午夜两性在线视频| 日韩三级视频一区二区三区| 99国产精品99久久久久| 国产三级黄色录像| 亚洲欧洲精品一区二区精品久久久| 在线观看免费视频网站a站| 黑人猛操日本美女一级片| 美女午夜性视频免费| 午夜成年电影在线免费观看| 欧美黑人精品巨大| 欧美中文综合在线视频| 成年动漫av网址| 国产男女内射视频| 80岁老熟妇乱子伦牲交| 国产亚洲午夜精品一区二区久久| 国产日韩欧美在线精品| 欧美另类一区| av视频免费观看在线观看| 国产精品 欧美亚洲| 一区福利在线观看| 丰满人妻熟妇乱又伦精品不卡| 国产av精品麻豆| 欧美国产精品va在线观看不卡| 久久精品久久久久久噜噜老黄| 午夜免费成人在线视频| 黄色a级毛片大全视频| 在线看a的网站| 亚洲精品国产av蜜桃| 久久久久精品国产欧美久久久 | 国产成人免费无遮挡视频| 男女边摸边吃奶| 午夜福利视频在线观看免费| 亚洲 欧美一区二区三区| 久久天躁狠狠躁夜夜2o2o| 亚洲成人免费av在线播放| av国产精品久久久久影院| 伊人亚洲综合成人网| 亚洲少妇的诱惑av| 色视频在线一区二区三区| av网站免费在线观看视频| 精品人妻一区二区三区麻豆| 制服人妻中文乱码| 国内毛片毛片毛片毛片毛片| 日韩中文字幕欧美一区二区| 一级片'在线观看视频| 亚洲人成77777在线视频| 欧美日韩黄片免| 国产成人啪精品午夜网站| 亚洲九九香蕉| 纯流量卡能插随身wifi吗| 制服人妻中文乱码| 国产福利在线免费观看视频| 午夜免费成人在线视频| 久久久久久久精品精品| 麻豆av在线久日| 国产又爽黄色视频| 王馨瑶露胸无遮挡在线观看| 波多野结衣一区麻豆|