• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    MOVIE: Mesh oriented video inpainting network

    2021-06-24 04:34:10LiuSenZhangZhizhengYuTaoChenZhibo

    Liu Sen, Zhang Zhizheng, Yu Tao, Chen Zhibo

    CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei 230027, China

    Abstract: Video inpainting aims to fill the holes across different frames upon limited spatio-temporal contexts. The existing schemes still suffer from achieving precise spatio-temporal coherence especially in hole areas due to inaccurate modeling of motion trajectories. In this paper, we introduce fexible shape-adaptive mesh as basic processing unit and mesh flow as motion representation, which has the capability of describing complex motions in hole areas more precisely and efficiently. We propose a Mesh Oriented Video Inpainting nEtwork, dubbed MOVIE, to estimate mesh flows then complete the hole region in the video. Specifically, we first design a mesh flow estimation module and a mesh flow completion module to estimate the mesh flow for visible contents and holes in a sequential way, which decouples the mesh flow estimation for visible and corrupted contents for easy optimization. A hybrid loss function is further introduced to optimize the flow estimation performance for the visible regions, the entire frames and the inpainted regions respectively. Then we design a polishing network to correct the distortion of the inpainted results caused by mesh flow transformation. Extensive experiments show that MOVIE not only achieves over four-times speed-up in completing the missing area, but also yields more promising results with much better inpainting quality in both quantitative and perceptual metrics.

    Keywords: mesh flow; deep neural networks; video inpainting

    1 Introduction

    Video inpainting is of high importance for many professional video post-production applications, including video editing, scratch or damage video repair, logo or watermark removal in broadcast videos, etc[1-4]. The goal of video inpainting is to fill missed regions of a given video sequence with spatially and temporally consistent results. More challengeable than image inpainting in which only spatial consistency need be considered, improving aforementioned consistency requires us to not only exploit spatial contexts but also attach importance to exploiting the contents from nearby frames. To this end, solving the temporal misalignment problem for videos plays a dominant role.

    For video data, temporal motions are complex due to local human or objects motions, global camera motions, and other environmental dynamics. The previous works which target video tasks such as video super-resolution and video stabilization[5-8]achieve promising results by explicitly taking advantage of motion information including optical flow, motion vector, homography, etc. Despite this, aligning features of adjacent frames in video inpainting is more challenging than other video tasks due to the pixel absence in the hole areas. Because the missed contents introduce noises and make the motion information not reliable as well.

    For the task of video inpainting, several temporal alignment solutions have been investigated, including patch matching methods, motion-based methods, 3D convolutional neural networks and attention-based neural networks. ①Patch matching methods select the most similar patch from adjacent frames for temporal coherence[9-12]. These methods may give rise to block artifact in the scenario with complex textures. ②Motion-based methods estimate motion information for copying corresponding contents from nearby frames, including optical flow and homography. Some works compute optical flows between two adjacent frames directly, which fail to provide precise flow prediction in the absence of pixel information[13-15]. A state-of-the-art optical flow-based work uses a sequence of flow maps from consecutive frames to complete the target optical flow[16]. However, the dense computation of pixel-level flow is a very time-consuming operation such that the efficiency is still under-explored. Homography-based methods use global affine transformation matrices to align two frames[13,17]. They only work well over plane motions or motions caused by camera rotations. ③3D convolutional neural networks use 3D filters to convolve the features from reference frames to target frame[18-20]. They have limited window size and suffer from high computation cost. ④Attention-based neural networks compute similarities between the hole boundary pixels in the target and the non-hole pixels in the references[21]via attention module. They are unstable because the context information of the hole boundary are insufficient for similarity computation. So far, an efficient and accurate temporal alignment solution for video inpainting still remains under-investigated.

    In addressing the temporal alignment problem, we propose to introduce flexible shape-adaptive mesh as basic processing unit and mesh flow as motion representation. We suggest that, in the case of videos with holes, mesh flow is more efficient and effective than other motion representations. First, mesh flow can represent more complex and coherent motions than homography and motion vector because it describes the motion trajectory of each pixel with multi-parameter model. Second, mesh flow can represent more accurate and robust motions in the hole area than optical flow because it can introduce richer spatial context information. Third, mesh flow is more computationally efficient than optical flow because it is a sparse motion field. More detailed theoretical analysis is illustrated in Section 3.1.

    To take advantage of mesh flow towards more effective and efficient video inpainting, we propose a Mesh Oriented Video Inpainting nEtwork, called MOVIE, which consists of a sequential mesh flow estimation network and a polishing network. Since computing the mesh flow based on frames directly is unreliable and easily cause misalignments due to the existence of the holes in video, we design two sequential modules in the sequential mesh flow estimation network. Specifically, the mesh flow estimation module predicts mesh flows for the visible contents of the frames to guarantee the accuracy of the computed motions. Then the mesh flow completion module completes the mesh flow in the hole regions of the target frame by learning from a sequence of adjacent mesh flows. We design a hybrid loss function to optimize the flow estimation performance for the visible regions, the entire frames and the inpainted regions respectively, and train the sequential mesh flow estimation network in an end-to-end and self-supervised manner. For the polishing network, we align the frames with the estimated mesh flows in a propagate manner and feed them into the network for further refinement. The polishing network is trained to correct the distortion of the hole regions of each frame caused by mesh flow transformation. Experimental results reveal the superior of our method than the state-of-the-art schemes.

    We summarize our contributions as follow:

    (Ⅰ) We are the first one to propose to take advantage of mesh flows as motion representations in addressing the misalignment problem for video inpainting and demonstrate its superiority in both effectiveness and efficiency compared to other motion representations.

    (Ⅱ) We propose a simple yet effective model and a hybrid loss function to better estimate mesh flows for the video with holes and better take advantage of the mesh flow for video inpainting.

    (Ⅲ) We evaluate our method on various challenging videos and demonstrate our proposed approach can achieve impresive improvements in both effectiveness and efficiency compared to the state-of-the-art approaches.

    2 related work

    Recent years have witnessed remarkable progress in video inpainting by deep learning-based approaches. In this section, we provide an overview of the literature in terms of alignment techniques given their effectiveness in handling temporal consistency. We summarize the existing methods on video inpainting into four categories: patch matching methods, motion-based methods, 3D convolutional neural networks and attention-based neural networks.

    2.1 Patch matching methods

    Early works are mainly patch-based optimization methods[9-12]. They split each frame of the video into small patches and recover the hole region by pasting the most similar patch from other frames in the video. Since the patch-based alignment only describe simple translation motion for each patch, the block effects of the completed videos are obvious. Furthermore, the computation of patch similarity suffer from large search space, which make the completing process extremely slow.

    2.2 Motion-based methods

    Motion-based methods propose to estimate motion information first, then warp the content from nearby frames to target frame. The motion representations have been investigated in deep learning-based approaches are optical flow and homography.

    Optical flowThe widely used motion representation in deep learning based video inpainting is optical flow, which describes per-pixel motions between frames for warping the visible content of reference frames to the hole area of target frame. Five optical flow related works have been published, which explore various strategies to exploit optical flow field information[13-16,22]. Three works estimated flows only between the adjacent frames[13-15]. Chang et al. proposed a selective scheme to combine an optical flow warping model and an image-based inpainting model[15]. Ding et al. considered two branch optical flows generated from images and deep features separately[13]. Woo et al. used the computed flow field between the previous completed frame and the target frame as an auxiliary model to enforce temporal consistency[13]. However, estimating optical flow on the hole region directly is easy to lead incorrect flow prediction. Further, Kim et al. proposes to estimate flows of feature maps between the source and five reference frames in multi-scales, and complete the hole based on the aggregation of five aligned features[22]. Despite consideration of long-range frames, the flows are still computed on the hole area, which address the above issue with little success. Xu et al. proposed a Deep Flow Completion network to complete optical flow by watching a sequence of flow maps from consecutive frames, which further used to guide the propagation of pixels to fill up the missing regions in the video[16]. This strategy can provide more accurate optical flow estimation than previous approaches. Despite its significant performance improvements, the dense computation of pixel-level flow is a very time-consuming operation such that the efficiency is still under-explored.

    HomographyTwo homography-based methods are proposed to predict global transformation parameters for aligning frames[13,17]. They both computed affine matrices between multiple reference frames and the target frame for the alignment, followed by an aggregation and refinement process.In the aggregation stage, Woo et al. proposed a non-local attention model to pick up the best matching patches in aligned frames[13], Lee et al. proposed a context matching module to assign weights for each aligned frames[17]. However, homography cannot be used to describe complex motions, limiting its application. Besides, to ensure long time dependency, they complete current frame by visiting the reference frames over a long-range distances, even the whole video shot. This strategy will result in intensive computational cost despite the simplicity of homography.

    2.3 3D convolutional neural network

    Several 3D convolutional networks are proposed to use 3D filters to convolve the features from reference frames to target frame, which equivalent to temporal alignment[18-20]. Wang et al. proposed a 3D-2D encoder-decoder network, which uses the output from 3D completion network to guide the 2D completion network[18]. Chang et al. did video inpainting with 3d gated convolutional and temporal patch discriminator[18], and further introduced a learnable gated temporal shift module to replace the computation-intensive 3D convolutional layer, which leads to a 3x reduction in computation[20]. However, the computation cost is still heavy. The temporal window size is limited in these 3D convolutional-based methods, hence they lack the ability to handle long time dependency challenge.

    2.4 Attention-based neural network

    Since attention module can be used for feature matching, Oh et al. proposed an asymmetric attention block to compute similarities between the hole boundary pixels in the target and the non-hole pixels in the references in a non-local manner[21]. The results are unstable regarding the complex situation and small dimension of the hole boundary.

    3 Mesh oriented video inpainting network

    3.1 Representations of motion information

    Towards better understanding of mesh flow, we first formulate the concept of mesh flow and compare it with other motion representations.

    Given a target frameT(x,y) and a reference frameR(x,y), we aim to find mapping functionsx′=f(x,y) andy′=g(x,y) to minimize the following objective function:

    E=d{T(x,y),R(x′,y′)}

    (1)

    To compute mesh flow, we first need to partition the frame into non-overlapped regular blocks and treat each block as a basic processing unit mesh. Each mesh is a flexible shape-adaptive quadrilateral and can be arbitrarily transformed according to its four vertices. Then we compute motions of the pixels at the vertices of the mesh quadrilaterals, and interpolate the motions of other pixels based on the motions of the vertexes with bilinear interpolation kernel. The model of the motions of the vertices in the mesh quadrilaterals can be expressed as followed:

    (2)

    wherevdenotes the vertices in the mesh quadrilaterals.

    Mesh flow possesses several characteristics: ①mesh flow describes the motion trajectory of each pixel with multi-parameter model, namely, the motion of each pixel is computed with four nodal motions in its belonged mesh quadrilateral, ②mesh flow represents coherent motion trajectories across mesh quadrilaterals (see Figure 2), ③ mesh flow computes the motions of the vertices according to their four belonged quadrilaterals, which can introduce rich context information for each vertice, ④mesh flow is a sparse motion field.

    Figure 1. Comparison of the state-of-the-art methods in term of quality and speed on 37 video sequences with 2139 frames in total.

    Figure 2. Motion representations, including homography, optical flow, motion vector and mesh flow.

    Homographydescribes affine mapping between two images, which is formulated as below:

    (3)

    where all the pixel share the same affine transformation parametersa0,a1,a2,b0,b1,b2.

    Since homography can only represents camera moving over a stationary scene, it cannot handle complex scenarios with multi-object motions.

    Optical flowcomputes motion information in pixel-level, the motion model of each pixel can be expressed as followed:

    (4)

    It is worth noting that what we need are the motion trajectories of the pixels in the hole area. While optical flow cannot introduce rich context information for these pixels as mesh flow, the optical flow estimation in the hole area is not reliable. In addition, optical flow is a dense motion field, which makes the flow estimation computation-intensive and time-consuming.

    Motion vectordescribes motion information in a patch-based manner. Since realistic motion in a patch may be more complicated than translation, motion vector cannot provide precise motion representation. Further, applying motion vectors as the description of motion information is easy to render blocking effect when predicting image due to discontinuity across block boundary (see Figure 2). In comparison, mesh flow can represent more complex non-linear motion transformation and produce more continuous results.

    Overall, mesh flow is more efficient and effective compared with other motion representations for the video inpainting task, in which the video has holes.

    3.2 Sequential mesh flow estimation network

    Given a sequence of frames {It|t=1,…,n} with holes {Ht|t=1,…,n}, our goal is to estimate the meshes {Mt|t=1,…,n-1} between frames wherein we also need take into account corrupted contents (corresponding to the holes).

    Since the spatial information is not sufficient due to the holes in video, computing the meshes based on frames directly is unreliable and easily cause misalignments. Thus, we propose to estimate the mesh flow of each entire frame in a sequential manner. The first step is to compute the mesh flow corresponding to visible regions of frames. Afterwards, we estimate the mesh flow corresponding to the corrupted parts (i.e., holes) based on adjacent mesh flows. The framework is illustrated in Figure 3.

    Figure 3. Overview of the sequential mesh estimation network. The network consists of two modules: mesh estimation module and mesh completion module. The mesh estimation module predicts meshes for the visible regions of the frames, and the mesh completion module completes the missing area of mesh of the target frame by learning from a sequence of adjacent meshes.

    3.3 The hybrid loss function for the sequential mesh flow estimation network

    We optimize the sequential mesh flow estimation network in a self-supervised manner. Firstly, we upsample the mesh flows to pixel-level flow maps by interpolating from nodal motions with bilinear interpolation kernel. Then we align the frames based on the pixel-level flow maps, and optimize for minimizing the L1distance between the target frames and the aligned frames.

    For the first mesh flow estimation module, we adopt the L1loss for visible regions. Since (2×N+1) mesh flows are needed for completing one mesh flow, we compute the average value of these (2×N+1) L1distances.

    L1(visible region)=

    (5)

    whereIdenotes the input frame,Hdenotes the holes,ω′ denotes the warping function with the predicted mesh flowθ′.

    For the second mesh flow completion module, we propose two L1loss functions, loss for the entire frame and loss for the inpainted regions.

    (6)

    whereIdenotes the input frame,Hdenotes the holes,ω″ denotes the warping function with the final completed mesh flowθ″.

    In summary, the hybrid loss function of the sequential mesh flow estimation network is as follows:

    L=L1(visible region)+L1(the frame)+L1(inpainted region)

    (7)

    3.4 Polishing network

    We illustrate the procedure of the polishing network in Figure 4. Given the mesh flows estimated by the sequential mesh flow estimation network, we perform the alignment procedure between two frames as described in section 3.3. To get the final aligned results, we perform the alignment operations for the frames (with the holes) pair by pair from the first two frames to the last two frames, and then repeat the same procedure backwardly. Then we concatenate the aligned frames and holes with the input frames and holes, and feed them into a residual block-based polishing network to generate the final refined output.

    We illustrate the procedure of the polishing network in Figure 4. The whole procedure has three steps: forward propagation, backward propagation and refinement. In the first step, we first warp the visible content of the first frame to the hole of the second frame with mesh flow, and update the mask of the second frame. We do the same warping operation one by one from the first frame to the last frame. In the second step, we do the same procedure as the first step backwardly. In the final step, we concatenate the aligned frames and holes with the input frames and holes, and feed them into a residual block-based polishing network to generate the final refined output.

    Figure 4. The process of video polishing. We first use the estimated mesh flows to warp the contents in a propagation manner forwardly, and then repeat the same procedure backwardly. Finally, we concatenate the aligned frames and holes with the input frames and holes, and feed them into a residual block-based polishing network to generate the final refined output.

    Figure 5. Video sequence for quantitative comparisons. The masks are selected from other videos in the DAVIS dataset.

    Figure 6. Qualitative results compared with the state-of-the-art methods on DAVIS dog-agility video sequence. Our method can complete the white and red pole with more precise and coherent structure.

    Figure 7. Qualitative results compared with the state-of-the-art methods on DAVIS motocross-bumps video sequence. Our method can complete the two white-stripe warning lines more continuously.

    We design two loss functions specific to the inpainted region and the entire frame respectively to train the polishing network. Among them, L1based loss function is designed for refining adopted contents in the hole regions of the frame, while the adversarial loss[24]is designed for making the completed contents more realistic and more consistent with visible regions.

    (8)

    In summary, the total loss of the polishing network is as follows:

    L=L1(inpainted region)+Ladv

    (9)

    3.5 Implementation detail

    We train our method in two stage, first the sequential mesh flow estimation network, then the polishing network. The two models both run on hardware with the Intel(R) Xeon(R) CPU E5-2620 v4 and GeForce GTX 1080Ti GPUs.

    We train the sequential mesh flow estimation network on 3471 videos in the YouTubVOS[25]dataset. For each sample, we select 12 frames with random frame step between 1 to 5 in one video. To collect the hole mask, we use the irregular mask dataset provided by an image inpainting work PartialConv[26], which contains 12000 mask files. We further augment the mask dataset to 480000 mask files by performing random translation, flipping and rotation. During training, we randomly select 12 masks for each sample.

    We use the Adam Optimizer withβ=(0.9, 0.999) and learning rates =10-4. Using one GeForce GTX 1080Ti GPU, the early convergency takes about 8 hours, and the final convergency takes about one week (the PSNR improves about 1dB).

    Then we build training data for the polishing network by performing temporal alignment operations on the 3471 videos of the YouTubeVOS dataset. For each video, the alignment operation propagates forwardly and backwardly with the output of the sequential mesh flow estimation network. The random selected masks are also saved with the aligned results.

    We use the Adam Optimizer withβ=(0.9, 0.999) and learning rates =10-4. Using one GeForce GTX 1080Ti GPU, the early convergency takes about 2 hours, and the final convergency takes about 3 days (the PSNR improves about 1dB).

    4 Experiments

    4.1 Evaluation datasets

    To demonstrate the qualitative and quantitative performance of our proposed method MOVIE, we evaluate it on DAVIS[27,28]dataset, which consists of pixel-wise foreground object annotation.

    For qualitative evaluation, we test on several video sequences with large motions and use the labeled pixel-wise foreground objects as holes. For quantitative evaluation, we randomly select 37 video sequences with 2139 frames in total in DAVIS dataset. Since the ground-truths of removed regions are not available while removing the objects directly in the video, we randomly select a mask sequence for each video from other videos in the DAVIS dataset. Figure 5 shows two samples of the test sequences, which contain large foreground objects in the hole region and the motions are complicated. We report the evaluation in terms of PSNR and SSIM, which are commonly used in video inpainting tasks. We also conduct ablation studies on these 37 video sequences. Inference speed is computed on a NVIDIA GTX 1080 Ti GPU for frames of 256x256 pixels.

    4.2 Baselines

    We compare our approach with the state-of-the-art approaches which can be categorized as follows:

    Patch-basedPatch-based[11]completes the hole of a video via patch-based similar matching.

    Optical flow-basedDVI[22]is a two-frame optical flow-based method, and Flow-guided[16]uses a sequence of flow maps from consecutive frames to complete the target optical flow.

    Homography-basedCPnet[17]does temporal alignment by computing affine matrices between two frames.

    3D convolution-basedGateTSM[20]and 3DGatedConv[18]both introduce new modules in 3D convolution layers for better performance and faster computation speed.

    Attention-basedOnionPeel[21]uses an asymmetric attention block to compute similarities between the hole boundary pixel in the target and the non-hole pixels in the references in a non-local manner.

    4.3 Qualitative results

    We illustrate the qualitative results in Figures 6 and 7, where the video sequence is a large motion video. As shown in the Figures, our method can complete the white and red pole with more precise and coherent structure. Our method is able to deal with the complicated situations, while the state-of-the-art methods have limitations in inpainting consistent results in which obvious artifacts can be observed. We illustrate the qualitative results in Figures 6 and 7, where these two video sequences are both large motion videos. As shown in the Figures, our method can complete the white and red pole with more precise and coherent structure (in Figure 6), and can complete the two white-stripe warning lines more continuously (in Figure 7). Our method is able to deal with these complicated situations, while the state-of-the-art methods have limitations in inpainting consistent results in which obvious artifacts can be observed.

    The performance improvement would come from two aspects: the superiority of mesh flow and the design of the sequential mesh flow estimation network. Mesh flow can represent complex non-linear motion transformations and coherent motion trajectories across mesh quadrilaterals. Further, the design of the sequential mesh flow estimation network can guarantee the precise estimation of the mesh flow in the hole areas of the video.

    4.4 Quantitative results

    We show the quantitative results in Table 1. Our method produces significant improvement (more than 1dB PSNR) over the current state-of-the-arts on the challenging datasets which contain complex motions from foreground objects, and show speedups of up to about 4x against the fastest method and more than 20x against the best method.

    Table 1. Quantitative comparison with the state-of-the-art methods. Our method produces signicant improvement (more than 1dB PSNR) over the current state-of-the-art methods, and show speedups of up to about 4x against the fastest method and more than 20x against the best method.

    The reconstruction values demonstrate that several methods fail to produce high reconstructed values due to the complex foreground motions in the hole region, including patch-based method[11], two-frame optical flow-based method DVI[22], homography-based method CPnet[17], 3D convolution-based methods GateTSM[20]and 3DGatedConv[18]. Their weaknesses have been analyzed in detail in section related work. Further, Flow-guided[16]and OnionPeel[21]can inpaint with higher PSNR but not reliable. Flow-guided[16]is easily affected by the noise introduced by the missed contents while estimates motion trajectories of the hole area in pixel-level. OnionPeel[21]is failed due to the small dimension of the hole boundary which used for computing similarities.

    In comparison, our method can handle complex motions more precisely and efficiently. The results show that mesh flow can be used to provide more precise temporal alignment with the well-designed sequential mesh flow estimation network. Meanwhile, our method is a computationally efficient solution.

    4.5 Ablation study on the sequential mesh flow estimation network

    In this section, we conduct a series of ablation studies to analyze the effectiveness of each component in the sequential mesh flow estimation network. Quantitative analyses are conducted on 37 video sequences described in Section 4.1 whose mask sequences are selected from other videos. We train each model in 50,000 iterations and optimize the models in the same training settings for fair compirason. The training process takes about 8 hours.

    (Ⅰ) The effectiveness of the sequential mesh flow estimation: Our model estimates mesh flows in a sequential manner: first estimates mesh flows for visible contents of the frames, then completes the mesh flows of hole areas by learning from the adjacent mesh flows. To analyze the effectiveness of this sequential strategy, we compare with a direct mesh flow estimation model which estimates the mesh flows directly given a sequence of frames and holes. As illustrated in Figure 8, estimating mesh flow in sequential manner can generate more accurate mesh flows, while estimating mesh flows directly totally fails.

    Figure 8. Ablation study on the effectiveness of the sequential mesh flow estimation strategy.

    Figure 9. Ablation study on the hybrid loss function in the sequential mesh flow estimation network. The symbols in the left-top corner of each frame represent the three loss functions respectively: L1 loss for visible region, L1 loss for the entire frame, L1 loss for inpainted region.

    (Ⅱ) Ablation study on mesh size: As shown in Table 2, we analyze the influence of mesh size.

    Table 2. Ablation study on different mesh sizes.

    Table 3. Ablation study on number of references.

    The results show that setting the mesh size to 8 can achieve better performance. The mesh size larger than 8 may fail to describe the complex motions in the holes. And the mesh size smaller than 8 cannot exploit sufficient information for aligning frames.

    (Ⅲ) Ablation study on number of references: We further analyze the influence of number of references. Note that the number of mesh flows larger than 12 will lead to out-of-memory error, hence we set the number of mesh flows up to 10. As shown in Table 3, larger number can lead to better performance.

    (Ⅳ) Ablation study on the hybrid loss function: To evaluate the hybrid loss function of the sequential mesh flow estimation network, we train the model in different combination settings of the three loss functions in the hybrid loss function. As shown in Table 4, each of the three loss functions make positive contribution to the final performance.

    Table 4. Ablation study on the hybrid loss function of the sequential mesh flow estimation network.

    Table 5. Ablation study on Polishing Network. The polishing network can achieve 1dB PSNR improvement.

    The aligned results are illustrated in Figure 9. Specifically, the two results in the left column indicate that the results are totally failed without L1 loss for the entire frame. The comparison between the results in the right column and the middle column shows that L1loss for inpainted region can lead to more consistent texture.

    Figure 10. Ablation study on the polishing network. The polishing network can smooth the artifact of the result and make it more visually plausible.

    And the effectiveness ofL1 loss for visible region is illustrated in the comparison between the two results in the right column.

    4.6 Ablation study on the polishing network

    In this section, we conduct ablation study regarding the polishing network. The results in Table 5 indicate that the polishing network can achieve 1dB PSNR improvement. Figure 10 shows that the polishing network can smooth the artifact of the result and make it more visually plausible.

    5 Conclusion

    In this paper, we propose an efficient and effective method for video inpainting. In essence, our main idea is to introduce mesh flow as a more proper representation of motion information so as to better target the temporal misalignment problem for video inpainting. Specifically, We design a sequential mesh flow estimation network which firstly predicts mesh flow only for visible regions of frames, then completes the holes of mesh flow by learning from the adjacent mesh flows. We further design a polishing network to polish the aligned results. Experiment results show that our method yields more promising results with higher inpainting quality in both quantitative and perceptual metrics, and achieves four-time speed-up at least in completing the missing area.

    无人区码免费观看不卡| 欧美不卡视频在线免费观看 | 亚洲成a人片在线一区二区| 久久这里只有精品19| 国产亚洲欧美98| 色尼玛亚洲综合影院| 夜夜躁狠狠躁天天躁| 真人做人爱边吃奶动态| 久热爱精品视频在线9| 亚洲精品中文字幕在线视频| 此物有八面人人有两片| 色综合亚洲欧美另类图片| 亚洲熟妇熟女久久| 69av精品久久久久久| 亚洲av成人不卡在线观看播放网| 免费少妇av软件| 纯流量卡能插随身wifi吗| 久久人人97超碰香蕉20202| 亚洲国产精品sss在线观看| 精品国产美女av久久久久小说| 在线永久观看黄色视频| 亚洲狠狠婷婷综合久久图片| 十八禁人妻一区二区| 久99久视频精品免费| 无人区码免费观看不卡| 91大片在线观看| 欧美最黄视频在线播放免费| 激情在线观看视频在线高清| 99久久久亚洲精品蜜臀av| 国产男靠女视频免费网站| 亚洲国产毛片av蜜桃av| 国产精品一区二区三区四区久久 | 好看av亚洲va欧美ⅴa在| 国产精品一区二区精品视频观看| 午夜精品国产一区二区电影| 亚洲成人国产一区在线观看| 制服丝袜大香蕉在线| 一级,二级,三级黄色视频| 国产色视频综合| e午夜精品久久久久久久| 国产精品永久免费网站| 欧美日本视频| 精品一区二区三区视频在线观看免费| 精品久久久久久,| 黄片小视频在线播放| 香蕉丝袜av| a级毛片在线看网站| 一进一出抽搐动态| 黄色丝袜av网址大全| 亚洲中文日韩欧美视频| 最近最新中文字幕大全电影3 | 国产成人一区二区三区免费视频网站| www.熟女人妻精品国产| 91九色精品人成在线观看| 国产亚洲精品久久久久久毛片| 黑人巨大精品欧美一区二区蜜桃| 亚洲男人天堂网一区| 叶爱在线成人免费视频播放| 国产精品美女特级片免费视频播放器 | 很黄的视频免费| 可以免费在线观看a视频的电影网站| 色尼玛亚洲综合影院| 精品久久久久久久人妻蜜臀av | 午夜日韩欧美国产| 看免费av毛片| 午夜激情av网站| 纯流量卡能插随身wifi吗| 99re在线观看精品视频| 亚洲av成人av| 一级片免费观看大全| 后天国语完整版免费观看| 国产单亲对白刺激| 啦啦啦韩国在线观看视频| 岛国在线观看网站| 免费看a级黄色片| 欧美午夜高清在线| 日日干狠狠操夜夜爽| 国产欧美日韩综合在线一区二区| 亚洲一区二区三区不卡视频| ponron亚洲| 欧美激情极品国产一区二区三区| aaaaa片日本免费| 国产伦一二天堂av在线观看| 美女大奶头视频| 亚洲精品中文字幕一二三四区| 中文字幕色久视频| 一级黄色大片毛片| 变态另类丝袜制服| 十分钟在线观看高清视频www| 久久精品亚洲精品国产色婷小说| 欧美黑人精品巨大| 亚洲一卡2卡3卡4卡5卡精品中文| 搡老熟女国产l中国老女人| 午夜福利18| 欧美日韩乱码在线| 日本在线视频免费播放| 老司机靠b影院| 19禁男女啪啪无遮挡网站| 国产国语露脸激情在线看| 看黄色毛片网站| 涩涩av久久男人的天堂| 欧美成人性av电影在线观看| 两个人看的免费小视频| 波多野结衣高清无吗| 极品教师在线免费播放| 多毛熟女@视频| 亚洲 欧美 日韩 在线 免费| www日本在线高清视频| 人人妻,人人澡人人爽秒播| 免费观看人在逋| 欧美成人免费av一区二区三区| 亚洲一区中文字幕在线| 深夜精品福利| 日本撒尿小便嘘嘘汇集6| 国产伦一二天堂av在线观看| 国产精品,欧美在线| 精品欧美国产一区二区三| 中文字幕av电影在线播放| 亚洲电影在线观看av| 久久久久久人人人人人| 日韩欧美免费精品| 高清在线国产一区| 久久久久久人人人人人| 免费女性裸体啪啪无遮挡网站| 日韩欧美在线二视频| 国产精品1区2区在线观看.| 宅男免费午夜| 亚洲欧美激情综合另类| 天天一区二区日本电影三级 | 日本在线视频免费播放| 12—13女人毛片做爰片一| 国产午夜精品久久久久久| 国内精品久久久久久久电影| 日日夜夜操网爽| 亚洲av成人不卡在线观看播放网| 久久精品91无色码中文字幕| 一级毛片高清免费大全| 亚洲精品久久成人aⅴ小说| 日韩高清综合在线| 国产成人精品久久二区二区免费| 久久亚洲精品不卡| 别揉我奶头~嗯~啊~动态视频| 丝袜美足系列| 国产蜜桃级精品一区二区三区| 真人一进一出gif抽搐免费| 久久青草综合色| 久久人人97超碰香蕉20202| 精品免费久久久久久久清纯| xxx96com| 午夜福利18| 久久久久久久久久久久大奶| 国内精品久久久久精免费| 好男人电影高清在线观看| 欧美日韩乱码在线| www.熟女人妻精品国产| 精品人妻1区二区| 高清黄色对白视频在线免费看| 国产成人啪精品午夜网站| 国产精品1区2区在线观看.| 国产成年人精品一区二区| 午夜精品久久久久久毛片777| 麻豆一二三区av精品| 亚洲国产精品sss在线观看| 亚洲黑人精品在线| 国产成人av激情在线播放| 18禁观看日本| 免费女性裸体啪啪无遮挡网站| 久久久久精品国产欧美久久久| 国产午夜福利久久久久久| 国产精品九九99| 欧美中文日本在线观看视频| 动漫黄色视频在线观看| 免费女性裸体啪啪无遮挡网站| av天堂久久9| 亚洲国产欧美日韩在线播放| 国产成人精品无人区| 成人18禁高潮啪啪吃奶动态图| ponron亚洲| 满18在线观看网站| 午夜日韩欧美国产| 国产高清激情床上av| 国产成人影院久久av| 亚洲欧洲精品一区二区精品久久久| 手机成人av网站| 男女做爰动态图高潮gif福利片 | 国产人伦9x9x在线观看| 久久精品91蜜桃| 国产成年人精品一区二区| 一边摸一边抽搐一进一小说| av天堂久久9| 久久久久久免费高清国产稀缺| 国产野战对白在线观看| 国产亚洲精品久久久久5区| 久久久久国产一级毛片高清牌| 757午夜福利合集在线观看| 男女下面进入的视频免费午夜 | 无遮挡黄片免费观看| 精品免费久久久久久久清纯| 亚洲欧美精品综合久久99| 国语自产精品视频在线第100页| 国产欧美日韩一区二区三| 大香蕉久久成人网| 久久久久久久久免费视频了| 国产欧美日韩一区二区三区在线| 亚洲性夜色夜夜综合| 两人在一起打扑克的视频| 色播在线永久视频| 国产xxxxx性猛交| 亚洲一码二码三码区别大吗| 精品日产1卡2卡| 超碰成人久久| 亚洲第一av免费看| 成人亚洲精品一区在线观看| 午夜老司机福利片| 精品高清国产在线一区| 99久久久亚洲精品蜜臀av| 亚洲中文av在线| 99久久国产精品久久久| 日日夜夜操网爽| 在线观看舔阴道视频| 97碰自拍视频| 天堂√8在线中文| 欧美日韩福利视频一区二区| 亚洲中文日韩欧美视频| 在线永久观看黄色视频| 狂野欧美激情性xxxx| 夜夜看夜夜爽夜夜摸| 久久精品人人爽人人爽视色| 黄片播放在线免费| 国产一区在线观看成人免费| 国产精品美女特级片免费视频播放器 | 国产精品99久久99久久久不卡| АⅤ资源中文在线天堂| 亚洲精品国产色婷婷电影| 黄网站色视频无遮挡免费观看| 国产成人精品在线电影| 精品第一国产精品| 久久亚洲精品不卡| 99久久综合精品五月天人人| 精品人妻1区二区| 大香蕉久久成人网| 亚洲专区国产一区二区| 男女之事视频高清在线观看| 黑人操中国人逼视频| 男女下面进入的视频免费午夜 | www国产在线视频色| 每晚都被弄得嗷嗷叫到高潮| 丝袜人妻中文字幕| 757午夜福利合集在线观看| 琪琪午夜伦伦电影理论片6080| 欧美中文综合在线视频| 97碰自拍视频| 青草久久国产| 亚洲精品一卡2卡三卡4卡5卡| 国产精品精品国产色婷婷| 久久国产精品人妻蜜桃| 男人舔女人下体高潮全视频| 十分钟在线观看高清视频www| 老司机靠b影院| 欧美老熟妇乱子伦牲交| 男人的好看免费观看在线视频 | 嫩草影院精品99| 久9热在线精品视频| 国产熟女xx| 免费少妇av软件| 老鸭窝网址在线观看| 九色国产91popny在线| 色哟哟哟哟哟哟| 欧美激情高清一区二区三区| 久久久久精品国产欧美久久久| 天天躁夜夜躁狠狠躁躁| 精品免费久久久久久久清纯| 十分钟在线观看高清视频www| 不卡av一区二区三区| 亚洲成人精品中文字幕电影| 国产av精品麻豆| 欧美另类亚洲清纯唯美| 国产视频一区二区在线看| 精品乱码久久久久久99久播| 亚洲五月婷婷丁香| 怎么达到女性高潮| 日韩欧美国产一区二区入口| 搡老妇女老女人老熟妇| 成人18禁在线播放| 夜夜看夜夜爽夜夜摸| videosex国产| 国产单亲对白刺激| 午夜激情av网站| 91精品三级在线观看| e午夜精品久久久久久久| 成人av一区二区三区在线看| 亚洲成av人片免费观看| 一边摸一边做爽爽视频免费| 久久精品成人免费网站| 久久国产精品影院| 亚洲国产欧美一区二区综合| 午夜激情av网站| 日本a在线网址| 韩国av一区二区三区四区| 啦啦啦 在线观看视频| xxx96com| 国产成年人精品一区二区| 高清在线国产一区| 日韩精品免费视频一区二区三区| 中出人妻视频一区二区| 久久国产精品男人的天堂亚洲| 免费不卡黄色视频| 亚洲伊人色综图| 欧美日韩福利视频一区二区| 999久久久国产精品视频| 制服诱惑二区| 亚洲中文字幕一区二区三区有码在线看 | 午夜老司机福利片| 纯流量卡能插随身wifi吗| a在线观看视频网站| 欧美激情久久久久久爽电影 | 国产精品亚洲美女久久久| videosex国产| 亚洲av片天天在线观看| 亚洲男人天堂网一区| 国产人伦9x9x在线观看| 久久中文看片网| 日韩欧美在线二视频| 国产一区在线观看成人免费| 大码成人一级视频| 亚洲一区二区三区色噜噜| 一区二区三区精品91| 国产精品一区二区精品视频观看| 麻豆av在线久日| 女性被躁到高潮视频| 久久中文字幕人妻熟女| 无限看片的www在线观看| 国产av精品麻豆| 在线国产一区二区在线| 国产精品爽爽va在线观看网站 | 美女国产高潮福利片在线看| 亚洲黑人精品在线| 变态另类成人亚洲欧美熟女 | 国产精品综合久久久久久久免费 | 亚洲五月色婷婷综合| 免费观看人在逋| 黄色a级毛片大全视频| 国产不卡一卡二| 久久久国产成人免费| 亚洲人成网站在线播放欧美日韩| 成在线人永久免费视频| 丰满人妻熟妇乱又伦精品不卡| 神马国产精品三级电影在线观看 | 精品一品国产午夜福利视频| aaaaa片日本免费| 国产成人欧美| av视频在线观看入口| 久久国产亚洲av麻豆专区| 一进一出好大好爽视频| 欧美丝袜亚洲另类 | 国产成人欧美在线观看| 一二三四在线观看免费中文在| 久久香蕉国产精品| 欧美另类亚洲清纯唯美| 久久久久久人人人人人| 国产精品久久久久久人妻精品电影| 啦啦啦韩国在线观看视频| av在线播放免费不卡| 后天国语完整版免费观看| 搞女人的毛片| 中文字幕人妻丝袜一区二区| 久久久国产成人精品二区| 动漫黄色视频在线观看| 免费女性裸体啪啪无遮挡网站| 中文字幕久久专区| 久久人人精品亚洲av| 欧美成狂野欧美在线观看| 在线十欧美十亚洲十日本专区| 99国产综合亚洲精品| 国产亚洲精品一区二区www| 国产精品av久久久久免费| av电影中文网址| 欧美日韩亚洲综合一区二区三区_| 国产亚洲精品综合一区在线观看 | 亚洲精品美女久久久久99蜜臀| 亚洲情色 制服丝袜| www.999成人在线观看| 亚洲在线自拍视频| 久久久水蜜桃国产精品网| 久久热在线av| 国产精品一区二区三区四区久久 | 两性午夜刺激爽爽歪歪视频在线观看 | 999久久久精品免费观看国产| 日韩大码丰满熟妇| 一边摸一边做爽爽视频免费| 国产欧美日韩一区二区精品| 亚洲专区中文字幕在线| 日韩一卡2卡3卡4卡2021年| 韩国精品一区二区三区| e午夜精品久久久久久久| 亚洲国产精品999在线| 老司机福利观看| 国产高清激情床上av| 国产人伦9x9x在线观看| 久久精品影院6| 真人做人爱边吃奶动态| 少妇裸体淫交视频免费看高清 | 精品久久久久久成人av| 国产精品,欧美在线| 国产成人av激情在线播放| 久久精品影院6| 一区二区三区国产精品乱码| 叶爱在线成人免费视频播放| 国产高清激情床上av| 亚洲欧洲精品一区二区精品久久久| 国产在线观看jvid| 女警被强在线播放| 日本免费一区二区三区高清不卡 | 午夜福利一区二区在线看| 丰满人妻熟妇乱又伦精品不卡| 欧美激情 高清一区二区三区| 国产精品一区二区免费欧美| 欧美日韩中文字幕国产精品一区二区三区 | 夜夜看夜夜爽夜夜摸| 久久精品成人免费网站| 每晚都被弄得嗷嗷叫到高潮| 亚洲成a人片在线一区二区| 97人妻精品一区二区三区麻豆 | 淫妇啪啪啪对白视频| 天堂√8在线中文| 久久久久久久久中文| 极品教师在线免费播放| 一边摸一边抽搐一进一出视频| 电影成人av| 一二三四在线观看免费中文在| 91国产中文字幕| 日本免费一区二区三区高清不卡 | 亚洲视频免费观看视频| 精品国产乱子伦一区二区三区| 国产一区二区在线av高清观看| 亚洲中文日韩欧美视频| 亚洲av熟女| 国产视频一区二区在线看| 99在线视频只有这里精品首页| 午夜福利高清视频| 可以免费在线观看a视频的电影网站| 亚洲精品久久成人aⅴ小说| 久久国产精品人妻蜜桃| 久久久久久大精品| 一边摸一边抽搐一进一出视频| 99riav亚洲国产免费| 成人av一区二区三区在线看| 女人被狂操c到高潮| 婷婷六月久久综合丁香| 黄色a级毛片大全视频| 淫秽高清视频在线观看| 亚洲午夜精品一区,二区,三区| 999久久久精品免费观看国产| tocl精华| 精品人妻1区二区| 成人亚洲精品av一区二区| 宅男免费午夜| 色综合婷婷激情| 亚洲av日韩精品久久久久久密| av电影中文网址| 老鸭窝网址在线观看| 亚洲精品国产区一区二| 亚洲免费av在线视频| 日韩欧美一区视频在线观看| 淫妇啪啪啪对白视频| 亚洲精品粉嫩美女一区| 一本大道久久a久久精品| 一级,二级,三级黄色视频| 老司机靠b影院| 一级作爱视频免费观看| 免费少妇av软件| 久久影院123| 亚洲av日韩精品久久久久久密| 亚洲国产日韩欧美精品在线观看 | 在线播放国产精品三级| av有码第一页| 女性被躁到高潮视频| 国产精华一区二区三区| 国产精品久久久久久亚洲av鲁大| 欧美日韩亚洲国产一区二区在线观看| 中文字幕人妻丝袜一区二区| 国产精品亚洲美女久久久| 久久精品91无色码中文字幕| 999久久久国产精品视频| 亚洲一区高清亚洲精品| 欧美老熟妇乱子伦牲交| 欧美日本视频| 久久久久国内视频| 18禁观看日本| 欧美中文日本在线观看视频| 亚洲,欧美精品.| 成人精品一区二区免费| 一个人观看的视频www高清免费观看 | 久久人人97超碰香蕉20202| 久99久视频精品免费| 91成人精品电影| svipshipincom国产片| 黄色视频不卡| 不卡av一区二区三区| 亚洲五月婷婷丁香| 国产亚洲精品第一综合不卡| 国产精品永久免费网站| 久久久久久大精品| 黄色成人免费大全| 国产极品粉嫩免费观看在线| 日韩精品免费视频一区二区三区| 国产精品电影一区二区三区| 制服丝袜大香蕉在线| 亚洲中文字幕日韩| 女人精品久久久久毛片| 欧美老熟妇乱子伦牲交| 精品久久久精品久久久| 黄片大片在线免费观看| 国产成人欧美| 色综合欧美亚洲国产小说| 午夜福利,免费看| 好看av亚洲va欧美ⅴa在| 丝袜人妻中文字幕| 免费看a级黄色片| av天堂久久9| 精品久久久久久久毛片微露脸| 国产精品免费视频内射| 18禁黄网站禁片午夜丰满| 亚洲片人在线观看| 中文字幕av电影在线播放| 免费人成视频x8x8入口观看| 欧美一区二区精品小视频在线| 中文字幕色久视频| 人人妻人人爽人人添夜夜欢视频| 熟妇人妻久久中文字幕3abv| 久久午夜亚洲精品久久| 天天躁狠狠躁夜夜躁狠狠躁| 午夜免费鲁丝| 国产成人精品无人区| 久久久久久人人人人人| 日韩欧美在线二视频| 女性生殖器流出的白浆| 亚洲国产精品合色在线| 可以在线观看的亚洲视频| 午夜福利欧美成人| 亚洲 欧美一区二区三区| 18禁美女被吸乳视频| 色哟哟哟哟哟哟| 老司机福利观看| 黄色视频不卡| av免费在线观看网站| 女警被强在线播放| 免费av毛片视频| 色播在线永久视频| 亚洲五月婷婷丁香| 国产又爽黄色视频| 中文字幕久久专区| 久久亚洲真实| 午夜免费成人在线视频| 亚洲情色 制服丝袜| а√天堂www在线а√下载| 99久久久亚洲精品蜜臀av| 国产激情久久老熟女| 久久香蕉国产精品| 侵犯人妻中文字幕一二三四区| 极品人妻少妇av视频| 在线免费观看的www视频| 99国产精品一区二区三区| 亚洲一卡2卡3卡4卡5卡精品中文| 国产主播在线观看一区二区| 亚洲午夜理论影院| 高清毛片免费观看视频网站| 久久人人精品亚洲av| 国内精品久久久久精免费| 亚洲国产精品成人综合色| 日韩欧美国产一区二区入口| 最近最新中文字幕大全电影3 | 两性夫妻黄色片| 巨乳人妻的诱惑在线观看| 看片在线看免费视频| 丝袜人妻中文字幕| 亚洲国产精品999在线| 亚洲精品av麻豆狂野| 18美女黄网站色大片免费观看| 国产亚洲精品第一综合不卡| 一级毛片女人18水好多| 欧美在线一区亚洲| 日本五十路高清| 50天的宝宝边吃奶边哭怎么回事| 亚洲国产精品成人综合色| 99国产极品粉嫩在线观看| 国产在线精品亚洲第一网站| av视频在线观看入口| 97碰自拍视频| 精品久久久久久久久久免费视频| 男女下面插进去视频免费观看| 波多野结衣高清无吗| 黄片播放在线免费| 亚洲黑人精品在线| 亚洲精品一区av在线观看| 高潮久久久久久久久久久不卡| 母亲3免费完整高清在线观看| 男女午夜视频在线观看| 国产99久久九九免费精品| 欧美成人一区二区免费高清观看 | 人人澡人人妻人| 最近最新中文字幕大全免费视频| 此物有八面人人有两片| 91老司机精品| 亚洲国产欧美一区二区综合| 村上凉子中文字幕在线| 一级毛片精品| 欧美成人一区二区免费高清观看 | 欧美一级a爱片免费观看看 | 狠狠狠狠99中文字幕| 97碰自拍视频| 欧美一级a爱片免费观看看 | 男女下面插进去视频免费观看| 亚洲伊人色综图| 欧美最黄视频在线播放免费|