• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Research on Facial Expression Capture Based on Two-Stage Neural Network

    2022-11-11 10:45:34ZhenzhouWangShaoCuiXiangWangandJiaFengTian
    Computers Materials&Continua 2022年9期

    Zhenzhou Wang,Shao Cui,Xiang Wang,*and JiaFeng Tian

    1School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang,050000,China

    2School of Engineering,Newcastle University,Newcastle Upon Tyne,NE98,United Kingdom

    Abstract: To generate realistic three-dimensional animation of virtual character, capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams, and rich high-resolution heatmaps of facial landmarks are obtained.Finally, the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects, making online entertainment and social interaction more immersive in shared virtual space.

    Keywords: Facial expression capture; facial landmarks; multi-task cascaded convolutional networks;high-resolution network;animation generation

    1 Introduction

    Facial expression capture te chnology is a major research in computer graphics character animation and other fields.With the emergence of shared virtual space,synchronous avatar facial expression can convey human emotions and effective social experience, which plays a crucial role in digital entertainment, film and television animation industry and other fields [1].For example, movies likeAvatar, which is well known to all, use 3D facial expression capture and generation technology by computer to make the virtual character expression animation more realistic [2].Facial expression features are extracted based on artificial intelligence technology,so that the information endows the virtual character with authentic facial expressions and emotions.The interaction between users and information will contribute to the development of the metaverse,which is an important evolution of future social ways[3,4].

    Extracting effective landmarks from images is the core step of facial expression capture and the obtained landmarks can be used as the basic data of facial expression parameters [5].Recent studies [6,7] have shown that the technique is limited by expensive capture tools, complex model reconstruction of facial expression and the inaccuracy of extracting facial landmarks using traditional image processing.It is impossible to ensure the accuracy of facial landmarks because the facial features of human face have different scale types and make different degrees of expressions.In view of these problems,this paper focuses on how to improve the accuracy of facial landmarks based on deep learning technology about diverse facial expressions and complex background,and achieve high fidelity and real-time virtual character expression animation.

    Therefore,this paper proposes MTHR-Face model based on two-stage neural network for facial expression capture.Firstly, the real-time video is captured by the camera, and the target face is detected and aligned by the improved multi-task cascaded convolutional networks(MTCNN),so that a more accurate facial candidate window is obtained for landmarks recognition.Then combined with high-resolution network (HRNetV2-W18), the method realizes recognition of high resolution facial expression features and real-time tracking under different levels of expressions.Finally, the action units(AU)are extracted by regression of the recognized facial landmarks to capture facial expression,and the mapping relationship between AU and Blendshape expression bases is built to synchronize expression of the virtual character and generate animation.The overall process is shown in Fig.1.

    In summary,the contributions of this paper are as follows:

    1)An improved MTCNN model is proposed to efficiently extract face feature information from images and accurately detect faces.This model can eliminate complex background and align faces,thus further reducing the difficulty of facial landmarks recognition task.

    2)A two-stage neural network that improved MTCNN and HRNetV2-W18 is introduced to recognize 98 rich facial landmarks and capture expression changes.This method not only maintains the high resolution of the facial feature images in the whole process,but also makes more accurate landmarks recognition under different degrees of expressions.

    3)Experiments show that the performance of MTHR-Face method on WFLW database is significantly better than other existing methods, especially for faces with difficult scenarios such as expression, large pose, and occlusion.Additionally, the algorithm can track facial expression in real-time and generate virtual character expression animation.

    Figure 1:General flow chart of expression capture system

    2 Related Work

    Facial expression capture system.Sibbing et al.[8] used 5 synchronous cameras to shoot faces without markers,mainly employing the method of 3D face reconstruction to establish the connection between frames by 2D grid tracking to achieve facial expression animation reproduction,but due to the reconstruction of facial expression,it would consume more time.Cao et al.[9,10],based on constructed Face Warehouse,3D expression library,used the regression method of extracting 3D facial expression from 2D videos to locate facial landmarks and track facial expression in real time, and generated 3D faces through registration and realistic facial expression animation with BlendShape.However,in the offline training stage,it needed to collect data from each face and generated the relatively rough facial expression model.Weise et al.[11] used Kinect motion-sensing peripherals as an expression capture tool,but the noise and error of the obtained 3D information were large.Therefore,the facial expression model was trained to reduce the noise and error to ensure the real-time animation,but the generated expression lacked variability and freedom.In recent years,facial expression capture based on deep learning has become a hot research topic.Laine et al.[12]trained the convolutional neural network based on videos of facial expressions to generate 3D expression performance by using multiview stereoscopic tracking and enhanced animation.However,the captured facial expression data is difficult to reproduce on large multi-users and requires a lot of training data.

    Facial landmarks recognition.Active Shape Model(ASM)[13],a classical algorithm in traditional methods,and AAM[14],an improved algorithm based on ASM,were used to detect facial landmarks by shape change model.The detection results of the above studies are strongly dependent on the dataset of their models, and their generalization performance and robustness are poor.Feng et al.[15] proposed a new cascade shape regression (CSR)structure for robust facial landmarks detection on unconstrained human faces.However,due to the limited capability of handmade feature representation,there is still a problem of misalignment in facial landmarks detection using traditional methods.In recent years, the method of deep learning has greatly improved the accuracy of facial landmarks detection.After the cascade convolutional neural network(DCNN)proposed by Sun et al.[16], it was improved [17] and applied to 68 facial landmarks detection.NeWell et al.[18] used a heatmap regression model and designed a stacked hourglass network for human pose estimation.Wu et al.[19]designed the LAB boundary perception algorithm,which used eight stacked hourglass networks to predict the boundary heatmaps of facial features and decode coordinate information.Yang et al.[20]used supervised transformation of standardized faces and stacked hourglass network to obtain predictive heatmaps,which achieved good results.However,the hourglass model consumed huge computational resources and still lacked the robustness of facial landmarks detection in largeangle posture scenarios.

    MTHR-Face method based on two-stage neural network leverages the best advantages of the improved MTCNN and HRNetV2-W18 model.The improved MTCNN can quickly detect faces and obtain accurate facial candidate windows,laying a foundation for facial landmarks recognition.Combined with HRNetV2-W18 model,this method not only maintains high resolution of the facial feature images in the whole process,but also improves the error of landmarks recognition under various expressions Real-time facial expression capture can realize the synchronization of virtual characters’expressions and generate animation with high fidelity.It is of great significance to real-time emotional communication and interactive control between users and avatars in shared virtual space.

    3 The Traditional MTCNN Model

    At present,in the field of machine vision,multi-task cascaded convolutional network(MTCNN)[21], as shown in Fig.2, is a model of face detection and facial landmarks localization with high computational speed and good performance.It is a cascaded convolutional network architecture and composed of three-layer convolutional neural networks including Proposal Network(P-Net),Refine Network(R-Net)and Output Network(O-Net).Its main tasks are face classification,boundary box regression and facial landmarks localization respectively.Face detection and alignment as well as facial landmarks extraction are realized in the process from simplicity to refinement.

    Figure 2:Structure of three-stage convolutional networks in MTCNN

    This is the input of the following three-stage cascaded framework:

    Step1:The operation of preprocessing ensures that the given image can be scaled to different scales to build an image pyramid,which is provided for training of the cascaded networks.

    Step2:P-Net structure is a fully convolutional neural network.Images with size of 12×12 after image pyramid processing are obtained as the input of the network.In order to ensure higher recall rate,a 3×3 convolution kernel is used to carry out preliminary feature extraction through convolution operation to obtain facial candidate windows and their bounding box regression vectors.Then using the method of bounding box regression adjusts windows and non-maximum suppression(NMS)are performed to filter highly overlapped candidates.The results with bounding size of 24×24 are mapped to the original image.

    A two-class classification problem of P-Net is the determination of face/non-face, so the loss function of face detection classification is cross-entropy function.The formula is as follows:

    Wherepirepresents the probability of the appearance of the face region,the notation∈{0,1}represents the ground-truth label.

    Step3:R-Net is basically constructed as convolutional neural network.The network will filter out a large number of poor candidate windows to achieve the effect of high-precision filtering and face region optimization, because there are many candidate windows left through P-Net.R-Net adds full connection layer to classify feature images output by P-Net, and then using fine-tuning candidate windows of boundary box vector, border regression and NMS finally are performed to remove overlapping windows for selected candidates.

    R-Net is a bounding box regression problem, which is used to select the candidate windows of P-Net and filter a large number of non-face candidate windows.The Euclidean loss for each samplexiis employed to measure the distance between the predicted and the actual value of face candidate windows.Its loss function is:

    Step4:The basic structure of O-Net is a relatively complex convolutional neural network.This network has more input features and in this stage will identify face regions with more supervision,further refine the coordinates of face detection windows to make the processing results more precise and meticulous.Finally,this network will output the face candidate windows and coordinates of five landmarks.

    Similar to the boundary regression process, the loss function of this step is still to calculate the deviation between the predicted and the actual landmarks’position.The minimized the Euclidean loss in the process of landmarks localization is as follows:

    In the formula,indicates the predicted coordinates,andyliandmarkindicates the ground-truth coordinate for theith sample.

    4 MTHR-Face Method

    The primary task of facial expression capture is to accurately recognize facial landmarks.Due to the variety of facial landmarks caused by human emotion,it is impossible to guarantee the uniformity of animation generated by facial expression of virtual characters.Therefore, MTHR-Face method in this paper is adopted for facial expression capture based on improved MTCNN and HRNetV2-W18, as shown in Fig.3.Face detection rate has a certain impact on facial features capture, which lays a foundation for the subsequent recognition of landmarks accurately,so the traditional MTCNN convolution operation is improved.Firstly, feature fusion operation is carried out in the P-Net and Octave Convolution is introduced to replace ordinary convolution in the R-Net, which can obtain more precise face candidate windows and detect target faces quickly.Then combined with HRNetV2-W18,the images cropped after face detection are input to four consecutive multi-scale cascade parallel convolution neural networks for feature fusion and obtain 98 rich high-resolution facial landmarks.The two-stage neural network can effectively maintain the spatial information and high resolution of facial feature images and make the landmarks converge under the complex geometric changes of facial expression.

    Figure 3:Block diagram of MTHR-face algorithm

    4.1 Improved MTCNN for Face Detection

    The purpose of face detection is to obtain appropriate facial regions to simplify the difficulty of landmarks recognition.However,efficient and accurate detection of face region has certain impact on subsequent recognition task.Therefore,traditional MTCNN convolution operation is improved.Its function is to efficiently filter the face information in the image,optimize the network structure of the model, improve the feature representation ability of the convolution layer, and ensure accurate and efficient recognition of subsequent tasks.

    4.1.1 Feature Fusion Operation

    P-Net as the first stage will generate a large number of overlapping face candidate windows,resulting in a long operation time.Therefore, the network is improved to shorten time, as shown in Fig.4 below.Firstly,the input P-Net image is subjected to three-layer convolution operation and the loss of the image is calculated.The structure ①in Fig.4 shows that multi-scale convolution operation is performed on the input image features,and then aggregation operation is performed to enhance the detection effect.Secondly,the first and second layer feature images are fused to improve the expression ability of the network and efficiently filter the face information in the image.Finally,the structure ②in Fig.4 is the decomposition operation of the convolution kernel to reduce number of parameters and improve the detection rate of the network.

    Figure 4:Improved P-Net structure

    4.1.2 Introducing New Octave Convolution

    Due to the small number of convolutional layers at all levels of MTCNN structure,the extracted features are not enough to fully represent face details.The whole network make full use of the features extracted from the convolution layer and increase the face detection capability of the multi-pose model,laying a foundation for facial landmarks recognition.Therefore, the new convolution operation,Octave Convolution (Octconv)[22], is introduced into R-Net of MTCNN, which only replaces the original Convolution,so that the network better realize the refinement and regression of face candidate windows and improve the detection accuracy.

    The principle of Octconv is to effectively process the low frequency and high frequency components in the corresponding frequency tensor for the convolution layer to form new output features together.The specific operating principle of tensor decomposition is shown in Fig.5 where X and Y are tensors of the input and output respectively.Output of each layer consists of low frequency and high frequency signals,and each part is composed of high frequency and low frequency components of the network output of the previous layer in a certain proportion.The formula of output high frequency signal and low frequency signal is as follows:

    Fig.5 shows that two green arrows correspond to information update of the high and low frequency characteristic graph while two red arrows facilitate information exchange between the two frequencies.The introduction of convolution can reduce the operation time and improve the face detection rate by halving the low frequency information in the input data.Compared with the features extracted by original convolution,the spatial redundancy of features is reduced and feature representation capability of convolution layer is improved.

    Figure 5:Detailed design of the octave convolution

    4.2 Facial Landmarks Recognition Combined with High-Resolution Network

    Facial landmarks recognition aims to detect the position of eyebrow,eyes,nose,mouth and facial contour from the face image.Due to the risk of missing feature information under different levels of facial expressions,more reliable landmarks are needed to further accurately capture facial expression and generate animation.Therefore, HRNetV2-W18 model [23] in this paper is selected as the face feature extractor,and the high resolution of facial feature map is maintained during the whole process,so that the prediction of facial landmarks is more accurate in space.This network with high-resolution performance is transferred to a task of facial landmarks recognition.In addition,it has achieved a good level in the MPII Human Pose database[24].

    HRNetV2-W18 has four stages and is connected in parallel, as shown in Fig.6.The first stage involves high-resolution convolution.At the beginning of each stage, a parallel stream with a lower resolution than the current minimum is added to connect the multi-resolution stream in parallel.The parallel connections of streams with different resolutions can effectively preserve the original features and extract the deep features downward.At the end of each stage,multi-scale feature fusion is performed on the output images of streams with different resolutions, which can form a highresolution network structure.Finally,the highest resolution of the feature map is used to predict facial landmarks.It maintains high-resolution representations through the whole process for spatially precise heatmap estimation.The network mainly includes parallel multi-resolution convolution, repeated multi-resolution fusion and regression heatmap.

    Figure 6:HRNetV2-W18 structure

    4.2.1 Parallel Multi-Resolution Convolution

    Starting from the input high-resolution image information as the first stage,the streams from highresolution images to low-resolution images are added layer by layer to form a parallel integration and connection stage,when the network depth gradually increases.The main body of the network consists of four parallel streams,composed of two stride-2 3×3 convolutions,whose resolution is reduced by half in turn and the corresponding number of channels is doubled.Therefore,the resolutions for the parallel streams of a later stage consist of the resolutions from the previous stage,and an extra lower one.An example network structure,containing 4 parallel streams,is logically as follows:

    whereNsris a sub-stream in thesth stage,andris the resolution index.The resolution index of the first stream isr=1.The resolution of indexrisof the resolution of the first stream.

    4.2.2 Repeated Multi-Resolution Fusion

    The process of multi-resolution fusion in the parallel network is shown in Fig.7.For feature maps larger than the current scale,a convolution layer with a stride-2 3×3 convolution is used for downsampling processing.After passing through the convolution layer, the side length of feature maps becomes half of the original and the area becomes one quarter of the original.For feature maps smaller than the current scale, up-sampling operation is required, the side length is doubled and the area is quadrupled.Firstly,interpolation method is used to expand the resolution.Then,the convolution layer with 1×1 convolution is used to change the number of channels.Finally,the features from different sources are summed to obtain the fused features.

    Figure 7:Fusion process of high,medium and low resolution respectively

    The exchange units introduced across parallel streams in multi-resolution fusion is used to make each stream repeatedly receive information from other parallel streams.Taking the third stage as an example,it is divided into several exchange units composed of three parallel convolutional units and one exchange unit across the parallel units.The scheme of exchanging information is as follows:

    whererepresents the convolution unit in therth resolution of thebth block in thesth stage,andεsbis the corresponding exchange unit.Formula(7)corresponds to Fig.7.

    Each output is a collection of input mappings,as shown in the following formula:

    The exchange unit across stages has an extra output map,as shown in the following formula:

    wheresis the number of parallel streams,the inputs aresresponse maps:{X1,X2,...,XS},the outputs aresresponse maps: {Y1,Y2,...,Ys}, whose resolutions and widths are the same to the input.The functiona(Xi,k)consists of up-sampling or down-samplingXifrom resolutionito resolutionk.

    4.2.3 Heatmap Estimation

    Heatmap regression firstly extracts features of face images,and then learns the features information from HRNetV2-W18 model.Finally convolution layer is added to the last layer of the model,and the positions of facial landmarks estimated by regressors are converted into high resolution heatmaps.The improvement of HRNetV2 model compared with HRNetV1 mainly lies in the fact that the output of the final feature map integrates the information from low-resolution feature maps,which can improve the accuracy of facial landmarks recognition task,as shown in Fig.8 below.

    Figure 8:Different types of HRNet output layers

    The network is applied to a task of facial landmarks recognition.For face imageP,the network can obtainNheatmapH(P),whereNis the total number of facial landmarks.The heatmap based on landmarks recognition of the network adopts the gaussian distribution principle that is the maximum value in the position of the heatmap, decodes the predicted position of each landmark from the corresponding heatmap, and output the coordinate of facial landmarks.As shown in the following formula:

    whereiis the heatmap index corresponding to facial landmark,L(i)gives the coordinates of theith landmark.

    5 Experimental Results and Application

    In this section, extensive experiments and analyses are carried out to prove the robustness and effectiveness of the proposed method.Firstly,the following paragraphs describe the datasets,training details and evaluation metric.Secondly, in terms of performance and metrics, the MTHR-Face is compared with other algorithms and its results are analyzed.Finally,the method in this paper is used to capture facial expression and generate animation in real-time.

    5.1 Datasets

    WIDER FACE [25] is a benchmark database for face detection including 32,203 images and 393,703 labeled face bounding boxes.70%of the images are taken as a training set and the remaining as a test set.

    The Wider Facial Landmarks in the Wild (WFLW)[19] is considered the most challenging database, which contains 10,000 faces (7500 for training and 2500 for testing)with 98 fully manual annotated facial landmarks.This database also features rich attribute annotations in terms of pose,expression,illumination, makeup,occlusion and blur, which can effectively evaluate the robustness of the algorithm for large angle posture and complex expression.

    5.2 Training Details

    The experiment uses Python3.7 and is compiled in PyCharm integrated development environment.The two models of MTHR-Face algorithm, improved MTCNN and HRNetV2-W18, were trained independently.Both networks were implemented in PyTorch.In this paper,the MTHR-Face algorithm is implemented on Windows10 operating system with NVIDIA GTX2080Ti (16 GB)GPU and an Intel(R)Core(TM)i7-8700 CPU@3.20 GHz.

    For improved MTCNN, P-Net, R-Net and O-Net are trained separately.Minibatch size is set to 256 and learning rate is 1e-3.In the process of network training, loss functions determined by Eqs.(1)-(3)are adopted for the three main tasks of MTCNN to carry out network training.And Intersection-over-Union (IoU)ratio is calculated by using real face coordinate area.If IoU value is greater than 0.65,it is a positive sample.If IOU value is less than 0.3,it is a negative sample.IoU value between 0.4 and 0.65 are considered to contain only partial face information.Negative and positive samples are used for face classification tasks, positive and partial faces are used for bounding box regression.

    The HRNetV2-W18 model is trained that the input images are cropped by face candidate windows of improved MTCNN and resized to 256×256 resolution.Data augmentation is applied by random flipping, rotation(between ±30°), increasing gaussian noise and color enhancement in WFLW database to improve the generalization ability of the network and the robustness of the model,as shown in Fig.9 below.The model is optimized using the Adam optimizer with an initial learning rate of 0.0001 and the batch size was set to 16.In total,training is applied of 300 epochs.

    Figure 9:Example diagram of image enhancement method for WFLW database.((a)A sample cropped face image from the WFLW database;(b)Image flipping;(c)Image rotation;(d)Image Gaussian noise;(e)Image median filtering;(f)One group is image color enhancement)

    5.3 Evaluation Metric

    In this paper,we use the normalization mean error(NME),the failure rate(FR),the area under the curve(AUC)and the cumulative error distribution(CED)curve of samples to measure the facial landmark location error.

    NME is a widely used metric to evaluate the performance of facial landmark recognition.Error of each landmark is calculated this way and then averaged to get the final result.The formula is as follows:

    whereNindicates the number of facial landmarks,indicates the predicted value of the facial landmarks,xsindicates the actual value of the facial landmarks.For the WFLW database,dist_between_eyesindicates the distance between the outer eye corners(“inter-ocular”).

    FR is calculated based on the NME value.For the WFLW database, the thresholdεis set to 0.1.When the NME of an image is larger than 0.1, this case is deemed a failure of facial landmark recognition.We derive the FR from the rate of failures in a test set.

    AUC provides another insight into a design of facial landmarks recognition.Basically,it can be deduced from CED curve that a non-negative curve is formed by plotting the curve from zero to the threshold for FR,under which the area is calculated to be AUC.The formula is as follows:

    whereNe≤lrepresents the number of images whose errorlis not less thane.Performance is higher when the CED curve is correspondingly higher.And CED curve evaluation indexes are widely used in benchmark database for facial landmark recognition.

    5.4 Experimental Results and Analysis

    Since the WFLW database meets the diversity of expressions and contains various type of challenge,the proposed method in this paper is compared with LAB algorithm and HRNetV2-W18 model alone from the final output subjective result images and the above evaluation metrics, which can comprehensively evaluate the robustness of MTHR-Face.

    The LAB is much more computational expensive due to a network architecture using eightstacked hourglass modules than MTHR-Face and HRNetV2-W18 model.And the hourglass network is adopted to first reduce the resolution and then increase the resolution,whereas HRNetV2-W18 is realized by gradually adding high-to-low resolution streams to connect the multi-resolution streams in parallel, and repeated multi-scale fusions are performed, which can maintain the high-resolution information of face feature maps and is accurate in space.LAB that a low-to-high resolution network structure has the risk of losing characteristic information,as shown in Fig.10b.When the human face makes complex expression changes,HRNetV2-W18 alone is used in Fig.10c where some landmarks are recognized beyond the target face region.However, the MTHR-Face method proposed in this paper combines HRNetV2-W18 with the improved MTCNN,which not only effectively maintains the spatial information of the face,but also can recognize facial landmarks more accurately under different levels of expressions to achieve real virtual animation effects for the subsequent.Fig.10 below shows the comparison results of test faces in facial landmarks recognition under different algorithms.It can be seen that the MTHR-Face method in Fig.10d can recognize facial landmarks more accurately under different expressions,such as normal and amazed,because the improved MTCNN can adjust the face posture,detect the face,cut the face image based on the face candidate windows and then input into HRNetV2-W18 model,which can further reduce the difficulty of landmarks regression task and make the recognized facial landmarks more convergence.

    Figure 10:Comparison diagram of the results of each algorithm under different levels of expressions

    To further evaluate the performance of the MTHR-Face method,Tab.1 below illustrates NME(lower is better), FR (lower is better), AUC (higher is better)and CED curve on the testset and six subsets of WFLW among the MTHR-Face,the HRNetV2-W18 alone and the LAB algorithm.The results show that MTHR-Face performs better than LAB and HRNetV2-W18 alone by a significant margin on every subset.LAB is weak in recognizing facial landmarks of extreme diversity samples on WFLW,such as big pose,exaggerated expressions and heavy occlusion.However,when combined with the improved MTCNN,HRNetV2-W18 decreases NME from 4.60%to 4.39%.Note that MTHR-Face still outperforms HRNetV2-W18 alone in all other metrics, which is much more beneficial to facial landmarks recognition.Compared with other method,the innovations proposed in this paper exhibit a certain improvement for each subset of the WFLW database.These results demonstrate that the method improve the problem of low accuracy and deviation caused by diverse expressions or complex background.Besides,the CED curve in Fig.11 shows that MTHR-Face curve is higher than the rest two between 0.02 and 0.1,which means the performance of the proposed method is significantly better than that of other algorithms in WFLW testset and it has certain robustness to extreme changes under different degrees of expressions.

    Table 1:Comparison in terms of NME(lower is better),FR(lower is better)and AUC(higher is better)on WFLW testset and its six subsets(98 landmarks)

    Figure 11:CED curves of different algorithms using WFLW database

    5.5 Animation Generation

    After collecting data from 98 facial landmarks based on the MTHR-Face in this paper,Support Vector Machine(SVM)[26]is used to train the strength estimation model of facial action unit(AU)to extract the parameters[27],which can obtain the classification and regress effect of facial expression.Finally,the expression parameters are acquired by establishing the mapping relationship between AU and Blendshape expression bases to realize facial expression capture and animation generation.

    Blendshape[28]is a linear combination of natural and other facial shapes and controls the facial expression of 3D models through semantic weight.Based on the division method of Facial Action Coding System,the specific facial expression is represented by E:

    WhereE0is expression base,eiis the weighted coefficient of corresponding expression and the facial expression module contains 18 data of facial action unit.

    In the process of expression animation drive, we firstly use Maya [29] to model the test virtual character in Fig.12 and bind the facial skeleton,then design the BlendShape expression controller of the character model and create the corresponding AU shape expression base on the virtual character’s face.Secondly, AU parameters extracted from real faces after normalization are mapped to Morph Targets controller in Unity3D engine.Finally, the mapping relationship is used to drive the facial expression changes of virtual characters to realize the man-computer facial expression interaction[30].

    Figure 12:Real-time expression generation animation effects

    The MTHR-Face model is used to capture the changes of facial features in real-time, and the virtual character shaped by Maya is selected as the test target model.Then facial expression captured in real-time is mapped to generate facial synchronous expression animation.Different emotion-driven animation effects are obtained as shown in Fig.12.Ensuring real-time performance,the average frame rate of generated animation reaches 20fps.It can be seen that the MTHR-Face can accurately capture the user’s facial expression and drive the virtual character expression animation in real-time.

    6 Conclusion

    MTHR-Face model based on two-stage neural network is proposed for facial expression capture.The method leverages the best advantages of the improved MTCNN and HRNetV2-W18 model.Benefiting from robust face detection of the improved MTCNN, the cropped images are input into HRNetV2-W18 model to obtain 98 high-resolution facial landmarks, which not only effectively maintains the spatial information of the face,but also can recognize facial landmarks more accurately under different levels of expressions and complex background.Finally, based on the movement of the recognized landmarks,the expression parameters are then transmitted to the Unity3D engine to drive the virtual character’s face for generation of real-time expression animation.Experimental results show that,the method achieves more outstanding performance than others in all metrics on WFLW database and can better improve the accuracy of facial landmark recognition.Especially compared with HRNetV2-W18 alone, the method’s NME is reduced from 4.60% to 4.39%.MTHR-Face can accurately capture facial expression and generate animation.Future work will consist in training other databases to verify the MTHR-Face, or possibly other representations and tasks,such as head pose estimation and face texture feature extraction, to get rich emotional communication and interactive between user and avatar in shared virtual space.

    Acknowledgement:We would like to thank the anonymous reviewers for their valuable and helpful comments,which substantially improved this paper.At last,we also would also like to thank all of the editors for their professional advice and help.

    Funding Statement:This research was funded by College Student Innovation and Entrepreneurship Training Program, grant number 2021055Z and S202110082031, the Special Project for Cultivating Scientific and Technological Innovation Ability of College and Middle School Students in Hebei Province,Grant Number 2021H011404.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    国产精品电影一区二区三区| 亚洲性久久影院| 一区二区三区激情视频| 日韩欧美精品v在线| 十八禁网站免费在线| 俄罗斯特黄特色一大片| 午夜爱爱视频在线播放| 99热这里只有是精品50| 婷婷亚洲欧美| 一a级毛片在线观看| 少妇的逼好多水| 亚洲va在线va天堂va国产| 99热这里只有精品一区| 中国美白少妇内射xxxbb| 免费观看在线日韩| 久久中文看片网| 国产精品久久久久久av不卡| 亚洲第一电影网av| 婷婷精品国产亚洲av| 久久人人精品亚洲av| a级毛片免费高清观看在线播放| 日韩欧美精品免费久久| 中文字幕人妻熟人妻熟丝袜美| 国产伦在线观看视频一区| 大型黄色视频在线免费观看| 国产男人的电影天堂91| 天天躁日日操中文字幕| 人妻久久中文字幕网| 中文资源天堂在线| 日韩人妻高清精品专区| 国产成年人精品一区二区| 性插视频无遮挡在线免费观看| 婷婷丁香在线五月| 亚洲,欧美,日韩| 亚洲精品亚洲一区二区| 精品久久久久久久久av| 亚洲最大成人手机在线| 黄色丝袜av网址大全| 国产白丝娇喘喷水9色精品| 美女大奶头视频| 制服丝袜大香蕉在线| 精品无人区乱码1区二区| 日韩一区二区视频免费看| 身体一侧抽搐| 亚洲av美国av| 免费观看的影片在线观看| 九九爱精品视频在线观看| 人妻制服诱惑在线中文字幕| 国产午夜精品久久久久久一区二区三区 | 亚洲av中文av极速乱 | 又黄又爽又免费观看的视频| 自拍偷自拍亚洲精品老妇| 听说在线观看完整版免费高清| 18禁黄网站禁片午夜丰满| 联通29元200g的流量卡| 熟女电影av网| 国产精品免费一区二区三区在线| 一边摸一边抽搐一进一小说| 在线播放国产精品三级| 黄色日韩在线| 男人舔女人下体高潮全视频| 日日干狠狠操夜夜爽| 欧美中文日本在线观看视频| 尾随美女入室| 午夜免费成人在线视频| 听说在线观看完整版免费高清| 麻豆精品久久久久久蜜桃| 18+在线观看网站| 大型黄色视频在线免费观看| 欧美一区二区国产精品久久精品| 久99久视频精品免费| 日日夜夜操网爽| 国产视频一区二区在线看| 色综合亚洲欧美另类图片| 成人无遮挡网站| 国产在视频线在精品| 国语自产精品视频在线第100页| 男人舔女人下体高潮全视频| 最近最新免费中文字幕在线| 国产久久久一区二区三区| 国产精品1区2区在线观看.| 久久久久久久久大av| 国产成人影院久久av| 国产 一区 欧美 日韩| 熟女电影av网| 亚洲18禁久久av| 久久午夜亚洲精品久久| 欧美一级a爱片免费观看看| 我的老师免费观看完整版| 亚洲av成人av| 最近在线观看免费完整版| 免费一级毛片在线播放高清视频| 午夜福利高清视频| 91久久精品国产一区二区三区| 热99re8久久精品国产| 九色国产91popny在线| 少妇的逼水好多| 午夜福利高清视频| 久久精品久久久久久噜噜老黄 | 一区二区三区激情视频| 国产精品一区二区三区四区久久| 成年人黄色毛片网站| 国产成人a区在线观看| 色哟哟哟哟哟哟| 国产一区二区在线av高清观看| 午夜爱爱视频在线播放| 日韩在线高清观看一区二区三区 | 99久久精品国产国产毛片| 国产探花在线观看一区二区| 久久久久久九九精品二区国产| 国国产精品蜜臀av免费| 美女高潮的动态| 成人高潮视频无遮挡免费网站| 国内揄拍国产精品人妻在线| 在线观看午夜福利视频| av在线老鸭窝| 国产免费一级a男人的天堂| 国产高清视频在线观看网站| 在线免费十八禁| 久久久久久久久中文| 少妇人妻一区二区三区视频| 色播亚洲综合网| 国内久久婷婷六月综合欲色啪| 国产亚洲精品久久久久久毛片| 五月玫瑰六月丁香| 熟女人妻精品中文字幕| netflix在线观看网站| 国产 一区精品| 亚洲va在线va天堂va国产| 免费观看精品视频网站| 成人国产一区最新在线观看| 我要看日韩黄色一级片| 国产亚洲欧美98| 成人高潮视频无遮挡免费网站| 丰满的人妻完整版| 真人一进一出gif抽搐免费| 国产伦精品一区二区三区四那| 免费看a级黄色片| 欧美在线一区亚洲| 男女那种视频在线观看| 国产乱人视频| 欧美性感艳星| 亚洲不卡免费看| 国产午夜福利久久久久久| 精品久久国产蜜桃| 热99re8久久精品国产| 免费看光身美女| 久久精品国产清高在天天线| 精品一区二区三区视频在线| 中文字幕高清在线视频| 91av网一区二区| av视频在线观看入口| 黄色丝袜av网址大全| 日韩欧美 国产精品| 噜噜噜噜噜久久久久久91| 老司机午夜福利在线观看视频| 国语自产精品视频在线第100页| 欧美黑人欧美精品刺激| 狂野欧美白嫩少妇大欣赏| 91狼人影院| 婷婷六月久久综合丁香| 亚洲四区av| 美女被艹到高潮喷水动态| 国产三级在线视频| 狠狠狠狠99中文字幕| 国产av一区在线观看免费| 国产免费一级a男人的天堂| 极品教师在线免费播放| 精品午夜福利视频在线观看一区| 亚洲av二区三区四区| 免费高清视频大片| 在线看三级毛片| 婷婷精品国产亚洲av| 国产国拍精品亚洲av在线观看| 两人在一起打扑克的视频| 国产精品一区二区三区四区免费观看 | 国产亚洲91精品色在线| 久久久久性生活片| 老熟妇乱子伦视频在线观看| 97超视频在线观看视频| 国产精品一区www在线观看 | 亚洲va在线va天堂va国产| 小蜜桃在线观看免费完整版高清| 日韩欧美精品v在线| 成人无遮挡网站| 美女被艹到高潮喷水动态| 18禁黄网站禁片免费观看直播| 嫁个100分男人电影在线观看| 99久久精品一区二区三区| 国产真实伦视频高清在线观看 | 男人舔奶头视频| 丝袜美腿在线中文| 大又大粗又爽又黄少妇毛片口| av在线观看视频网站免费| 欧美xxxx黑人xx丫x性爽| 免费一级毛片在线播放高清视频| 黄片wwwwww| 春色校园在线视频观看| a级一级毛片免费在线观看| av国产免费在线观看| 久久国内精品自在自线图片| 日韩欧美在线乱码| 九九久久精品国产亚洲av麻豆| 别揉我奶头~嗯~啊~动态视频| 伊人久久精品亚洲午夜| 国产精品免费一区二区三区在线| 麻豆精品久久久久久蜜桃| 麻豆成人av在线观看| 国产高清视频在线观看网站| 久久精品国产99精品国产亚洲性色| 国产黄片美女视频| 长腿黑丝高跟| 久久精品国产99精品国产亚洲性色| 成人特级黄色片久久久久久久| 舔av片在线| 乱码一卡2卡4卡精品| 成人毛片a级毛片在线播放| 欧美日韩中文字幕国产精品一区二区三区| 最新在线观看一区二区三区| 免费av不卡在线播放| 女生性感内裤真人,穿戴方法视频| 啦啦啦啦在线视频资源| 免费观看在线日韩| 亚洲av第一区精品v没综合| 黄色丝袜av网址大全| 日本五十路高清| 亚洲av熟女| 99热这里只有精品一区| 97碰自拍视频| 性色avwww在线观看| 日韩欧美在线乱码| 久久久久久久久大av| 两人在一起打扑克的视频| av.在线天堂| 露出奶头的视频| 亚洲第一电影网av| 悠悠久久av| 亚洲av美国av| 成人二区视频| 日本熟妇午夜| 久久精品夜夜夜夜夜久久蜜豆| 国产真实乱freesex| 亚洲av五月六月丁香网| 美女黄网站色视频| 非洲黑人性xxxx精品又粗又长| 国语自产精品视频在线第100页| 欧美日本视频| 99久久中文字幕三级久久日本| 男人狂女人下面高潮的视频| 亚洲精品一卡2卡三卡4卡5卡| 九色成人免费人妻av| 一级av片app| 亚洲一区二区三区色噜噜| 日韩av在线大香蕉| 免费在线观看日本一区| 男人舔女人下体高潮全视频| 韩国av在线不卡| 永久网站在线| 欧美激情在线99| 淫妇啪啪啪对白视频| 亚洲成人中文字幕在线播放| 人妻少妇偷人精品九色| 国内精品一区二区在线观看| 综合色av麻豆| 老女人水多毛片| 欧美日本亚洲视频在线播放| 春色校园在线视频观看| 欧美日韩瑟瑟在线播放| 成年女人永久免费观看视频| 99久久成人亚洲精品观看| 国产午夜精品久久久久久一区二区三区 | 欧美zozozo另类| 国产激情偷乱视频一区二区| 日韩精品有码人妻一区| 亚洲成人久久爱视频| 天天一区二区日本电影三级| 欧美人与善性xxx| 十八禁网站免费在线| 亚洲图色成人| 亚洲欧美日韩无卡精品| 国产真实乱freesex| 女同久久另类99精品国产91| 免费看美女性在线毛片视频| 亚洲欧美日韩高清在线视频| 久久亚洲精品不卡| 一进一出抽搐动态| 永久网站在线| 国模一区二区三区四区视频| 亚洲欧美日韩东京热| 日韩强制内射视频| 久久国产精品人妻蜜桃| 精品国产三级普通话版| 又爽又黄无遮挡网站| 亚洲精品亚洲一区二区| 麻豆国产av国片精品| 国产精品98久久久久久宅男小说| 色综合站精品国产| 尤物成人国产欧美一区二区三区| 亚洲欧美日韩无卡精品| 很黄的视频免费| 不卡一级毛片| 毛片一级片免费看久久久久 | 国产精品三级大全| 国产久久久一区二区三区| 97人妻精品一区二区三区麻豆| 人妻少妇偷人精品九色| 国产午夜福利久久久久久| 嫩草影院入口| 国产免费男女视频| 亚洲最大成人av| 亚洲av成人精品一区久久| 91av网一区二区| 在线看三级毛片| 国产爱豆传媒在线观看| 亚洲精品影视一区二区三区av| 在线播放无遮挡| 国产黄a三级三级三级人| 日本一本二区三区精品| 99热网站在线观看| 桃色一区二区三区在线观看| 国产三级中文精品| 麻豆成人午夜福利视频| 日本-黄色视频高清免费观看| 91在线精品国自产拍蜜月| 成人欧美大片| 亚洲熟妇中文字幕五十中出| 国产色爽女视频免费观看| 黄色女人牲交| 欧美色欧美亚洲另类二区| av在线蜜桃| 亚洲精品久久国产高清桃花| 少妇高潮的动态图| 男女之事视频高清在线观看| 亚洲无线观看免费| 露出奶头的视频| 国产伦精品一区二区三区四那| 亚洲av成人av| 久久久精品大字幕| 国产精品久久久久久精品电影| 内地一区二区视频在线| 97人妻精品一区二区三区麻豆| 国产亚洲欧美98| 亚洲精品在线观看二区| 中文字幕精品亚洲无线码一区| 午夜福利高清视频| 亚洲精品影视一区二区三区av| 亚洲成av人片在线播放无| 日日撸夜夜添| 欧美中文日本在线观看视频| 免费在线观看影片大全网站| 69av精品久久久久久| 看黄色毛片网站| 深夜精品福利| 午夜福利成人在线免费观看| 亚洲成人久久性| 少妇裸体淫交视频免费看高清| 亚洲av成人精品一区久久| 国产一区二区在线观看日韩| 一本一本综合久久| 中国美白少妇内射xxxbb| 最新中文字幕久久久久| 午夜精品久久久久久毛片777| 又黄又爽又免费观看的视频| 国产黄a三级三级三级人| 欧美极品一区二区三区四区| 国产高潮美女av| 亚洲国产精品成人综合色| 国内精品久久久久精免费| 欧美bdsm另类| 麻豆成人午夜福利视频| 欧美激情久久久久久爽电影| 日本精品一区二区三区蜜桃| x7x7x7水蜜桃| 变态另类成人亚洲欧美熟女| 黄色视频,在线免费观看| 国产精品一区二区免费欧美| 无遮挡黄片免费观看| 久久久色成人| 国产毛片a区久久久久| 少妇的逼好多水| 别揉我奶头 嗯啊视频| 亚洲欧美日韩无卡精品| 91精品国产九色| 婷婷色综合大香蕉| 亚洲avbb在线观看| 热99在线观看视频| 九九爱精品视频在线观看| 国产精品av视频在线免费观看| 别揉我奶头 嗯啊视频| 国产蜜桃级精品一区二区三区| 久久精品国产鲁丝片午夜精品 | 日日夜夜操网爽| 亚洲国产欧美人成| 国产亚洲精品综合一区在线观看| 国产精品自产拍在线观看55亚洲| 国产精品亚洲美女久久久| 国产伦人伦偷精品视频| 日韩精品中文字幕看吧| 亚洲欧美日韩卡通动漫| 亚洲av二区三区四区| 成人无遮挡网站| 全区人妻精品视频| 给我免费播放毛片高清在线观看| 久久精品91蜜桃| 亚洲av免费高清在线观看| 悠悠久久av| 国产麻豆成人av免费视频| 又粗又爽又猛毛片免费看| 51国产日韩欧美| 国产精品久久久久久亚洲av鲁大| 99热这里只有是精品在线观看| 精品久久久久久,| 国产精品伦人一区二区| eeuss影院久久| 成人亚洲精品av一区二区| 久久婷婷人人爽人人干人人爱| 日本在线视频免费播放| 99久久中文字幕三级久久日本| 亚洲在线自拍视频| 欧美一区二区国产精品久久精品| 亚洲国产日韩欧美精品在线观看| 日本a在线网址| 国语自产精品视频在线第100页| 性色avwww在线观看| 亚洲一区二区三区色噜噜| 精品人妻熟女av久视频| 国产三级在线视频| 人妻久久中文字幕网| 国产蜜桃级精品一区二区三区| 又黄又爽又刺激的免费视频.| 美女高潮喷水抽搐中文字幕| 亚洲三级黄色毛片| 欧美日韩中文字幕国产精品一区二区三区| 91狼人影院| 69av精品久久久久久| 精品不卡国产一区二区三区| 亚洲最大成人手机在线| 嫩草影视91久久| 久久99热6这里只有精品| 少妇被粗大猛烈的视频| 中文在线观看免费www的网站| 国产男人的电影天堂91| 国产亚洲精品综合一区在线观看| а√天堂www在线а√下载| 免费av不卡在线播放| 国内精品宾馆在线| 欧美又色又爽又黄视频| 男人狂女人下面高潮的视频| 国产免费av片在线观看野外av| 国产 一区 欧美 日韩| 深爱激情五月婷婷| 国产精品一区二区三区四区免费观看 | 最近在线观看免费完整版| 午夜影院日韩av| 欧美日韩亚洲国产一区二区在线观看| 日韩欧美三级三区| 99久久精品一区二区三区| 免费大片18禁| 看十八女毛片水多多多| 麻豆国产av国片精品| 国产av在哪里看| 国产精品久久电影中文字幕| 久久精品国产清高在天天线| 可以在线观看毛片的网站| 深爱激情五月婷婷| 又紧又爽又黄一区二区| bbb黄色大片| 午夜福利18| 国产精品永久免费网站| АⅤ资源中文在线天堂| 91在线观看av| 欧美成人a在线观看| 三级国产精品欧美在线观看| 日本黄色片子视频| 日韩欧美一区二区三区在线观看| 窝窝影院91人妻| 欧美性感艳星| 国产老妇女一区| 亚洲成人久久爱视频| 97人妻精品一区二区三区麻豆| 中文字幕人妻熟人妻熟丝袜美| 亚洲国产欧美人成| 黄色一级大片看看| 国产亚洲av嫩草精品影院| 亚洲专区中文字幕在线| 亚洲av电影不卡..在线观看| 亚洲国产欧洲综合997久久,| 成人一区二区视频在线观看| 搡老妇女老女人老熟妇| 国产亚洲精品久久久久久毛片| bbb黄色大片| 少妇被粗大猛烈的视频| 日韩大尺度精品在线看网址| 日本免费a在线| 一级黄色大片毛片| 能在线免费观看的黄片| 久久精品国产自在天天线| 国产成人a区在线观看| 国产精品自产拍在线观看55亚洲| av福利片在线观看| 午夜激情欧美在线| eeuss影院久久| 久久久精品大字幕| 男女之事视频高清在线观看| 亚洲第一区二区三区不卡| 免费在线观看成人毛片| 他把我摸到了高潮在线观看| 99久久精品一区二区三区| 亚洲欧美日韩卡通动漫| 12—13女人毛片做爰片一| 麻豆av噜噜一区二区三区| 国产精品久久久久久精品电影| 天堂av国产一区二区熟女人妻| 亚洲av熟女| 国产免费一级a男人的天堂| 久久精品国产99精品国产亚洲性色| 我的女老师完整版在线观看| 中文资源天堂在线| 亚洲欧美清纯卡通| 亚洲真实伦在线观看| 日本三级黄在线观看| 国产一区二区激情短视频| 国产精品国产高清国产av| 国产淫片久久久久久久久| 精品久久久久久,| 日本一本二区三区精品| 天美传媒精品一区二区| 国产一级毛片七仙女欲春2| 99热这里只有是精品在线观看| 久久久久久伊人网av| 国产在线男女| 夜夜爽天天搞| 欧美+亚洲+日韩+国产| 亚洲av电影不卡..在线观看| 麻豆国产97在线/欧美| 国产精品乱码一区二三区的特点| 久9热在线精品视频| 无遮挡黄片免费观看| 精品久久久久久久久久免费视频| 乱人视频在线观看| 日本黄色视频三级网站网址| 欧美丝袜亚洲另类 | 天堂av国产一区二区熟女人妻| 少妇猛男粗大的猛烈进出视频 | 精品一区二区三区视频在线观看免费| 成人二区视频| 国产av不卡久久| 久久久久国产精品人妻aⅴ院| 男人和女人高潮做爰伦理| 成人三级黄色视频| 啦啦啦韩国在线观看视频| 91在线精品国自产拍蜜月| 日韩欧美 国产精品| 亚洲男人的天堂狠狠| 亚洲精品粉嫩美女一区| 波野结衣二区三区在线| 在线免费十八禁| 热99在线观看视频| 欧美高清成人免费视频www| 99riav亚洲国产免费| 免费看a级黄色片| 十八禁国产超污无遮挡网站| 精品免费久久久久久久清纯| 身体一侧抽搐| 亚洲男人的天堂狠狠| 综合色av麻豆| 久久香蕉精品热| 成年免费大片在线观看| 日韩欧美三级三区| 成人高潮视频无遮挡免费网站| 人妻少妇偷人精品九色| www.色视频.com| av在线亚洲专区| 国产精品嫩草影院av在线观看 | 99热6这里只有精品| 色综合亚洲欧美另类图片| 国产伦一二天堂av在线观看| 午夜精品在线福利| 一级a爱片免费观看的视频| 中文字幕久久专区| 搡老妇女老女人老熟妇| 国产爱豆传媒在线观看| 免费黄网站久久成人精品| 丰满乱子伦码专区| 国产乱人伦免费视频| 在线免费观看不下载黄p国产 | 麻豆一二三区av精品| 久久久久九九精品影院| 五月伊人婷婷丁香| 午夜视频国产福利| 亚洲精品久久国产高清桃花| 国产精品福利在线免费观看| 欧美最黄视频在线播放免费| 精品久久久久久久末码| 自拍偷自拍亚洲精品老妇| 人妻丰满熟妇av一区二区三区| 制服丝袜大香蕉在线| 国产爱豆传媒在线观看| 老师上课跳d突然被开到最大视频| 国产精品女同一区二区软件 | 能在线免费观看的黄片| 91麻豆精品激情在线观看国产| 国产探花极品一区二区| 51国产日韩欧美| 免费不卡的大黄色大毛片视频在线观看 | 国产乱人伦免费视频| 亚洲成a人片在线一区二区| 88av欧美| 亚洲av二区三区四区| 啦啦啦啦在线视频资源| 国产一区二区激情短视频| 全区人妻精品视频| 中文字幕免费在线视频6| 舔av片在线| 99久久久亚洲精品蜜臀av|