• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Weakly-Supervised Crowd Density Estimation Method Based on Two-Stage Linear Feature Calibration

    2024-04-15 09:37:16YongChaoLiRuiShengJiaYingXiangHuandHongMeiSun
    IEEE/CAA Journal of Automatica Sinica 2024年4期

    Yong-Chao Li , Rui-Sheng Jia , Ying-Xiang Hu , and Hong-Mei Sun

    Abstract—In a crowd density estimation dataset, the annotation of crowd locations is an extremely laborious task, and they are not taken into the evaluation metrics.In this paper, we aim to reduce the annotation cost of crowd datasets, and propose a crowd density estimation method based on weakly-supervised learning, in the absence of crowd position supervision information, which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose, we design a new training method, which exploits the correlation between global and local image features by incremental learning to train the network.Specifically, we design a parent-child network (PC-Net) focusing on the global and local image respectively, and propose a linear feature calibration structure to train the PC-Net simultaneously, and the child network learns feature transfer factors and feature bias weights, and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition, we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels, and design a global-local feature loss function (L2).We combine it with a crowd counting loss(LC) to enhance the sensitivity of the network to crowd features during the training process, which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation, and outperforms the comparison methods on five datasets of ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50, UCF_QNRF and JHU-CROWD++.

    I.INTRODUCTION

    WITH the increase of the global population and human social activities, large crowds often gather in public places, which brings huge hidden dangers to public safety.Therefore, determining how to accurately estimate crowd density has become an important research topic in the field of public safety.To train a robust and reliable network for accurate crowd density estimation, most existing crowd density estimation networks use a fully-supervised or semi-supervised training method, the network model is trained through the ground truth generated by manual annotation, which requires a lot of manpower, material and financial resources,and in large-scale dense crowd images, interference factors such as low resolution, object occlusion, and scale changes make it difficult to label each pedestrian in the crowd.Therefore, determining how to trade off the accuracy of crowd density estimation and dataset labeling cost, and save the dataset labeling cost without losing counting accuracy becomes a challenge.

    The crowd density estimation method mainly obtains the number of crowds by extracting crowd information from the image.Existing crowd density estimation training methods mainly include fully-supervised methods [1]-[25] and semisupervised methods [26]-[39].The fully-supervised method is to obtain the ground truth by manually labeling each pedestrian in the image, and then training the network model through the ground truth, although this method shows high performance in crowd density estimation, it requires significant manpower, material and financial resources to label people in the image; the ground truth for the semi-supervised method is mainly divided into two types, that is, we mark all pedestrians in some images and mark some pedestrians in all images; this method is close to the fully-supervised method in crowd density estimation and shows good robustness, but this method still needs to label crowds in the image, and the training process is very cumbersome.Moreover, the problem faced by both fully-supervised and semi supervised methods is the limitation of the dataset.Then, the method for obtaining the distribution of the crowd changes, such with a change in the shooting perspective or the spatial distribution characteristics of the crowd, the ground truth obtained under the current labeling method, needs to be re-labeled, and the labeled ground truth will not be used to evaluate the counting performance during the test process.This means that the ground truth labeled for each pedestrian is redundant.To reduce the cost of manual labeling, weakly-supervised training methods are proposed, and the main difference between these methods and the fully-supervised and semi-supervised methods are that the weakly-supervised methods do not require any manual annotation of the crowd location information at all, while the fully-supervised and semi-supervised methods require manual annotation of all or part of the crowd location information.In fact, without the demand for locations, the crowd numbers can be obtained in other economical ways.For instance, with an already collected dataset, the crowd numbers can be obtained by gathering the environmental information, e.g., detection of disturbances in spaces, or estimation of the number of moving crowds.Chanet al.[40] segment the scene by crowd motions and estimate the crowd number by calculating the area of the segmented regions.To collect a novel counting dataset, we can employ sensor technology to obtain the crowd number in constrained scenes, such as mobile crowd sensing technology [41].Moreover, Shenget al.[42] propose a GPSless energy-efficient sensing scheduling to acquire the crowd number more economically.On the other hand, several approaches [43]-[46] prove that, with the estimated results,there is no tight bond between the crowd number and the location.The weakly-supervised labeling data in this paper, all of which were obtained from already collected datasets, use only the crowd’s quantity labeling and drop the location labeling information.

    However, although such weakly-supervised methods save the cost of dataset labeling, the ensuing problem is that the network does not know the characteristics of pedestrians at the beginning of the training process, due to the lack of the location information of the crowd as the training label, and the characteristics of pedestrians are learned only after several iterations, which leads to reduced sensitivity of the network to crowd features, and the convergence speed of the network becomes very slow, and the network model’s ability to fit features is substantially reduced, which affects the accuracy of crowd density estimation.Therefore, the weakly-supervised approach of simply removing the crowd location information saves the cost of labeling the dataset, but limits the performance of the network and does not fundamentally solve the problem.

    To solve the above problem, inspired by the optimal iterative learning control methods [47]-[49], reaction–diffusion neural networks [50] and the model latent factor analysis[51]-[54], we reconsider the training approach of the crowd density estimation model and also sample weakly-supervised data labels, i.e., we use only the number of pedestrians in the image as supervised.However, to compensate for the missing crowd location information and to improve the convergence speed of the network and the feature fitting ability, we designed a novel and effective training method, using a parent-child network with the same parameters to learn different features in the crowd, and then using a linear transformation to correct the information location of the features extracted by the parent network using hidden features learned by the child network, to accelerate the network’s ability to adapt to the features.Our training method, significantly improves the convergence speed of the network; the network performance as well as the counting accuracy, is not much different from the fullysupervised method, and, since the parent network has the same parameters as the child network, the increment of the number of parameters of the parent-child network model is very small compared to the number of parameters of the parent network, and the increase in the number of parameters is well within the acceptable range compared to the improved performance of the network.To address the above problems,this paper designs a crowd density estimation method based weakly-supervised learning, which trains the network by correlating between global and local image features to improve the performance of the network model.The main contributions of this paper are as follows:

    1) We design a weakly-supervised crowd density estimation method, which based on using only the number of crowd as supervised information without using location label supervision.It omits the manual labeling work without losing the crowd density estimation performance and greatly saves the cost of network training compared to existing fully-supervised methods.

    2) We design a novel and effective training approach by designing a parent-child network, which uses incremental learning, by the characteristic linear calibration structure to enhance the adaptability of the network to hidden features using transfer factors and offset weights.It improves the performance of weakly-supervised learning methods, and we verify its effectiveness in this task.

    3) We design a loss function that adds the error between the parent network features and child network features (L2) to the ground truth and predicted counting error (LC), and use gradient descent to optimize the features extracted by the parentchild network to accelerate the convergence speed of network training and improve the accuracy of crowd density estimation.

    II.RELATED WORK

    1)Fully-Supervised/Semi-Supervised Crowd Density Estimation Methods: With the development of big data, machine learning, and convolutional neural networks [55]-[61], a large number of convolutional neural network (CNN)-based crowd density estimation methods have been proposed.Basic CNN is first applied to crowd density estimation, such as CNNboosting [1], Wanget al.[2], these networks use basic CNN layers, including convolutional layer, pooling layer, fully connected layer, no additional feature information is required,which are simple and easy to implement, but the crowd estimation accuracy is low.Multi-column CNN is subsequently widely used, such as MCNN [3], MBTTBF [4], Multi-scale-CNN [5], CP-CNN [6], DADNet [7], these networks usually use different columns to capture multi-scale information.However the information captured by different columns is redundant and wastes many training resources.To solve the problem of redundant feature extraction by multi-column CNN, Single-column CNN is applied to crowd density estimation, such as CSRNet [8], SANet [9], SPN [10], CMSM [11],TEDnet [12], and IA-MFFCN [13].These networks usually deploy a single deeper CNN instead of the bloated structure of multi-column network architecture, do not increase the complexity of the network, and have higher training efficiency, so it has received extensive attention.However, with the development of the density map-based method, the background noise in the image seriously affects the display of the detailed information of the crowd distribution, how to filter out the background noise to highlight the crowd location information has become a challenge.

    Therefore, attentional mechanisms have been widely introduced into crowd density estimation tasks, and, attentional mechanisms can supplement the features extracted by the backbone network or the head network by providing the capability to encode distant dependencies or heterogeneous interactions to highlight the head position.ADcrowdNetp designs an attention image generation structure [14], attentional neural field (ANF) uses local and global self-attention to capture long-range dependencies [15], attention guided feature pyramid network (AP-FPN) proposes an attention guided feature pyramid network [16], which adaptively combines high-level and low-level features to generate high-quality density maps with accurate spatial location information, and multi-scale feature pyramid network (MFP-Net) designs a feature pyramid fusion module using different depth and scale convolution kernels [17] where the receptive field of CNN is expanded to improve the training speed of the network, PDANet uses a feature pyramid to extract crowd features of different scales to improve counting accuracy [18], and SPN uses the scale pyramid network to effectively capture multi-scale crowd characteristics [10], and obtain more comprehensive crowd characteristic information.Meanwhile, researchers have attempted to transfer Transformer models in the field of natural language processing to the task of crowd density estimation [19]-[23],[62]-[66].Transformer uses self-attention to capture the global dependency between input and output, where the advantage is that it is not limited by local interactions, can mine long-distance dependencies and can perform parallel calculations, where the most appropriate inductive bias can be learned according to different task objectives, thereby capturing the global context information of the image and modeling the dependencies between global features, which is a good solution to the limited receptive field of CNN, especially in the presence of uneven scales in dense crowds.In 2020, Dosovitskiyet al.[19] proposed the vision transformer (ViT)model, an image classification method based entirely on the self-attention mechanism, which is also the first work of Transformer to replace convolution.In 2021, Sunet al.[24]demonstrated the importance of global contextual information in the task of crowd density estimation.In 2021, TDCrowd combines ViT and density map to estimate the number of people in the crowd [25], which solves the problem of background noise interference in crowd density estimation, and improves the accuracy of crowd density estimation.

    However, the aforementioned CNN or ViT methods require a large number of labels for training, and labeling the crowd density estimation dataset is a laborious task.

    2)Weakly-Supervised Crowd Density Estimation Methods:To reduce the cost of labeling the dataset, some weakly-supervised crowd density estimation methods have been developed.In the weakly-supervised methods, there is no need to label any crowd location information, and image-level count labels are used as the weakly-supervised signal for training.In 2016,Borstelet al.[37] proposed a weakly-supervised density estimation method based on the Gaussian process, using the number of objects as the label to train the network, but this method partitions the image, so that different partitions will repeat the same target, causing the estimated number of targets to be higher than the actual number.In 2019, Maet al.[38] proposed a weakly-supervised density estimation method using Bayesian loss, which performs expectation calculation from the probability density map estimated from the network, and regresses to estimate the number of people in the crowd,which improves the counting efficiency under the weaklysupervised method.In 2019, Samet al.[36], designed an autoencoder to train the network in a weakly-supervised way,updating only a small number of parameters during training,in an attempt to achieve a nearly un-supervised method for crowd density estimation.In 2020, Yanget al.[39], proposed a network based on soft label ranking, which highlights the supervision of crowd size based on the original crowd density estimation network.In 2020, Samet al.[29], by matching statistics of the distribution of labels, proposed a weaklysupervised training method that does not use image-level location labeling information.To ease the overfitting problem, in 2019, Wanget al.[27] explores the generation of synthetic crowd images to reduce the burden of annotation and alleviate overfitting.With the application of ViT in the field of crowd density estimation, in 2021, TransCrowd applied ViT to crowd density estimation for the first time [21], and proposed a weakly-supervised counting method, which greatly improved the accuracy of crowd density estimation in the weakly-supervised mode, but was affected by the simple structure of the model, where extraction of features was limited.

    Compared with previous weakly-supervised methods, we proposed a weakly-supervised method based on linear calibration of parent-child network features, which can effectively reduce labeling cost during training, while maintaining stateof-the-art performance, achieving an optimal trade-off between crowd density estimation accuracy and dataset labeling cost.

    III.PROPOSED METHOD

    A. Overview of the Network Architecture

    To improve the convergence speed of the network under the weakly-supervised training method, we propose a parent-child network (PC-Net).It exploits the correlation between global and local features in images to enhance the network’s ability to fit the features by incrementally learning and continuously linearly correcting the features extracted by the network.The proposed PC-Net structure is shown in Fig.1.The PC-Net achieves a better balance between accuracy and training costs.Specifically, PC-Net is divided into two parts, the Parent network and Child network, which have the same backbone network.We design a pyramid vision transformer as the feature extraction backbone network to extract crowd features at different levels.In the process of network training, the Parent network learns crowd features through global images, while the Child network learns feature transfer factors and feature bias weights from local images.Then, the crowd features learned by the Parent network are corrected by a linear correction structure to obtain a feature map that contains richer and more accurate global contextual information.Meanwhile, during the training of the network, the Parent and Child networks are updated with the learned weights by gradient descent using different losses to improve the accuracy of the crowd density estimation.Finally, a 1 × 1 convolutional layer is used to output the final density map.In the following sections, we describe our framework in detail.

    B. Backbone Network

    In PC-Net, the subject network is divided into two parts,Parent-Net and Child-Net.In order to use incremental learning, linearly correcting the crowd features, Parent-Net and Child-Net have the same network structure.In order to adapt to the problem of scale variability existing in crowd images, a pyramid vision transformer feature extraction backbone network is designed in this paper to extract crowd features at different levels, as shown in Fig.1, while using a multi-scale window to restrict the calculation of the vision transformer’s self-attention mechanism to non-overlapping local regions,which improves computational efficiency.Since the vision transformer can not directly process 2D images, an image preprocessing process is required to convert 2D images into 1D image block sequences before the images are input to pyramid vision transformer.The process of image preprocessing and the structure of the pyramid vision transformer are shown as follows.

    1) Image Partition

    Before the image is input into the pyramid vision transformer, the 2D image is converted into a 1D image block sequence.To improve the computational efficiency, the input image is divided intoN×Nfixed windows, and the image in the window is divided into image blocks of fixed size, and the self-attention calculation is performed in each window, as shown in Fig.2.

    2)Pyramid Vision Transformer

    Fig.2.The process of the image partition.

    Fig.3.The structure of pyramid vision transformer.The feature map of each layer needs to be partitioned first to convert the 2D image into a 1D sequence,and then perform feature reshape on the processed 1D sequence to generate 2D features.

    When extracting multi-scale crowd features, a multi-layer pyramid vision transformer structure is used.Between layers,the scale of the feature map is controlled by a strategy of progressive shrinkage.Simultaneously, the scheme using multiscale windows restricts the self-attention calculation process to non-overlapping local windows, and expands the window layer by layer through cross-window connections, which improves the computational efficiency.The method in this paper designs a three-layer transformer-encoder structure, as shown in Fig.3.

    Specifically, the size of the input image isH×W× 3, the size of the output feature mapFiafter LayeriisHi×Wi×Ci,and the size of the image patch in LayeriisKi×Ki× 3, whereK1= 4,K2= 2,K3= 2, the number of Layeriwindows isNi×Ni, whereN1= 4,N2= 2,N3= 1, each window of Layericontainsimages patch, and linearly project the image patch into a 1D sequence and embed position information,after the transformer-encoder extracts features, visualize feature sequence rearrangement as feature maps, whereCiis less thanCi-1.The transformer-encoder of LayeriincludesLilayers twin multi-head attention mechanism (TMSA) and multilayer perceptron (MLP), whereL1= 2,L2= 6,L3= 2, and each layer is processed by layer normalization (LN) and residual connection.Before TMSA and MLP, LN is used to normalize the feature sequence, which makes the training process more stable and effectively avoids the problem of gradient disappearance or gradient explosion.And residual connection is used after TMSA and MLP, and the features processed by TMSA and MLP are superimposed with the features before processing to avoid the degradation problem of matrix weights in the network.The calculation process is as follows:

    MSA containsmself-attention (SA) modules.In each independent SA, input sequence, calculate the query (Q), key(K) and value (V) of the sequence, where the process is as follows:

    In the formula,WQ,K,Vare learnable matrices, and the outputs ofmself-attention modules are connected in series,which can be expressed as

    MLP contains two linear layers with the Gaussian error linear unit (GELU) activation function.This paper uses the GELU activation function of a standard normal distribution,as shown in (8)

    The first linear layer expands the dimension of the feature sequence from D to 4D, and the second linear layer shrinks the dimension of the feature sequence from 4D to D.

    C. Linear Feature Calibration

    In order to improve the convergence speed and feature fitting ability of the weakly-supervised crowd counting method during training, we propose a linear feature calibration structure.To achieve feature calibration and transfer between Parent-Net and Child-Net, we consider that the feature parameters of Parent-Net and Child-Net belong to the same linear spaceVn(nrepresents the number of channels of features).Each channel feature in the Child-Net can be transferred from the corresponding channel feature in the Parent-Net by a linear transformation.Fig.4 shows how the Child-Net feature’s parameters are transferred from the Parent-Net by a linear calibration.

    Fig.4.The process of the linear feature calibration.

    InFig.4, wedefinethechannelfeaturesinParent-NetasFP∈Rh×w×n(h,w,nrepresentthelength,width,andnumber of channels of the features, respectively), the feature transfer factors as α ∈R1×1×n, and the feature bias weights as β ∈R1×1×n, so the process of linear feature correction can be expressed as

    D. Loss Function

    In order to further strengthen the method proposed in this paper, we make full use of the correlation between the local and global crowd feature information to train the network, and improve the accuracy of crowd density estimation.The comprehensive loss function is designed, which consists ofLCloss function andL2loss function, as shown in (10)

    In the formula,LCis the counting loss of the PC-Net estimated number of people with the ground truth, andL2is the MSE loss between PC-Net predicted density map and parentnet predicted density map, where, theLCcounting loss can be expressed as

    In the formula,Ndenotes the number of images in the training set,FY(Xi,θ) denotes the estimated number of people obtained from theXi(i=1,...,N) images, andθdenotes a set of parameters that can be learned;Yidenotes the true number of people in theXi(i=1,...,N) images.L2loss can be expressed as

    In the formula,Ndenotes the number of images in the training set,Xirepresents theith image of the input,θdenotes a set of parameters that can be learned, andZ(Xi,θ) denotes the prediction result of PC-Net andZP(Xi,θ) denotes the prediction result of Parent-Net.

    E. Crowd Density Map Generation

    The crowd features extracted by PC-Net contains the location information of each pedestrian.We use a focal inverse distance transform (FIDT) to process the features to generate a visualized crowd density map [67].The specific process can be expressed as follows: if there areZpedestrian feature points in an image, the following processing is performed on the feature images:

    In (13),Zdenotes the set of all crowd feature points, and for any feature point (x,y), the Euclidean distanceP(x,y) is calculated with its nearest feature point (x′,y′).Since the distance between feature points varies greatly, it is difficult to perform distance regression directly, so the inverse function is used for regression, as shown in (14), whereIis the processing result of FIDT,Cis an additional constant, usually set to 1, to avoid division by 0 in the calculation process, andP(x,y) is exponentially processed to slow down the decay of the crowd head information, andIis visually displayed to generate a visual crowd density map.Finally, the predicted crowd density values are obtained by 2D integrating and summing the generated density maps.In the experiments,A= 0.02,B= 0.75 were set.

    IV.EXPERIMENTS

    A. Training Process

    In the training phase, one iteration updates parameters for two models.As shown in Fig.1, first, the data are fed into Parent-Net for training, and the global featureFPis optimized using the gradient descent method, as follows:

    In the formula, εdenotes the learning rate of Parent-Net, andLCis the count loss of the Parent-Net estimated number of crowd with Ground Truth.Second, we use the Linear Feature Calibration structure to transferFPchannel-by-channel into Child-Net to obtainFC, the process of transfer, as shown in(9).Since the transfer factorαand the bias weightβused in linear feature calibration need to be learned by Child-Net, we need to feed the local image data into Child-Net and optimizeFCwith the gradient descent method, as follows:

    In the formula, μdenotes the learning rate of Child-Net.Loos is the value of the integrated loss function designed in this paper.In the testing phase, we use the best-performing model on the test set to make an inference.

    B. Training Hyper-Parameter Settings

    During training we use the Adam optimizer, Batch_size is set to 16, the learning rate εin the Parent-Net and μ in the Child-Net are initialized as 0.0001, reduced by 0.5 times after every 50 epochs, where the GELU function is used as an activation function to improve the training speed and effectively avoid the disappearance and explosion of the gradient.We use l2 regularization of 0.0001 to avoid over-fitting.Since the images in the dataset have different resolutions, the resolution of all images is adjusted to 768 × 768.The experimental environment is shown in Table I.

    TABLE I EXPERIMENTAL ENVIRONMENT (TABLE I INTRODUCES THE EXPERIMENTAL ENVIRONMENT PARAMETERS FROM THE ASPECTS OF SYSTEM, FRAME, LANGUAGE, CPU, GPU AND RAM)

    C. Datasets

    In this work, extensive experiments are conducted on five crowd datasets of ShanghaiTech Part A, ShanghaiTech Part B,UCF_CC_50, UCF_QNRF and JHU-CROWD++.Unlike fully-supervised methods, only count-level labels are used as supervision information in the training process.Choose a representative crowd image on each dataset, as shown in Fig.5.The crowd images in each dataset have different degrees of uneven crowd scale variation.

    1)ShanghaiTech[3]: It has 1198 crowd images with a total of 330165 people.The dataset contains two parts, A and B.Part A includes 482 highly crowded crowd images, of which 300 form the training dataset and the remaining 182 form the testing dataset; Part B includes 716 relatively sparse crowd images, of which 400 images form the training dataset, and the remaining 316 images form the testing dataset.

    2)UCF_CC_50[68]: It has 50 crowd images, these images have different resolutions and different viewing angles.The number of pedestrians per crowd image varies from 94 to 4543, with an average of 1280 pedestrians per image.Due to the limited number of images in this dataset and the large span of the number of people in the image, five-fold cross-validation is used in this dataset.

    3)UCF_QNRF[69]: It has 1535 crowd images with a total of 12 500 people, of which 1201 form the training sample set and the remaining 334 form the test sample set.The number of pedestrians per crowd image varies from 49 to 12 865, with an average of 815 pedestrians per image.

    Fig.5.Crowd images from five crowd datasets.(a) From the ShanghaiTech Part A dataset; (b) From the ShanghaiTech Part B dataset; (c) From the UCF_CC_50 dataset; (d) From the UCF_QNRF dataset; (e) JHUCROWD++ dataset.

    4)JHU-CROWD++[70]: It is an unconstrained dataset with 4372 images that are collected under various weatherbased conditions such as rain, snow, etc.and contains 2722 training images, 500 validation images, and 1600 testing images.This dataset contains 1.5 million annotations at both image level and head-level.The total number of people in each image ranges from 0 to 25 791.

    D. Evaluation Metric

    In this paper, we use mean absolute error (MAE), mean squared error (MSE), and mean absolute percentage error(MAPE) as evaluation metrics for PC-Net performance.MAE is the average absolute value of the difference between the target and estimated densities, and it is the averageL1loss between the target and estimated densities.It can highlight outliers in the data, and its value is not affected by the influence of outliers, making it more robust in evaluating algorithm performance.MSE is the average squared value of the difference between the target density and the estimated density, and it is the averageL2loss between the target density and the estimated density, which can penalize larger error values.MSE usually magnifies the effect of squared error to make it easier to distinguish between models with larger error values.MAPE is a measure of the relative error between the estimated and actual values, which makes it easier to compare the variability of algorithms on different datasets, and it uses the percentage error to measure the prediction error, which is more convenient in practice, more intuitive, easy to explain.MAPE can avoid the problem of “mean squared error inflation” that tends to occur in MSE, i.e., when there are outlier values in the dataset, as the impact on MAPE is smaller.In summary, the three metrics MAE, MSE, and MAPE are chosen to evaluate the algorithm in this paper, which can well demonstrate the robustness as well as the accuracy of PC-Net.The calculation is shown as follows:

    Intheformula,Nrepresentsthenumberoftestimages,Cirepresentstheactualnumberofpeopleintheith image,andC?irepresents the estimated number of people in theith image.When the values of MAE, MSE and MAPE are smaller, the error between the estimated number of people and the actual number of people is smaller, indicating that the effect of the experiment is better.

    E. Experiment 1: Comparisons With State-of-the-Art Methods

    The ShanghaiTech dataset is a crowded and multi-scale dataset, to verify the counting performance of PC-Net.Experiments are performed on this dataset and compared with stateof-the-art methods, and the results of MAE, MSE and MAPE are given in Table II.The UCF_CC_50 dataset includes 50 grayscale images, where the images have different resolutions and viewing angles, which is a very challenging dataset with various crowd scenes and a limited total number of images;Therefore, five-fold cross-validation is performed to maximize the use of samples, and the dataset is randomly divided into 5 equal parts.Each part contains 10 images, four of which are used as the training dataset, and the remaining one is used as the testing dataset, where a total of five trainings and testings are performed.Finally the average value of the error index is taken as the final experimental result, and compared with state-of-the-art methods.The results of MAE, MSE and MAPE are given in Table II.UCF_QNRF dataset is also a crowded and multi-scale dataset, which is collected from three different datasets and includes various scenes around the world.The total number of images and the total number of people far exceed the first three datasets, and compared with state-of-the-art methods, the results of MAE, MSE and MAPE are given in Table II.JUU-CROWD++ is a super large dataset, which contains crowd images under various complex weather conditions.Compared with state-of-the-art methods,the results of MAE, MSE and MAPE are given in Table II.

    1)Performance on the ShanghaiTech Dataset: In this paper,PC-Net is compared with state-of-the-art methods, and the results are show shown in Table II, where we divide these methods into two groups.The first group is the fully-supervised methods, which uses location information and population number information as supervised information.The second group is the weakly-supervised methods, which uses only population number information as supervised information.According to Table II, PC-Net is very competitive with the first group.Although MAE, MSE, and MAPE do not achieve the optimal results, they are more advantageous than most of the fully-supervised methods such as GL, LW-Count, etc.,PC-Net largely closes the gap in counting performance between weakly-supervised methods and fully-supervised methods, and its labeling cost is much lower than that of fullysupervised methods.The advantage of PC-Net over the second group is more obvious, as MAE, MSE and MAPE are better than the existing weakly-supervised methods.On Part A,MAE, MSE and MAPE are improved by 11.2%, 14.8% and 12.5%, respectively, and on Part B, MAE, MSE and MAPE are improved by 21.5%, 35.4% and 22.2%, respectively.Thus,it is demonstrated that PC-Net can achieve the best density estimation performance with a weakly-supervised training mode by training with feature linear correction.Figs.6(a) and 6(b) shows some visualization results of PC-Net on Part A and Part B datasets.

    TABLE II COMPARISON OF PC-NET AND THE STATE-OF-THE-ART METHODS ON THE SHANGHAITECH, UCF_CC_50, UCF-QNRF AND JHU_CROWD++ DATASETS.L DENOTES THE TRAINING LABEL CONTAINS LOCATION INFORMATION, AND C DENOTES THE TRAINING LABEL CONTAINS POPULATION NUMBER INFORMATION.RED AND BLUE INDICATE THE FIRST AND THE SECOND-BEST PERFORMANCES, RESPECTIVELY

    It can be seen that PC-Net performs well on two datasets,generating accurately distributed density maps with high resolution, and the prediction results are close to the true values.Comparing Figs.6(a) and 6(b), the ShanghaiTech Part A dataset is extremely crowded and has little change in crowd scale, while the ShanghaiTech Part B dataset is relatively sparse but has large change in crowd scale, which indicates that PC-Net can be a good fit for different degrees of crowd scale changes.The third column of Fig.6, gives the heat map of the Parent-Net output, and we use the red box to mark out the obvious misidentification or omission of identification.It can be seen that extracting the crowd features using only Parent-Net can easily produce misidentification of crowd features.The process of crowd feature correction and transfer, on the other hand, corrects the location information of the crowd well, which further compensates for the lack of crowd location information under the weakly-supervised crowd counting method and further improves the accuracy of the crowd counting.

    2)Performance on the UCF_CC_50 Dataset: According to Table II, under the weakly-supervised training, compared to the second group, PC-Net outperforms other weakly-supervised methods on the UCF_CC_50 dataset, with MAE, MSE and MAPE improving by 38.8%, 43.7% and 46.3%, respectively, which proves the superiority of PC-Net.However,compared with the first group, PC-Net has obvious shortcomings, probably because the data in this dataset is limited and the number of people included in the images spans a relatively large range.The prediction results are not stable enough, and there are a small number of images with large errors, which leads to a decrease in the performance of the method.Fig.7 shows some visualization results of PC-Net on UCF_CC_50 dataset.

    Fig.6.Visualization results of the density maps on (a) ShanghaiTech Part A and (b) ShanghaiTech Part B, respectively.

    In the second column of Fig.7, the crowd density map generated by PC-Net is given, it can be seen that PC-Net can make good predictions and generate accurate density maps in crowded scenarios with variable scales, and the generated density maps have different sparsity for different scales of crowds, but the estimated values have some errors relative to the real values, such as the first set of images, which are a small number of images with large errors in the test of this paper.It is possible that the low brightness of the image is affecting the counting performance of the network.To further evaluate the visualized crowd density images, we manually label several samples containing crowd locations and perform a visual display of crowd locations, as shown in the third column of Fig.7.A new set of evaluation metrics, structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), were also used to evaluate the generated crowd density maps with labeled density maps, which compensate for the shortcomings of the one-dimensional evaluation metrics such as MAE and MSE.The experimental results show that PC-Net can fit the location information of the crowd well, and although there are some location errors, they are within the acceptable range.To summarize, PC-Net’s counting performance is slightly insufficient in the face of extremely crowded crowds, so more data is needed for training to improve the accuracy of the model on extremely crowded datasets.

    Fig.7.Visualization results of the density maps on UCF_CC_50.

    3)Performance on the UCF_QNRF Dataset: According to Table II, compared with the second group of methods, in the weakly-supervised mode, the MAE, MSE and MAPE of PCNet improved by 12.8%, 11.6% and 13.4%, respectively,which indicates a significant improvement in the prediction effect.PC-Net achieved optimal counting accuracy on this dataset and showed excellent robustness.Compared with the first group of methods, PC-Net also outperforms some of the fully-supervised training methods, such as L2R and TEDnet,etc., further narrowing the gap in counting performance between weakly-supervised training methods and fully-supervised training methods, and comparing some of the most advanced crowd density estimation methods, PC-Net greatly reduces the injection cost of the dataset label, although its performance is slightly worse.Fig.8 shows some visualization results of PC-Net on the UCF_QNRF dataset.

    Fig.8.Visualization results of the density maps on UCF_QNRF.

    It can be seen that PC-Net has a good ability to fit the crowd of different scales in the first image of Fig.8, and generates an accurate and high resolution density map, which reflects that PC-Net has a good ability to solve the problem of drastic changes in the scale of the crowd.PC-Net also generates an accurate density map for the denser crowd in the second image, but there is a certain error in the estimated value relative to the real value, which is a small number of images with large errors in the test of this paper’s method, probably because the difference in lighting interferes with the counting accuracy, and more training is needed in the next step to improve the robustness of the model and exclude large errors.

    4)Performance on the JHU-CROWD++Dataset: According to Table II, PC-Net has a great advantage over both the first and second group of methods, and it is superior to the weakly-supervised methods, such as the advanced method TransCrowd.In addition, compared with fully-supervised methods, such as MCNN and CSRNet, the counting accuracy of PC-Net has been significantly improved on this dataset, and MAE, MSE and MAPE all achieved the second best performance, which proves the effectiveness of our method.Fig.9 shows some visualization results of PC-Net on JHUCROWD++ dataset, including the plots of crowd density in rainy and snowy days.It can be seen that PC-Net can better process the crowd images under the deteriorating weather conditions.

    F. Experiment 2: Actual Experiment

    In order to test the performance of PC-Net in practical applications, we conducted experiments in several real scenarios.To ensure the applicability and universality of the experiments, images taken by cameras on campuses, subway stations and city roads were randomly selected as test set, The test set contains a total of 400 images with more than 10 scenes, each containing a number of people ranging from 0 to 2000, all with a resolution of 768 × 768, and these data generally have uneven scales, background noise and other common factors that affect the accuracy of crowd density estimation.We conducted multiple groups of experiments and took the average value as the result of the test, and the experimental results are shown in Table III.Fig.10 shows some visualization results of actual experiment.

    Fig.9.Visualization results of the density maps on JHU-CROWD++.

    TABLE III COMPARISON OF PC-NET AND THE OTHER METHODS ON THE RANDOM DATASET

    Fig.10.Visualization results of the density maps of the actual experiment.

    It can be seen that the test results of PC-Net on the unfamiliar dataset still outperformed the compared algorithms, and the MAE, MSE, and MAPE all obtained the optimal results.Here we randomly selected the visualization results of four scenes, and we can see that PC-Net has some adaptability to scenes that have never been seen before, and can also generate accurate and high-resolution crowd density maps, and the predicted crowd density is within an acceptable error range compared with the real crowd density.However, the results of the multi-scene test reveal the problem where the migration of PC-Net for multiple scenes is slightly insufficient, such as the third and fourth group of images.The error of the crowd density in this scene is obviously slightly larger, and the main problem is the poor adaptability of PC-Net for this scene.Therefore, PC-Net needs to increase the training sample and test in multiple scenes to adjust its model parameters and also increase its adaptability to multiple scenes.

    V.DISCUSSION

    A. Study of Training Hyper-Parameter Settings

    In the training process of the network, the selection of the initial training hyper-parameter is crucial to the success of the network training.Setting good parameters can help avoid gradient disappearance or gradient explosion during the network training process, and make the neural network learn the features of the data more quickly and accurately, and improve the training effect and generalization ability of the model.In order to determine the optimal initialization parameters, we discussed the effects of different Batch_size, learning rate, activation function and optimizer on the performance of PC-Net in ShanghaiTech Part A.The experimental results are shown in Fig.11.

    Fig.11.Visualization results of the study of different initialization hyperparameter settings.(a) Denotes the MAE values for different Batch_size; (b)Denotes the MAE values for different Learning rate; (c) Denotes the MAE values for different Activation function; (d) Denotes the MAE values for different Optimizer.

    It can be seen that PC-Net is more sensitive to Batch_size and learning rate during the training process.As the Batch_size increases, the parallel performance of GPU is fully utilized, thus speeding up the training of the model.A larger Batch_size requires more memory storage, and a larger Batch_size may lead to overfitting because the model is more likely to memorize a larger Batch_size and thus fail to learn the overall features of the input data; therefore, on balance, we set the Batch_size to 16.Due to the complexity of the crowd density estimation task and the depth of PC-Net layers, we consider setting a smaller initial learning rate in order to avoid unstable or scattered training.The experimental results show that optimal model performance is achieved when the initial learning rate is set to 0.0001.For the activation function and the optimizer, the experimental results show that PC-Net is less sensitive to them.We compared five activation functions(GELU, Sigmoid, ReLU, Tanh, Softmax) and three optimizers (SGD, Adam, Momentum).The experimental results show that PC-Net achieves optimal results when GELU is used as the activation function and Adam is used as the optimizer.In summary, we set the Batch_size to 16, set the initial learning rate to 0.0001, and use GELU as the activation function and Adam optimizer at the beginning of the training.

    B. Study of Backbone Network

    With CNN-based deep learning, because CNNs have a small receptive field, this limits the upper limit of the global feature extraction range of the network.Therefore, CNNbased methods are more capable of extracting local crowd information in small intervals, but are not enough for global crowd information extraction of the whole image, which makes it difficult for CNN-based methods to establish global context features.However, ViT has the advantage of capturing long context dependencies and a global receptive domain,which is a good remedy for this deficiency of CNN.We calculated effective receptive fields for both VGG network and ViT.Specifically, we measure the effective receptive field of different layers as the absolute value of the gradient of the center location of the feature map with respect to the input.Results are averaged across all channels in each map for 16 randomly selected images, with results in Fig.12.

    Fig.12.Visualization results of the effective receptive fields for VGG and ViT.

    We observe that lower layer effective receptive fields for ViT are indeed larger than in VGG, and while VGG effective receptive fields grow gradually, ViT receptive fields become much more global midway through the network.ViT receptive fields also show strong dependence on their center patch due to their strong residual connections.Overall, VGG effective receptive fields are highly local and grow gradually, ViT effective receptive fields shift from local to global.To further verify the superiority of the performance of pyramid vision transformer, we conducted an ablation study using the first 10 layers of VGG16 replacing pyramid vision transformer as the backbone network of PC-Net, keeping the other structures the same; the results are shown in Table IV.

    As can be seen, the performance of the pyramid vision transformer is significantly better than that of VGG.On the Part A dataset, MAE, MSE and MAPE are improved by 5.6%,2.2% and 6.3%, respectively.On the Part B dataset, MAE,MSE and MAPE are improved by 23.2%, 27.3% and 24.1%,respectively.On UCF_CC_50 dataset, MAE, MSE and MAPE are improved by 10.6%, 9.8% and 14.9%, respectively.On UCF_QNRF dataset, MAE, MSE and MAPE are improved by 13.8%, 11.0% and 15.3%, respectively.On the JHUCROWD++ dataset MAE, MSE and MAPE are improved by 23.7%, 26.4% and 27.6%, respectively.This is further proof of the superiority of PC-Net’s performance.

    TABLE IV RESULTS OF BACKBONE NETWORK ABLATION STUDY

    TABLE V RESULTS OF PYRAMID VISION TRANSFORMER ABLATION STUDY

    C. Study of Pyramid Vision Transformer

    The pyramid vision transformer structure proposed in this paper consists of three layers of ViT; to verify its rationality,ablation experiments were conducted on five datasets, keeping the other structures the same in the experiments to test the performance of the pyramid vision transformer under different configurations.The results are shown in Table V, where L* represents the number of layers of ViT in pyramid vision transformer.

    As can be seen, the performance of PC-Net improves as the first three layers of ViT are stacked in the pyramid vision transformer, but when the ViT is stacked to the 4th layer, the performance of the model is almost the same as the 3-layer ViT, and even some metrics appear to decrease; when the number of layers of ViT continues to increase to the 5th layer,the performance of the model starts to decrease rapidly.We believe that as the depth of the network increases, the gradients in the backpropagation may become very small, leading to the gradient vanishing problem, or the gradients become very large, leading to the gradient exploding problem.These problems can make the training process difficult and make convergence impossible.Moreover, as the depth of the network increases, the number of parameters in the network increases exponentially, which can over-fit the network and make it unable to generalize to new datasets, reducing the generalization ability of the network.Therefore, we take the above factors into consideration and set the number of layers of ViT in pyramid vision transformer as 3.

    D. Study of Linear Feature Calibration

    In this paper, we propose a new training method using a linear feature calibration to train the network through incremental learning, which utilizes the correlation between global and local image features.To verify its effectiveness, we tested the convergence speed of the network under different supervision methods on the ShanghaiTech dataset, and the results are shown in Fig.13.

    Fig.13.Convergence speed of networks under different supervision methods.The abscissa is the training Epochs, and the ordinate is the loss value during training.The three training methods sample the same backbone network,which is the backbone network proposed in this paper.

    Here, the “weakly-supervised” training method means that instead of using the linear feature calibration structure proposed in this paper, a channel attention fusion approach is used, and the features extracted from the Parent-Net and Child-Net are weighted and fused.It can be seen that the convergence speed and the fitting ability of our proposed the training method are clearly better than those of the “weaklysupervised” training method.However, it can be seen that,compared with the fully-supervised method, the convergence stability of PC-Net is poor during the training process.The reason is that during the training process, there is uncertainty in the sample labels, which increases the learning difficulty of the model, and the model may be affected by noise and learn the wrong features, resulting in overfitting or under fitting,causing the model to converge unstably.We believe that we can try to use a nonlinear feature correction process to increase the stability of the training process.

    E. Study of Loss function

    The loss function is very important in the training process of the network, and using different loss functions will have a great impact on the performance of the model regression,therefore, a comprehensive loss function is designed and the weight ofL2andLCis adjusted by loss weights.To obtain the optimal loss function, experiments are conducted on the ShanghaiTech and UCF_QNRF datasets, and the values of?is discussed.The results are shown in Fig.14.

    As can be seen, different weight loss functions have an impact on the performance of the network model, and MAE and MSE change with ?, showing a trend of decreasing and then increasing, which proves the rationality of the two-part loss function.The optimal values of MAE and MSE were obtained at ?=0.6, which proved the improvement of the comprehensive loss function on the network performance.

    F. Study of Network Parameters

    To analyze the parameter complexity and time complexity of PC-Net, we compared MAE, Params, and inference time on the ShanghaiTech dataset, and the experimental results are shown in Table VI.

    As can be seen, the advantage of PC-Net is that it uses a weakly-supervised training method, which reduces the training cost; the MAE as well as the density estimation performance is good, however, the number of parameters is slightly larger and the required inference time is longer.Therefore, the performance of PC-Net suffers and the density estimation accuracy decreases when applied to devices with limited computational resources, such as embedded devices.Therefore, in future work, we consider studying a lightweight method based on PC-Net to analyze the parameter bottleneck layer in PCNet, find the part of the network that consumes the most time and computational resources, and compress it to improve the training and application of the network.

    VI.CONCLUSION AND FUTURE WORK

    Fig.14.MAE and MSE in ShanghaiTech Part A and UCF_QNRF datasets under different counting loss weights.

    TABLE VI VI COMPARISON OF THE PARAMS, MAE AND RUNNING TIME OF PC-NET AND OTHER METHODS ON THE SHANGHAITECH DATASET

    In this paper, an effective weakly-supervised crowd density estimation method is proposed and a novel training method is used to achieve an optimal balance between training costs and counting performance.The network mainly consists of a pair of parent-child networks and a linear feature calibration structure.Specifically, the parent network is used to extract the crowd features, the child network is used to extract the feature correction parameters and bias weights, and the features are calibrated using the linear feature calibration structure to improve the convergence speed as well as the fitting ability of the network.In addition, a pyramid vision transformer is used as the backbone network of the PC-Net to solve the problem of uneven scale in the crowd, while the spatial correlation and crowd sensitivity of density map are enhanced by global-local feature loss and counting loss.

    In future work, we will study a crowd counting and positioning method based on PC-Net, which can not only achieve a better personal positioning function and counting accuracy,but also the number of parameters is smaller and more stable during training.

    日韩精品免费视频一区二区三区| 国产成人精品久久二区二区免费| 窝窝影院91人妻| 久久国产精品影院| 男人舔女人下体高潮全视频| 香蕉丝袜av| 无人区码免费观看不卡| 亚洲午夜理论影院| 最好的美女福利视频网| 少妇裸体淫交视频免费看高清 | 高潮久久久久久久久久久不卡| 色综合欧美亚洲国产小说| 亚洲男人的天堂狠狠| 中文字幕精品免费在线观看视频| 嫩草影视91久久| 免费高清视频大片| 51午夜福利影视在线观看| 在线观看日韩欧美| 国产激情欧美一区二区| 久久婷婷成人综合色麻豆| 国产私拍福利视频在线观看| 电影成人av| 女人被狂操c到高潮| 精品久久久久久久人妻蜜臀av | 欧美在线一区亚洲| 国产精品永久免费网站| 日本黄色视频三级网站网址| 亚洲av美国av| 午夜精品久久久久久毛片777| 亚洲情色 制服丝袜| 国产精品免费视频内射| 在线免费观看的www视频| 在线观看午夜福利视频| 欧美中文综合在线视频| 老熟妇乱子伦视频在线观看| 久久狼人影院| 日韩高清综合在线| 久久热在线av| 妹子高潮喷水视频| 在线免费观看的www视频| 十八禁人妻一区二区| 成人18禁在线播放| 亚洲欧美激情在线| 一进一出好大好爽视频| 国产高清激情床上av| 99在线人妻在线中文字幕| 青草久久国产| 午夜久久久久精精品| 久热爱精品视频在线9| 欧美黑人精品巨大| 这个男人来自地球电影免费观看| 日本在线视频免费播放| 亚洲五月色婷婷综合| 午夜免费成人在线视频| 亚洲aⅴ乱码一区二区在线播放 | 久久久久九九精品影院| 久久热在线av| 韩国精品一区二区三区| 久久人妻熟女aⅴ| 国产单亲对白刺激| 日韩国内少妇激情av| 亚洲国产精品999在线| 久久人妻福利社区极品人妻图片| 熟女少妇亚洲综合色aaa.| www日本在线高清视频| 亚洲国产精品sss在线观看| 俄罗斯特黄特色一大片| 久久草成人影院| 欧美精品啪啪一区二区三区| 欧美午夜高清在线| 亚洲人成电影免费在线| 中文字幕高清在线视频| av中文乱码字幕在线| 女生性感内裤真人,穿戴方法视频| 在线观看日韩欧美| 超碰成人久久| 男人舔女人的私密视频| 国产精品精品国产色婷婷| 国产区一区二久久| 欧美绝顶高潮抽搐喷水| 国产av在哪里看| 99国产综合亚洲精品| 在线av久久热| 免费在线观看视频国产中文字幕亚洲| 精品人妻1区二区| 欧美日韩亚洲综合一区二区三区_| 狂野欧美激情性xxxx| 欧美日韩瑟瑟在线播放| 国产高清激情床上av| 视频在线观看一区二区三区| 一级毛片精品| 国产野战对白在线观看| 中亚洲国语对白在线视频| 日韩欧美国产在线观看| 999久久久精品免费观看国产| 长腿黑丝高跟| 成人国语在线视频| 国产亚洲精品一区二区www| 亚洲国产精品999在线| 欧美在线一区亚洲| 精品国内亚洲2022精品成人| 此物有八面人人有两片| 欧美激情高清一区二区三区| 亚洲精品国产一区二区精华液| 一级黄色大片毛片| АⅤ资源中文在线天堂| 十分钟在线观看高清视频www| 亚洲最大成人中文| 久久天堂一区二区三区四区| 久99久视频精品免费| 亚洲av片天天在线观看| 日韩欧美一区二区三区在线观看| 99香蕉大伊视频| 午夜日韩欧美国产| 女人高潮潮喷娇喘18禁视频| 女性生殖器流出的白浆| 亚洲五月婷婷丁香| 国产av一区在线观看免费| 超碰成人久久| 日本五十路高清| av有码第一页| 国产精品美女特级片免费视频播放器 | 亚洲精品国产一区二区精华液| 97人妻天天添夜夜摸| 欧美老熟妇乱子伦牲交| 亚洲成a人片在线一区二区| 欧美性长视频在线观看| 亚洲激情在线av| 麻豆国产av国片精品| 精品高清国产在线一区| 午夜福利欧美成人| 别揉我奶头~嗯~啊~动态视频| 精品国产国语对白av| 99re在线观看精品视频| a在线观看视频网站| 一级毛片女人18水好多| 天天躁狠狠躁夜夜躁狠狠躁| 黄片大片在线免费观看| 波多野结衣巨乳人妻| 亚洲国产毛片av蜜桃av| 午夜福利在线观看吧| 50天的宝宝边吃奶边哭怎么回事| 国产亚洲精品第一综合不卡| 母亲3免费完整高清在线观看| 制服人妻中文乱码| 午夜日韩欧美国产| 91麻豆精品激情在线观看国产| 好男人在线观看高清免费视频 | 最好的美女福利视频网| 桃红色精品国产亚洲av| 国产精品亚洲美女久久久| 欧美黑人精品巨大| 日韩一卡2卡3卡4卡2021年| 欧美精品啪啪一区二区三区| 中文字幕人妻丝袜一区二区| 欧美中文日本在线观看视频| 久久精品91蜜桃| 午夜福利免费观看在线| 国产高清videossex| 少妇粗大呻吟视频| 久久精品影院6| e午夜精品久久久久久久| 亚洲午夜精品一区,二区,三区| 国产精品,欧美在线| bbb黄色大片| 久久精品亚洲熟妇少妇任你| 久久久水蜜桃国产精品网| 久久久国产成人精品二区| 国产精品98久久久久久宅男小说| 变态另类丝袜制服| 免费久久久久久久精品成人欧美视频| 亚洲 国产 在线| 最近最新中文字幕大全免费视频| 高潮久久久久久久久久久不卡| 国产三级在线视频| 久久久久九九精品影院| 天天一区二区日本电影三级 | 在线国产一区二区在线| 久久精品国产清高在天天线| 老司机午夜十八禁免费视频| 亚洲人成网站在线播放欧美日韩| 免费一级毛片在线播放高清视频 | 久久热在线av| 国产真人三级小视频在线观看| 亚洲成av人片免费观看| 亚洲一区中文字幕在线| 90打野战视频偷拍视频| 久久国产精品人妻蜜桃| 免费少妇av软件| 青草久久国产| 曰老女人黄片| 国产精品精品国产色婷婷| 伦理电影免费视频| 国产野战对白在线观看| 99香蕉大伊视频| 又黄又爽又免费观看的视频| 淫秽高清视频在线观看| 黑人欧美特级aaaaaa片| 中文字幕人成人乱码亚洲影| 欧美乱妇无乱码| 美国免费a级毛片| 久久中文看片网| 中文字幕人妻熟女乱码| 在线十欧美十亚洲十日本专区| 欧美黄色片欧美黄色片| 国产成人欧美在线观看| 成人特级黄色片久久久久久久| 久久久精品欧美日韩精品| e午夜精品久久久久久久| 国产亚洲av高清不卡| 天天添夜夜摸| 成人欧美大片| 免费av毛片视频| 精品一区二区三区四区五区乱码| 妹子高潮喷水视频| 高清黄色对白视频在线免费看| 精品国内亚洲2022精品成人| 午夜a级毛片| 欧美日韩一级在线毛片| 欧美日韩瑟瑟在线播放| 天堂动漫精品| 亚洲欧美精品综合久久99| 国产一级毛片七仙女欲春2 | 天堂影院成人在线观看| 欧美在线黄色| 亚洲人成电影免费在线| 精品一区二区三区视频在线观看免费| 亚洲av熟女| 日日摸夜夜添夜夜添小说| 久久亚洲精品不卡| 在线av久久热| 国产精品影院久久| av福利片在线| 999久久久精品免费观看国产| 一夜夜www| 亚洲欧美日韩高清在线视频| 巨乳人妻的诱惑在线观看| 麻豆国产av国片精品| 欧美激情久久久久久爽电影 | 韩国av一区二区三区四区| 免费av毛片视频| 十分钟在线观看高清视频www| 亚洲精品中文字幕在线视频| 一边摸一边抽搐一进一小说| 少妇熟女aⅴ在线视频| 欧美日本中文国产一区发布| 欧美国产日韩亚洲一区| 精品久久久久久,| 国产欧美日韩一区二区精品| 一边摸一边做爽爽视频免费| 亚洲一码二码三码区别大吗| 国产精品亚洲美女久久久| 长腿黑丝高跟| 黄片播放在线免费| 他把我摸到了高潮在线观看| 国产精品久久电影中文字幕| 一级a爱片免费观看的视频| 禁无遮挡网站| 少妇 在线观看| 黄色视频,在线免费观看| 黄色a级毛片大全视频| 在线观看一区二区三区| 少妇的丰满在线观看| 18禁观看日本| 波多野结衣一区麻豆| av免费在线观看网站| 黄色视频,在线免费观看| 50天的宝宝边吃奶边哭怎么回事| 午夜免费观看网址| 色综合欧美亚洲国产小说| 波多野结衣高清无吗| 丝袜人妻中文字幕| 国产精品1区2区在线观看.| 丝袜美腿诱惑在线| 国产精品一区二区免费欧美| 男女午夜视频在线观看| 18美女黄网站色大片免费观看| 看片在线看免费视频| 身体一侧抽搐| cao死你这个sao货| 久久青草综合色| 国产在线观看jvid| 禁无遮挡网站| 91九色精品人成在线观看| 在线观看免费视频网站a站| 在线观看免费日韩欧美大片| 丝袜美足系列| 他把我摸到了高潮在线观看| 午夜老司机福利片| 一边摸一边抽搐一进一出视频| 国产精品一区二区三区四区久久 | 日日干狠狠操夜夜爽| 亚洲国产精品久久男人天堂| 妹子高潮喷水视频| 亚洲熟妇熟女久久| 日韩一卡2卡3卡4卡2021年| 又黄又粗又硬又大视频| 国产蜜桃级精品一区二区三区| 午夜福利视频1000在线观看 | 亚洲熟妇熟女久久| 欧美丝袜亚洲另类 | 午夜久久久在线观看| 亚洲 欧美 日韩 在线 免费| 欧美激情极品国产一区二区三区| 性欧美人与动物交配| 国语自产精品视频在线第100页| 久久中文看片网| 国产欧美日韩一区二区精品| 香蕉久久夜色| 免费av毛片视频| 很黄的视频免费| 一边摸一边抽搐一进一出视频| www.999成人在线观看| 电影成人av| 亚洲国产日韩欧美精品在线观看 | 国产国语露脸激情在线看| 亚洲第一电影网av| 国产免费av片在线观看野外av| 亚洲欧美精品综合久久99| 91成人精品电影| 热99re8久久精品国产| 日韩中文字幕欧美一区二区| 中文字幕另类日韩欧美亚洲嫩草| 黄片播放在线免费| 国产真人三级小视频在线观看| 欧美黑人精品巨大| 一进一出抽搐gif免费好疼| 一本久久中文字幕| 午夜精品在线福利| 一级毛片女人18水好多| 老熟妇仑乱视频hdxx| 91av网站免费观看| 国产精品久久电影中文字幕| 99久久久亚洲精品蜜臀av| 黄频高清免费视频| 亚洲情色 制服丝袜| 国产av在哪里看| 午夜福利在线观看吧| 国产成人免费无遮挡视频| 男女午夜视频在线观看| 亚洲精华国产精华精| 高潮久久久久久久久久久不卡| 日韩成人在线观看一区二区三区| 亚洲精品久久国产高清桃花| 精品国内亚洲2022精品成人| 国产又色又爽无遮挡免费看| 麻豆国产av国片精品| 国产免费男女视频| 国产精品二区激情视频| 亚洲一区高清亚洲精品| 亚洲人成网站在线播放欧美日韩| 91成年电影在线观看| 午夜成年电影在线免费观看| 久久人人97超碰香蕉20202| 一级片免费观看大全| 久久 成人 亚洲| 中文字幕人成人乱码亚洲影| 夜夜爽天天搞| 欧美大码av| 成人国产一区最新在线观看| 黑丝袜美女国产一区| 久久精品国产99精品国产亚洲性色 | 国产成+人综合+亚洲专区| av欧美777| 国产区一区二久久| 精品电影一区二区在线| 国产亚洲欧美在线一区二区| 国产精品久久久久久人妻精品电影| 国产aⅴ精品一区二区三区波| 身体一侧抽搐| АⅤ资源中文在线天堂| 操美女的视频在线观看| 精品久久久精品久久久| 欧美av亚洲av综合av国产av| 99香蕉大伊视频| 精品国产国语对白av| www.999成人在线观看| 亚洲精品国产一区二区精华液| 久久午夜综合久久蜜桃| 国产又爽黄色视频| 色精品久久人妻99蜜桃| 精品久久久久久久久久免费视频| 精品人妻1区二区| 黄片大片在线免费观看| 丝袜美足系列| 欧美黑人欧美精品刺激| 热99re8久久精品国产| 精品久久久精品久久久| 国产三级在线视频| 国产精品精品国产色婷婷| 久久青草综合色| 97碰自拍视频| 欧美激情高清一区二区三区| 亚洲 欧美一区二区三区| 色综合亚洲欧美另类图片| 黄色片一级片一级黄色片| 亚洲色图av天堂| 岛国视频午夜一区免费看| 亚洲一区高清亚洲精品| 日韩免费av在线播放| 免费人成视频x8x8入口观看| 黑人欧美特级aaaaaa片| 国产在线观看jvid| 国产成人一区二区三区免费视频网站| 亚洲精品中文字幕在线视频| 一级,二级,三级黄色视频| 日韩国内少妇激情av| 91av网站免费观看| 欧美最黄视频在线播放免费| 久久久国产欧美日韩av| 操出白浆在线播放| 少妇粗大呻吟视频| 最近最新中文字幕大全免费视频| 亚洲av成人一区二区三| 亚洲精品粉嫩美女一区| 亚洲欧洲精品一区二区精品久久久| 日本五十路高清| 亚洲av电影在线进入| xxx96com| 丰满的人妻完整版| 国产亚洲欧美98| 成人av一区二区三区在线看| av超薄肉色丝袜交足视频| 国产熟女xx| 亚洲,欧美精品.| 婷婷丁香在线五月| 色在线成人网| 精品国产乱码久久久久久男人| 麻豆一二三区av精品| 国产精品一区二区三区四区久久 | 一本久久中文字幕| 国语自产精品视频在线第100页| 久久人妻福利社区极品人妻图片| 可以在线观看毛片的网站| 亚洲,欧美精品.| 欧美国产日韩亚洲一区| 欧美日本亚洲视频在线播放| 国产av又大| 色哟哟哟哟哟哟| 久久九九热精品免费| 精品久久久久久久人妻蜜臀av | 午夜福利18| 国产精品九九99| 中文字幕人妻丝袜一区二区| 欧美一级a爱片免费观看看 | 高潮久久久久久久久久久不卡| 宅男免费午夜| www国产在线视频色| 精品国产乱子伦一区二区三区| 免费在线观看完整版高清| av天堂久久9| 在线观看免费视频网站a站| 一卡2卡三卡四卡精品乱码亚洲| 免费少妇av软件| 老司机午夜十八禁免费视频| 人人妻人人澡欧美一区二区 | 久久婷婷成人综合色麻豆| 性少妇av在线| 国产不卡一卡二| 免费在线观看亚洲国产| 精品国产超薄肉色丝袜足j| 久久久久久久久久久久大奶| 欧美乱码精品一区二区三区| 视频区欧美日本亚洲| 亚洲 欧美一区二区三区| 99精品欧美一区二区三区四区| 黄色丝袜av网址大全| 日韩欧美一区视频在线观看| 色av中文字幕| 国产又爽黄色视频| 中出人妻视频一区二区| 嫩草影视91久久| 国产国语露脸激情在线看| 悠悠久久av| 久久午夜亚洲精品久久| 黄网站色视频无遮挡免费观看| 午夜福利在线观看吧| 老熟妇乱子伦视频在线观看| 欧美在线一区亚洲| 色老头精品视频在线观看| а√天堂www在线а√下载| 一本大道久久a久久精品| 淫秽高清视频在线观看| 99国产精品99久久久久| 国产一区二区激情短视频| 亚洲色图综合在线观看| 欧美激情极品国产一区二区三区| 免费观看精品视频网站| 亚洲男人天堂网一区| 天堂影院成人在线观看| 男女做爰动态图高潮gif福利片 | 国产av一区在线观看免费| 国产高清激情床上av| svipshipincom国产片| 在线永久观看黄色视频| 真人做人爱边吃奶动态| 99国产极品粉嫩在线观看| 亚洲国产中文字幕在线视频| 欧美黑人欧美精品刺激| 欧美乱色亚洲激情| 午夜福利视频1000在线观看 | 国产精品永久免费网站| 亚洲av日韩精品久久久久久密| 岛国在线观看网站| 欧美色视频一区免费| 亚洲国产毛片av蜜桃av| 黄色a级毛片大全视频| 精品国产乱码久久久久久男人| 国产精品亚洲一级av第二区| 国内精品久久久久久久电影| 国产高清激情床上av| 免费一级毛片在线播放高清视频 | 亚洲成人久久性| 老司机午夜福利在线观看视频| 亚洲,欧美精品.| 搡老熟女国产l中国老女人| 一边摸一边抽搐一进一小说| 好男人电影高清在线观看| 亚洲精品国产一区二区精华液| 国产欧美日韩精品亚洲av| 欧美绝顶高潮抽搐喷水| 真人一进一出gif抽搐免费| av天堂在线播放| 亚洲欧洲精品一区二区精品久久久| 久久精品人人爽人人爽视色| 欧美国产精品va在线观看不卡| 亚洲av第一区精品v没综合| 日韩精品中文字幕看吧| 午夜精品久久久久久毛片777| 日本五十路高清| 久久午夜亚洲精品久久| 777久久人妻少妇嫩草av网站| 久久久久久久久久久久大奶| 国产熟女xx| 亚洲成人国产一区在线观看| 两个人看的免费小视频| 亚洲av成人av| 黄色毛片三级朝国网站| 嫩草影院精品99| 亚洲va日本ⅴa欧美va伊人久久| cao死你这个sao货| 国产精品亚洲一级av第二区| 欧美乱妇无乱码| 99久久久亚洲精品蜜臀av| 十八禁人妻一区二区| 99久久久亚洲精品蜜臀av| 免费不卡黄色视频| 亚洲精品一区av在线观看| 波多野结衣一区麻豆| 大型黄色视频在线免费观看| 国内精品久久久久久久电影| 涩涩av久久男人的天堂| 亚洲第一欧美日韩一区二区三区| 精品国产一区二区三区四区第35| 成人国语在线视频| av在线播放免费不卡| 国产成人啪精品午夜网站| 国产私拍福利视频在线观看| 99精品在免费线老司机午夜| 国产精品自产拍在线观看55亚洲| 美国免费a级毛片| 女生性感内裤真人,穿戴方法视频| 如日韩欧美国产精品一区二区三区| 看片在线看免费视频| 日本精品一区二区三区蜜桃| 国产99久久九九免费精品| 国产精品免费一区二区三区在线| 国产伦人伦偷精品视频| 夜夜躁狠狠躁天天躁| 国产精品爽爽va在线观看网站 | 啦啦啦观看免费观看视频高清 | 大码成人一级视频| 黄色女人牲交| 久久性视频一级片| av中文乱码字幕在线| 国产成人啪精品午夜网站| 国产精品一区二区三区四区久久 | 在线播放国产精品三级| 男女床上黄色一级片免费看| 久久热在线av| 久久久久久大精品| www日本在线高清视频| 日韩高清综合在线| 亚洲精品在线美女| 国产精品美女特级片免费视频播放器 | 黑人操中国人逼视频| 久久中文字幕人妻熟女| 成人国产综合亚洲| 成人手机av| av天堂在线播放| 法律面前人人平等表现在哪些方面| 99热只有精品国产| 色播在线永久视频| 免费人成视频x8x8入口观看| 精品一区二区三区视频在线观看免费| 99精品久久久久人妻精品| 色在线成人网| 亚洲精品国产色婷婷电影| 亚洲精品粉嫩美女一区| 熟妇人妻久久中文字幕3abv| 中国美女看黄片| 大型黄色视频在线免费观看| 91麻豆av在线| 人人妻人人爽人人添夜夜欢视频| 免费一级毛片在线播放高清视频 | 欧美日韩瑟瑟在线播放| 桃红色精品国产亚洲av| 久久欧美精品欧美久久欧美| 精品午夜福利视频在线观看一区| 欧美黄色片欧美黄色片| netflix在线观看网站| av网站免费在线观看视频| 久久婷婷成人综合色麻豆| 成人国语在线视频| 国产私拍福利视频在线观看|