• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    ST‐SIGMA:Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    2022-12-31 03:49:14YangFangBeiLuoTingZhaoDongHeBingbingJiangQilieLiu

    Yang Fang|Bei Luo|Ting Zhao|Dong He|Bingbing Jiang|Qilie Liu

    1School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing,China

    2School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing,China

    3School of Electrical Engineering,Korea Advanced Institute of Science and Technology(KAIST),Daejeon,Republic of Korea

    4School of Information Science and Technology,Hangzhou Normal University,Hangzhou,China

    Abstract Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting(ST‐SIGMA),an efficient end‐to‐end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST‐SIGMA adopts a trident encoder–decoder architecture to learn scene semantics and agent interaction information on bird’s‐eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST‐SIGMA further exploits a spatio‐temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST‐SIGMA achieves significant improvements compared to the state‐of‐the‐art(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in real‐world AD scenarios.

    KEYWORDS feature fusion,graph interaction,hierarchical aggregation,scene perception,scene semantics,trajectory forecasting

    1|INTRODUCTION

    In recent years,autonomous driving(AD)has made great progress,and its practical application value is becoming increasingly prominent.However,there are still open challenges[1,2]within current AD,for example,scene perception and trajectory forecasting,which are important to perceiving the surrounding environment of the ego‐agent and predicting the future trajectories of neighbouring traffic agents given sensory data and past motion states[3].Specifically,scene perception is designed to sense its surroundings to avoid collisions,while trajectory forecasting aims to optimise path planning.Recently,some works[4,5]attempt to address these two problems with a framework,which needs to handle two heterogeneous information:scene semantics and interaction relations.However,this task is challenging due to the following two main factors.First,the coexistence of multi‐category moving agents(i.e.pedestrians,cars,cyclists etc.)in an AD environment hampers the LiDAR‐only approach to perceive different shapes and the camera‐only approach to predict motion states.Second,the ego‐vehicle faces multimodal interactions with surrounding traffic agents,termed spatial interaction.Furthermore,the future motion trends of traffic agents depend largely on their previous motion states,called temporal interaction.Since complex spatio‐temporal interactions are intertwined,a single‐model network often fails to explicitly model the interactions.Most early methods of scene perception rely on object detection and tracking pipelines but cannot identify unseen objects in the training set,and therefore,they cannot accurately perceive unseen traffic agents in scenes.

    Based on the above observations,this paper employs a bird's‐eye view(BEV)and an occupancy grid map(OGM)to represent the surrounding environment and traffic agent’s motion states.Figure 1 illustrates three perception pipelines with BEV maps for AD.Figure 1a depicts the instance‐level prediction[6],which focusses on the 2D detection bounding box without motion estimation[7].Figure 1b illustrates the method proposed by Wu et al.[8],which performs joint scene perception and motion prediction.Figure 1c demonstrates the prediction results of our proposed ST‐SIGMA.Compared with(a)and(b),ST‐SIGMA can simultaneously perform both instance‐level and pixel‐level detection as well as dense motion prediction.Benefiting from the fusion of graph interaction features with scene semantic features and the hierarchical aggregation decoder(HAD),our model achieves better perceptual and predictive performance than SOTA methods,as shown in our prediction results.

    Most of the trajectory forecasting methods[9,10]in AD deal with the forecasting task by breaking it down into three subproblems[11],that is,detection,tracking,and forecasting.For detection,they use advanced 2D[12,13]or 3D[14]detectors to perceive the surrounding traffic objects[15].For tracking,the 2D or 3D visual object tracking methods[16,17]fulfil the data association,which is essential for generating motion trajectories of seen traffic agents.For forecasting,based on the past trajectories obtained by Multiple Object Tracking(MOT),a temporal model is built for trajectory forecasting.However,this step‐by‐step manner suffers from several inherent deficiencies.First,the‘barrel effect’restricts the performance of trajectory forecasting by the detection and tracking performance.Second,each step has significant computational complexity,and this detection‐association‐forecasting ideology inevitably increases time consumption,including training and inference time overhead,making detection‐tracking‐forecasting tasks difficult to perform in real time.

    In addition,the input modalities for the trajectory forecasting model are diverse.Luo et al.[4]model trajectory forecasting problems by taking LiDAR‐only data[18]as the network input.Ivanovic and Pavone[19]use a recurrent sequence model and a variational deep generative model to generate a distribution of future trajectories for each agent[20].They rely solely on the past motion information of agents to model future motion states that lack context information.

    FIGURE 1(a)Shows the predicted results of a general instance‐level detector,(b)is the output of MotionNet,and(c)denotes the predicted results of the proposed ST‐SIGMA.The left images are initial bird’s‐eye view(BEV)representations,the middle ones are the corresponding ground‐truth maps,and the right images are the output of three different methods,respectively.The arrows indicate future motion predictions of the foreground grid cells,and the different colours represent different agent categories.The orientation and the length of the arrows represent the direction and distance of the agents'movement.The central area of the BEV map represents the location of the ego‐agent

    To remedy the above‐mentioned issues,this paper proposes a unified scene perception and graph interaction encoding framework,ST‐SIGMA,consisting of the scene semantic encoder(SSE),graph interaction encoder(GIE),and HAD.SSE takes multi‐sweep LiDAR data in BEV as the input to extract high‐level semantic features,and GIE utilises agents'previous state information to encode graph‐structured interactive relations between neighbouring traffic agents.Then,both output features are propagated into HAD for pixel‐level prediction tasks.Notably,our model captures features of both the scene and interaction to compensate for the deficiencies in prior work.

    In summary,the contribution of this paper is threefold:

    ·An iterative aggregation network is developed as the SSE,which iteratively aggregates shallow and deep features to preserve as much spatial and semantic information as possible for multi‐task dense predictions.

    ·An attention‐based feature fusion method is designed to efficiently fuse the semantic and interaction encoding features to facilitate multimodel feature learning.

    ·The proposed ST‐SIGMA framework can learn scene semantics and graph interaction features in a unified framework for pixel‐level scene perception and trajectory forecasting.

    The rest of this paper is organised as follows:Related work is discussed in Section 2.Details on the proposed scene perception and trajectory forecasting framework are presented in Section 3.The experimental results and analysis on the nuScenes[21]data set are presented in Section 4.And finally,Section 5 draws the conclusion and the future work of this paper.

    2|RELATED WORK

    This section revisits some of the key works that have been proposed for scene perception,graphical interaction representation,and trajectory prediction,respectively.We also illustrate the similarities and differences between the proposed ST‐SIGMA and other works.

    2.1|Scene perception

    The canonical scene perception task targets the identification of the location and class of objects.According to the input modality,this task is categorised into 2D object detection,3D object detection,and multimodal object detection.2D object detection includes two‐stage methods[22],single‐stage methods[23],and transformer‐based methods[24].With the increasing adoption of LiDAR in AD,3D object detection has recently gained increasing attention.The voxel‐based method voxelizes irregular point clouds into 2D/3D compact grids and then adopts 2D/3D Convolutional Neural Networks(CNNs)[6,25].The point‐based method leverages permutation invariant operators to abstract features from raw points[26,27].Due to the ambiguity of single modality,fusion‐based object detection is emerging to address this drawback[28,29].Specifically,our method follows the pipeline of multimodal object detection,where we leverage LiDAR point clouds and graph interaction data to perform both pixel‐level categorisation and instance‐level object detection.

    2.2|Graph interaction representation

    Graph convolutional networks(GCN)[30]can model the dependencies between graph nodes and propagate neighbouring information to each node.It is receiving more attention in many vision tasks[31,32].ST‐GCN[33]applies to construct a sequence of skeleton graphs for action recognition,where each graph node corresponds to a joint of the human body.Refs.[34,35]use ST‐GCN to learn interaction features from human trajectories in the past and then design a time‐extrapolator CNN for future trajectory generation.Weng et al.[9]propose a graph neural network‐based feature interaction mechanism that is applied to a unified MOTand forecasting framework to improve socially aware feature learning.However,none of these trajectory prediction methods fully consider the direct or indirect interactions between traffic participants.Instead,the proposed ST‐SIGMA explicitly explores the interaction relationships between multiple traffic agents in AD scenes by leveraging ST‐GCN as a GIE.

    2.3|Trajectory forecasting

    Significant progress has been made in trajectory forecasting based on different data modalities.Fast and Furious[4]develops a joint detector,tracking,and trajectory forecasting framework by encoding an OGM over multiple LiDAR frames.MotionNet[8]and Fang et al.[36]propose a novel spatio‐temporal pyramid network(STPN)to jointly perform pixel‐level perception and trajectory prediction.Besides,parallelized tracking and prediction[9]utilises the past motion states of traffic agents in a top‐down grid map as the model input,and then the recurrent neural networks are applied to extract and aggregate temporal features as the interaction representation for motion prediction.With the development of high‐definition map(HDM)data,HDM data‐based trajectory forecasting methods receive more attention.GOHOME[37]exploits the graph representations of the HDM with sparse projection to provide a heatmap output that depicts the probability distribution of an ego‐agent's future position.THOMAS[38]leverages hierarchical and sparse image generation for multi‐agents'future heatmap estimation.Unlike these works,this paper utilises the fusion of the scene and interaction features for pixel‐level trajectory forecasting.

    3|PROPOSED METHOD

    We aim to perform simultaneous multi‐agent scene perception and trajectory forecasting in the 2D space of BEV.First,the BEV representation of the point cloud is fed to the SSE network for scene semantics encoding.Meanwhile,multi‐agent past trajectories are fed to the GIE network for interactive graph encoding as well.Then,the HAD network further fuses the features from SSE and GIE to perform scene perception and trajectory forecasting.The overall ST‐SIGMA framework is illustrated in Figure 2.Section 3.1 describes the BEV representation process.Section 3.2 presents details about scene information manipulation and the semantic encoder building process.Section 3.3 introduces motion state data preparation and explains how the graph interaction is modelled,given agents'past trajectories.Section 3.4 gives the deep aggregation decoder construction and configuration and elaborates on the rationale of network design.

    3.1|BEV representation

    FIGURE 2 The proposed scene perception and trajectory prediction framework,ST‐SIGMA,consists of three essential modules:SSE,GIE,and HAD.Each of them plays its own unique role for scene semantics encoding,graph interaction encoding,and feature fusion,please see the corresponding sections for details.GIE,graph interaction encoder;HAD,hierarchical aggregation decoder;SSE,scene semantic encoder;ST‐GCN,spatio‐temporal graph convolution network;ST‐SIGMA,Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    To make the raw point cloud from the LiDAR sensor structured and compatible with the network input,we need to transfer raw point clouds into BEV maps.Concretely,the origin of the coordinate system for each point cloud sweep changes over time due to the movement of LiDAR mounted on the ego‐agent,which leads to plausible motion estimation.To alleviate these issues,following Ref.[8],we first synchronise all the past point frames to the current coordinate system for coordinate alignment.Then wespecify the range of scene regionfrom the raw3D cloud point at timestampt,denoted as Mt∈RLc×Wc×Hc,whereLc,WcandHcdenote the length,width,and height,respectively.The origin is located at the position of the ego‐agentP0=[x0,y0,z0]by specifyingP0as the origin of synchronised coordinate system.The valid range of scene Mtw.r.tP0can be represented by,whereH0is the vertical distance from the LiDAR sensor to the ground.Thereafter,the Mtis voxelised with a predefined voxel size[δl,δw,δh]into a discretised grid mapItwith a size of.We simply encodeItas a binary BEV map.In particular,a voxel occupied by the point cloud is assigned a value of 1;otherwise,it is assigned as?1.The binary mapItis taken as the input into the SSE network for scene semantics encoding.

    3.2|Scene semantic encoder

    In the STPN of the baseline,we observe that pyramidal connections are linear,and the shallower layers’features are not sufficiently aggregated to mitigate their inherent semantic weaknesses.Instead of STPN,we propose SSE for progressive spatio‐temporal feature aggregations inspired by Ref.[39].Concretely,SSE takes a set of sequential BEV mapss={st|t=?T,…,?1}∈RT×Z×X×Yas the input,whereTis the number of BEV maps,andZ,XandYdenote the number of channel(height),Xaxis andYaxis of each map.The network consists of five spatio‐temporal convolution(STC)blocks.The first STC block,STC‐0,extents height channels fromZtoCwhile reserving the map resolution.The remaining four blocks double the number of height channels and reduce the resolution by a factor of two at each stage to model high‐level semantic features.The output scene semantic feature maps S∈R16C×X∕16×Y∕16are then fused with graph interaction features G(detailed in Section 3.3)to compose the semantic and interactive aggregations as the final encoding features A,which are further propagated into the decoder network with multiheads.At the same time,to preserve as much of the spatial information of the low‐level features as possible for better resolving the pixel‐level information for fine‐grain output tasks,we apply iterative aggregation connections,which are compositional and non‐linear,and the earlier aggregated layers pass through more aggregations.The iterative aggregation functionIaggregates the shallow and deep features from a series of layers{l1,…,ln},which is formulated as follows:

    whereAis the aggregation node.Our SSE network has four iteration levels and six aggregation nodes.Each nodeAaccepts two input branches:the first branch is from feature maps that share the same resolution as aggregated features;the second branch is the feature maps down‐sampled from the first input branch.For dimension and resolution adaptation,the features of the second branch are fed into a projection blockPand an up‐sampling blockU.The projection functionPprojects the feature map channel of stage 2–6 to the number of 16,and then the un‐sampling functionUup‐samples the projected feature maps to the same resolution as the features of stage 1(256×256).The up‐sampled features are taken as input to aggregation nodeAin Equation(1).The SSE network is shown in Figure 3.

    3.3|Graph interaction encoder

    Besides scene semantic information,the interaction relationships between agents will influence their future trajectory in the BEV map.In general,the past motion states of an agent directly or indirectly affect its own and its neighbours'future trajectories.Therefore,we extend the GCN to the ST‐GCN to encode the graph interaction between agents,as shown in Figure 4.

    To emphasise the difference from the original GCN,we revisit the preliminary knowledge of the graph convolution network(GCN),and the original GCN is formulated as follows:

    where P is the sampling function aggregating the information of neighbours aroundvi,and the superscript(l)denotes layerl.W is the learnable parameters of graph model,is a normalisation term,and Θ is the cardinality of the neighbour set.

    Given multi‐agent's motion state vectorfor agentiat timestampt,it serves as the initial input of the first layer(layer0)of GIE.Here,a set of sequential motion state vectorvitfrom timestamp?Tto?1 compose the input of GIE,denoted as,whereMis the number of agents andDmeans the feature dimension for each graph node.General graph convolution networks consider only single‐frame graph representationGt=(Vt,Et),whereVt=is the set of vertices of the graph,andis the set of edges betweeni‐th andj‐th vertex.We assume that there is an edge betweenvitandvjtif theL2distance between them is less than a predefined thresholdd.To better encode the mutual interaction between neighbouring agents,each edge is weighted by a kernel functionkij

    t,defined as follows:

    where‖·‖is theL2distance from BEV between thei‐th andj‐th vertex,which means thatifvitandvjtare connected,andeijt=0 otherwise.Allkijtform the adjacency matrixAtof graphGt.To ensure a proper GCN,the adjacency matrixAtis normalised with identity matrixIand the degree matrix Λ as follows:

    Similar to Refs.[33,34],ST‐GCNs define a new spatio‐temporal graphGconsisting of a sequential subgraphs asG={Gt|t=?T,…,?1},and allGtshare the same topology but vary vertex attributes ofvtwith a different timestampt.Thus,the graph is defined asG=(V,E),in whichV={vi|i=1,…,M},and.It means that each vertex ofGconsists of the temporal aggregation of spatial node attributes;in this way,the temporal information is naturally embedded in graphG.In addition,the normalised adjacency matrixAis the stack of{A?T,…,A?1},in whichAtis calculated as Equation(4).The vertices'representation of layerlat timestamptis denoted asV(l)t,andV(l)is the temporal stack of.Our final ST‐GCN layers is formulated as follows:

    where W(l)is the learnable parameters at a laterl.After N‐layer graph convolution operation,the ST‐GCN outputs final graph feature mapsV(N),denoted as graph interaction embedding G,which is further fused with scene semantics encoding S and is then taken as input to the HAD network for dense prediction tasks.

    3.4|Hierarchical aggregation decoder

    Given the scene semantic feature maps S output by SSE and graph interaction feature maps G output by GIE,the proposed hierarchical aggregation decoder network takes S and G as input for simultaneous multi‐layer feature aggregation and heterogeneous feature fusion.Specifically,S is a stack of the multi‐scale feature maps{s(1),…,s(N)},s(n)is the output of then‐th aggregation stage with resolution[X/2n?1,Y/2n?1],where[X,Y]is the original size of the BEV map.There are five aggregated feature maps,except the fifth‐stage featuress(5),which are directly output by the STC‐4 block and are further fused with G.The feature maps from one to four stages■s(n)■4n=1are produced by iterative aggregation operation,defined in Equation(1).As for G,the original input of GIE is theT‐sequential stack ofN‐agents'motion state matrixFor model suitability,we first transform the dimensional order of Q to∈RD×T×M.Then,the ST‐GCN feature mapsV(N)isproduced by the ST‐GCN network,given graph vertices^Q and adjacency matrixA∈RT×M×M,andV(N)is fed into a graph residual block for the graph representation aggregator(GRA),which is composed of a stack of residual convolution operations,to obtain the final graph interaction feature maps G∈RT××M,where=M=X/16.Thus,s(5)and G share the same feature resolution and can be seamlessly fused[40]after applying the ST‐GCN and GRA network.We design three fusion approaches for heterogeneous feature fusion,as shown in Figure 5,and conduct ablation study for their effectiveness comparison in Section 4.4.3.The core of the ST‐SIGMA framework lies in the HAD network,shown in Figure 3,which plays a vital role in two aspects of dense prediction tasks.First,it enables us to neatly fuse multimodal feature maps even with diverse dimensions,resolution,and channels.Second,HAD network can iteratively and progressively aggregate features from shallow to deep to learn a deeper,higher,and finer resolution decoder with fewer parameters and better accuracy.

    FIGURE 3 Hierarchical aggregation decoder(HAD)and scene semantics encoder(SSE)architectures.The SSE is responsible for scene semantic feature extraction.The HAD fuses the features from the SSE and graph interaction encoder(GIE)and then performs dense predictions.STC,spatio‐temporal convolution

    FIGURE 4 Graph interaction encoder(GIE),which is implemented with the spatio‐temporal graph convolution network(ST‐GCN).Generally,the ST‐GCN is the temporal stack of spatial graph representations.Please refer to Section 3.3 for more details

    FIGURE 5 Three feature fusion approaches are used in our method,and C is the concatenation operation.(a)Is the concatenation‐based fusion,(b)is the addition‐based fusion,and(c)is the hybrid fusion method,which includes concatenation,addition,and attention operations.GIE,graph interaction encoder;SSE,scene semantic encoder

    3.5|Loss function

    The proposed HAD network generates final feature map F∈RZ×X×Y,andZ,X,Yare the channel,width,and height of feature maps,followed by three prediction heads for object detection,pixel‐level categorisation,and trajectory forecasting,respectively.Feature maps F are first taken as input into bottle neckconv2dfor feature adaptation.Each task is supervised by following three loss functions.

    Forobject detection loss,we apply the cross‐entropy loss(CE)for box classification,which is defined as follws:

    whereyidenotes the ground‐truth label of thej‐th sample,yi=1 means the foreground,andpiis the probability belonging to the foreground predicted by the learnt model.For bounding box regression,we employ a linear combination ofl1‐norm loss and the generalised IoU lossLGIoU[41].The final regression loss can be formulated as follows:

    where 1{yi=1}is an indicator function for the positive sample,biis thei‐th predicted bounding box,andbis the ground‐truth bounding box.λIoUandλ1are the regularisation parameters.

    Forpixel‐level categorisation loss,we employ the focal loss(FL)[42]to handle the class imbalance issues,which is defined as follows:

    wherepti=piifyi=1,andpti=1?piotherwise,piis the predicted probability that thei‐th pixel belongs to the foreground category,andyiis the ground‐truth category labels;interested readers can refer to related literature[42]for more details.

    Fortrajectory forecasting loss,with the analysis in Ref.[45],we employ a weighted smoothL1loss functionLtffor trajectory forecasting,where the weight setting follows that of the categorisation loss.However,the above loss can only guarantee the global normalisation of training and cannot guarantee the local spatiotemporal consistency.Therefore,we additionally adopt the spatiotemporal consistency loss to augment spatiotemporal consistency learning,which is defined as follows:

    In Equation(9),‖·‖is the smoothL1loss,bkis the object instance with indexk,and∈R2and∈R2are the predicted displacement values of position(i,j)and position(i′,j′)at timet,respectively.It assumes that the motion states of all pixels within an instance box should be very close to each other without much jitters,referred to as spatial consistency.Similar to spatial consistency,the predicted motion states of each agents:,denoted as the average movements of all included pixels,should be smooth without large displacement changes during a short time durationΔt,whereKis the number of cells,andβtcandβscrepresent weight parameters of the temporal‐and spatial‐consistency loss,respectively.

    The overall loss function of the ST‐SIGMA model is the weighted sum of above multi‐task losses,which is defined as follows:

    whereLclsandLregare computed for instance detection loss,LflcandLstcare formulated as pixel‐level categorisation and trajectory forecasting losses,andπiis the trade‐off parameters for balancing the multi‐task learning.

    4|EXPERIMENTS AND ANALYSIS

    In this section,we evaluate the performance of the proposed ST‐SIGMA method on the nuScenes data set.First,we give an introduction to the data set.Second,the implementation details and the evaluation criterion are presented.Then,we give the details of experimental analysis and compare the proposed method with existing SOTA methods.Finally,we demonstrate the effectiveness and efficiency of each module through comprehensive ablation studies.

    4.1|Training and test data set

    We use the nuScenes data set for all our experiments and evaluation.The nuScenes data set provides a full suite of sensor data for autonomous vehicles,including six cameras,1 LiDAR,5 mm wave radars,as well as Global Positioning System and Inertial measurement unit.It contains 1000 scenes with annotated samples,each of which is sampled within 20 Hz and contains various driving scenarios.Because nuScenes only provides the 3D bounding boxes for point clouds object detection and does not provide motion or trajectory information,we obtain the motion states between two adjacent frames by calculating the displacement vector of corresponding point clouds within the labelled bounding boxes based on theirX–Yaxis coordinate values and the displacement relative to their centre positions.For point clouds outside the bounding box,such as the background,road,roadside building etc.,the movement values are set to zero.At the same time,we crop the point clouds,and set the range on thex‐axis to be 32 m for the positive and negative axis,respectively,and the same range on they‐axis.On thez‐axis,we take into account that the LiDAR sensor is mounted on top of the vehicle,so the negative axis direction is set to 3 m and the positive axis is 2 m.

    4.2|Implementation details

    The size of scene range Mtis set as[?32,32]×[?32,32]×[?3,2]m3,and then Mtis discretised by the voxel size[0.25,0.25,0.4]into a grid mapItof size[256,256,13].We use five temporal frames of synchronised point clouds as SSE network inputwith tensor size 5×13×256×256.We define five categories for instance‐level classification and pixel‐level categorisation prediction,for example,vehicle,pedestrian,bicycle,background,and others.The‘other’category includes all the remaining objects in nuScenes to handle the possible unseen objects beyond the data used in our paper.For the GIE network,we use 8‐dimension motion vectorsfor each traffic agent,whereand each quantity containsx‐axis andy‐axis components.For spatial‐temporal graphG,we construct network inputwith size 5×8×20×20,which are generated from the same five temporal frames with the input of SSE.And the SSE encoder outputs scene semantic features S,GIE outputs graph interaction features G,and both of them share the same size 32×16×16.Then,we apply three feature fusion approaches to fuse them,that is,channel‐wise concatenation,channel‐wise addition,and attention‐augmented addition.And we further verify their effectiveness in the ablation studies.The visualisation of predicted results by ST‐SIGMA is shown in Figure 6.

    4.3|Evaluation criterion

    For trajectory forecasting,we calculates the relative displacements of the corresponding point clouds in adjacent frames.Meanwhile,all grid cells within BEV map are classified into three groups according to their moving speed,for example,static,slow(speed≤5 m/s),and fast(speed>5 m/s).In each group,the averageL2‐norm distance between the estimated displacement and the ground truth displacement,that is,the average displacement error(ADE),is calculated in Equation(11):

    In Equation(11),Nrepresents the number of traffic agents,represents the predicted trajectory of then‐th traffic agents at timestampt,andrepresents the true trajectory,wheret=1,…,Tfuture.Equation(12)computes the Overall cell category Accuracy(OA)of all cells,formulated as follows:

    where Correct classified Cells(CC)represents the number of correctly classified cells,and AC represents the total number of cells.And Equation(13)calculates the Mean Category Accuracy(MCA),which indicates the average category accuracy:

    where CA(Bg)represents the classification accuracy of the background,CA(Vehicle)represents the classification accuracy of vehicles,CA(Ped)represents the classification accuracy of pedestrians,CA(Bike)represents the classification accuracy of bicycles,and CA(Others)represents the classification accuracy of other unseen traffic objects,respectively.

    4.4|Experimental analysis

    FIGURE 6 The visualisation of predicted results by each head,for example,the predicted pixel categorisation(using different colours to represent different categories),bounding boxes,and the trajectories of traffic agents

    We evaluate the performance of our proposed ST‐SIGMA method and compare it with other SOTA perception and trajectory forecasting methods.Specifically,for pixel‐level categorisation and trajectory forecasting,we compare with the following methods:static model,which only considers the static environment;FlowNet3D[43],HPLFlowNet[44],Neural scene flow prior[46]and recurrent closest point[47],which estimate the scene flow between adjacent point cloud frames with linear dynamic assumption;PointRCNN[27],which is the combination of the PointRCNN detector and Kalman filter for bounding box prediction and trajectory forecasting of objects in BEV representation;LSTM‐Encoder–Decoder[45],which estimates the multi‐frame OGMs by using the same prediction head as ST‐SIGMA while preserving its backbone network;and MotionNet[8],which adopts LiDAR point clouds and the STPN framework for scene perception and motion prediction.It is noteworthy that our ST‐SIGMA is inspired by the baseline[8],but with three novel contributions.First,we discard the STPN backbone and replace it with the iterative aggregation network,which can better propagate low‐level features to high‐level stages for multi‐scale feature aggregation.Second,we employ a GIE to further learn interactive relations between traffic agents.Third,we introduce an additional instance‐box prediction head for instance‐level object detection,which can directly boost pixel categorisation performance by capturing higher‐level semantic information from traffic scenes.For evaluating the performance of the bounding box prediction,we compare our method with SOTA detectors,including PointPillars[6],PointPainting[29],shape signature networks[48],Class‐balanced grouping and sampling[49],and CenterPoint[14].Moreover,we further compare the complexity of the proposed ST‐SIGMA with different scene perception and trajectory forecasting methods.

    4.4.1|Quantitative analysis

    To better evaluate the performance of the proposed method in different traffic scenarios,we first divide all the grid cells into 3 groups by different moving speeds:static(velocity=0 m/s),slow(velocity≤5 m/s),and fast(velocity>5 m/s).As shown in Tables 1–5,the ST‐SIGMA‐Baseline adopts the STPN backbone network proposed by the baseline method to extract scene semantic features from the BEV representation,and is additionally equipped with GIE.ST‐SIGMA employs the iterative aggregation network instead of STPN,and includes SSE,GIE,HAD,and the attention‐based fusion.

    From Table 1,we can see that for static cells,the static model gets the best trajectory forecasting performance,but the ADE increase with the moving speed of grid cells,which means the static model only performs well under extreme conditions and is not suitable for real‐world and dynamic AD scenes.A similar phenomenon happens to FlowNet3D[43],HPLFlow-Net[44],and PointRCNN[27];all of them achieve good prediction results for static cells;however,their performance drops dramatically for moving objects.Unlike the above methods,MotionNet[8]has a fairly stable model performance and shows the excellent performance for slowly moving objects;it outperforms all other methods including ours when moving velocity≤5 m/s.Instead,our proposed ST‐SIGMA method yields the best performance for fast moving objects.Specifically,it outperforms MotionNet by 0.0217 in terms of ADE,and 0.0637 in terms of median displacement error,respectively.This superiority is attributed to the graph interaction features,which can effectively model dynamic interactive relations,especially for the fast‐moving traffic agents with velocity>5 m/s.In addition,we can draw the conclusion that each component(SSE,HAD,and feature fusion approaches)plays a positive role for performance improvement.Notably,ST‐SIGMA with the attention‐based fusion approach achieves the best performance of trajectory forecasting,which suppresses the baseline by 2.3%,please refer to Tables 1 and 5 for more details.

    For evaluating pixel categorisation performance,we apply two types of loss functions:CE and FL.As shown in Table 2,ST‐SIGMA+ALL+CE denotes ST‐SIGMA with CE loss,and ST‐SIGMA+ALL+FL denotes ST‐SIGMA with FL function.We compare the proposed method with PointRCNN,LSTM‐Encoder–Decoder,and MotionNet.Table 2 demonstrates that across all five categories,the proposed ST‐SIGMA achieves the highest accuracy for four foreground categories,that is,car,pedestrian,bike,and others.We consider MotionNet as the baseline method;for CE,ST‐SIGMA increases categorisation accuracy by 1.5% compared to the baseline method.When applying FL,it further improves the performance of the baseline by 1.8%.For the pedestrian category with a small sample size,our method receives a significant performance improvement.Specifically,it increases the categorisation accuracy from 77.2 to 83.8,which outperforms the baseline method by 6.6%.For‘others’foreground categories,whose categorical information are not specified in this paper,ST‐SIGMA still achieves much better accuracy than the baseline by more than a 10%performance improvement.And the MCA of ST‐SIGMA is 75.5,which is 4.4%higher than that of the baseline.Notably,there is no obvious performance improvement for the background category and the overall cell categorisation.We analyse that for most point cloud frames,the point clouds belonging to the background category is much more than foreground point clouds.Once we use multi‐frame BEV maps to get the temporal‐aggregated scene features,these aggregated features can facilitate the detection of the foreground object with a small number of point clouds,but for the background category with a large number of point clouds,iterative aggregation network may cause over‐fitting issues due to over‐aggregation of background features.We will address this issue in a future study.

    TABLE 1 In the comparison of trajectory forecasting performance,we compare ours with some state‐of‐the‐art methods

    TABLE 2 The comparison of pixel‐level categorisation performance

    TABLE 3 The complexity comparison of the proposed ST‐SIGMA and some state‐of‐the‐art scene perception and trajectory forecasting methods

    Besides dense pixel‐wise predictions,we additionally add an instance‐level detection for the object bounding box prediction,as shown in Table 4.We compare the performance of ours with the most commonly used 3D object detectors.Our detector is based on the aggregated BEV maps and only performs 2D detection,although the performance of ST‐SIGMA is far inferior to most of SOTA 3D detectors,such as CenterPoint.However,employing this detection function can help to disambiguate the pixel categorisation process by providing both the spatial consistency constraint and the additional semantic information.To further evaluate the efficiency of the proposed method,we further compare the complexity of ST‐SIGMA with the following methods,PointPillars[6],PointRCNN[27],Flow-Net3D[43]and MotionNet[8].Table 3 shows that the complexity of the proposed ST‐SIGMA is about 10% higher than that of the baseline method.We attribute the complexity growth to the usage of multimodal data input and the iterative network architecture.

    4.4.2|Qualitative analysis

    To qualitatively evaluate the proposed ST‐SIGMA method,we visualise predicted results of the pixel categorisation and trajectory forecasting in Figures 7 and 8,and compare it with corresponding ground‐truth maps and the predicted results of the baseline method.For the convenience of qualitative analysis,we first give the detailed description of the elements and attributes in the following figures.

    Specifically,in Figure 7a,there are five different colours of point clouds;different colours denote the different categories of traffic agents,for example,blue points represent the background,purple points mean vehicles(cars or truck),black points mean the pedestrian,green points denote the bicycle,and red points mean other foreground categories.In Figure 7b,besides point clouds,there are five different colours of arrows,where each colour represents the category information that is the same with points,and the arrows represent the future trajectory of each point.Concretely,the length of the arrowindicates the moving distance,and the direction of the arrow means the moving orientation.Figure 7a shows the comparison of ground‐truth maps,the predicted results of our ST‐SIGMA,and the baseline method.In pixel categorisation results,there are fewer pixels misclassified by ST‐SIGMA than the baseline,as shown in the regions enclosed by the circle.It demonstrates that the categorisation performance of the proposed method is better than that of the baseline.As for trajectory forecasting results in Figure 7b,it is clear to see that the points at the bottom of the map are misclassified by the baseline,but ST‐SIGMA gives robust categorisation results thanks to the additional instance detection function.More visualisation of trajectory forecasting results are shown in Figure 8,where each row represents the comparison between ground truth,predicted results of ST‐SIGMA,and the baseline in a traffic scene.In first row,the points belonging to other category at the top of the map are misclassified into vehicles by the baseline.Instead,the proposed ST‐SIGMA can accurately categorise them.However,in the second row,for traffic agents that are too close or too far from the ego‐vehicle,our methods occasionally perform false detection,which indicates that our method has lower robustness compared to the baseline method in these traffic scenarios.

    TABLE 4 The comparison of object detection performance

    FIGURE 7 The pixel‐level categorisation prediction results(first row),and the trajectory forecasting results(second row),from left to right:the ground‐truth category map,the predicted results of ST‐SIGMA,and the predicted results of the baseline method.ST‐SIGMA,Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    TABLE 5 Ablation study about the effect of each component on the ST‐SIGMA framework

    4.4.3|Ablation study

    In this section,we estimate the validity and effectiveness of each proposed component through comprehensive ablation studies.First,we focus on the proposed ST‐SIGMA framework,which consists of four key components,SSE,GIE,HAD,and feature fusion approaches(Con,Add,and Att).Table 5 shows that the ST‐SIGMA‐Baseline adopts the STPN backbone network from the baseline to model BEV maps,and is additionally equipped with GIE.In Table 5,the second row replaces STPN with the iterative aggregation network for scene semantics encoding,and leaves the rest unchanged.The third row further applies the HAD network for hierarchical multimodel feature aggregation.And the fourth to sixth rows apply Concatenation‐based fusion,element‐wise Addition‐based fusion,and Attention‐based fusion approaches,respectively.As shown in Figure 5,(a)is the Concatenation fusion(Con),(b)is the element‐wise Addition fusion(Add),and(c)is the Attention‐based fusion(Att),respectively.We can also draw the conclusion that each component plays a positive role in the performance improvement of ST‐SIGMA.If we take MotionNet as the baseline,we can see that only simply adding interaction encoding cannot help improve the trajectory forecasting accuracy.And ST‐SIGMA with the attention‐based fusion approach obtains the best performance,which suppresses MotionNet by 2.3%.

    FIGURE 8 The pixel‐level categorisation prediction of three selected scenes,from left to right:the ground‐truth category map,the predicted results of ST‐SIGMA,and the predicted results of the baseline.ST‐SIGMA,Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    Furthermore,we analyse the effect of different weighted adjacency matrix kernel functions,which define the mutual influence between vertices and provide the prior knowledge of social relations among traffic agents.Since the graph interactions have a large impact on the future trajectory of agents,specifically,we analyse the influence of different kernel functions,k1,k2,k3,k4andk5in Equation(14),on the trajectory forecasting performance(ADE)of multiple traffic agents,and the results are shown in Table 6.

    In Equation(14),we setk1=1 as the baseline in the weighted adjacency matrix.And the second kernel functionk2is defined as the Euclidean distance(L2 norm)between traffic agents to simulate their influence on each other.The third kernelk3is defined as the inverse of the L2 norm and adds a residual parameter?to the denominator,making sure the denominator is not equal to 0.The fourth kernel functionk4is calculated using the Gaussian radial basis function[50].And the fifth kernelk5is also defined as the inverse of the L2 norm,but different fromk3,we setk5=0 in the case of,wherevitandvj

    trepresent thei‐th agent andj‐th agent at timestampt.It indicates that two traffic agents are considered to be the same object when they are at the same location.The performances of these kernel functions are shown in Table 6,and through ablation experiments,we can see that the best performance is produced byk5,since the future motion of traffic agents are more sensitive to theinfluence of other similar objects physically close to them.Therefore,this study computes similarity measurements for traffic agents to address this issue.Conversely,without this condition,the relationship between traffic agents in the model cannot be correctly represented.Therefore,this paper usesk5to define the adjacency matrix in all experiments.

    TABLE 6 Ablation analysis about the influence of different kernel functions on trajectory forecasting performance

    In addition,we also try to find the optimal number of input BEV frames for trajectory forecasting performance.To achieve this goal,we draw the curve graph in Figure 9 to show the relationship between the number of input frames and the average displacement errors of trajectory forecasting.It is observed that when the number of input frames increases from 1 to 5,all ADE are consistently decreased.However,the time and space complexity will increase significantly with the increase in the input frame number.We need to make a trade‐off between the number of input frames and the model performance.As shown in Figure 9,when the number of frames exceeds 5,the model performance gradually saturates and even decreases for fast moving traffic agents.Hence,we set the number of input frames to 5 for all experiments.

    5|CONCLUSIONS

    This paper proposes ST‐SIGMA,a unified framework for simultaneous scene perception and trajectory forecasting.The proposed method can jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents,which are crucial for real‐world AD systems.Experimental results show that the proposed ST‐SIGMA framework outperforms the SOTA method by 2.3% higher MCA for pixel categorisation and 4.4%less ADE for trajectory forecasting.In future work,we intend to utilise multimodal data fusion(LiADR,camera,HD‐Map)and adopt traffic rules to further improve the performance of scene perception and trajectory forecasting while maintaining the acceptable complexity cost.

    FIGURE 9 Ablation analysis about the effect of the number of input frames on the trajectory forecasting performance.Considering the accuracy and the efficient trade‐off,we chose 5 as the optimal number of input frames for all experimental settings.ADE,average displacement error

    ACKNOWLEDGEMENTS

    This work was supported in part by the Science and Technology Research Program of Chongqing Municipal Education Commission(No.KJQN202100634 and No.KJZD‐K201900605),National Natural Science Foundation of China(No.62006065),Basic and Advanced Research Projects of CSTC(No.cstc2019jcyj‐zdxmX0008).

    CONFLICT OF INTEREST

    The authors declared that they have no conflicts of interest to this work.

    DATA AVAILABILITY STATEMENT

    Research data are not shared.

    ORCID

    Yang Fanghttps://orcid.org/0000-0001-6705-4757

    国产无遮挡羞羞视频在线观看| 新久久久久国产一级毛片| 日韩成人在线观看一区二区三区| 亚洲中文av在线| 亚洲aⅴ乱码一区二区在线播放 | 成人手机av| 国产成人影院久久av| 精品国产一区二区久久| 成人精品一区二区免费| 又黄又粗又硬又大视频| 日本黄色视频三级网站网址 | 欧美日韩视频精品一区| 99久久综合精品五月天人人| 欧美激情久久久久久爽电影 | 国产精品久久久久久精品古装| 精品视频人人做人人爽| 一夜夜www| bbb黄色大片| 欧美久久黑人一区二区| 亚洲三区欧美一区| 丰满的人妻完整版| 日韩大码丰满熟妇| 欧美国产精品一级二级三级| 久久国产精品影院| 国产高清激情床上av| 51午夜福利影视在线观看| 精品久久久久久电影网| 黑人操中国人逼视频| 成人av一区二区三区在线看| 十八禁网站免费在线| 1024视频免费在线观看| 国产精品一区二区在线不卡| 一区二区三区激情视频| 十八禁网站免费在线| 大香蕉久久成人网| 免费在线观看日本一区| 国产欧美日韩一区二区三区在线| 亚洲精品国产精品久久久不卡| 桃红色精品国产亚洲av| 怎么达到女性高潮| 男人舔女人的私密视频| 天天添夜夜摸| 久久久久视频综合| 香蕉久久夜色| 啦啦啦免费观看视频1| 99精国产麻豆久久婷婷| 极品人妻少妇av视频| 久久 成人 亚洲| 免费在线观看影片大全网站| videosex国产| 无人区码免费观看不卡| 中文字幕av电影在线播放| 婷婷成人精品国产| 日本精品一区二区三区蜜桃| 超色免费av| 伦理电影免费视频| 身体一侧抽搐| 成年人免费黄色播放视频| 欧美精品一区二区免费开放| av国产精品久久久久影院| 国产精品久久久久久精品古装| 亚洲男人天堂网一区| 亚洲人成电影观看| 免费在线观看日本一区| 少妇粗大呻吟视频| 日韩视频一区二区在线观看| 一进一出抽搐动态| 狂野欧美激情性xxxx| 91国产中文字幕| 色在线成人网| 亚洲精品国产精品久久久不卡| 天天添夜夜摸| 午夜精品在线福利| 99热网站在线观看| av网站在线播放免费| 欧美最黄视频在线播放免费 | 69av精品久久久久久| 视频在线观看一区二区三区| 夜夜爽天天搞| 99久久国产精品久久久| 中出人妻视频一区二区| 亚洲欧洲精品一区二区精品久久久| 日日摸夜夜添夜夜添小说| 99久久99久久久精品蜜桃| 热re99久久国产66热| 久久久久久久国产电影| 国产精品一区二区在线观看99| 亚洲欧美激情综合另类| 岛国在线观看网站| 国产精品久久久人人做人人爽| av超薄肉色丝袜交足视频| 欧洲精品卡2卡3卡4卡5卡区| 超碰97精品在线观看| 国产精品二区激情视频| 日韩一卡2卡3卡4卡2021年| 丁香六月欧美| 亚洲欧美色中文字幕在线| 51午夜福利影视在线观看| av网站在线播放免费| 手机成人av网站| 亚洲精品久久成人aⅴ小说| 夜夜爽天天搞| 50天的宝宝边吃奶边哭怎么回事| 动漫黄色视频在线观看| 久久亚洲精品不卡| 99re6热这里在线精品视频| 亚洲成人免费av在线播放| 狂野欧美激情性xxxx| а√天堂www在线а√下载 | 久久精品国产亚洲av高清一级| 色婷婷av一区二区三区视频| 国产精品国产av在线观看| 激情视频va一区二区三区| 一本大道久久a久久精品| 午夜免费鲁丝| 最新在线观看一区二区三区| 男女下面插进去视频免费观看| 狠狠狠狠99中文字幕| 制服诱惑二区| 欧美成狂野欧美在线观看| a在线观看视频网站| 亚洲精品中文字幕一二三四区| 99香蕉大伊视频| 久久草成人影院| 咕卡用的链子| 777久久人妻少妇嫩草av网站| 男女午夜视频在线观看| 老汉色av国产亚洲站长工具| 日本a在线网址| 大香蕉久久成人网| 国产成人一区二区三区免费视频网站| 亚洲国产中文字幕在线视频| 日日夜夜操网爽| 一级黄色大片毛片| tube8黄色片| 女人高潮潮喷娇喘18禁视频| 国产无遮挡羞羞视频在线观看| 中文字幕精品免费在线观看视频| 色婷婷久久久亚洲欧美| 一二三四在线观看免费中文在| 不卡一级毛片| 99久久人妻综合| videos熟女内射| 在线永久观看黄色视频| 亚洲片人在线观看| 99精品欧美一区二区三区四区| 亚洲熟妇熟女久久| av超薄肉色丝袜交足视频| 国产亚洲欧美98| 99精品在免费线老司机午夜| 午夜福利欧美成人| 久99久视频精品免费| 天天躁日日躁夜夜躁夜夜| 成年动漫av网址| 1024视频免费在线观看| netflix在线观看网站| 久久国产精品男人的天堂亚洲| a级毛片在线看网站| 夜夜躁狠狠躁天天躁| 亚洲 国产 在线| 男女之事视频高清在线观看| 精品国产乱子伦一区二区三区| 丝瓜视频免费看黄片| 51午夜福利影视在线观看| 制服人妻中文乱码| 亚洲 国产 在线| 最近最新中文字幕大全免费视频| 亚洲在线自拍视频| 天天影视国产精品| 亚洲熟女精品中文字幕| 免费在线观看日本一区| 一区二区三区激情视频| 国产不卡一卡二| 丝袜美腿诱惑在线| 国产国语露脸激情在线看| av电影中文网址| 黄色女人牲交| 一级作爱视频免费观看| av免费在线观看网站| 成年人免费黄色播放视频| 18禁裸乳无遮挡动漫免费视频| 女同久久另类99精品国产91| 90打野战视频偷拍视频| 久久久久精品国产欧美久久久| 一进一出抽搐gif免费好疼 | 国产亚洲欧美98| 亚洲五月色婷婷综合| 亚洲精品国产精品久久久不卡| 久久久久国产一级毛片高清牌| 亚洲三区欧美一区| 高清黄色对白视频在线免费看| 国产成人av激情在线播放| 亚洲成人免费av在线播放| 这个男人来自地球电影免费观看| 99精品在免费线老司机午夜| 12—13女人毛片做爰片一| 捣出白浆h1v1| 电影成人av| 99久久精品国产亚洲精品| 高清av免费在线| 高清av免费在线| 中文字幕高清在线视频| 日韩免费高清中文字幕av| 久久久久久人人人人人| 十八禁高潮呻吟视频| 日韩成人在线观看一区二区三区| 他把我摸到了高潮在线观看| 日韩三级视频一区二区三区| 51午夜福利影视在线观看| 精品欧美一区二区三区在线| 纯流量卡能插随身wifi吗| 久久99一区二区三区| 脱女人内裤的视频| 91麻豆精品激情在线观看国产 | 天天躁日日躁夜夜躁夜夜| 国产一卡二卡三卡精品| 色婷婷av一区二区三区视频| 欧美亚洲日本最大视频资源| 母亲3免费完整高清在线观看| 亚洲 欧美一区二区三区| 美女午夜性视频免费| 一区在线观看完整版| 国产精品乱码一区二三区的特点 | 天天躁狠狠躁夜夜躁狠狠躁| 国产欧美日韩综合在线一区二区| 亚洲中文字幕日韩| 国产伦人伦偷精品视频| 国产1区2区3区精品| 久久狼人影院| 免费不卡黄色视频| 一区二区三区激情视频| 一级片免费观看大全| 国产成人精品久久二区二区免费| 99re6热这里在线精品视频| e午夜精品久久久久久久| 国产免费男女视频| 免费日韩欧美在线观看| 亚洲三区欧美一区| 亚洲视频免费观看视频| 少妇裸体淫交视频免费看高清 | 国产精品综合久久久久久久免费 | 一边摸一边抽搐一进一小说 | 精品一区二区三区视频在线观看免费 | 国产成人影院久久av| 黑人欧美特级aaaaaa片| 国产欧美日韩一区二区三区在线| 大陆偷拍与自拍| 欧美黄色淫秽网站| 日日夜夜操网爽| 麻豆乱淫一区二区| 久久精品国产亚洲av香蕉五月 | 精品国产国语对白av| 美女国产高潮福利片在线看| 日韩免费av在线播放| 午夜福利一区二区在线看| 久久精品国产亚洲av高清一级| 成人18禁在线播放| 黑人巨大精品欧美一区二区mp4| 精品国产一区二区三区四区第35| 久久国产精品大桥未久av| 一夜夜www| 国产极品粉嫩免费观看在线| 人人澡人人妻人| 妹子高潮喷水视频| 在线天堂中文资源库| 两人在一起打扑克的视频| 精品国产超薄肉色丝袜足j| 亚洲精品中文字幕一二三四区| 久久 成人 亚洲| 18禁国产床啪视频网站| 欧洲精品卡2卡3卡4卡5卡区| 精品一区二区三卡| 大码成人一级视频| 天天躁狠狠躁夜夜躁狠狠躁| 日本撒尿小便嘘嘘汇集6| 久久久久久免费高清国产稀缺| 69av精品久久久久久| 9191精品国产免费久久| 国产亚洲欧美在线一区二区| 午夜免费鲁丝| 亚洲av美国av| 夫妻午夜视频| 日韩 欧美 亚洲 中文字幕| 精品国产一区二区久久| 悠悠久久av| 亚洲一区二区三区欧美精品| 一边摸一边抽搐一进一出视频| 成人亚洲精品一区在线观看| 一级毛片精品| 捣出白浆h1v1| 成人18禁高潮啪啪吃奶动态图| 国产xxxxx性猛交| 免费在线观看亚洲国产| 韩国精品一区二区三区| 99精品久久久久人妻精品| 一边摸一边做爽爽视频免费| 男女下面插进去视频免费观看| 制服人妻中文乱码| 18禁观看日本| 国产亚洲精品久久久久5区| 国产精品 欧美亚洲| 亚洲精品国产区一区二| 嫁个100分男人电影在线观看| 又黄又爽又免费观看的视频| 国产99久久九九免费精品| 久久久久精品人妻al黑| 麻豆成人av在线观看| 精品国产国语对白av| 国产在视频线精品| 999精品在线视频| 国产精品久久久av美女十八| 久久中文看片网| 成人免费观看视频高清| 老鸭窝网址在线观看| 亚洲一区高清亚洲精品| 一级毛片高清免费大全| 久久国产精品影院| av线在线观看网站| 亚洲av日韩精品久久久久久密| 69av精品久久久久久| 成人国语在线视频| 国产精品乱码一区二三区的特点 | 国产成人精品无人区| 99香蕉大伊视频| 欧美中文综合在线视频| 成人精品一区二区免费| 婷婷成人精品国产| 精品一品国产午夜福利视频| 国产精品一区二区在线观看99| 中文字幕人妻熟女乱码| 一级毛片精品| 亚洲三区欧美一区| 精品视频人人做人人爽| 大陆偷拍与自拍| 热99re8久久精品国产| 看黄色毛片网站| 免费黄频网站在线观看国产| 美女高潮到喷水免费观看| 午夜福利在线观看吧| √禁漫天堂资源中文www| 欧美人与性动交α欧美软件| 精品亚洲成国产av| 亚洲九九香蕉| 久9热在线精品视频| 91大片在线观看| 日韩中文字幕欧美一区二区| 欧美黑人精品巨大| 狠狠婷婷综合久久久久久88av| 久久这里只有精品19| 久久久久久久久免费视频了| 国产精品亚洲一级av第二区| 国产欧美日韩一区二区三| 母亲3免费完整高清在线观看| 国产精品欧美亚洲77777| 国产乱人伦免费视频| 啦啦啦免费观看视频1| 最近最新中文字幕大全免费视频| 女人被狂操c到高潮| 亚洲欧美一区二区三区久久| 久久久久久人人人人人| 在线天堂中文资源库| 亚洲一区二区三区欧美精品| 亚洲精品乱久久久久久| 欧美人与性动交α欧美软件| 亚洲国产毛片av蜜桃av| 精品国产一区二区三区久久久樱花| 超碰成人久久| 999精品在线视频| 免费在线观看完整版高清| 一边摸一边抽搐一进一小说 | 欧美日韩瑟瑟在线播放| 女人高潮潮喷娇喘18禁视频| 岛国毛片在线播放| 在线观看免费午夜福利视频| 我的亚洲天堂| 一进一出抽搐gif免费好疼 | 国产又色又爽无遮挡免费看| cao死你这个sao货| 国产人伦9x9x在线观看| 亚洲精品成人av观看孕妇| 国产主播在线观看一区二区| 日韩欧美免费精品| 一级作爱视频免费观看| 脱女人内裤的视频| 欧美 日韩 精品 国产| 久久香蕉国产精品| 在线观看免费午夜福利视频| 久久青草综合色| 91字幕亚洲| 久久久久久人人人人人| 亚洲av美国av| 欧美日韩一级在线毛片| 成人精品一区二区免费| 色在线成人网| 亚洲欧美一区二区三区黑人| svipshipincom国产片| 一二三四在线观看免费中文在| 精品国产一区二区三区四区第35| 亚洲av欧美aⅴ国产| 色尼玛亚洲综合影院| 国产无遮挡羞羞视频在线观看| a级片在线免费高清观看视频| 在线观看午夜福利视频| 国产人伦9x9x在线观看| 黄网站色视频无遮挡免费观看| 欧美老熟妇乱子伦牲交| 99精品久久久久人妻精品| 亚洲色图综合在线观看| 人人妻,人人澡人人爽秒播| 国产伦人伦偷精品视频| 久久天堂一区二区三区四区| 亚洲成人手机| 日本a在线网址| 国产亚洲一区二区精品| 男女免费视频国产| 亚洲五月色婷婷综合| 人成视频在线观看免费观看| 美女国产高潮福利片在线看| 久久婷婷成人综合色麻豆| 中文字幕人妻丝袜一区二区| 欧美成狂野欧美在线观看| 夜夜躁狠狠躁天天躁| 黑人巨大精品欧美一区二区蜜桃| 国产一区二区激情短视频| 国产成人免费观看mmmm| 高清欧美精品videossex| 人妻 亚洲 视频| 国产精华一区二区三区| 免费一级毛片在线播放高清视频 | 国产av精品麻豆| 可以免费在线观看a视频的电影网站| 久久久久国产精品人妻aⅴ院 | 人妻丰满熟妇av一区二区三区 | 国产精华一区二区三区| 一本大道久久a久久精品| 亚洲精品国产精品久久久不卡| 国产人伦9x9x在线观看| 飞空精品影院首页| 亚洲欧美一区二区三区黑人| 99精品在免费线老司机午夜| 国产不卡一卡二| 18禁黄网站禁片午夜丰满| 欧美精品啪啪一区二区三区| 亚洲精品美女久久久久99蜜臀| 久久久国产一区二区| 欧美日韩亚洲高清精品| 久久人妻熟女aⅴ| 一级毛片高清免费大全| 一本一本久久a久久精品综合妖精| 国产亚洲欧美98| 欧美亚洲日本最大视频资源| 亚洲黑人精品在线| 一级毛片精品| 久久精品国产清高在天天线| 国产精品亚洲av一区麻豆| 国产无遮挡羞羞视频在线观看| 日本欧美视频一区| 一级作爱视频免费观看| av免费在线观看网站| 亚洲九九香蕉| 日日夜夜操网爽| 国产一卡二卡三卡精品| 人妻久久中文字幕网| 久久精品国产a三级三级三级| 国产精品亚洲av一区麻豆| 欧美 日韩 精品 国产| 欧美亚洲 丝袜 人妻 在线| 久久久久国产精品人妻aⅴ院 | 狂野欧美激情性xxxx| 九色亚洲精品在线播放| 一个人免费在线观看的高清视频| xxx96com| 热99久久久久精品小说推荐| 欧美一级毛片孕妇| av天堂在线播放| 久9热在线精品视频| 一区二区三区激情视频| 天天操日日干夜夜撸| 两性午夜刺激爽爽歪歪视频在线观看 | 亚洲av片天天在线观看| 日韩欧美国产一区二区入口| 青草久久国产| 一本大道久久a久久精品| 亚洲精品国产精品久久久不卡| 国产aⅴ精品一区二区三区波| 国产真人三级小视频在线观看| 女人被狂操c到高潮| 久久人人97超碰香蕉20202| 大香蕉久久成人网| 欧美精品亚洲一区二区| 99国产综合亚洲精品| 亚洲欧美日韩高清在线视频| 精品一区二区三卡| 亚洲免费av在线视频| 飞空精品影院首页| 亚洲人成电影观看| 又紧又爽又黄一区二区| 人成视频在线观看免费观看| 无限看片的www在线观看| 色94色欧美一区二区| 人妻丰满熟妇av一区二区三区 | 啪啪无遮挡十八禁网站| 王馨瑶露胸无遮挡在线观看| 午夜福利,免费看| 免费av中文字幕在线| 亚洲国产看品久久| 成人国语在线视频| 99国产极品粉嫩在线观看| 99香蕉大伊视频| 欧美亚洲日本最大视频资源| 午夜精品在线福利| 亚洲中文字幕日韩| a在线观看视频网站| 亚洲成人免费电影在线观看| 国产成人精品久久二区二区91| 老司机深夜福利视频在线观看| 久久人妻福利社区极品人妻图片| 在线观看舔阴道视频| 国产av精品麻豆| 天天躁日日躁夜夜躁夜夜| 午夜福利免费观看在线| 99国产精品99久久久久| 国产精品.久久久| 免费一级毛片在线播放高清视频 | 高清欧美精品videossex| 亚洲九九香蕉| 美国免费a级毛片| 国产亚洲精品久久久久5区| 搡老熟女国产l中国老女人| 老司机影院毛片| 精品久久久精品久久久| 日韩视频一区二区在线观看| 精品高清国产在线一区| 日本黄色日本黄色录像| 免费观看精品视频网站| 99国产精品免费福利视频| 欧美久久黑人一区二区| 国产精品免费一区二区三区在线 | 亚洲欧美激情在线| 欧美亚洲 丝袜 人妻 在线| 亚洲成人免费电影在线观看| a在线观看视频网站| 国产无遮挡羞羞视频在线观看| 制服诱惑二区| 黄网站色视频无遮挡免费观看| 国产精华一区二区三区| 999久久久精品免费观看国产| 久久精品亚洲熟妇少妇任你| 老司机影院毛片| 成人永久免费在线观看视频| 国内毛片毛片毛片毛片毛片| 韩国av一区二区三区四区| 久久人妻福利社区极品人妻图片| 三上悠亚av全集在线观看| 18禁国产床啪视频网站| 亚洲综合色网址| 久久亚洲精品不卡| 中文字幕最新亚洲高清| 亚洲,欧美精品.| 国产成人av激情在线播放| 亚洲熟妇中文字幕五十中出 | 大片电影免费在线观看免费| 久久狼人影院| av网站在线播放免费| 日韩欧美一区视频在线观看| 成人影院久久| 一二三四社区在线视频社区8| 99国产精品99久久久久| 成在线人永久免费视频| 女人爽到高潮嗷嗷叫在线视频| 99国产综合亚洲精品| 国产精品乱码一区二三区的特点 | 悠悠久久av| 久久久国产成人免费| 嫩草影视91久久| 欧美丝袜亚洲另类 | av线在线观看网站| 18禁裸乳无遮挡动漫免费视频| 国产又色又爽无遮挡免费看| 国产精品永久免费网站| 亚洲男人天堂网一区| 久久青草综合色| 黑丝袜美女国产一区| 国产免费男女视频| 亚洲七黄色美女视频| 国产精品99久久99久久久不卡| 18禁裸乳无遮挡免费网站照片 | 午夜两性在线视频| 制服诱惑二区| 亚洲精品在线观看二区| 欧美一级毛片孕妇| 在线永久观看黄色视频| 黑人欧美特级aaaaaa片| 在线天堂中文资源库| 91在线观看av| 满18在线观看网站| 亚洲欧美激情在线| 久99久视频精品免费| 久久久国产欧美日韩av| 欧美成人免费av一区二区三区 | 中亚洲国语对白在线视频| 亚洲欧美激情在线| 18禁黄网站禁片午夜丰满| 免费少妇av软件| 国产高清国产精品国产三级| 精品人妻熟女毛片av久久网站| 男男h啪啪无遮挡| 欧美日韩乱码在线| 少妇 在线观看| 欧美日韩一级在线毛片| 亚洲国产毛片av蜜桃av| 国产片内射在线| 久久久久久久午夜电影 | 欧美精品亚洲一区二区| 亚洲色图 男人天堂 中文字幕|