• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      Visual SLAM in dynamic environments based on object detection

      2021-11-03 13:24:36YongAiTingRuiXioqingYngJilinHeLeiFuJininLiMingLu
      Defence Technology 2021年5期

      Yong-o Ai ,Ting Rui ,Xio-qing Yng ,* ,Ji-lin He ,Lei Fu ,Jin-in Li ,Ming Lu

      a College of Field Engineering,People’s Liberation Army Engineering University,Nanjing,210007,China

      b JinKen College of Technology,Nanjing,211156,China

      c Research Institute of Chemical Defense,Academy of Military Sciences,Beijing,102205,China

      Keywords: Visual SLAM Object detection Dynamic object probability model Dynamic environments

      ABSTRACT A great number of visual simultaneous localization and mapping(VSLAM)systems need to assume static features in the environment.However,moving objects can vastly impair the performance of a VSLAM system which relies on the static-world assumption.To cope with this challenging topic,a real-time and robust VSLAM system based on ORB-SLAM2 for dynamic environments was proposed.To reduce the influence of dynamic content,we incorporate the deep-learning-based object detection method in the visual odometry,then the dynamic object probability model is added to raise the efficiency of object detection deep neural network and enhance the real-time performance of our system.Experiment with both on the TUM and KITTI benchmark dataset,as well as in a real-world environment,the results clarify that our method can significantly reduce the tracking error or drift,enhance the robustness,accuracy and stability of the VSLAM system in dynamic scenes.

      1.Introduction

      With the fast development of the urbanization,the city will become the main scenario for the future war based on manmachine coordinated combat and unmanned combat.How to achieve effective reconnaissance of the urban combat environment before or during the war has become one of the key factors for victory in urban warfare.The rise of simultaneous localization and mapping (SLAM) technology enhances the capacity of reconnoitering information of urban battlefield environments in real-time.In the field of mobile robot research,map construct technology is the core part of perception,modeling,planning and understanding in an unknown environment.The SLAM uses the data captured by the externally perceived sensors to self-locate and build a map of the surrounding environment at the same time.This advanced technology can be utilized in the unmanned aerial vehicle (UAV),unmanned ground vehicle (UGV),and so on,to self-locate and navigate in order to scout in the urban battlefield environment and carry out precise strikes on the military targets.

      After decades of rapid development,numerous mature SLAM technical solutions have emerged at the current stage,such as PTAM [1],LSD-SLAM [2],DSO [3],ORB-SLAM2 [4] and VINS-Mono[5],etc.And it has been a pivotal technology in the field of automatic robots and computer vision.However,to simplify the problem formulation,most state-of-the-art SLAM works depend on the core underlying assumption of a static environment,containing only rigid,non-moving objects.The pose estimation and localization of robustness and accuracy in SLAM in dynamic scenes still have many crucial challenges.These mainly include two aspects:on the one hand,it is not easy to define a dynamic object from the planar pixel;on the other hand,to detect and track dynamic objects in sequential images is difficult.

      The lack of robustness is the core problem facing robotic perception today in dynamic environments.The conventional technique for handling moving objects in the SLAM optimization process is to detect them and then either regard them as outliers[6,7] or track them separately utilizing traditional multi-target tracking [8].Nevertheless,dynamic objects,such as humans,exist in many real-life environments.While a small portion of dynamic objects can be handled by viewing them as noise,a large proportion of dynamic objects violate the static environment assumption,thus the usage of many existing visual odometry methods for real applications is limited.

      In this study,we put forward a visual SLAM method in dynamic environments,which combines object detection based on Deep Learning techniques together with the dynamic object probability model to distinguish static and dynamic areas of the scene effectively and accurately.Our experimental demonstrations verify that the VSLAM shows improved accuracy and robustness in dynamic scenarios.The main contributions of our work are summarized as follows:

      ? A new SLAM framework based on ORB-SLAM2 combined with object detection is proposed to impair the influences of moving objects on the camera pose estimation and dense 3D point cloud mapping.The approach of deep object detection neural network serves as a preprocessing stage to filter out data that are related to dynamic targets or static objects.

      ? A novel dynamic object probability model is put forward to enhance the ability of separating dynamic objects from the static scenes in our VSLAM system.It calculates the probability of each keyframe point,and update and propagate the moving probability of features points and map points in the tracking thread of our SLAM.

      In the rest of the paper,Section 2 describes relevant work about dynamic SLAM systems and object detection deep learning methods adopted in the VSLAM.Section 3 provides the architecture of our SLAM system and the details of our approach to solving vision-based SLAM in dynamic environments.Afterward,we show in section 4,the qualitative and quantitative results of the performance of our SLAM system in the TUM [9] and KITTI benchmark dataset [10] and in the dynamic indoor scenarios revealing the effectiveness,availability and robustness of the system.At last,in section 5 we comment on the conclusions of the presented work,and a brief research direction that could be explored to improve our approach is given.

      2.Related work

      The SLAM in dynamic environments.As early as 2003 [11],Wang et al.propose a Bayesian formula of the SLAM in dynamic environments,they provide a solid foundation for understanding and solving the DATMO problem[12].Then in 2007,they again put forward the mathematical framework to integrate SLAM and moving object tracking.The SLAM and moving object tracking(SLAMMOT)problem is introduced and they think both of them are mutually beneficial [13].Soon afterward,more and more scholars pay attention to this research field and plenty of related research works are constantly emerging in large numbers.Bleser et al.describe a conditional probability framework where the initialization of the model using a CAD model of one object in a dynamic environment [14].With the RANSAC algorithm [15],the camera pose and structure reconstruction are acquired and the uncertainty presenting in the computer vision research is taken into account in a probabilistic way.In the literature [16],an approach that combines the least-squares formulation of SLAM,sliding window optimization algorithm with generalized expectation maximization to directly cope with both dynamic and static objects in a SLAM system.Imre et al.use multiple camera setups to estimate pose trajectory and mapping in generic dynamic environments,static cameras are utilized to build a sparse 3D model of the scenario,with respect to which the moving camera pose is estimated at every moment [17].Similarly,in reference [18],a multi-robot simultaneous localization and tracking (MR-SLAT) algorithm based on the extended Kalman filter is proposed,which maintains the relation between teammate robots and dynamic objects via the augmented state.Walcott-Bryant et al.describe a Dynamic Pose Graph(DPG-SLAM) which enables a robot to remain localized in a challenging dynamic scene over time with laser sensors[19].Then an efficient and up-to-date map is kept in long-term low dynamic environments.In Ref.[20] Einhorn and Gross combine normal distribution transform and occupancy mapping as well as a method for detecting and tracking moving objects in highly dynamic scenes.Combining a Kinect-style RGB-D camera with an IMU a robust visual odometry which generates 3D feature points integrating depth information into RGB color information is proposed to be applied to a highly dynamic environment in reference [21].Likewise,the IMU is also incorporated into a monocular visual odometry algorithm based on the extended Kalman filter within which multilevel patch features are directly tracked in Ref.[22].Rünz and Agapito present a method to explicitly deal with dynamics and enforce both spatial and temporal coherence combined with the motion segmentation and the object instance segmentation,which reconstructs and keeps tracking moving objects with a separate 3D model [23].In reference [24],a novel method to segment and track multiple moving objects utilizing ML-RANSAC algorithm in real dynamic scenarios.Barnes et al.use offline multi-session mapping means to generate a pixel-wise ephemerality mask and depth map with a monocular visual odometry to reduce the drift and improve ego-motion estimation in urban traffic challenging environments [25].In reference [26],an approach of dense RGB-D SLAM which detects moving objects and reconstructs the background structure in the meantime is proposed to enhance the autonomous robotics capability in a complex and highly dynamic scene.In the recent work[27],an innovative sparse motion removal (SMR) model which differentiates dynamic regions from static areas in each input frame,then it gets rid of dynamic regions and the static ones are fed into a feature-based SLAM for visual localization.

      Object detection is used in the VSLAM system.In recent years,since deep learning has achieved important advances in object detection,the academic community of autonomous robotics has combined deep learning with SLAMIDE issues [28].Ekvall et al.propose a novel object detection method named Receptive Field Cooccurrence Histograms(RFCH)embedded into the SLAM system to provide abundant representations of natural scenes,and the system is able to address problems for instance complex background,dynamic scenes and varying illumination [29].In articles[30,31],Wang et al.present a novel moving object detection(MOD)algorithm used in a robot Mono-SALM system for dynamic environments.Similarly,Gálvez-López et al.combine a novel object recognition based on bags of binary words [32] with a monocular SLAM to work together and benefit each other in real-time[33].In addition,a novel object detection method which is based on DPM[34] is used to add semantic information into the sparse map in a monocular SLAM[35].Fast generic object detection associating the height transcendental information with the image region sizes is integrated into a Kalman filtering-based monocular SLAM in order to address the intrinsic problem of unknown global scale in a Mono-SLAM system in another unique way [36].In the literature[37],the SSD object detection algorithm[38]and 3D unsupervised segmentation method proposed in Ref.[39]with improvements are incorporated into the RGB-D ORB-SLAM2 framework,and a semantic map can be built and updated on the fly.Likewise,Zhong et al.also combine the SSD object detection algorithm with the RGB-D ORB-SLAM2 in dynamic and complex environments,and they make the object detector and SLAM mutually beneficial in their Detect-SLAM framework[40].In the last two years,a featurebased S-PTAM[41]framework is integrated with a modified Faster R-CNN network [42] which is applied to estimate the class,Bounding Box data and orientation as well as dimensions[43].Xiao et al.put forward a Dynamic-SLAM based on the Mono ORB-SLAM2 system,which makes use of the SSD object detector to detect moving objects at the semantic level and a missed detection compensation algorithm to improve the recall rate of the detector[44].In this work,with less much less explored perspective,we present a robust visual SLAM system based on a deep-learningbased object detection and the dynamic object probability model,which is primarily appropriate for dynamic environments.

      Fig.1.The overview of our VSLAM,front-end procedures:tracking,object detection;Back-end procedures:local mapping,loop closing and global optimization.

      3.System description

      In this section,the technical details about our SLAM system will be elaborated in detail.The section is split into four parts.First of all,the framework of our SLAM is presented.Next,a brief introduction of the object detection method adopted in the system is given.Then the dynamic object probability(DOP)model that we put forward to estimate the motion corresponding likelihoods among sequential frames is illustrated at length.Finally,the means of discarding outliers in our system is presented.

      3.1.Overview

      The critical task of visual SLAM is to locate dynamic objects in the real environment and estimate the pose accurately.To the best of our knowledge,ORB-SLAM2 has an outstanding performance in various scenarios,indoor or outdoor scenes,to be applied in drone aircrafts or pilotless automobiles.However,the ORB-SLAM2 is based on the static-world assumption and still has a few shortcomings in handling with dynamic scene problems,so we integrate our method with the ORB-SLAM2 system in order to enhance its robustness and stability in highly dynamic environments.

      The block diagram of the system is shown in Fig.1.Our system has four main parallel threads:tracking,object detection,local mapping and loop closing.First of all,the raw stereo RGB images or RGB and depth frames are processed in the tracking thread.Then the tracking thread extracts keyframes from sequential RGB frames,which are sent to the object detection thread next.In this thread,only keyframes go through the deep object detection convolutional neural network,then predefined dynamic categories of objects(such as pedestrians,cars and riders) are detected accurately.Afterward,the DOP model is built in order to discriminate dynamic and static areas and propagate the dynamic object probability by features matching and matching points expansion in the tracking thread.Furthermore,most of ORB features pertaining to moving regions are discarded and the exact camera pose estimation is acquired.Finally,the fifth thread is launched to perform full BA after the local mapping and loop closing thread implemented.

      3.2.Object detection

      The core task of object detection is to accurately and efficiently identify and locate a large number of object instances of predefined categories from images.This technique is vital to many real-world applications such as automatic driving,intelligent transportation and video surveillance.In the proposed system,a state-of-the-art detector YOLOv4 that is an efficient and powerful object detection model is adopted to predict classes and bounding boxes of objects implemented by Alexey1https://github.com/AlexeyAB/darknet.in real-time [45].The YOLOv4 is designed to achieve faster operating speed,fewer parameters and more accurate than SSD that is used in reference[46],which can be easily trained and used.The object detector network trained on the MS COCO dataset [47] could detect 80 classes in total and acquire real-time,high quality,and persuasive object detection results.

      The YOLOv4 network takes a color keyframe as input and output corresponding Bounding Boxes which label the area in the keyframe with the several predetermined categories,e.g.,cars,persons,laptops,and so on.This Bounding Box of each class can be easily used in the VSLAM systems to precisely separate the movable object region and the static background region.Then the detector results of the keyframe are input to the dynamic object probability model of our system and the detailed implementation is shown in Section III-C.

      3.3.Dynamic object probability (DOP) model

      In the research of visual SLAM in dynamic environments,the optical flow method has been used to detect which pixels of frames correspond to moving objects in the literature[48,49].However,it is inevitable that the movement of the camera itself brings with the error of changes of optical flow field.What’s worse,when the camera angle of view happens to too many changes,the effect of the optical flow method becomes worse.In order to differentiate between the dynamic area and the static region efficiently and accurately,the DOP model is proposed in this paper.The probability of feature points belonging to moving objects is called as the dynamic object probability.As it is shown in Fig.2,the moving probability of ORB feature points on the keyframe is divided into four states.Then we utilize both high-confidence points to propagate the moving probability to adjacent unmatched ORB feature points during the matching point expansion.

      The purpose of the DOP model is to update and propagate the moving probability of features points exclusively on the keyframe in the tracking thread,so that the efficiency of object detection towards dynamic objects is improved greatly.Considering the spatial-temporal consistency of image sequences,just keyframes are selected to be processed in the object detection thread,then the DOP model is built and the moving probability is propagated frameby-frame in the tracking thread.Furthermore,the moving probabilityPt(Xi)of 3D points which have matched ORB feature points in the keyframe is constantly updated in the local map with the following equation:

      Fig.2.Four states of the moving probability of ORB feature points.

      wherePt-1(Xi)denotes the moving probability of 3D pointXiafter the updating in the last keyframeKt-1.If it’s the first point,we setPt-1(Xi)=Pinit=0.5.AndSt(xi)is the state of matched ORB feature pointxiin the current keyframeKt,which depends on the detected areas.If the ORB feature pointxifalls into the Bounding Box of dynamic objects,it is treated as a determinate moving point,whose state value isSt(xi)=1.The remaining points are regarded as determinate static points,withSt(xi)=0.The α represents the influence factor to smoothen the instant detection results.A higher value means more sensitivity to the instant detection results and the lower value represents that more historical results from multiview are considered.And we set α=0.3 in our RGB-D SLAM experiment and αis 0.5 in our stereo SLAM system so that it is well suitable for the complex and highly dynamic environments.

      The moving probability of each ORB feature point frame-byframe is estimated and updated via two means:feature matching and matching point expansion.As it is shown in Fig.3,the moving probability of ORB feature points in the current frame is propagated from the points in the last frame.In the process of feature matching,when an ORB feature pointxiis matched to another onein the last frame,the moving probabilityis propagated to it.

      In addition,when an ORB feature point is matched to any 3D pointin the local map,it is also given a moving probability whose value is the same as the matched pointFurthermore,if a feature point has the matched point both in the last frame and in the local map,we should select the probability value of the local map as its moving probability.Then an initial probability valuePinitis assigned to other unmatched points in this frame.TheirPinitis set to 0.5,because we have no a priori knowledge about which state those points pertain to.The following rule of the moving probability propagating with feature matching operation is as follows:

      Subsequently,the matching point expansion is designed to expand the moving probability from the high-confidence points to other adjacent points which do not correspond to any matched points in the feature matching process.It is based on the idea that the state of neighboring points is corresponding in most cases.So after the moving probability propagating with feature matching,high-confidence pointsxtwhich include static and dynamic feature points are selected to expand their influencing area to a round region with radius γ.Then we keep searching unmatched points within the area.The probability value of the found points is updated following equation (3):

      wherePinitdenotes the initial moving probability.Note that if a point is influenced by more than one high-confidence point,all the influences of these adjacent high-confidence points are summed.And the influence of a high-confidence point including the difference of moving probability is formulated asAnd λ(d)represents a distance factor,Cdenotes a constant value.

      3.4.Outliers removal

      Fig.3.The diagrammatic drawing of the moving probability propagating frame-by-frame.The size of points represents the confidence.

      A core problem for visual SLAM in dynamic scenarios is the rejection of landmarks which actually lie on moving objects.Nevertheless,if the object detection network is merely utilized to differentiate static and dynamic areas of the image,it will fail to estimate lifelong models when a priori dynamic objects keeps static,e.g.,parked cars or people sitting.On the other hand,in extremely challenging scenarios where moving objects can cover almost the whole image view,there can be some remaining correspondences declared as inliers that belong to moving objects.Just because of these,we propose to integrate object detection with the DOP model to perform outliers removal successfully.

      Next,we will explain how to identify areas in the image that belong to moving objects,combines object detection with the DOP model.By means of the object detection algorithm expounded in Section III-B,the Bounding Boxes of the predefined dynamic and static object categories in the keyframe are obtained accurately.Then the DOP model is built so as to estimate and update the moving probability of each ORB feature point frame-by-frame,whereupon,we are able to differentiate between static and dynamic points precisely and efficiently.After that,in the tracking thread,we delete ORB feature points located on dynamic parts of the scene and moving map points in the local map before the camera pose estimation.Finally,more robust and accurate camera ego-motion results can be acquired successfully through our VSLAM system.The whole process of outliers removal is described in Fig.4.The outliers removal of our RGB-D SLAM is described on the top row and the bottom row represents our Stereo SLAM.On the right side of demarcation line are the results with ORB-SLAM2 operation.Compared with the ORB-SLAM2,the ORB-feature points of moving objects are deleted completely by our means.And it is obvious that the dynamic ORB-feature points are removed completely and merely the remaining static points are used to perform the camera pose estimation.In addition,the probability distribution of dynamic 3D points in the local map is updated in quick succession.

      Fig.4.The process of discarding outliers.

      By the way,the object detection detector has an inherent problem that the Bounding Box size of the object exceeds the range of its profile,which leads to the appearance of Bounding Boxes overlapping.Then we put forward an algorithm to compute the moving probability of ORB-feature points in these regions.It is shown in Algorithm 1.

      Algorithm 1 To compute the moving probability of ORB-feature points in the overlapping areas of several BBoxes.

      4.Experiments

      In this section,the performance of our visual SLAM is analyzedboth quantitatively and qualitatively.We estimate the property of our system in dynamic environments with public TUM [9] and KITTI [10] dataset and test the time consuming for tracking in the system.What’s more,we integrate our RGB-D SLAM with ROS and qualitatively give evidence of it on a physical robot in the dynamic indoor environment to evaluate its accuracy,robustness and realtime performance.All of the experiments are carried out on a computer with Intel i7 CPU,TITAN GPU,and 12 GB memory.The physical robot is TurtleBot3,and image sequences are captured by the Azure Kinect DK sensor,which provides highly accurate color and depth images and camera calibration parameters.

      4.1.Evaluation using the benchmark dataset

      The TUM RGB-D dataset provides many sequences in dynamic indoor scenes with accurate ground-truth data.It contains walking,sitting and desk sequences,and the walking sequences are mainly utilized for our experiments,since they are highly dynamic scenarios where two persons are walking back and forth.Furthermore,the KITTI dataset that is captured by driving around a mid-size city,in rural areas and on highways provides many sequences in dynamic scenes with accurate ground truth trajectories directly attained by the output of the GPS/IMU localization unit projected into the coordinate system of the left camera after rectification.It contains the 01 sequence is captured by driving around on highways,and the 04 sequence is in a city road.The above two sequences are mainly used for our experiments because they are highly dynamic environments.

      In this section,in order to verify the effectiveness of the proposed method in dynamic environments,we compare it with the ORB-SLAM2 system,which is widely accepted as one of the most robust and state-of-the-art VSLAM systems at present.The metric of absolute pose error (APE) is very suitable for measuring the performance of the visual system.And the metric of relative pose error (RPE) is utilized to measure the drift of the visual odometry.So we compute the metrics APE and RPE for the quantitative evaluation.And the values of Root-mean-square Error(RMSE) and Standard Deviation (STD) can preferably demonstrate the robustness and stability of the system.Then they are chosen as the measure of evaluation.

      The results of our system and RGB-D ORB-SLAM2 for seven sequences within the TUM dataset are shown in Table 1 and Table 2.As far as APE is concerned,the improvement values of RMSE and STD can respectively come up to 98.29%and 98.07%.In addtion,the improvement values of RMSE and STD can respectively come up to 63.53%and 77.68%.As it can be seen that,compared against RGB-D ORB-SLAM2,our SLAM system performs outstandingly in most highly dynamic sequences and achieves an order of magnitude enhancement.It indicates that the robustness and stability of the visual SLAM system are significantly improved with our means in highly dynamic environments.However,in the static or lowdynamic cases,the improvment value of metrics APE is small and the metrics RPEs of a few static or low-dynamic sequences have not been improved,so the performance of our system is almost similar to the ORB-SLAM2.

      Table 1 Results of metrics absolute pose error(APE[m]).

      Table 2 Results of metrics relative pose error(RPE[m]).

      Fig.5.The results of APE in the fr3/walking_xyz sequence].(a)and(b)are the trajectories compared with the ground truth of ORB-SLAM2 and our SLAM respectively;(c)and(e)are the contrastive curves among three trajectories in xyz and rpy mode respectively;(d)is the statistic data about comparing ORB-SLAM2 with our SLAM;(f) are the box plots of the APE of ORB-SLAM2 and our SLAM.

      Fig.5 and Fig.6 show selected APE and RPE plots from ORBSLAM2 and our SLAM in highly dynamicfr3/walking_xyzsequence respectively.It is obvious that the errors are significantly decreased and our SLAM system is more robust and stable than the RGB-D ORB-SLAM2 in highly dynamic scenarios.

      Fig.7 and Fig.8 respectively show APEs and RPEs of our stereo-SLAM on the 5 sequences of KITTI Odometry dataset.From the histograms,it is found that our system significantly outperforms stereo ORB-SLAM2 in highly dynamic sequences of 01 and 04.And the trajectories estimated with our VSLAM are more accurate than that with ORB-SLAM2.Compared with the scene of 04 sequence,the 01 sequence is more intricate.It can be inferred from this that our VSLAM system will be more stable and robust in more challenging dynamic environments.What’s more,our system has comparable performance to ORB-SLAM2 with respect to tracking accuracy in static sequences (00,02,03).

      Fig.6.The results of RPE in the fr3/walking_xyz sequence.(a) are the RPE curves of ORB-SLAM2 and our SLAM;(b) is the statistic data about comparing ORB-SLAM2 with our SLAM;(c) are the violin histograms of the RPE of ORB-SLAM2 and our SLAM.

      Fig.7.Absolute pose errors on the 5 sequences in KITTI Odometry dataset.Lower is better.

      Fig.8.Relative pose errors on the 5 sequences in KITTI Odometry dataset.Lower is better.

      4.2.Timing analysis

      In most real-world applications of visual SLAM,e.g.,autonomic robots,driverless cars and augmented reality,the real-time performance is a vital indicator to estimate the SLAM system.Thereupon,we test the tracking times using the sequence 05 of the KITTI dataset.Table 3 shows their tracking times.It can be seen that its computation time is similar to the ORB-SLAM2 and sufficiently short for real-time applications.

      Table 3 The mean and median tracking times of two frameworks.

      4.3.Evaluation in real environment

      A majority of SLAM systems perform registration between the current frame and the last frame and the estimated transformation matrix among these frames is assumed to be due to the camera motion,which leads to irreversible corruptions on account of dynamic objects are likely to be fused into the map.To demonstrate the robustness and usability of our system,we carry out a realworld RGB-D SLAM experiment in the dynamic indoor scene,with a person walking straightly in the front of a desk in a laboratory.Images of resolution 640 × 576 captured at 30 Hz were processed on an NVIDIA Jetson TX2 platform.The indoor sequence was composed of a total number of 662 pairs (color and depth images) gathered by the Azure Kinect DK RGB-D camera in a laboratory.And the total length in time of the experiment was about 85 s.The full trajectory was approximately 4.6 m long from the initial camera position.

      Fig.9 depicts the dense mapping performed by our RGB-D SLAM system in a laboratory.As we can observe,in Fig.9 (a) the ORBSLAM2 is not able to distinguish the moving person walking straightly in the front of a desk and build a series of dense points cloud maps of the profile of the person,since its algorithm is based on the static-world assumption and the moving person is mistaken for the static objects.In contrast,in Fig.9 (b) our system can deal with dynamic objects accurately,although the slight distortion occurs in the reconstructed desk where a dynamic person passed by,the profile of the moving person is discarded out thoroughly.It fully proves that our system is more robust,accurate and stable than ORB-SLAM2 in dynamic environments.

      Fig.9.Dense 3D reconstruction point cloud maps.In(a),it is the profile of the moving person in the red dotted box and the red arrow points to the direction of people movement.

      5.Conclusion

      In this paper,we have proposed a real-time visual SLAM system that can perform well in highly dynamic environments with many independent moving objects.The object detection and dynamic object probability model are incorporated into the ORB-SLAM2 system,which makes some significant improvements in highly dynamic environments.In addition,for the inherent problem that object detection hardly obtains the Bounding Box fitted well to the object,an algorithm is proposed to settle this effectively.By this means,we can achieve the similar effectiveness to the pixel-wise precision of semantic segmentation method and faster than semantic segmentation in terms of performance.Finally,we present experimental results illustrating the improved accuracy of our proposed means and the efficiency and usability of our implementation.It indicates that we are able to deal with challenging dynamic objects and improve considerably visual odometry estimations.Consequently,more robust and accurate localization and mapping results are obtained in highly dynamic scenarios.

      Nevertheless,there is room for improvement.In our visual SLAM system,the deep neural network used in the object detection thread is a supervised method.That is to say,the detector model may hardly predict correct results when there are significant differences between training scenes and actual scenes.In future work,we can employ self-supervised or unsupervised deep learning approaches in order to overcome this issue.

      Declaration of competing interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgements

      The research presented in this paper has been funded by the National Natural Science Foundation of China (No.61671470).

      静安区| 玛多县| 达尔| 福海县| 桓台县| 泰州市| 越西县| 瑞安市| 呼和浩特市| 犍为县| 明星| 齐河县| 萨嘎县| 晋中市| 康保县| 伊金霍洛旗| 正阳县| 东乌珠穆沁旗| 玉屏| 瓦房店市| 疏附县| 聂拉木县| 金乡县| 乌兰县| 寻乌县| 林周县| 辰溪县| 渭源县| 巴南区| 昭苏县| 衡南县| 景宁| 临夏市| 泗水县| 灵山县| 寿光市| 乡城县| 大庆市| 永昌县| 怀远县| 章丘市|