Tingting Li·Haowei Zhu·Chunhe Hu·Junguo Zhang
Abstract Existing almost deep learning methods rely on a large amount of annotated data,so they are inappropriate for forest fire smoke detection with limited data.In this paper,a novel hybrid attention-based few-shot learning method,named Attention-Based Prototypical Network,is proposed for forest fire smoke detection.Specifically,feature extraction network,which consists of convolutional block attention module,could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas.Moreover,we design a metalearning module to alleviate the overfitting issue caused by limited smoke images,and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images.A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
Keywords Forest fire smoke detection·Few-shot learning·Channel attention module·Spatial attention module·Prototypical network
Forest fires are fast-spreading and difficult to extinguish due to the large amount of combustible material.They not only cause serious damage to human life and property,but also lead to the rapid degeneration of the ecological environment(Peng and Wang 2019).Therefore,early detection of forest fires is essential for disaster reduction.Image-based fire detection methods are more suitable for outdoor environment such as forest,mountainous and parking area compared to sensors (Frizzi et al.2016).These methods are generally divided into flame detection and smoke detection (Barmpoutis et al.2020).Smoke is an important sign for early fire detection because it spreads faster than flame and moves over a wide area (Wang et al.2019).Conventional imagebased fire smoke detection methods handcrafted low-level features (e.g.color,texture,shape,etc.) rely on experience.However,the above methods can only be applied in specific scenarios and the accuracy rate decreases once the environment changes (Hu and Lu 2018).
With the rapid development of artificial intelligence and computer vision technology,deep learning has made significantly strides on several challenging visual tasks (Ferreira et al.2020;Xie et al.2019;Hu and Guan 2020;Liu et al.2019).Fire smoke detection methods based on convolutional neural networks (CNNs) have recently drawn much attention and have significantly improved the accuracy of fire smoke detection.Compared with traditional image-based smoke detection approaches,these methods have the ability to extract depth features automatically during the feature extraction process.What’s more,existing methods can be applied to variable wilderness environment.However,the false alarm rate of forest frie smoke detection increases when smoke accounts for only a small portion of the image (Li et al.2018).Since,convolutional neural networks have difficulty focusing on small smoke and are unable to extract discriminative features of smoke (Li et al.2018).In this case,the network prefers to detect by the image background rather than the smoke itself.In addition,these images often have complex backgrounds and may contain scattered areas,such as shadows,clouds,haze,fog,etc.These scattered areas are one of the key challenges of small smoke detection.
Recently,few-shot learning (FSL) methods have become the main approach to solve the overfitting problem caused by limited training data.The success of fire smoke detection methods based on deep learning can be partially attributed to a large amount of annotated data (Xue and Wang 2020).However,forest fire smoke images far away from the camera are very hard to capture in real-life.There are also few small smoke images in public forest fire smoke datasets.Therefore,it is necessary to propose an effective detection approach for small smoke based on limited training data.Few-shot learning follows the paradigm of learning-to-learn and aims to solve a target classification task by learning a set of base classification tasks (Rizve et al.2021).The target classification task dataset is divided into a support set and a query set.Few-shot learning methods are able to classify query images using a few annotated support images (Zhang et al.2021).How to leverage information from the supporting images of small smoke is another key challenge in small smoke detection scenario.
Towards the aforementioned challenges,we propose a novel hybrid attention-based few-shot learning (FSL) algorithm,called Attention-Based Prototypical Network (ABPNet),for forest fire smoke detection.The proposed method consists of two modules:feature extraction module and meta-learning module.First,inspired by the attention mechanism can increase the discriminative ability of the features weighted by the attention maps,the CNN embedded with Convolutional Block Attention Module (CBAM) is designed as the feature extraction module to extract high-level and discriminative forest fire smoke features.Second,the metalearning module is implemented to distinguish between class prototypes and query targets,thereby avoiding the overf titing problem of few-shot forest fire smoke detection.Finally,the performance of the few-shot learning algorithm with hybrid attention is validated on our forest fire smoke dataset and miniImageNet dataset.
The remainder of this paper is organized as follows.Introduction reviews work related to fire detection,fewshot learning and attention mechanism.The proposed ABP-Net method is systematically introduced in Materials and methods.In Results and discussion,we compare the performance of our method with classical few-shot learning algorithms.Finally,the paper is concluded in Conclusions.In this section,we present the strengths and weaknesses of flame detection and smoke detection methods,then review the state-of-the-art few-shot learning approaches for object detection,and finally introduce the application of different attention mechanism.
Image-based fire detection methods can be classified into flame detection and smoke detection.Previous image-based approaches (Ko et al.2009;Chen et al.2010;Yuan 2012;Yuan et al.2013) rely on prior knowledge to extract smoke and flame features manually,which do not provide reliable robustness.Due to the ability to automatically extract features and learn complex representations,fire detection methods based on deep neural networks have drawn much attention in recent years.Tao et al.(2016) proposed a novel smoke detection framework based on CNNs that extracts smoke features automatically.Shen et al.(2018) used an optimized YOLO model for flame detection from video frames.To effectively exploit the long-range motion context and spatial representation,Yin et al.(2019) designed a novel recurrent motion-space context model.These methods not only solve the limitations of conventional image-based methods,but also remarkably improve the fire detection accuracy.To further improve fire detection performance and detect fire locations,many studies have integrated conventional machine learning algorithms into CNN.Maksymiv et al.(2016) integrated AdaBoost,local binary patterns (LBP)and a CNN in a smoke detection algorithm to improve the smoke detection performance and reduce time complexity.Luo et al.(2017) proposed a strategy to implicitly enlarge the suspected regions and designed a CNN-based smoke detection algorithm via the motion characteristics of smoke.Barmpoutis et al.(2019) combined the power of faster R-CNN with multidimensional dynamic texture analysis based on higher-order LDSs aiming flame detection.Although the flame detection methods aforementioned are widely used for fire detection,they may not be suitable for complex forest environment.It is almost impossible to view the flame from surveillance in time because of the amount of cover and the fact that flames have a narrower spread than smoke (Shi et al.2018).
Few-shot learning (FSL) (e.g.,data augmentation (Perez and Wang 2017),transfer learning (Sun et al.2020),meta-learning (Wu et al.2020a),etc.) have previously been employed to tackle the problem of overfitting and to relieve the diffi-culty and cost of large-scale image annotation (Wang et al.2020a).Xu et al.(2017) proposed a deep domain adaptation method for forest fire smoke detection and trained it on synthetic as well as real forest fire smoke images,which demonstrated the effectiveness of synthetic smoke images in deep learning.To improve the accuracy of smoke detection,an effective dense optical flow approach based on transfer learning was designed by Wu et al.(2020b).These methods have the ability to mitigate the overfitting issue and reduce the false alarm rate.Although data augmentation and transfer learning can only alleviate overfitting,the problem cannot be completely eliminated (Geng et al.2019).Recent researches have focused on solving the above challenge via meta-learning.The goal of meta-learning based FSL is to extract the transferable meta-knowledge in order to form a set of labeled examples which directly generalize unseen classes (Hou et al.2019).Vinyals et al.(2016) introduced a matching network with a weighted nearest neighbor classifier that is useful in labeling multiple components of oneshot classification tasks.Snell et al.(2017) first proposed the prototypical network for FSL,with the ability to indicate each class by the mean of its support images.Boney and Ilin(2017) extended the prototypical networks to augment training data via a soft-assignment method in a semi-supervised few-shot learning scenario.Numerous improved prototypical networks have been proposed.These methods are simple and efficient compared to recent meta-learning methods,while still produced state-of-the-art results (Snell et al.2017).
Attention mechanisms improve the feature representations of networks by focusing on important features and ignore unnecessary information,which is inspired by the human visual perception process (Mnih et al.2014).Attention mechanisms were first applied to natural language processing (NLP) in (Bahdanau et al.2015) and are now widely used in computer vision tasks.In order to perform explicit spatial transformations of features,Jaderberg et al.(2016)proposed a new self-contained module for neural network called the spatial transformation,which has gain in accuracy across several tasks.Hu et al.(2017) concentrated on the channel relationship and proposed the Squeeze-and-Excitation (SE) block.This block not only improves the representational power of a network,but also brings significant improvements in performance without increasing the computational cost.To capture more sophisticated channel-wise dependencies,a number of improved SE block have been proposed (Wang et al.2020b;Qin et al.2021).An Efficient Channel Attention (ECA) module based on local cross-channel interaction strategy is proposed by Wang et al.(2020b),which is extremely light-weight and shows good generalization ability in various visual tasks such as object detection and instance segmentation.In addition,many researchers have combined spatial attention and channel attention to design more sophisticated attention modules.Woo et al.(2018) presented the CBAM,a new efficient architecture that uses both spatial and channel-wise attention.Experiments on different benchmark datasets have demonstrated the superiority of this module over using only the channelwise attention.A Convolutional Triplet Attention Module(CTAM) capable of capturing cross-dimension interaction between spatial attention and channel attention was proposed by Misra et al.(2021).This module is essentially the same as the convolution module in terms of light-weighting and floating point operations per second (FLOPs).
The proposed method is evaluated on our forest fire smoke(FFS) dataset and miniImageNet dataset (Vinyals et al.2016).The FFS dataset includes 5 categories:smoke,cloud,fog,trees and cliffs,while the miniImageNet contains 100 categories.Each category in both datasets has 600 RGB images of dimensions 224×224.Figure 1 presents examples from the FFS dataset categories.The miniImageNet dataset is utilized as the base dataset to ensure adequate training dataset.Diversity of training dataset can improve the generalization ability of few-shot learning model.Therefore,80 classes are randomly selected from the miniImageNet dataset and then split into training set and validation set according to 4:1,which is introduced by Ravi and Larochelle(2017).The remaining 20 classes were used as a benchmark dataset along with the FFS dataset to evaluate the performance of our proposed method.
In this paper,a novel hybrid attention-based few-shot learning algorithm for forest fire smoke detection is proposed.The proposed framework consists of two components:the feature extraction module and meta-learning module.The overall framework proposed is shown in Fig.2.
Fig.1 Sample images from the forest fire smoke dataset:a Smoke image,b cloud image,c fog image,d tree image,and e cliffs image
Fig.2 The architecture of our proposed network.Discriminative features F′ are extracted from support images x i and query image x q by feature extraction module.Then,Prototype of class j is computed from 5 support images features via the meta-learning module.The Euclidean distance between and query feature F′(x q ) is calculated to determine the similarity of two features and finally the detection result is output
We define a set of base tasksT1and a K-shot target taskT2.The base datasetD basewith base classesM=80 is used for training and validation.The base dataset in this paper contains labeled samples from magnanimity classes and labeled samples.The larger the base dataset,the better the model learning ability.Randomly select a subset from the base dataset as a training task for an episode,and then each class of subsets is split into support set and query set.The target task in this paper is 5-way 5-shot forest fire smoke detection.The target datasetD targetwith test classesN=5 is used for testing.This dataset is divided into support setSand query setwherex idenotes annotated support image,y iis the corresponding label forx iandx qdenotes unlabeled query image.MandNare mutually exclusive.The goal is to leverage the supervision provided by the support set in order to correctly classify the query images.
Feature extraction module
We propose a CNN based on CBAM (Woo et al.2018) as the feature extraction module.This module consists of a CNN backbone and two CBAMs,as shown in Fig.3.Inspired by the effectiveness of CNNs in fire smoke detection,we employ a CNN to extract high-level and detailed smoke features.The CNN contains four convolution blocks.Each block comprises a 64-filter convolutional layer,an activation function layer and a max-pooling layer.Convolution is used to generate feature maps from input images.Feature maps of thet-th convolutional layer can be represented as(k=1,2,...,nt),wherentdenotes the number of feature maps.To obtainwe first convolve the feature mapsof the (t-1)-th convolutional layer with the filterand add the biasand then delivered to the rectified linear unit (ReLU) activation function Φ (.):
Fig.3 Architecture of the feature extraction module
where *denotes the convolution operation.In addition,to mitigate overfitting and accelerate convergence,a batch normalization layer is appended into each convolutional block followed by convolutional layer (Gu et al.2020).
We also adopt the CBAM to improve the representation capacity of global information.As shown in Fig.4,the CBAM compromises two components:a 1-dimensional (1D) channel attention mapMCAM∈RC×1×1and a2-dimensional (2D) spatial attention mapMSAM∈R1×H×W.The former is able to exploit the inter-channel relationship of the features,while the latter can utilize the inter-spatial relationship of the features (Woo et al.2018).The overall attention operations can be summarized as:
Fig.4 The structure of CBAM
Meta-learning module
Inspired by the recent application of meta-learning in fewshot object detection (Finn et al.2017;Fort et al.2017;Song et al.2020),we adopt the learning approaches in the prototypical network (Snell et al.2017) to solve the overfitting problem caused by limited small smoke images.The proposed network generalizes a classifier from the base dataset and is able to recognize query images from limited support images.
The prototypical network initially learns a prototype of its class from the support images and subsequently learns a metric space.This is followed by the detection of the query images by computing the prototype representation distances of each classes.Each prototype is taken as the mean vector of the support images:
whereS jdenotes the support subset andjis the corresponding class.The prototype representation of each classand feature representation of query imageF’(x q) were generated by the attention-based neural network.
The distance function applied in metric space can significantly influence the performance of prototypical networks.The square Euclidean function is applied to calculate the similarity betweenandF’(x q).The detection probability of the query imagex qis computed according to softmax and is expressed as follows:
whered(.,.) is the Euclidean distance between two vectors.
The learning process is followed by minimizing the log-probabilityJ=-logp(y=j|xq) of the true classjvia Adam.The lossJfor a training base task is computed as follows:
whereN bis the number of classes in base subset;andN qis the number of query samples per class.
We adopt Pytorch to implement all approaches.Our model was trained via Adam with an initial learning rate of 10-3 that subsequently decreased by 50% every 2000 episodes.L2regularization was then adopted to mitigate overfitting.The proposed networks were trained using the Euclidean distance in the 5-shot scenarios,withN bandN qset to 15 and 5,respectively.Tests were performed using 5-way episodes for 5-shot detection and each class contained 15 query samples per episode.
The 5-way accuracy (Acc.) was employed as the evaluation metric for the proposed forest fire smoke detection method.The accuracy rate (AR),false alarm rate (FAR),detection rate (DR),recall rate(RR) and F1-score(F1) (Mao et al.2017) were adopted as performance evaluation criteria for few-shot learning methods on forest fire smoke dataset.
We explored the influence of the CBAM location in the convolutional neural network on feature extraction capabilities.The CBAM can be placed in any of the convolutional blocks within the CNN backbone.Figure 5 depicts the accuracy of the CBAM at different locations on the FFS dataset.CBAM_1 means that is applied in the first convolution block of the CNN backbone.The other annotations in Fig.5 are similar.The CBAM-based model outperforms the network without CBAM,and the accuracy is significantly improved by applying two CBAMs in the feature extraction module.However,when more than two were used,the accuracy decreased.The maximum test accuracy (68.61%)was achieved using the CBAM in the first and the fourth convolutional blocks.This demonstrates the ability of the CBAM to improve the detection accuracy by extracting discriminative features.Therefore,we take the convolutional neural network based on CBAM_14 as the few-shot feature extraction module in this paper.
Fig.5 Few-shot detection accuracy of the CBAM under different convolutional blocks
In order to evaluate the class discriminatory ability of our proposed network,we used Gradient-weighted Class Activation Mapping (Grad-CAM) and Guided Grad-CAM(Selvaraju et al.2020) to visualize the regions of the smoke image that provide support for a particular prediction.The visualization results are shown in Fig.6,Grad-CAM (1) and Grad-CAM (2) represent the attention map without using the CBAM and the attention map of our proposed network,respectively.Our proposed network shows precise smoke localization to support predictive performance,while the Guided Grad-CAM can even localize tiny smoke (Fig.6 e and f).
Fig.6 Visualization of randomly selected smoke images from the FFS dataset
We also compared the detection accuracy of CBAM and Convolutional Triplet Attention Module (CTAM) (Misra et al.2021).Like CBAM,CTAM captures the interaction information of spatial attention and channel attention without increasing model complexity or computation.As shown in Fig.7,the CBAM and CTAM placed in both the first and fourth convolutional block achieved the highest accuracy 68.61% and 64.96%,respectively.The detection accuracy of CBAM is significantly better than CTAM at different locations,with a maximum discrepancy of 3.91%.Therefore,CBAM is more suitable for forest fire smoke detection than CTAM.
Fig.7 Comparison of different attention modules based on detection accuracy
We discussed the feature extraction capabilities of different convolutional neural networks.AlexNet,VGG-16,ResNet-18,ResNet-50 and DenseNet-121 are compared with our proposed backbone network.To ensure reliability and fairness of the experimental results,the integration method of CBAM and backbone network was inspired by Woo et al (2018).The average accuracy of the 100 tests are shown in Fig.8.Compared to shallow backbone networks,our proposed backbone network is competitive in the case of relatively similar training time (68.61%).The highest test accuracy of our proposed network is 71.2%.The deeper the backbone network,the better the model performance.DenseNet-121 achieved the highest detection accuracy(82.31%).Although there is a significant improvement in detection accuracy,the training cost also increases greatly.The processing time per iteration of DenseNet-121 is more than three times of our proposed backbone network.
Fig.8 Comparison of different backbone networks based on detection accuracy
Choosing different distance metrics may bring different performance to the model.In order to investigate the distance metric settings,we trained our proposed network and the prototypical network using the Euclidean and Cosine distances function,respectively.Figure 9 illustrates the range of fulctuations over 5 training sessions.The validation accuracy increased rapidly from 0 to 10 epochs and the loss drops from 0 to 20 epochs,while both converged after 40 epochs.The convergence using Euclidean distance function is significantly better than that using cosine distance function.Due to some of the data are very similar,the range in Fig.9 is not obvious.Our proposed algorithm with Euclidean distance function has good stability in validation accuracy.The validation accuracy of our proposed method using Euclidean distance reached approximately 68%,which is 15% higher than that of the cosine distance.
Fig.9 Comparison of the detection accuracy of Cosine and Euclidean distance function
Our proposed method improves the test accuracy by 8%compared to the prototypical network and also achieves the highest accuracy in the 5-way 5-shot detection as shown in Table 1.The test accuracies of the prototypical network and our method using Euclidean distance are almost 15% and 17% higher than that using Cosine distance,respectively.Moreover,compared to conventional prototype network,our proposed algorithm with attention mechanism is proven to have better feature extraction capabilities and achieved a higher accuracy.
Table 1 Few-shot detection accuracy on the forest fire smoke dataset
The Euclidean distance model exhibits a greater test accuracy than that of the cosine distance model.Thus,the Euclidean distance is more suitable for both the prototypical network and our proposed network.This is largely because unlike Euclidean distance,cosine distance is not a Bregman divergence (Snell et al.2017).
We compared our proposed method on the FFS dataset with several existing few-shot methods:the matching network(Vinyals et al.2016),prototypical network (Snell et al.2017),and meta-learning long short-term memory (LSTM)(Ravi and Larochelle 2017).
To verify the effectiveness of our proposed method on the FFS dataset,we compared the evaluation criteria of classical few-shot learning algorithms.Table 2 reports the presentation of the results.Our proposed method achieved the highest AR (69.83%) and DR (84%) among the models,outperforming the prototypical network accuracy by 8.46%and 9%,respectively.RR and F1 are important evaluation metrics for the performance of forest fire smoke detection models,with higher RR and F1 values indicating a better model performance.The classical algorithms listed may not be suitable for distinguishing smoke from suspected smoke areas.For example,the highest FAR of the matching network was 17.00%,which was 4 times higher than that of our proposed network.As shown in Fig.10,the proposed CBAM-based prototypical network achieved a promising performance on the forest fire smoke dataset in few-shot learning.In summary,our method outperforms the other methods on the forest fire smoke dataset.
Fig.10 Confusion matrix of the proposed method
Table 2 Comparison of performance evaluation criteria for few-shot learning methods
In order to further validate the performance of the proposed method,we compare it with several well-known few-shot methods,namely,the matching network (Vinyals et al.2016),the prototypical network (Snell et al.2017),and meta-learning LSTM (Ravi and Larochelle 2017) on the original miniImageNet test dataset.Our method achieves the highest accuracy in 5-way 5-shot detection is reported in Table 3,proving its ability to extract discriminative features with few support sample.
Table 3 Few-shot detection accuracy on miniImageNet
We proposed an attention-based prototypical network for forest fire smoke few-shot detection.Firstly,a convolutional neural network based on convolutional block attention module,which included channel and spatial attention modules,was designed to extract features from support and query images.It can automatically extract high-level image features and focus on more discriminative features from small targets.Secondly,we applied a meta-learning moduleto alleviate the overfitting problem caused by limited data.It utilized the prototype of support images for comparison with query features to achieve effective detection.Experiments on forest fire smoke datasets and miniImageNet dataset show that the proposed method is more effective than recent few-shot learning approaches and achieves the highest test accuracy.
Journal of Forestry Research2022年5期