LI Jicai, SUN Shiding, JIANG Haoran, TIAN Yingjie,3,4*, XU Xiaoliang
1 School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China;
2 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
3 Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China;
4 Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China;
5 Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China
Abstract: In recent years, deep convolution neural network has exhibited excellent performance in computer vision and has a far-reaching impact. Traditional plant taxonomic identification requires high expertise, which is time-consuming. Most nature reserves have problems such as incomplete species surveys, inaccurate taxonomic identification, and untimely updating of status data. Simple and accurate recognition of plant images can be achieved by applying convolutional neural network technology to explore the best network model. Taking 24 typical desert plant species that are widely distributed in the nature reserves in Xinjiang Uygur Autonomous Region of China as the research objects, this study established an image database and select the optimal network model for the image recognition of desert plant species to provide decision support for fine management in the nature reserves in Xinjiang, such as species investigation and monitoring, by using deep learning. Since desert plant species were not included in the public dataset, the images used in this study were mainly obtained through field shooting and downloaded from the Plant Photo Bank of China (PPBC). After the sorting process and statistical analysis,a total of 2331 plant images were finally collected (2071 images from field collection and 260 images from the PPBC), including 24 plant species belonging to 14 families and 22 genera. A large number of numerical experiments were also carried out to compare a series of 37 convolutional neural network models with good performance, from different perspectives, to find the optimal network model that is most suitable for the image recognition of desert plant species in Xinjiang. The results revealed 24 models with a recognition Accuracy, of greater than 70.000%. Among which, Residual Network X_8GF (RegNetX_8GF) performs the best, with Accuracy, Precision, Recall, and F1 (which refers to the harmonic mean of the Precision and Recall values) values of 78.33%, 77.65%, 69.55%, and 71.26%, respectively. Considering the demand factors of hardware equipment and inference time, Mobile NetworkV2 achieves the best balance among the Accuracy,the number of parameters and the number of floating-point operations. The number of parameters for Mobile Network V2 (MobileNetV2) is 1/16 of RegNetX_8GF, and the number of floating-point operations is 1/24. Our findings can facilitate efficient decision-making for the management of species survey, cataloging, inspection, and monitoring in the nature reserves in Xinjiang, providing a scientific basis for the protection and utilization of natural plant resources.
Keywords: desert plants; image recognition; deep learning; convolutional neural network; Residual Network X_8GF(RegNetX_8GF); Mobile Network V2 (MobileNetV2); nature reserves
Wild plants constitute the main part of the ecosystem in the nature reserves. Thus, it is the primary problem for managers to carry out species investigation and classification on the wild plants (Li et al., 2020). Owing to the high professional knowledge demand, the traditional plant classification and recognition is time-consuming and inefficient (Cao et al., 2018), the existing species survey in most of nature reserves is not comprehensive, and the classification of plant species is not accurate, leading to the problems such as status data update is not in time, and the administration agencies of nature reserves are unable to make timely and effective protection and management as well as countermeasures for ecological recovery (Liu et al., 2018; Wang et al.,2019; Xiao, 2019). In June 2019, the General Office of the State Council of the People's Republic of China issued a document to establish national park as the main body of the nature reserve system, based on ecological environment regulation and big data platform, for use of information technology means such as cloud computing and Internet of things for a comprehensive grasp of nature reserve ecosystem composition, distribution, and dynamic change, thus providing a scientific support for the management decisions of nature reserves (http://www.gov.cn/zhengce/2019-06/26/content_5403497.htm). Therefore, how to use new information technology to obtain plant-related data more efficiently and accurately has become an urgent problem to be solved by researchers and managers.
With its powerful feature extraction, convolution neural network has significant advantages in the recognition and analysis of high-dimensional data such as images, sounds, and texts, which can reduce the damage of field specimens to fragile plant resources, decrease the difficulty of identification and classification of similar plant species, and improve work efficiency (Mikolov et al., 2011). In recent years, intelligent recognition of plant images has gradually become a research hotspot (Liu, 2020). In the previous literature, convolutional neural network was used to recognize the images of leaves, fruits, flowers, and other organs of plants under a simple background (Hall et al., 2015; Abdullahi et al., 2017; Bargoti and Underwood, 2017; Coulibaly et al., 2019; Cao et al., 2020). Researchers have classified and recognized the images of five crops and 100 different ornamental plant species in different nature scenes by convolutional neural network (Simonyan and Zisserman, 2014; Kussul et al., 2017; Liu, 2018). Even several mature convolutional neural network image recognition systems, such as ''Xingse APP'' and ''Aiplants APP'', have been widely used in the survey of wild plant resources (Gao et al., 2020). However,the accuracy of these plant image recognition systems is generally low in the classification and recognition of desert plant images under the conditions of complex nature scenes (Jin, 2020). For example, Zhang and Huai (2016) used hierarchical deep learning to train and recognize leaf images of plants with simple scene and complex scenes, and found that the recognition rate of plants with single scene was as high as 91.11%, while the recognition rate of plants with complex scenes was only 34.38%. The main problems are as follows. First, the number of images of the same desert plant species in different nature scenes is too small, and there are fewer images that can focus on the salient classification characteristics of desert plant species. Second, previous image recognition system is based on all urban and rural cultivated plants, or a certain organ of plants, or simple background image datasets. However, in the evolutionary process of long-term adaptation to the special desert environment, the external morphology of different desert plant species has similar characteristics (homogenization of plant and branch characteristics, similarity of branch morphology and color, highly degraded leaf patterns, etc.), which increases the difficulty of machine learning visual recognition and makes it easier to produce misjudgment (He et al., 2006). To solve the first problem, researchers proposed the method of obtaining a large number of plant images conforming to technical requirements (Jin, 2020). The second problem is the key scientific and technical issue that this study needs to focus on: how to significantly improve the image recognition accuracy of similar plant species in complex nature scenes and select the optimal network model suitable for the image recognition of desert plant species, which is a very challenging task with broad practical application.
In view of the lack of research on the image recognition of desert plant species, this study took the panoramic image set of major desert plant species distributed in nature reserves in Xinjiang Uygur Autonomous Region of China as the research object, and integrated 37 non-lightweight and lightweight models of eight categories, such as Visual Geometry Group Network (VGG),Residual Network (RegNet), and Mobile Network (MobileNet), which are widely used at present(Krizhevsky et al., 2012; He et al., 2016; Howard et al., 2017). By using grid search to find the optimal hyperparameters and comparing the performance, we discussed the optimal network model suitable for the image recognition of desert plant species, so as to achieve convenient and accurate classification and recognition of desert plant species, and provide a solution for large-scale field plant background investigation in nature reserves in Xinjiang in the future.
At present, there are 201 protected nature reserves in Xinjiang, which cover an area of 2.51×105km2, accounting for 15.07% of the total land area of Xinjiang (Fig. 1). Among them, there are one World Natural Heritage Site, 28 nature reserves, 24 scenic spots, 13 geological parks, 57 forest parks, 51 wetland parks, and 27 desert parks. In terms of regional distribution, there are 63 in southern Xinjiang and 138 in northern Xinjiang, accounting for 31.00% and 69.00% of the total number, respectively.
Fig. 1 Overview of Xinjiang and spatial distribution of nature reserves in Xinjiang. Note that the figure is based on the standard map (新S(2021)023) of the Map Service System (https://xinjiang.tianditu.gov.cn/main/bzdt.html)marked by the Xinjiang Uygur Autonomous Region Platform for Common Geospatial Information Services, and the standard map has not been modified. Satellite image source: Geospatial Data Cloud (http://www.gscloud.cn/).
Based on the ''List of National Key Protected Wild Plants'' (Ming, 2021), this study selected 24 representative xerophytic desert plant species that are distributed in nature reserves in Xinjiang as the identification objects (Fig. 2). Since desert plant species were not included in the public dataset, the images were mainly obtained through field shooting and downloaded from the Plant Photo Bank of China (PPBC; http://ppbc.iplant.cn/sp/12519). The field collection was extend from 2019 to 2021. Rangers in nature reserves were commissioned to take pictures with digital cameras or mobile phones in the natural environment. Those pictures were RGB true color images in JPG format. The collected plant images were confirmed by experienced plant experts and labeled manually. Note that some unclear images were deleted directly. After the sorting process and statistical analysis, a total of 2331 plant images were finally collected (2071 images from field collection and 260 images from the PPBC), including 24 plant species belonging to 14 families and 22 genera (Table 1). The training, validation, and test sets were allocated in a ratio of 3:1:1. The plant species information can be found in the Flora of Xinjiang (Xinjiang Flora Editorial Committee, 1992-2004) and the Red List of Chinese Biodiversity: Higher Plant Volume(http://www.iplant.cn/rep/protlist/4).
Fig. 2 Images of the selected 24 desert plant species in nature reserves in Xinjiang
Convolutional neural network is a branch of deep learning, which is a kind of feedforward neural network structure containing convolutional computation and with deep structure. In recent years,it has been widely used in the field of image recognition (Lecun and Bengio, 1998).Convolutional neural network includes convolutional layer, pooling layer, and fully connected layer (Fig. 3). The mathematical expression of the network is as follows:
wherexrepresents the input image;F(x) represents the output of the network, such as the corresponding class or probability of the input imagex;Nrepresents the number of hidden layers;andfirepresents the function of the corresponding layeri.
Table 1 Basic information of the selected 24 desert plant species in nature reserves in Xinjiang
Fig. 3 Schematic diagram of convolutional neural network
In the convolutional layer,fconsists of multiple convolution kernels (g1, ...,gk-1,gk), and the common convolution kernel sizes are 1×1, 3×3, 5×5, and so on. Eachgkrepresents a linear function in thekthkernel, which can be expressed as follows:
where (x,y,z) represents the position of the pixel in the input imageI;Wk(u,v,w) represents the weight of the kernelk; andm,n,andwrepresent the height, width, and depth of the convolution kernel, respectively.
In the activation layer,fis a pixel-wise nonlinear function, that is, a rectified linear unit, which can be represented by the following equation:
In the pooling layer,fis a layer-wise nonlinear down-sampling function, which aims to gradually reduce the size of the feature representation.
The fully connected layer can also be considered as a convolutional layer with a kernel size of 1×1. In classification tasks, usually, a prediction layer (i.e., softmax layer) is added to the last fully connected layer to calculate the probability whether the input images may belong to different classes. For instance, if the number of neurons in the prediction layer isC(that is, the number of categories isC):p1,p2, ...,pC, the aboveCvalues can be converted to the probability values through the softmax layer (Eq. 4).
whereyiandip⌒ represent the expected label value and the predicted output value of samplei,respectively.
In general, the performance of convolutional neural network becomes better as the number of network layers deepens, such as VGG with 16 layers, Google Inception Network (GoogLeNet)with 22 layers, and Residual Network (ResNet) with 152 layers (Simonyan and Zisserman, 2014).However, research shows that no network structure can be guaranteed to outperform other network structures on any dataset (Liu and Luo, 2019). For a specific dataset, it is necessary to select the network structure with the best performance according to the experimental results.Therefore, this study adopts eight categories (including VGG, ResNet, Dense Convolutional Network (DenseNet), Squeeze Network (SqueezeNet), MobileNet, Shuffle Network (ShuffleNet),Efficient Network (EfficientNet), and RegNet) of 37 common non-lightweight and lightweight
network structures, and adjusts the model parameters to find the best performing network structure. The experimental environment was: Intel (R)Xeon (R) CPU E5-2640 v3@2.60GHz,NVIDIA GeForce GTX 2080Ti, Ubuntu 18.04.1. The PyTorch 1.6 deep learning framework was used, and the batch sizes were set to 4, 8, 16, and 32. Using the Stochastic Gradient Descent(SGD) optimization algorithm (Li et al., 2021), we determined the following values: a learning rate size, a momentum of 0.9, and a weight decay rate of 0.005.
In this study, theAccuracy,Precision,Recall, andF1 (which refers to the harmonic mean of thePrecisionandRecallvalues) were used to evaluate the model results (Cai, 2020). TheAccuracymeasures the ratio of all the correct judgment results of the classification model to the total samples.Precisionis the proportion of the results that are predicted to be positive.Recallrefers to the proportion of all the positive samples that are judged to be positive.
whereTP,TN,FP, andFNrepresent the numbers of true positive, true negative, false positive,and false negative samples in the prediction results, respectively.
The complexity of different models is measured by the number of parameters and the number of floating-point operations (Shen, 2021). The number of parameters refers to the total number of parameters that need to be trained in the network model, which is used to measure the size of the model. The number of floating-point operations refers to the number of floating-point operations per second, which can be used to measure the algorithm complexity. The higher the number of floating-point operations, the slower the operation speed of convolutional neural network.
The model recognition results of plant images are presented in Table 2. Thirteen models withAccuracybelow 70.000% were found, of which the following three were below 55.000%:SqueezeNet1_0, SqueezeNet1_1, and ShuffleNetV2_X0_5. Twenty-four models withAccuracyexceeding 70.000% were found, of which the following nine exceeded 75.000%:EfficientNet_B1, EfficientNet_B3, RegNetX_400MF, RegNetX_800MF, RegNetX_3_2GF,RegNetX_8GF, RegNetX_16GF, RegNetY_3_2GF, and RegNetY_16GF. RegNetX_8GF outperformed the other networks, withAccuracy,precision,Recall, andF1 values of 78.333%,77.654%, 69.547% and 71.256%, respectively.
In addition to the above results, we also compared the number of parameters and the number of floating-point operations for the different network structures. For the number of parameters, there were 16 models smaller than 10.000 M (megabyte, which refers to the storage space occupied by model parameters; 1 M=1024 kilobytes) and four models larger than 100.000 M (VGG11,VGG13, VGG16, and VGG19). For the number of floating-point operations, there were 15 models smaller than 1.000 G (model operation speed; 1 G=109/s). We used two indicators to quantify the relationships of theAccuracywith the number of parameters and the number of floating-point operations (Fig. 4). Amongst the models with anAccuracyhigher than 70.000%,MobileNetV2 achieves the best balance among theAccuracy, the number of parameters, and the number of floating-point operations. For MobileNetV2, theAccuracyreaches 71.429%, the number of parameters is only 2.255 M, the number of floating-point operations is only 0.313 G,the ratio ofAccuracyto the number of parameters is 31.676, and the ratio ofAccuracyto the number of floating-point operations is 228.206. Although RegNetX_8GF exhibited the best performance, the number of parameters and the number of floating-point operations were 16 and 25 times higher, respectively, compared to MobileNetV2.
Table 2 Experimental results of 37 different models used in the image recognition of desert plant species
Fig. 4 Relationships of the Accuracy with the number of parameters (a) and the number of floating-point operations (b) for 37 different models used in the image recognition of desert plant species. M, megabyte, which refers to the storage space occupied by model parameters (1 M=1024 kilobytes); G, model operation speed (1 G=109/s). VGG, Visual Geometry Group Network; ResNet, Residual Network; DenseNet, Dense Convolutional Network; SqueezeNet, Squeeze Network; MobileNet, Mobile Network; ShuffleNet, Shuffle Network;EfficientNet, Efficient Network; RegNet, Residual Network.
According to the comparative analysis of the above results and considering factors such as hardware equipment and inference time, MobileNetV2 exhibited the best comprehensive performance for the image recognition of desert plant species and had better application prospects in practical work. The classification results of MobileNetV2 and RegNetX_8GF are shown in Table 3, and the confusion matrix is shown in Figure 5.
Table 3 Classification results of MobileNetV2 and RegNetX_8G in the image recognition of desert plant species
Due to the small amount of data available forAmmodendron bifolium, once the images were divided into the training, validation, and test sets, the results will be quite different and have no analytical value. From aPrecisionperspective for the remaining 23 kinds of plant species, for the model MobileNetV2, thePrecisionof all the plant species, except forOxytropis bogdoschanicaandHaloxylon persicum, was higher than 60.000%. Therefore, MobileNetV2 was able to identify various types of plant species well and had a highAccuracy. For incorrect classifications, it can be seen from the confusion matrix that oneCaragana polourensisimage, oneHelianthemum songaricumimage, oneTamarix taklamakanensisimage, and twoCorydalis kashgaricaimages were recognized asOxytropis bogdoschanicaimages. Additionally, oneSalsola junatoviiimage,onePopulus pruinoseimage,oneTamarix taklamakanensisimage,oneCalligonum ebinuricumimage,and oneHaloxylon ammodendronimage were recognized asHaloxylon persicumimages.TheRecallofEremosparton songoricumwas the lowest, at only 33.000%. The confusion matrix shows that there were twoEremosparton songoricumimages were recognized asHaloxylon ammodendronin the test set. TheRecallofCorydalis kashgaricawas the next lowest, at 38.500%. Referring to the confusion matrix, it can be seen that in the test set, one image ofCorydalis kaschgaricawas respectively recognized asAmmopiptanthus nanus,Lagochilus lanatonodus,andHaloxylon ammodendron,twoimages were recognized asOxytropisbogdoschanica,and three images were predicted asCaryopteris mongholica.
Upon inspecting the original images of the wrongly classified plant species (Fig. 6), it can be found that the shape characteristics of desert plants were all presented or the leaves were highly degraded, which were scaly or cylindrical, due to long-term adaptation to the harsh environment;or plant shape is approximate to a round spherical. The plant images ofEremosparton songoricumandHaloxylon persicumin spring and summer are very similar in branch shape, branch color, and branching pattern. For example, the images taken from the vertical view ofCorydalis kashgarica,Ammopiptanthus nanus,Lagochilus lanatonodus,Haloxylon ammodendron,Oxytropisbogdoschanica, andCaryopteris mongholicaare nearly spherical. Desert plants have different types of adhesives for stem smoothness or leaf distribution, and the leaves are scaly or cylindrical,leading to significantly different taxonomic characteristics. In the process of computer vision recognition, the fine-grained recognition of these fine attributes is not clear, resulting in low similarity of external morphological features, low image recognition sensitivity, and high false positive rate of higher plants. This shows that the performance of MobileNetV2 still needs to be strengthened in recognizing similar but different species of plants, and the image dataset needs to be improved. From the values ofF1 it can be seen that the performances ofEremosparton songoricum,Corydalis kashgarica,Oxytropis bogdoschanica, andHaloxylon persicumare not good enough, which are affected by lowPrecisionandRecallvalues. Considering all the factors,plants with flowers and fruits or crown shapes and color, such asPopulus euphratica,Cistanchedeserticola,Calligonum ebinuricum,Prunus tenella,Lagochilus lanatonodus,andCaryopteris mongholica,obviously differ from the others without these characteristics in the images and have better recognition performances (all the indicators exceeding 80.000%). In conclusion, without the intervention of experts, the lightweight network MobileNetV2 achieves the automatic classification of plant images accurately and quickly.
Fig. 5 Confusion matrix of MobileNetV2 (a) and RegNetX_8GF (b) in the image recognition of desert plant species. The plant species corresponding to the labels are consistent with those in Figure 2.
Fig. 6 Images of incorrectly classified samples with high similarity of external morphological features
To verify the validity of the optimal model discovered in this study, we selected the Tianchi Bogda Peak Nature Reserve and Ebinur Lake Wetland National Nature Reserve, where there are many desert plants, for empirical verifying. The Ebinur Lake Wetland National Nature Reserve gathers more than 90.00% of the plant species in the deserts of the Junggar Basin, and some endangered and endemic species are also distributed there. It is one of the regions with the most abundant desert plant populations in inland river basins in China, and the plant species here account for about 64.00% of the country's total desert plant species (Yang et al., 2009). The Tianchi Bogda Peak Nature Reserve covers the area where the main peak of the eastern Tianshan Mountains, Bogda Peak, is located. Within a horizontal distance of 80 km from south to north, it has a complete vertical band spectrum of mountains. With about 700 plant species, the area is the most typical representative of vertical mountains in the world's temperate arid regions and is included in the UNESCO Network of Man and Biosphere Programme (Su and Niu, 2016).Amongst the 24 desert plant species selected in this study, six of them are present in the Ebinur Lake Wetland National Nature Reserve, and nine of them are distributed in the Tianchi Bogda Peak Nature Reserve.
The empirical results of MobileNetV2 and RegNetX_8GF in the image recognition of desert plant species are shown in Table 4. In can be seen that in the image recognition of desert plant species in the Tianchi Bogda Peak Nature Reserve, theAccuracy,Precision,andRecallof these models reached 83.000% or more. TheAccuracyof MobileNetV2 is 83.871%, which is nearly 5.00% higher than that of RegNetX_8GF in the image recognition of 24 desert plant species(Accuracyof 78.33%). MobileNetV2 is of high accuracy in the image recognition of desert plant species, and it has good application prospect in the practical work. In the image recognition of desert plant species in the Ebinur Lake Wetland National Nature Reserve, each evaluation indicator also reached more than 60.000% for the two models. It can be seen that both MobileNetV2 and RegNetX_8GF have high accuracy values. The performance of these two models was compared in the two nature reserves, with respect to the image recognition of 24 desert plant species. In terms of evaluation indicators, the empirical identification of desert plant species in the Ebinur Lake Wetland National Nature Reserve and Tianchi Bogda Peak Nature Reserve was poor, and the values of the evaluation indicators in the Ebinur Lake Wetland National Nature Reserve were lower than those in the Tianchi Bogda Peak Nature Reserve. There may be a variety of reasons for this. The images of the nine kinds of desert plant species in the Tianchi Bogda Peak Nature Reserve were all obtained from the ''Color Atlas of Wild Vascular Bundle Plants in Bogda Biosphere'' (Su and Niu, 2016), and the pictures were also processed by screening and clearing. The images of the six kinds of desert plant species in the Ebinur Lake Wetland National Nature Reserve were taken from the field without clearing and other processing.These resulted in the differences of the above comparative findings. The comparison also illustrates the importance of the quality and quantity of image recognition datasets to network models, and also implies that no one network structure can guarantee its superiority over other network structures or datasets. For specific datasets, we need to conduct experiments and select the network structure with the best performance, based on the experimental results. This is also the practical significance of this research.
Table 4 Performances of empirical application of MobileNetV2 and RegNetX_8GF in the image recognition of desert plant species in the Tianchi Bogda Peak Nature Reserve and Ebinur Lake Wetland National Nature Reserve
TheAccuracyof the optimal MobileNetV2 in the image recognition of desert plant species screened in this study did not reach more than 90.000% of the image recognitionAccuracyof plant species in a single background, indicating that the finding of this study is still a long way from practical application (Zhang and Huai, 2016).
Firstly, from the perspective of constructing image dataset, increasing the amount of image data and enhancing the image quality are helpful to improve the image recognition accuracy (Li,2022). This study is based on 2331 plant images for model training and testing. At the same time,due to the large difference in pixel size between plant images collected from the field (using mobile phones and cameras) and PPBC, the performance of model classifier is affected to some extent. In the future, through transfer learning, data expansion and image cleaning technology can solve the problems of insufficient data and different image standards to a certain extent. As for the problem of complex background in the images, it can be seen from the image recognition results in the Tianchi Bogda Peak Nature Reserve (Table 4) that theAccuracycan reach more than 80.000% if the object is focused and the features are prominent in an image. Barbedo (2016) also demonstrated that removing the background of an image can improve the image recognitionAccuracyby 3.000%. However, background removal requires a lot of work and professionals to complete, which is often difficult to achieve in the application. In theory, the more differential features extracted from an image, the higher theAccuracyof image recognition (Gai et al., 2021).It is obvious that leaves, flowers, and fruits of plants have the advantages of multiple shape features, high recognition, and high discrimination. Future research can fully use the multi-feature fusion method of panoramic images of plants and images of organs (such as flowers, fruits, and leaves) to further improve the accuracy and sensitivity of the models.
Secondly, from the perspective of data processing and analysis, fine-grained or new network structures can be considered to learn and obtain more expressive depth features. It can be seen from the misclassified plant images (Fig. 6) that due to the similar morphological characteristics of desert plant species, there is still a problem of high misjudgment rate. On the one hand, the complexity of the collection environment will cause the uncertainty of expert labeling. However,Bekker and Goldberger (2016) verified that the deep convolutional network can maintain a high reliability when the number of mislabeled samples is not very high. On the other hand, the occurrence of some image recognition errors is probably due to the neglect or inability to distinguish some subtle attribute features in the process of model learning. For such extreme cases that cannot be effectively distinguished visually, prior knowledge should be combined to make decisions. How to introduce the existing plant family and genus classification labels as prior information to improve the generalization ability of neural network and make it more suitable for the image recognition of desert plant species in Xinjiang will become one of the next research contents (Cao et al., 2018).
Thirdly, in terms of learning algorithm, the current parameter adjustment of convolutional neural network basically relies on experience and practical operation, which requires constant training, tuning parameter, and repeated trial and error, consuming a lot of time and energy (Tang,2020). Auto Machine Learning (AutoML) is the rise of popular research field in recent years (Liu and Luo, 2019). It automatically builds a network structure, which can guarantee the same accuracy as classic artificial selection network and its application to the species identification of natural protected area, and is expected to overcome the artificial selection on subjective fault,select a better network structure objectively, and improve the image recognition accuracy.
Finally, from the perspective of the scope of application of the models, rare desert plant species also includeBetulaholophila,Reaumuria kaschgarica, etc. (Yin, 1991). However, due to the difficulty of collection, only some desert plant species in Xinjiang were selected as the research objects in this study. In addition, this study is based on static image data processing and analysis.At present, a large number of video surveillance systems have been arranged in the nature reserves in Xinjiang. Therefore, it is also an important direction to strengthen the research on video image data recognition in the future.
Based on image processing and deep learning technology, this study adopted 37 commonly used non-lightweight and lightweight convolution neural network models of eight categories to recognize the images of 24 desert plant species typically distributed in Xinjiang. The results show that there are 24 models withAccuracyabove 70.000% and nine models withAccuracyabove 75.000%. Among them, the performance of RegNetX_8GF is better than other network models.TheAccuracy,Precision,Recall, andF1 values of RegNetX_8GF are 78.333%, 77.654%,69.547%, and 71.256%, respectively, which meet the requirements of conventional image recognition. To further measure the relationships of theAccuracywith the number of parameters and the number of floating-point operations in the models withAccuracyhigher than 70.000%,we found that MobileNetV2 achieves the best balance among theAccuracy, the number of parameters, and the number of floating-point operations. The number of parameters for MobileNetV2 is 1/16 of RegNetX_8GF, and the number of floating-point operations is 1/24.Considering hardware equipment, inference time, and other factors, MobileNetV2 has the best performance in the image recognition of desert plant species and is more suitable in field investigation. In order to verify the effectiveness of this study, we empirically tested RegNetX_8GF and MobileNetV2 in the image recognition of desert plant species in the Tianchi Bogda Peak Nature Reserve and the Ebinur Lake Wetland National Nature Reserve, and found that MobileNetV2 has a good application prospect in the practical work.
Due to the limitations of image datasets, the image recognition accuracy still needs to be improved. In the future research work, we will further enrich the image sets of desert plant species in Xinjiang in multiple ways and forms, optimize the convolutional neural network model,improve the test accuracy, and provide solutions for the administration agencies of nature reserves to carry out large-scale field plant background investigation, so as to improve work efficiency and decision-making ability.
Acknowledgements
This work was supported by the West Light Foundation of the Chinese Academy of Sciences (2019-XBQNXZA-007) and the National Natural Science Foundation of China (12071458, 71731009).