Yu-Dong Zhang,Muhammad Attique Khan,Ziquan Zhu and Shui-Hua Wang
1School of Informatics,University of Leicester,Leicester,LE1 7RH,UK
2Department of Computer Science,HITEC University Taxila,Taxila,Pakistan
3Science in Civil Engineering,University of Florida,Gainesville,Florida,FL 32608,Gainesville,USA
4School of Mathematics and Actuarial Science,University of Leicester,LE1 7RH,UK
Abstract:(Aim)COVID-19 is an ongoing infectious disease.It has caused more than 107.45 m confirmed cases and 2.35 m deaths till 11/Feb/2021.Traditional computer vision methods have achieved promising results on the automatic smart diagnosis.(Method) This study aims to propose a novel deep learning method that can obtain better performance.We use the pseudo-Zernike moment(PZM),derived from Zernike moment,as the extracted features.Two settings are introducing:(i)image plane over unit circle;and(ii)image plane inside the unit circle.Afterward, we use a deep-stacked sparse autoencoder(DSSAE)as the classifier.Besides,multiple-way data augmentation is chosen to overcome overfitting.The multiple-way data augmentation is based on Gaussian noise, salt-and-pepper noise, speckle noise, horizontal and vertical shear,rotation,Gamma correction,random translation and scaling.(Results)10 runs of 10-fold cross validation shows that our PZM-DSSAE method achieves a sensitivity of 92.06% ± 1.54%, a specificity of 92.56% ± 1.06%,a precision of 92.53%±1.03%,and an accuracy of 92.31%±1.08%.Its F1 score,MCC,and FMI arrive at 92.29%±1.10%,84.64%±2.15%,and 92.29%± 1.10%, respectively.The AUC of our model is 0.9576.(Conclusion) We demonstrate“image plane over unit circle”can get better results than“image plane inside a unit circle.”Besides,this proposed PZM-DSSAE model is better than eight state-of-the-art approaches.
Keywords: Pseudo Zernike moment; stacked sparse autoencoder; deep learning; COVID-19; multiple-way data augmentation; medical image analysis
COVID-19 has caused more than 107.45 m confirmed cases and 2.35 m deaths till 11/Feb/2021 in about 192 countries/regions and 26 cruise/naval ships [1].Fig.1 shows the top 10 countries of cumulative confirmed cases and deaths, respectively.The main symptoms of COVID-19 are low fever, a new and ongoing cough, a loss or change to taste and smell [2].In the UK,three vaccines are formally approved as Pfizer/BioNTech, Oxford/AstraZeneca, and Moderna.Two COVID-19 diagnosis methods are available.The former is viral testing to test the existence of viral RNA fragments [3].The swab test shortcomings are two folds:(i) the swab samples may be contaminated, and (ii) it needs to wait from several hours to several days to get the test results.The latter is chest imaging.There are two main chest imaging available:chest computed tomography (CCT) [4] and chest X-ray (CXR) [5].
Figure 1:Data till 11/Feb/2021 (a) Cumulative confirmed cases (b) Cumulative deaths
CCT is one of the best chest imaging [6] techniques since it provides the finest resolution and can recognize extremely small nodules in the chest region.CCT employs computer-processed combinations of multiple X-ray observations taken from different angles [7] to produce highquality 3D tomographic images (virtual slices).In contrast, CXR only provides one 2D image,which performs poorly on soft tissue contrast.This study focuses on the CCT images [8].
Currently, numerous studies are working on using machine learning (ML) and deep learning (DL) technologies [9,10].For example, Guo et al.[11] employed ResNet-18 for classifying thyroid images.Lu [12] utilized an extreme learning machine (ELM) trained by bat algorithm(BA).Those two approaches were not developing for COVID-19, but they can be transferred to the COVID-19 dataset easily and used as comparison basis approaches in our experiments.For COVID-19 researches, Yao [13] proposed a wavelet entropy biogeography-based optimization(WEBBO) method for COVID-19 diagnosis.Wu [14] presented three-segment biogeography-based optimization (3SBBO) for recognizing COVID-19 patients.Wang et al.[15] presented a DeCovNet.Their accuracy achieved 90.1%.El-kenawy et al.[16] presented a novel feature selection voting classifier (FSVC) method for COVID-19 classification.Yu et al.[17] presented a GoogleNet-COD method to detect COVID-19.Chen [18] designed a gray-level co-occurrence matrix and support vector machine (GLCMSVM) method to classify COVID-19 images [19].
To further improve the performance of automatic COVID-19 diagnosis, this paper proposes a novel method that combines the traditional ML approach with the recent DL approach.We use the pseudo-Zernike moment (PZM) as the extracted features, and we use a deep-stacked sparse autoencoder (i.e., one of the deep neural networks) as the classifier.The combination achieves excellent results that overperform eight state-of-the-art approaches.The novelties of our paper lie in the following aspects
? We are the first to apply a pseudo-Zernike moment to COVID-19 image analysis.
? Deep stacked sparse autoencoder (DSSAE) works better than traditional classifiers.
? Our proposed “PZM-DSSAE” model is better than eight state-of-the-art approaches.
We use the dataset in reference [20], which contains 148 COVID-19 patients and 148 healthy control (HC) subjects.Slice level selection [20] was employed to generateC1=320 COVID-19 images andC2=320 HC images.The raw images are with sizes of 1024×1024×3.A fourstep preprocessing was used on this dataset.First, the images are converted to grayscale to save storage amount.Second, histogram stretch is used to enhance the contrast.Third, border pixels are removed, which contains the text and ruler in the right side, and the check-up bed in the bottom.Finally, downsampling to widthWand heightHis carried out to further reduce the storage of the dataset.Fig.2 display one example of COVID-19 patient and one sample of HC subject.Algorithm 1 itemizes the pseudocode of preprocessing.
?
Figure 2:Example of preprocessed images (a) COVID-19 (b) HC
Tab.1 displays the abbreviation list Image moment was firstly introduced by Hu [21], who used geometric moments to generate a set of invariants.Hu’s moments have been widely used in knee osteoarthritis classification [22], brain tumor classification [23], etc.However, geometric moments are sensitive to noise.Thus, Teague [24] introduced Zernike moments (ZMs) based on orthogonal Zernike polynomials.The orthogonal moments have been proven to be more robust in noisy conditions, and they can achieve a near-zero value of redundancy measure [25].
Table 1:Abbreviation list
Later, pseudo Zernike moment (PZM) is derived from Zernike moment.PZMs have been proven to give better performances than other moment functions such as Hu moments, Zernike moments, etc.For example, for an orderp, there are(p+1)2linearly independent pseudo-Zernike polynomials of orders ≤p, while there are onlyZernike polynomials.Hence, PZM is more expressive and offers more feature vectors than ZM.
The kernel of PZMs is a set of orthogonal pseudo-Zernike polynomials defined over the polar coordinate inside a unit circle (UC).The 2D PZM of orderpwith repetitionqof an imageg(r,θ)is defined as [26]
where the pseudo-Zernike polynomialsWpq(r,θ)of orderpare defined as
where 0 ≤|q|≤p.In practice, pseudo Zernike functions (https://www.mathworks.com/matlabcentral/fileexchange/33644-pseudo-zernike-functions) are used for simplicity and fast calculation.Fig.3 displays pseudo Zernike functions of ordersp≤5.
Note that PZM are defined in terms of polar coordinates(r,θ)with |r|≤1.Therefore,the computation of PZM requires a linear transformation of the image plane (IP) coordinates(w,h),1 ≤w≤W,1 ≤h≤Hto the UC domain(x,y)∈R2.There are two commonly used transformations as shown in Fig.4:(i) IP over UC; and (ii) IP inside UC.In this study, we use the former (IP over UC), because the lesions will not occur within the four corners of the CCT image.
Figure 3:Pseudo Zernike functions of orders p ≤5
Figure 4:Two transformation (IP:image plane; UC:unit circle) (a) Raw image plane W×H (b) IP over UC (c) IP inside UC
Traditionally,p-order PZMs are sent into shallow classifiers, such as multi-layer perceptron [27], adaptive differential evolution wavelet neural network (Ada-DEWNN) [28], linear regression classifier (LRC) [29], kernel support vector machine (KSVM) [30].In this study, we introduced a customized deep-stacked sparse autoencoder (DSSAE).DSSAE is a type of deep neural network technologies, and we expect DSSAE to achieve better performances than shallow models.
The fundamental element of DSSAE in the autoencoder (AE), which is a typical shallow neural network that learns to map its inputXto outputY.There is an internal code outputINThat represents the inputX.The whole AE can be divided into two parts:An encoder part(AX,BX)that maps the inputXto the codeIN, and a decoder part(AY,BY)that maps the code to a reconstructed dataY.
The structure of AE is displayed in Fig.5, where the encoder part is with weightAXand biasBX, and the decoder part is with weightsAYand biasBY.We have
where the outputYis an estimate of inputX, andzLSis the log sigmoid function
Figure 5:Structure of an AE
The sparse autoencoder (SAE) is a variant of AE.SAE encourages sparsity into AE.SAE only allows a small fraction of the hidden neurons to be active at the same time.To minimize the error between the input vectorXand the outputY, the raw loss functionJbof AE is deduced as:
whereNSmeans the number of training samples.From Eqs.(4) and (5), we find the outputYcan be expressed in the way of
wherezAEis the abstract of AE function [31].Hence, Eq.(7) can be revised as
To avoid over-complete mapping or learn a trivial mapping, we define oneL2regularization termΓAof the weights(AX,AY)and one regularization termΓsof the sparsity constraint.Therefore, the loss functionJlof SAE is derived as:
whereasstands for the sparsity regulation factor, andaAthe weight regulation factor.The sparsity regularization termΓsis defined as:
wherezKLstands for the Kullback–Leibler divergence [32] function, |IN|is the number of elements of internal code outputIN,is them-th neuron’s average activation value over allNStraining samples, andρis its desired value, viz., sparsity proportion factor.The weight regularization termΓAis defined as
The training procedure is set to scaled conjugate gradient descent (SCGD) method.
We use SAE as the building block and establish the final deep-stacked sparse autoencoder(DSSAE) classifier by following three operations:(i) We include input layer, preprocessing layer,PZM layer; (ii) We stack four SAEs; (iii) We append softmax layer at the bottom of our AI model.The details of this proposed PZM-DSSAE model are listed in Tab.2 and illustrated in Fig.6.After processing, all the CCT images are normalized to fixed grayscaled images with the size ofW×H.Then, PZM is applied to obtain feature vector with size of(p+1)2×1.In the classification stage, four SAE blocks with number of neurons of(M1,M2,M3,M4)are employed.Finally, a softmax layer with neurons ofMcis appended, whereMcmeans the number of categories to be identified.
Table 2:Layer details of proposed PZM-DSSAE model
Figure 6:Structure of proposed PZM-DSSAE model
The small size of training images causes overfitting, one solution to data augmentation (DA)that creates fake training images.Multiple-way DA (MDA) is an enhanced method of DA.Wang [33] proposed a 14-way data augmentation, in which they employed seven different DA techniques onk-th training imageg(k)and its mirrored imageg(m)(k).
In this study, we add two new DA techniques, speckle noise (SN) [34] and salt-and-pepper noise (SAPN).SN altered image is defined as
whereNSNis uniformly distributed random noise.The mean and variance ofNSNis set tomSNandvSN, respectively.
For thek-th training imageg(k), the SAPN altered image [35] is defined asgSAPN(k)with its values are set as
wherestands for noise density, and P the probability function.gminandgmaxcorrespond to black and white colors, respectively.The definitions ofgminandgmaxcan be found in Algorithm 1.
First,QDdifferent DA methods as shown in Fig.7 are applied tog(k).Let Hm,m=1,...,QDdenote each DA operation, we have the augmented dataset on raw imageg(k)as:
Figure 7:Diagram of proposed 16-way DA
SupposeQNstands for the size of generated new images for each DA method, we have
Second, horizontal mirrored image is generated as:
wherezbstands for horizontal mirror function.
Third, all theQDdifferent DA methods are performed on the mirror imagepc(k), and generateQDdifferent dataset.
Fourth, the raw imageg(k), the mirrored imageg(m)(k), all the aboveQD-way results of raw image Hm[g(k)], andQD-way DA results of horizontal mirrored imageare combined together.The final generated dataset fromg(k)is defined as F(k):
wherezastands for the concatenation function.Suppose augmentation factor isQA, which stands for the number of images in F(k), we have
Algorithm 2 summarizes the pseudocode of proposed 18-way DA method.
?
F-fold cross-validation was used in this study.The whole dataset is divided intoFfolds.Atf-th trial, 1 ≤f≤F, thef-th fold is selected as the test, and the restF- 1 folds [36]:[1,...,f-1,f+1,...,F] are selected as training set (Fig.8).In this study, supposeF=10, then each fold will contain 32 COVID-19 images and 32 HC images.
Figure 8: F-fold cross validation
To avoid randomness, we run the whole above procedureNRtimes with different initial random seeds and different cross-validation partitions.The ideal confusion matrix (CM)Ridealis defined as
Note here the off-diagonal entries ofRidealare all zero, viz.,rideal(m,n)= 0,?mn.C1andC2are the number of samples of each category, which can be found in Algorithm 1.Seven measures are defined based on realistic CM [37] defined as:
The first four measures are sensitivity, specificity, precision and accuracy, common in most pattern recognition papers.The last three measures are F1 score, Matthews correlation coefficient(MCC) [38], and Fowlkes–Mallows index (FMI) [39].They are defined as:
Besides, the receiver operating characteristic (ROC) curve [40] is used to provide a graphical plot of our model.ROC curve is created by plotting the true positive rate against the false-positive rate at various threshold settings.The area under the curve (AUC) is also calculated.
Tab.3 displays the parameter setting of this study.The number of samples of each class is 320.The minimum and maximum grayscale values are set to(0,255).For the crop operation, 200 pixels are removed from all four sides.The preprocessed image is with size of 256×256.The max order of PZM is set to 19, so we have(19+1)2=400 PZM features.The weight regularization factoraA=0.001, the sparsity regulation factoras=1.1, and the sparsity proportion factor isρ=0.05.The neurons of four SAEs are 300, 200, 100, and 50, respectively.The number of classes to be classified is set to 2.The number of folds in cross-validation is set to 10.The mean and variance of uniformly distributed random noise in SN are set to 0 and 0.05, respectively.The noise density of SAPN is set to 0.05.The number of different DA methods is set to 9, and the number of the newly generated image is set to 30.The augmentation factor is obtained asQA=542 (See Algorithm 2).The number of runs is set to 10.
Table 3:Parameter setting
Fig.9 shows theQD-way DA to the raw image.Due to the page limit, the mirrored image and its corresponding DA results are not displayed.As can be observed in Fig.9, the multiple-way DA can increase our training images’diversity.
Figure 9: QD-way DA results of raw image (a) Gaussian noise (b) SAPN (c) SN (d) Horizontal shear (e) Vertical shear (f) Rotation (g) Gamma correction (h) Random translation (i) Scaling
Tab.4 gives the 10 runs of 10-fold cross-validation, where we can see our method achieves a sensitivity of 92.06% ± 1.54%, a specificity of 92.56% ± 1.06%, a precision of 92.53% ± 1.03%,and an accuracy of 92.31% ± 1.08%.Its F1 score, MCC, and FMI arrive at 92.29% ± 1.10%,84.64% ± 2.15%, and 92.29% ± 1.10%, respectively.The AUC is 0.9576.
In addition, we compared the two transformation settings:IP over UC against IP inside UC(See Fig.4).The IP inside the UC setting achieves a sensitivity of 91.84% ± 2.18%, a specificity of 92.44% ± 1.31%, and an accuracy of 92.14% ± 1.12%, which are worse than IP over UC setting.This comparison result demonstrates the reason why we choose IP over UC in this study.Particularly, the receiver operating characteristics (ROC) curves of both settings are displayed in Fig.10.
This proposed PZM-DSSAE method is compared with 8 state-of-the-art methods.The comparison results are carried out on the same dataset via 10 runs of 10-fold cross-validation, and the results are displayed in Tab.5.Fig.11 displays the error bar of the proposed method against 8 state-of-the-art methods.We can see that the proposed PZM-DSSAE gives the best performance among all the methods.The reason is three folds:(i) We try to use PZM as the feature descriptors,(ii) DSSAE is used as the classifier, (iii) 18-way DA is employed to solve the overfitting problem.
Table 4:10 Runs of statistical analysis of proposed PZM-DSSAE method
Figure 10:ROC curves of two settings (a) IP over UC (b) IP inside UC
Table 5:Comparison to state-of-the-art methods
Figure 11:Error bar plot of method comparison
This study proposed a novel PZM-DSSAE system for COVID-19 diagnosis.As far as the authors’best known, we are the first to apply PZM to COVID-19 image analysis.Also, two other improvements are carried out:(i) DSSAE is used as the classifier, and (ii) multiple-way data augmentation is employed to generalize the classifier.Our model yields a sensitivity of 92.06% ±1.54%, a specificity of 92.56% ± 1.06%, an accuracy of 92.31% ± 1.08%, and an AUC of 0.9576.
In the future, we shall collect more COVID-19 images from more patients and multiple modalities.Also, other advanced AI models will be tested, such as graph neural networks and attention networks.
Funding Statement:This study was supported by Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Medical Research Council Confidence in Concept Award, UK(MC_PC_17171); Hope Foundation for Cancer Research, UK (RM60G0680); Global Challenges Research Fund (GCRF), UK (P202PF11)
Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.
Computers Materials&Continua2021年12期