Xiao-Gang Wang(汪小剛) and Hao-Yu Wei(魏浩宇)
1Department of Applied Physics,Zhejiang University of Science and Technology,Hangzhou 310023,China
2Department of Optical Engineering,Zhejiang A&F University,Hangzhou 311300,China
Keywords: optical encryption,nonlinear optical cryptosystem,deep learning,phase retrieval algorithm
With the increasing demands for secured transmission and storage, information security has attracted wide attention and many optical encryption techniques have been proposed to secure information.[1,2]In 1995,Refregier and Javidi first proposed double random phase encoding(DRPE),[3]in which an image is encrypted into white noise by using two statistically independent random phase masks(RPMs)that are respectively located at the input image plane and Fourier transform domain.Inspired by the DRPE method,researchers have continuously investigated the DRPE method in different transform domains,such as Fresnel transform domain,[4]fractional Fourier transform domain,[5]and Gyrator transform domain.[6]Since the DRPE-based optical encryption has high flexibility and expansion capability, consequently, many other optical encryption methods have also been developed, which proves to be feasible and effective.[7-25]
In recent years, the vulnerability analysis of optical encryption systems has attracted wide interest from researchers.The DRPE-based optical encryption schemes have been verified to be vulnerable to different kinds of attacks[26-29]due to its inherent properties of linearity and symmetry. It was also found that they are vulnerable to new attack methods using deep learning (DL) techniques.[30-32]In order to remove the inherent symmetry and linearity of DRPE-based cryptosystems, NOC based on phase-truncated Fourier transform (PTFT) was proposed in 2010.[33]However, the PTFTbased nonlinear cryptosystem was still demonstrated to be vulnerable to some specific attacks using iterative amplitudephase retrieval algorithm (APRA) and has the silhouette problem.[34,35]Meanwhile, several asymmetric encryption schemes based on equal modulus decomposition(EMD)have been proposed in these years,[36,37]which can resist the specific attack and eliminate the silhouette problem. However,it was reported that one can obtain precise results from EMDbased encryption system by performing an attack based on phase retrieval algorithm (PRA).[38]Recently, it has been proved that the learning-based method can also break the optical asymmetric cryptosystems based on PTFT and EMD.[39]
To improve the security level and remove the silhouette problem,amplitude-phase retrieval algorithm(APRA)and phase retrieval algorithm (PRA) have been introduced into nonlinear optical cryptosystems (NOCs).[40,41]The results of analysis show that the APRA-based and PRA-based NOCs can provide a high robustness against various conventional attacks.However,the vulnerabilities of these two types of NOCs against DL-based attacks, to the best of our knowledge, have not been studied before. In this work, a DenseNet model for cryptanalysis is proposed to analyze the securities of the two types of NOCs in the Fourier and Fresnel domains for the first time. In this end-to-end DL-based method, the neural network model is trained by using a set of ciphertexts and the corresponding plaintexts,and the Tensorflow,a popular opensource artificial intelligence library developed by Google, is used for setting up the neural network and conducting the network training. The optimal parameters of this network are determined through a series of forward and backward passes,which minimize a loss function. Numerical experiments show that the trained DenseNet model can be used to retrieve unknown plaintexts from the given ciphertexts without using optical keys, which implies that the PRA-based NOCs cannot resist the proposed DL-based method attack.
The rest of this paper is organized as follows. In Section 2, we first describe the basic principles of two types of NOCs. And then the DL-based model for cryptanalysis is proposed in Section 3. In Section 4, we show the results of test simulations and performance of our proposed DL-based attack method. Finally,some conclusions are drawn in Section 5.
In this subsection, we give a brief introduction to the principle of the Fourier domain APRA-based NOC.[40]In the encryption scheme an amplitude and phase mixture retrieval of the Yang-Gu algorithm is used, hereafter referred to as amplitude-phase retrieval algorithm. The flowchart for encryption is shown in Fig. 1(a), where the encryption process is conducted in two procedures. In the first procedure, based on a mixture-type amplitude-phase retrieval,the secret imagef(x,y) is encrypted into the amplitudeg'(u,v) with a public keyP1(u,v) and a binary phase modulation exp[iπγ1(u,v)].In the second procedure, the amplitudeg'(u,v) is the input image and encrypted into ciphertextC(x',y')in the same way as another public keyP2(x',y') and binary phase modulation exp[iπγ2(x',y')]. Figure 1(b)is a flowchart illustrating thekth iteration in the first procedure of encryption process, which can be given as follows.
(i) In thekth iteration, the plaintextf(x,y) is multiplied by phase function exp[iαk(x,y)], and is then Fourier transformed to the output plane,the complex amplitude function in the output plane can be expressed as
where the functionP?1(u,v)denotes the complex conjugate of the public keyP1(u,v),andP1(u,v)represents the public random phase keys,which are fixed in the encryption process.
(iii) Functiongk(u,v) is multiplied by the public keyP1(u,v) and is then inverse Fourier transformed to the input plane, the complex function in the input plane can be expressed as
where IFT denotes the inverse Fourier transform.The functionIk+1(x,y)exp[iαk+1(x,y)] is the result of the Fourier transform of the functiongk(u,v)P1(u,v), whereIk+1(x,y) and exp[iαk+1(x,y)] represent amplitude and phase function respectively. Note that phase part exp[iαk+1(x,y)] is reserved and used to replace exp[iαk(x,y)]exp[iαk(x,y)]in the next iteration while the amplitude partIk+1(x,y)is discarded.
(iv) The convergence of the iteration process is determined by the correlation coefficient (CC) value between
f(x,y)andIk+1(x,y).
Fig.1. Flowcharts of(a)encryption process and(b)the k-th iteration in the first procedure of encryption process,with PR denoting operation of phase reservation.
The final retrieved amplitudeg(u,v) contains both positive and negative elements, therefore a binary modulation exp[iπγ1(u,v)] is introduced, and the private modulationγ1(u,v)is given by the following equation:
where (x',y') is the coordinate in the spatial domain,G'(x',y') is the result of the Fourier transform of functiong'(u,v)exp[iβ(u,v)].C(x',y')represents the amplitude function that contains both positive and negative elements,The binary modulationγ2(x',y') is generated simultaneously to obtain the ciphertextC'(x',y'). For simplicity,we omit the subscripts denoting the iteration number in each procedure for the unknown quantities. As a result, the decryption keys can be given as follows:[40]
In this previously proposed asymmetric cryptosystem based on APRA,two RPMsP1,P2serve as the public encryption keys while the decryption keysD1andD2are generated by using the public encryption keys. It is alleged that one cannot retrieve the secret image only by using the public keys and the cryptosystem cannot be cracked by using the traditional attack methods.[40]
Based on a Gerchberg-Saxton (GS) PRA and DPRE,Rajput and Nishchal proposed a Fresnel domain PRA-based NOC,which could provide a high level of robustness against known-plaintext, chosen-plaintext, and special attacks.[41]Figure 2 displays a flowchart showing the steps of the encryption process. The ciphertext can be obtained by employing the GS PRA twice.
Fig.2. Flowchart of PRA-based encryption scheme,with PT denoting operation of phase truncation.
(i)In the first level of encryption, the original amplitude imagef(x,y)multiplied by an RPM,exp[iαn(x,y)],is used as the input ofnth iteration,where the output of the first level is given by
The main objective of the NOCs is to break the linearity of conventional systems,where the encryption method and keys are necessary for the recipient to perform the decryption.In those NOCs, the encryption keys are randomly generated by the computer, and the encryption keys,i.e.P1,P2,A1, andA2,are fixed once generated in the iterations. When those encryption keys are used for encrypting different plaintexts, do the NOCs remain unbreakable? In this case, we noticed that the correspondence between cyphertext and plaintext is oneto-one and the encryption keys are the same. Now,a question arises. Can we construct an iteration procedure to asymmetrically map the plaintexts into ciphertexts? Or is it possible to detect the relationships between cyphertext and plaintext by using a series of available “plaintext-ciphertext” pairs? The answer is yes. Since the encryption keys used for encrypting different plaintexts are the same,what we can infer is that the relationship between different ciphertexts is existent. For simplicity, the relationship between the plaintext and the cyphertext in NOCs is described as
where F[·] denotes the encryption algorithm that transforms the plaintext in the input plane into the cyphertext in the output plane.For convenience,we use the same notation,i.e.F[·],for the two different NOCs. The reconstruction process can be given by whereR[·]represents the mapping of the ciphertextsIinto the plaintextsO.
As is well known,the machine learning algorithms prove to be able to yield high-quality solutions in several computer imaging problems during the past few years. Owing to their nonlinear characteristics, it is reasonable to use deep neural networks to estimate the relationship described by Eqs. (21)and (22). In our proposed method, the proposed DenseNet is designed to achieve the approximate equivalent models of NOCs by using a number of“plaintext-ciphertext”pairs.
Figure 3(a)show the proposed DenseNet architecture for cryptanalysis. It adopts encoder-decoder framework[42]and can be described as follows.
The inputs fed to the designed DenseNet model are ciphertexts with a size of 64×64 pixels. Each ciphertext firstly passes through a standard convolutional layer with 24 filter kernels with a size of 3×3, and is then successively decimated by six “Denseblock+AveragePooling” blocks, which serve as an encoder to extract the feature maps from the input ciphertexts. Each denseblock consists of four composite convolution layers,and each layer connects to every other layers within the same block in a feed-forward fashion.[43]Each composite convolutional layer is comprised of three consecutive operations: batch normalization (BN), rectified linear unit (ReLU) and convolution with small filters (size of 3×3);“AveragePooling”denotes 2×2 average pooling layer with stride 2, which is used for downsampling. After transmitting through another dense block, the data pass through six “Upsampling+Denseblock” blocks. Upsampling represents up-convolutional layer, which serves as a decoder to perform pixel-wise regression. Finally, the feature maps pass onto a standard convolutional layer,of which the filter size is 1×1, and the estimate of the plaintext is then produced. In this method, the skip connections are used to pass the highfrequency information learned in the initial layers down the network toward the output reconstruction.[44]
In this method, L2 regularization with weight decay of 0.001 is applied to all convolutional filters. The same regularization is also used in batch normalization. A small dropout rate of 0.05 is set to prevent overfitting. Owing to the GPU memory constraints, our DenseNet model is trained by using the Adam optimizer with a mini-batch size of 2. The training process starts with an initial learning rate of 0.001 and drops it by a factor of 2 after every 50 epochs. The neural network is trained for 300 epochs with the training samples being shuffled at each epoch. In this approach,mean square error(MSE)is used to monitor the network performance and can be defined as
wherenrepresents the pixel number of the input ciphertext,Yidenotes the ground truth data,i.e.the plaintext, and ?Yiis the output prediction,respectively.
Fig.3. Designed DenseNet architecture,indicating(a)training process and(b)testing process of the DenseNet model.
Numerical experiments have been performed to verify the feasibility of the proposed DL-based attack method. In the Fresnel domain nonlinear cryptosystem,the wavelength of laser light isλ=632.8 nm and the two diffraction distances are set to bez1=0.1 m,andz2=0.2 m. In the iteration processes of the two different types of NOCs based on the APRA and PRA,the correlation coefficient(CC)is employed to evaluate the difference between the original input image and its estimates.The CC thresholds for the iteration processes are both set to be 0.98.The experiments are carried out on a database of 10000 grayscale images with a size of 64×64 pixels, which are selected from the Fashion-MNIST datasets randomly.[45]The chosen 9000 images are used as the training data and the rest 1000 images as the testing data.
The proposed DL-based algorithm is implemented on a GeForce GTX1650 GPU by Tensorflow 1.2, a deep learning framework. A learning rate constant ofη= 0.001 is used for the learning of neural network and the number of training epochs is set to be 300. The execution time for the training of DenseNet model for each NOC is about 24.0 h. The MSE curves versus the epoch of the two DenseNets respectively corresponding to the two types of NOCs are shown in Fig. 4, where the MSEs of two networks drop to 0.01 after 300 epochs.
Fig.4. MSE curves in training processes for(a)APRA-based NOC and(b)PRA-based NOC.
To verify the effectiveness of the designed DenseNet model,test sets are fed into each of the pre-trained DenseNets for ARPA-based and PRA-based NOCs. The reconstruction results using the proposed DenseNets are illustrated in Figs. 5(I) and 5(II) separately, where the input testing data,ground truth and the results derived from pre-trained DenseNet are respectively shown in Figs.5(a)-5(c).As can be seen from Fig.5(c),all the retrieved images can be recognized by visual inspection. The average CC values of the retrieved images in Figs. 5(I) and 5(II) are 0.9702 and 0.9675, respectively. The successful decryption of the encrypted images without knowing the correct keys proves the good performance of our proposed DL-based method.
Fig.5. Images retrieved from test data. (I)APRA-based NOC.(II)PRA-based NOC.(a)Testing ciphertext,(b)ground truth,and(c)retrieved images by trained DenseNet.
To further verify the effectiveness of the proposed method for different databases,500 images are then randomly selected for predicting from MNIST database. Figure 6 shows the unknown plaintext images retrieved by utilizing the proposed DenseNet model, which has been pre-trained on the MNIST database. The ciphertext images, ground truth, retrieved images are shown in Figs.6(a)-6(c),respectively. Likewise,the encrypted images can be retrieved with high quality. The average CC values of the retrieved images in Figs.6(I)and 6(II)are 0.9819 and 0.9535,respectively.
Fig.6. Predicted results of MNIST database. (I)APRA-based NOC.(II)PRA-based NOC.(a)Ciphertext images,(b)plaintext images,and(c)retrieved images by using pre-trained DenseNet.
To demonstrate the robustness of the attack method based on the DenseNet,we investigate the influence of occlusion on the performance of the system. The simulation results are shown in Fig. 7, where panels (I) and (II) represent the test results corresponding to the APRA-based NOC and the PRAbased NOC, respectively. Figure 7(a) shows the ciphertext with 5%occlusion,and the plaintexts retrieved by pre-trained DenseNet model are shown in Fig.7(b). The average CC values of retrieved images are 0.9342 and 0.8942 with respect to APRA-based NOC and PRA-based NOC, respectively. Figure 7(c) is the ciphertext with 10% occlusion, and the plaintexts attacked by DenseNet are shown in Fig.7(b),the average CC values of the retrieved images corresponding to the two types of NOCs are 0.8423 and 0.8179,respectively. It can be found that the secret images can still be recovered and recognized when there are few data lost in the ciphertext.
Furthermore, we investigate the effect of noise pollution on the decryption. The zero-mean Gaussian noise is added into the ciphertexts and observe their retrieved images. The simulation results are shown in Fig. 8, where panels (I) and(II)represent the results from the attack on APRA-based NOC and PRA-based NOC,respectively. Figure 8(a)shows ciphertexts polluted by Gaussian noise with standard deviation of 0.01,and the derived images are shown in Fig.8(b),where the corresponding average CC values of the two sets are 0.9423 and 0.9156. The ciphertexts polluted by Gaussian noise with a standard deviation of 0.02 are shown in Fig. 8(c), and the two sets of retrieved images are shown in Fig. 8(d), with the average CC values being 0.8953 and 0.8625,respectively,which demonstrates the good performance against the Gaussian noise.
Compared with conventional Unet and convolution neural network(CNN),the proposed DenseNet has more direct connections between layers, strengthening feature propagation,encouraging feature reuse and substantially reducing the number of parameters. The previously proposed neural networks(CNN and Resnet)[20-22]have been further used to attack the two types of NOCs. However, the results obtained under the same training conditions,i.e.,the same quantity of data,training time and epoch,are not satisfactory. The results are given in Fig.9,which shows the better performance of our proposed neural network.
Fig.8. Robustness test of DenseNet on Gaussian noise with respect to(I)APRA-based NOC and(II)PRA-based NOC,showing(a)ciphertexts with 0.01 variance Gaussian, (b)retrieved images from panel(a), (c)ciphertexts with 0.02 variance Gaussian, and(d)retrieved images from panel(c).
Fig.9. Attack results of NOCs using different neural networks. (I)APRA-based NOC and(II)PRA-based NOC,showing(a)decrypted images using DenseNet,(b)retrieved image by using CNN,and(c)retrieved images using Resnet.
The simulation results show that it is possible for us to estimate the relationship between the outputs and inputs by learning models designed for cryptoanalysis. The proposed DenseNet can be considered as a black box,which can achieve the approximate equivalent models of NOCs by using a series of “plaintext-ciphertext” pairs without additional conditions.The unknown plaintexts can be retrieved by using the trained DenseNet model without decryption keys and complex phase retrieval algorithms.
In this paper,we proposed a novel DL-based method for cryptanalysis of two types of NOCs based on DL for the first time. The designed DenseNet model is built up according to the densely connected convolutional neural network architecture and trained by a series of“ciphertext-plaintext”pairs.With pre-trained DenseNet,secret images can be successfully derived from ciphertexts without using encryption key or decryption key. The robustness of the DenseNet model against occlusion and noise pollutionis also investigated. The results of numerical experiments show that the two types of NOCs are vulnerable to our well-designed DL-based attack. It can also be seen that the performance obtained in this work is better than that obtained by the previously proposed methods.
Acknowledgements
Project supported by the National Natural Science Foundation of China (Grant Nos. 61975185 and 61575178), the Natural Science Foundation of Zhejiang Province, China(Grant No. LY19F030004), and the Scientific Research and Development Fund of Zhejiang University of Science and Technology,China(Grant No.F701108L03).