Linna ZHOU ,Zhigao LU ,Weike YOU? ,Xiaofei FANG
1School of Cyberspace Security,Beijing University of Posts and Telecommunications,Beijing 100084,China
2School of Information Science & Technology,University of International Relations,Beijing 100091,China
Abstract: In the field of reversible data hiding (RDH),designing a high-precision predictor to reduce the embedding distortion and developing an effective embedding strategy to minimize the distortion caused by embedding information are the two most critical aspects.In this paper,we propose a new RDH method,including a predictor based on a transformer and a novel embedding strategy with multiple embedding rules.In the predictor part,we first design a transformer-based predictor.Then,we propose an image division method to divide the image into four parts,which can use more pixels as context.Compared with other predictors,the transformer-based predictor can extend the range of pixels for prediction from neighboring pixels to global ones,making it more accurate in reducing the embedding distortion.In the embedding strategy part,we first propose a complexity measurement with pixels in the target blocks.Then,we develop an improved prediction error ordering rule.Finally,we provide an embedding strategy including multiple embedding rules for the first time.The proposed RDH method can effectively reduce the distortion and provide satisfactory results in improving the visual quality of data-hidden images,and experimental results show that the performance of our RDH method is leading the field.
Key words: Reversible data hiding;Transformer;Adaptive embedding strategy
Reversible data hiding (RDH) has been widely used in scenarios requiring high-quality cover signals,e.g.,military communication and health care,as secret data can be embedded into the cover signal without loss (Cox et al.,2002).Prediction error expansion (PEE) (Thodi and Rodriguez,2007) is one of the most widely used techniques to achieve RDH.PEE achieves prediction of the target pixel using the context pixels to generate the prediction error at first.Then,the result obtained is expanded according to the predefined strategy to achieve data embedding.
So far,one approach to improve the PEE is designing a high-precision prediction method to reduce embedding distortion,since,in PEE,the smaller the prediction error,the better the visual quality after embedding the pixels.Most of the prediction methods focus on the improvement of the predictor,e.g.,difference predictor (Tian,2003),median edge direction predictor (Thodi and Rodriguez,2007),bilinear interpolation predictor (Sachnev et al.,2009;Luo et al.,2010),rhombus predictor (Chen et al.,2010),gradient adaptive predictor (Coltuc,2011,2012),and others using multiple predictors (Jafar et al.,2016).The above-mentioned predictors use the similarity between the target pixel and neighboring pixels.However,they all use only one or a few neighboring pixels for prediction,which limits the similarity between the target pixel and the global pixels.Targeting the flaw,Weng et al.(2017) designed a prediction pattern in which each to-be-embedded pixel can be predicted bynneighboring pixels surrounding it.The larger the value ofn,the more accurate the prediction,and the better the embedding performance achieved at low embedding capacity,and vice versa.Hu and Xiang (2022)proposed a global nonlinear predictor based on convolutional neural networks (CNNs).Although the predictor would effectively expand the range of pixels for prediction,the method could use only more local pixels instead of the pixels of the whole image for prediction because of the disadvantages of CNNs in processing global information.Therefore,exploring a predictor that can use every pixel of the entire image is necessary.In recent years,transformers have emerged in computer vision.Because a transformer can handle long-distance information better than CNNs,the transformer technology,which combined with CNNs or completely replaced CNNs,has developed rapidly in image generation,image division,image detection,etc.Although transformers have not been used in the field of RDH,the introduction of transformers into the RDH community makes it possible to further improve the prediction performance given its excellent performance in the image field.
In addition,another type of approach to improve the PEE is designing an effective embedding strategy to reduce the distortion caused by embedding information.Some embedding strategies focus on the improvement of embedding rules,e.g.,histogram shifting (HS) (Thodi and Rodriguez,2007),pixel value ordering (PVO)-k(Ou et al.,2014),and improved PVO (IPVO) (Weng et al.,2019).However,all existing embedding strategies include only one embedding rule.Since a complete image can be seen as being composed of complex textured blocks and simple smooth blocks,it may be more appropriate to use different embedding rules for different blocks.Therefore,designing an embedding method that can combine multiple embedding rules is necessary.Meanwhile,existing embedding strategies focus on complexity measurements,distinguishing smooth and textured blocks,and determining the embedding areas (Wang X et al.,2015;Hu and Xiang,2022);the complexity measurements adopted usually use the neighboring pixels or pixel blocks to calculate the complexity of the target pixel (i.e.,pixel block).For example,the complexity of He and Cai (2021)was calculated as the sum of the absolute differences between two consecutive pixels in the horizontal or vertical direction.However,the pixels in target blocks may affect the complexity,and these pixels cannot be ignored.
In this paper,both prediction and embedding techniques are considered;that is,our RDH method consists of a transformer predictor and an adaptive embedding strategy with multiple embedding rules.In the predictor part,we use the advantages of transformers to establish global pixel correlation(Goodfellow et al.,2014;Vaswani et al.,2017;Esser et al.,2021;Zheng et al.,2022) and design a predictor based on the transformer.The transformer predictor extends the range of pixels for prediction from neighboring pixels to global ones for the first time.An image division method is also proposed for our predictor,which can use twice the pixels of the rhombus predictor as context.Through our predictor,we obtain many minor prediction errors and then,while embedding secret data,the adaptive embedding strategy expands or secondary extends these prediction errors and embeds secret data.The smaller the prediction error,the smaller the distortion caused by the expanded image.This guarantees the effectiveness of our predictor.
In the embedding strategy part,we first present a complexity measurement step for pixels in target blocks.Then,an effective embedding rule called improved prediction error ordering(IPEO)is proposed by redesigning IPVO(Weng et al.,2019),which sorts the prediction error instead of the pixel value.Next,we offer an embedding strategy including multiple embedding rules for the first time,which can select different rules adaptively according to the complexity of the image.Our adaptive embedding strategy selects the appropriate embedding strategy for the embedded blocks with different complexities,which can improve the embedding capacity and the embedding performance.
Extensive experimental results show that the proposed RDH method provides satisfactory results in embedding information and exceeds the performance of existing methods.
The rhombus predictor is the first and most classic PEE predictor.Chen et al.(2010) designed a rhombus predictor,which includes the following two steps:feature selection and feature modification based on difference expansion.The rhombus predictor could reconstruct the process of embedding features and transform the original pixel distribution into the predicted error (predicted residual) distribution.Because the residual distribution is similar to the Laplace distribution,the image information expression is more compact,and effective entropy subtraction is realized.The rhombus predictor divides an image into dot sets and cross sets (Fig.1),and takes the mean of four neighboring pixels of the target pixel as its prediction value because these neighboring pixels can have a significant impact on the target pixel.The subsequent modifications to the rhombus predictor focus mainly on designing an efficient predictor and using more pixels as the prediction context.
Fig.1 Image division of a rhombus predictor
Though the rhombus predictor and the rhombus predictor based improved predictors use the similarity between the target pixel and the neighboring pixels,they are local and linear,and are not accurate enough for complex distributed images (Hu and Xiang,2021).Therefore,it is necessary to explore a global nonlinear predictor that can improve prediction accuracy.
Complexity measurement is a selection strategy using the correlation between neighboring pixels,e.g.,the local variance (Sachnev et al.,2009),forward variance (Li et al.,2011),error energy estimation (Hong,2012),local absolute difference (Ou et al.,2013),and full-enclosing context strategy(He et al.,2017).It determines the embedding position by calculating the complexity of pixels(pixel blocks)and it is important to reduce the embedding distortion.He and Cai (2021) calculated the complexity of pixel blocks as the sum of the absolute differences between two consecutive pixels in the horizontal or vertical direction,which first determined pixel blocks to be embedded and then embedded the data according to the embedding strategy.Hu and Xiang(2022)directly calculated the complexity of each pixel composing pixel blocks by adopting anL×Lblock centered at the to-be-embedded pixel.
In Section 3.4.1,we will consider the work of He and Cai (2021)as an example to discuss the defects of complexity measurement based on pixel blocks,namely,the absence of available pixels in the target block in calculation.
Hu and Xiang (2021) proposed a CNN predictor(CNNP),which introduced deep learning into the RDH field for the first time.The feature extraction step comprises several convolution blocks arranged parallel to exploit multiple receptive fields.The kernel size of these convolution blocks isK×K,whereKis set to 3,5,or 7.The features extracted from different convolution blocks are added together and fed into two convolution blocks with kernel size ofK=3.
Based on the CNNP,Hu and Xiang (2022) designed a new image division (Fig.2),to enable the CNNP to use more image pixels as context.The cover image is first divided into dot sets and cross sets.Subsequently,the cover image is divided by the 2×2 gray blocks and white blocks.
Fig.2 Hu and Xiang (2022)’s image division method
Although the CNNP effectively expands the range of pixels for prediction,the method can use only more local pixels instead of the pixels of the whole image for prediction because of the disadvantages of CNNs in processing global information(Dosovitskiy et al.,2021).
Li et al.(2013)proposed PVO to sort the pixel values in the blocks divided for embedding.Ou et al.(2014) proposed PVO-kby adopting more pixels in the divided block.Peng et al.(2014)proposed IPVO by exploiting image redundancy.He and Cai (2021)proposed a dual pairwise PEE strategy to fully exploit the potential of pairwise PEE.In pairwise PEE(Ou et al.,2016),instead of two bits of data,one of the combinations of the bits 00,01,and 10 is embedded into a pair of expandable errors.In this way,the embedding distortion can be reduced at the cost of embedding log23 instead of two bits.Qu and Kim(2015) proposed a novel pixel-based PVO (PPVO).In their method,each pixel is predicted using its sorted context pixels.In this way,almost all the pixels can be predicted,and hence,the number of prediction errors is largely increased.Correspondingly,the number of prediction errors capable of carrying data bits is increased.
The main idea of PEO (Zhang et al.,2020b) is to exploit the intercorrelations of prediction errors by combining PEE with the recent RDH technique of PVO.Specifically,the prediction errors within an image block are first sorted.Then,the block’s maximum and minimum prediction errors are predicted and modified for data embedding.By the proposed approach,image redundancy is better exploited,and promising embedding performance is achieved.
In this section,we begin with the framework of our method.Then,we give the design of the transformer predictor.Finally,we introduce the adaptive embedding strategy.The embedding process is shown in Fig.3.
Fig.3 Framework of the proposed reversible data hiding (RDH) method.The framework consists of a predictor based on a transformer and an embedding strategy including multiple embedding methods.The detailed processes of the predictor and embedding strategy are listed separately (CNN:convolutional neural network)
In this study,we propose a new RDH method(Fig.3).The embedding framework consists of two modules:a transformer predictor and an adaptive embedding strategy.It should be noted that the predictor mentioned in the framework has been trained.
In the prediction process,we refer to our image division method and divide the image into four sets(dot setI1,cross setI2,triangle setI3,and star setI4).At the same time,we divide the secret data into four sets asW1,W2,W3,andW4.Take setI4as an example.We useI1,I2,andI3as the context for our transformer predictor to generate the prediction set ofI4.The specific generation process is as follows:first,maskI4;then,convert the remaining three sets into codes through the encoder and input the sequence of codes into the transformer for content reasoning of the masked region;finally,generate an image of the masked area through the generator.The prediction error ofI4,,is calculated fromI4and the generated prediction set ofI4.Our adaptive embedding strategy then uses the prediction error to embed hidden data.In the embedding process,we first divideinto several non-overlapping blocks and calculate each block’s complexity.Finally,according to the complexity measurement,we decide the embedding region and embedding order,select different embedding rules to embed hidden dataW4,and gain the secret part.Similarly,,andare obtained by the same method.The final secret imageIWcan be obtained by combining,,and.
The extraction process of secret data is similar to the embedding process.We first use,andto generate,which is in cooperation with,to extract the informationW1and recover the original dot set imageI1.The same process recovers the other sets.It should be noted that the extraction process must be reversed to the embedding process.For example,if the embedded part starts fromI1,the extraction must start from.In the processes of embedding and extraction,the parts used for prediction and complexity calculation are not changed.Therefore,reversibility can be guaranteed.
We provide an image division method for our predictor because our predictor can use global pixels for prediction.The more pixels the predictor can use,the better the prediction performance.The rhombus predictor divides an image into dot sets and cross sets,and each pixel can be predicted using four neighboring pixels.Hu and Xiang (2022) adopted a new division method(Fig.2)to improve the rhombus predictor,which can provide nearly three quarters of the image information as the context.First,the image was divided into dot sets and cross sets.Subsequently,the image was divided by the 2×2 gray blocks and white blocks.From the local information,the image division method of Hu and Xiang (2022)can use four neighboring pixels as context.From the global information,the image division method can use nearly three quarters of the image.
Referring to Wang XY et al.(2021),we propose a method to divide an image into four parts.Fig.4 demonstrates the proposed division method using a 6×6 image.The image is divided into four parts,including dot setI1,cross setI2,triangle setI3,and star setI4.From the local information,our image division method can use eight neighboring pixels as context,which is twice as much as those used in Hu and Xiang (2022).From the global information,the image division method can use three quarters of the image.
Fig.4 Our image division method
In image generation,the mask is used to shield the area to be generated,e.g.,zeroing the area or turning it into random noise.Image generation aims to reconstruct the masked part to the original image.Usually,the masked area is a continuous block of pixels,e.g.,a dog or an airplane in the image.
However,since we divide the image into four sets,the pixels in each set are discontinuous,and continuing to use the usual mask method will weaken the correlation between pixels.Therefore,we set the mask to a set of pixels with a step size of 2.0.Fig.5a shows the original image,and Fig.5b shows our mask method.It is not difficult to see that this mask is the same as that in our image division method.Specifically,we use all the pixels with a step size of 2.0,starting from (0,0),(0,1),(1,0),and (1,1) to obtain four masks,corresponding toI1,I2,I3,andI4,respectively,obtained by image division.In other words,we can directly reconstruct target pixels by using the modified mask.
Fig.5 Parts of the test image Lena:(a) unmasked original image;(b) masked image after using our proposed mask method.The mask in (b) is used for final training and testing,which is the collection of pixels starting from (0,0) with a step size of 2.0.The white area is the masked pixels.In addition,there are three other masks starting from (0,1),(1,0),and(1,1),respectively
3.3.1 Design of transformer predictor
As stated in Section 2.3,a high-performance predictor should be designed to break the limitation of neighboring pixels.Unlike CNNP,we use the image generation method to predict the target pixels and introduce transformers into RDH.
We need a lightweight image generation algorithm that does not require large datasets to design our predictor because RDH has only a small number of datasets.We design our predictor by following a vector quantized generative adversarial network(VQGAN)(Esser et al.,2021),which expresses the consideration of an image in the form of a sequence,instead of as pixels,and learns the image composition according to the index of the code.The structure of our predictor is composed of two parts:a convolutional network consisting of an encoderEand a decoderG,and a content reasoning module with transformers.The former is used to build codebook and generate images,and the latter is used to reason about the contents of the mask.Our predictor is designed to generate more minor prediction errors,which can achieve data expansion and embedding more effectively,thus improving the embedding performance.
3.3.2 Learning an effective codebook of images
In this part,we learn a convolutional network consisting of an encoderEand a decoderGto represent an imagex ∈RH×W×3to a spatial collection of codebook entrieszq ∈Rh×w×nz,wherenzis the dimensionality of codes,andHandWrepresent the height and width,respectively(handware the corresponding values in the codebook).An equivalent representation is a sequence ofhwindices,which specify the respective entries in the learned codebook (Esser et al.,2021).More precisely,we follow Esser et al.(2021) and approximate a given imagexbyWe obtainzqusing the encoding=E(x)∈Rh×w×nzand a subsequent elementwise quantizationq(·)of each spatial codeonto its closest codebook entryzk:
zqis readily recovered and decoded to an imageusing the following expression:
Due to our image division,we need to modify the part of feature extraction to adapt discontinuous mask regions.To avoid problems such as a VQGAN(Esser et al.,2021)being gradually influenced in the first place by neighboring pixels in the deep CNN layers or an image generative pre-trained transformer(iGPT)(Chen et al.,2020)that loses important context details due to the large-scale down-sampling,we use a stacked(×4)CNN embedding as our encoder.In each block,the 1×1 filter and layer norm are applied for nonlinear projection,followed by a partial convolutional layer (Zheng et al.,2022) that uses a 2×2 filter with stride two to extract visible information.The original partial convolution operation is done as follows:
whereWpcontains the convolution filter weights,bis the corresponding bias,andxpandmpare the feature values and mask values,respectively,in the current convolution window.
We continue to use the loss function in Esser et al.(2021):
whereLReis the reconstruction loss,Lsgois the stop-gradient operation loss,sg[·] denotes the stopgradient operation,andLcommitmentis the so-called commitment loss(van den Oord et al.,2017).
3.3.3 Reasoning content with transformer
Our transformer encoder is built on the standard qkv self-attention (SA) model (Vaswani et al.,2017).With encoderEand decoderG,we can represent images in terms of a sequence of indices(s ∈{0,1,...,|Z|?1}hw)from the codebook,wheresis equivalent tozq=q(E(x))∈Rh×w×nzgiven by imagex,andZis the codebook.The input of the transformer is as follows:
whereWqkvis the learned parameter to refine featuresfor queryq,keyk,and valuev,andAis the dot similarity which is scaled by the square root of feature dimensionCh.Then,we compute a weighted sum over all valuesvvia
With indicess3.4 Adaptive embedding strategy
Our adaptive embedding strategy includes a complexity measurement and multiple embedding rules.The strategy matches appropriate embedding rules for different areas of embedding according to target requirements and reduces embedding distortion.
3.4.1 Complexity measurement
The complexity (He and Cai,2021) shown in Fig.6 is calculated as the sum of the absolute differences between two consecutive pixels in the horizontal or vertical direction.
Fig.6 He and Cai (2021)’s complexity measurement
However,only one quarter of the image is used for embedding,with one-fourth pixels used for prediction every time.Hence,we propose a complexity measurement to use the remaining invariant pixels.According to the proposed image division method,more context can be provided to undertake the complexity measurement.
Fig.7 shows our proposed complexity measurement.Since the image is divided into four sets,we take the dot set as an example.We use the block complexity measurement method in Section 2.2,and a complexity measurement Comiis computed for each block.As shown in Fig.7,we first divide the image into the dot set and non-dot set.To avoid affecting reversibility,we use a non-dot set to calculate the complexity of the dot set.Moreover,we use a method similar to that proposed by He and Cai(2021) to use the neighboring pixel blocks.Comiis calculated as the sum of the absolute differences between two pixels of the same set in the horizontal or vertical direction.For boundary blocks,we simply double the value derived from the nearest fullenclosing pixels.We follow He and Cai (2021) and mark the pixels of the cross set asc1,c2,...,cnfrom top to bottom and from left to right.Similarly,the pixels in the triangle setI3and star setI4are marked ast1,t2,...,tnands1,s2,...,sn,respectively.The complexity measurement formula of the neighboring pixel blocks Comoutis expressed as follows:
Fig.7 Our complexity measurement
wherenis the number of pixels in each set,andx ∈{c,t,s}.The complexity of pixels in the target block Cominis derived using the same approach:
wheremis the number of pixels in each set,andx ∈{c,t,s}.The complexity measurement Comiof each block is calculated as follows:
3.4.2 Embedding strategy of multiple embedding rules
The proposed embedding strategy is the first in the field to use multiple embedding rules,including IPEO and HS.IPEO is a new embedding rule based on IPVO,which sorts the prediction error rather than the pixel value.We first divide the image into several non-overlapping blocks.For each split block,the prediction errors are sorted by values from small to large,to obtaineσ(1),eσ(2),...,eσ(n).Here,σ(·)represents the location sorted by the size of the prediction error.IPEO calculates the second-order prediction error from the prediction error and embeds the data.The embedding process is similar to IPVO.The calculation formula of the maximum error pairis as follows:
where
According to Eq.(16),the expandable secondorder errors can be determined asandAs a result,our two-dimensional(2D)mapping is shown in Fig.8,which describes the transformation of,where the secondorder error pair (1,1) is embedded with log23 bits.The other three types of second-order errors are embedded with 1,1,and 0 bits,separately.
Fig.8 Four types of second-order prediction error pairs of improved prediction error ordering (IPEO).The embedded capacities of the blue,brown,red,and green types are log23,1,1,and 0 bits,respectively.References to color refer to the online version of this figure
However,if the prediction error is small enough,using IPEO will waste the embedding capacity.HS is an excellent alternative to IPEO.HS directly embeds secret data into prediction error,and the embedding capacity of each pixel is 1 or 0 bits.The specific description is as follows:
whereb ∈{0,1}denotes the embedded data andTis an artificially set threshold.
An effective embedding strategy can ensure invisibility and increase the embedding capacity as much as possible.In addition,it can adapt to images with different complexities.The proposed IPEO can significantly improve the imperceptibility,but it will lead to the loss of embedding capacity when dealing with images with low complexity.HS can embed a large amount of data when the prediction error is small.Still,it will produce many invalid translations when the image is more complex,resulting in small embedding capacity and poor imperceptibility.
Therefore,we propose an embedding strategy consisting of IPEO and HS,and adopt complexity to select embedding rules.We define a range for smooth,textured,and non-embeddable blocks with our complexity measurement.HS embeds data into smooth blocks,while IPEO is used for textured blocks.The pixels in the non-embeddable blocks do not change.This strategy realizes the adaptive matching of embedding rules in different embedding areas.
In practical applications,some auxiliary information is needed to extract the hidden information and recover the original image at the receiver end.The pixels in the cover image with values 255 and 0 are changed to 254 and 1 to avoid overflow and underflow errors,respectively.The location map(Howard and Vitter,2016) is used to determine whether a pixel valued 1 (254) should be changed to 0 (255)or not.The auxiliary information,as in He and Cai (2021),includes block size (4 bits),thresholds(T1,T2) (24 bits),end position (16 bits),and length of the compressed location map(16 bits).
4.1.1 Training parameters
As described in Section 3.2,we modified the usual mask method to make it a set of pixels with a step size of 2.0 (as shown in Fig.5).Considering the situation that if the mask step is set to 2.0 in the actual initial training process and discontinuous pixels are directly used as the context in the training process,the local minimum problem might occur,and the initial step size of the mask was set toN(N >2).Nwill gradually decrease with the training process until it finally decreases to 2.Expressly,Nwas initially set to 8.Compared with the direct setting of 2,it can use consecutive pixels to generate target pixels whenN=8.Then,we setN=4 as the transition and finally fix the step size toN=2.This method of decreasing the step size can avoid the local minimum problem and improve the training speed.WhenN=2,the mask method can ensure that it is the same as the image division method described in Section 3.2.In addition,the coding stage encoding images of sizeH ×Winto discrete codes of size(H/f)×(W/f)was denoted by a factorf.We followed the parameters in Esser et al.(2021)and setf=16.
4.1.2 Datasets used for training
We trained our models on various datasets,including CelebAHQ (Liu et al.,2015;Karras et al.,2018),FFHQ (Karras et al.,2019),Places2 (Zhou et al.,2018),and ImageNet (Russakovsky et al.,2015),and saved the parameters of the models on different datasets.In addition,we used six standard test images for test.Specifically,the six standard test images were 512×512 grayscale images,including Lena,Barbara,Boat,Elaine,Lake,and Peppers.We used the models trained from four different datasets to test the pictures and recorded the mean squared error (MSE) of models with different parameters on the test set.As shown in Table 1,it is not difficult to see that the model using ImageNet as the training set is the best.
Table 1 Mean squared error (MSE) of the absolute prediction errors in the test sets under four datasets
It was mentioned by Zhang et al.(2020b) that the prediction error determines the embedding performance,so the prediction performance of the PEO method can directly reflect the quality of the predictor.We did not compare the predictor with other predictors separately,but evaluated the whole RDH method in Section 4.3.
We modified the complexity measurement of He and Cai (2021) to adapt it to our predictor,so we compared our work with only He and Cai (2021)in terms of the complexity part.We kept the predictor and embedded information unchanged,and evaluated the two methods by changing only the embedding capacity synchronously.The peak signalto-noise ratio (PSNR) was the evaluation standard.Fig.9 shows the complexity of our work and that of He and Cai (2021).Our method divided the image into 4×4 pixel blocks.Specifically,four pixels in each pixel block can be used to embed information,while the remaining 12 pixels can be used for complexity calculation.In addition,we used half of the neighboring pixel block to calculate the complexity,i.e.,24 pixels.Our PSNR was higher than that of He and Cai(2021)when the embedding capacity was small.
Fig.9 Comparison of complexity measurements between our work and He and Cai (2021) (PSNR:peak signal-to-noise ratio)
In this subsection,we conducted quantitative and qualitative experiments to evaluate the embedding performance of the proposed RDH method by comparing it with several state-of-the-art works,including CNN PEO (Hu and Xiang,2022),locationbased PVO(LPVO)(Zhang et al.,2020a),dual pairwise PEE(He and Cai,2021),pairwise IPVO(Dragoi et al.,2018),and pairwise PEE(Ou et al.,2013).For the proposed RDH method,the block size that embeds data wasN=4×4,meaning that each block was embedded using four pixels at a time.The complexity thresholdsT1andT2were set to 2 and 10,respectively,andTin HS was set to 5.For other stateof-the-art works,the implementation details can be referred to the corresponding papers.
By comparing the PSNR of data hiding images,the embedding performance of the proposed RDH method and the latest research results were evaluated.Under the same embedding capacity,the higher the PSNR,the better the RDH method’s performance.Table 2 shows the PSNR values of our RDH method and several state-of-the-art works when the embedded capacity is 10 000 or 20 000 bits.For image Lena,when the embedding capacity is 10 000 bits,the PSNR of the proposed RDH method is as high as 62.25 dB,and when the embedding capacity is 20 000 bits,the PSNR is as high as 58.58 dB.To further compare the performance verse embedding capacity,we conducted several experiments using different images.Fig.10 shows the changing trend of PSNR values in the six standard test images.Comparing the results in Fig.10 and Table 2,we can find that in all the cases,the PSNR value of the proposed RDH method exceeds that of the most advanced works.This shows that the proposed RDH method can obtain satisfactory results.
Table 2 Peak signal-to-noise ratio(PSNR) values of six classic images of the test set generated by the proposed reversible data hiding (RDH) method,CNN PEO,LPVO,dual pairwise PEE,pairwise IPVO,and pairwise PEE with embedding capacity of 10 000 or 20 000 bits
Fig.10 Comparison of performances between the proposed RDH method and CNN PEO,LPVO,dual pairwise PEE,pairwise IPVO,and pairwise PEE on Lena (a),Barbara (b),Boat (c),Elaine (d),Lake (e),and Peppers(f).CNN:convolutional neural network;PEO:prediction error ordering;LPVO:location-based pixel value ordering;PEE:prediction error expansion;IPVO:improved pixel value ordering;RDH:reversible data hiding;PSNR:peak signal-to-noise ratio.References to color refer to the online version of this figure
In this paper,we proposed a new RDH method,including a transformer predictor and an adaptive embedding strategy.In the predictor part,we first introduced a new image division method,proving that it could use more image information as the context for prediction.Besides,we confirmed that the new transformer predictor can better improve the prediction performance by comparing it with those typical predictors because of the ability to handle long-distance information.The transformer predictor successfully extends the range of pixels for prediction from neighboring pixels to global ones.
In the embedding part,we first proposed a complexity measurement with pixels in the target blocks to sort the pixel blocks.Thereafter,we developed an IPEO rule.Finally,we provided an embedding strategy including multiple embedding rules,and then more error pairs were generated for embedding data.Experimental results have shown that the embedding performance of the proposed RDH method was satisfactory and better than those of the current stateof-the-art works.For the test images,the average PSNR was 61.50 dB after hiding 10 000 bits,which exceeds those of the recently reported CNN PEO(Hu and Xiang,2022),LPVO(Zhang et al.,2020a),dual pairwise PEE (He and Cai,2021),pairwise IPVO(Dragoi et al.,2018),and pairwise PEE (Ou et al.,2013)methods by 0.73,1.22,1.39,1.92,and 3.19 dB,respectively.
The RDH method that we proposed is the first to use global pixels and an adaptive strategy with multiple embedding rules.In future works,we consider that there is room for improvement of the prediction performance using better deep learning methods as predictors and designing more effective strategies.
Contributors
Zhigao LU designed the research,processed the data,and drafted the paper.Weike YOU helped organize the paper.Xiaofei FANG drew the figures and checked the paper.Zhigao LU,Weike YOU,and Linna ZHOU revised and finalized the paper.
Compliance with ethics guidelines
Linna ZHOU,Zhigao LU,Weike YOU,and Xiaofei FANG declare that they have no conflict of interest.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Frontiers of Information Technology & Electronic Engineering2023年8期