Xiaorui Zhang,Xun Sun,Xingming Sun,Wei Sun and Sunil Kumar Jha
1Engineering Research Center of Digital Forensics,Ministry of Education,Jiangsu Engineering Center of Network Monitoring,School of Computer and Software,Nanjing University of Information Science&Technology,Nanjing,210044,China
2Wuxi Research Institute,Nanjing University of Information Science&Technology,Wuxi,214100,China
3Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology,Nanjing University of Information Science&Technology,Nanjing,210044,China
4IT Fundamentals and Education Technologies Applications,University of Information Technology and Management in Rzeszow,Rzeszow Voivodeship,100031,Poland
Abstract:The leakage of medical audio data in telemedicine seriously violates the privacy of patients.In order to avoid the leakage of patient information in telemedicine, a two-stage reversible robust audio watermarking algorithm is proposed to protect medical audio data.The scheme decomposes the medical audio into two independent embedding domains, embeds the robust watermark and the reversible watermark into the two domains respectively.In order to ensure the audio quality, the Hurst exponent is used to find a suitable position for watermark embedding.Due to the independence of the two embedding domains, the embedding of the second-stage reversible watermark will not affect the first-stage watermark,so the robustness of the first-stage watermark can be well maintained.In the second stage,the correlation between the sampling points in the medical audio is used to modify the hidden bits of the histogram to reduce the modification of the medical audio and reduce the distortion caused by reversible embedding.Simulation experiments show that this scheme has strong robustness against signal processing operations such as MP3 compression of 48 db,additive white Gaussian noise(AWGN) of 20 db, low-pass filtering,resampling,re-quantization and other attacks,and has good imperceptibility.
Keywords: Telemedicine; privacy protection; audio watermarking; robust reversible watermarking; two-stage embedding
With the rapid development of Internet communication technology, telemedicine has received more and more attention in the medical field, and the protection of medical data has become the primary problem to be solved in telemedicine.Traditional robust watermarking [1–5] and fragile/semi-fragile watermarking [6,7] will cause permanent distortion of the medical cover audio when extracting the watermark, which makes the doctor unable to diagnose effectively.As a kind of digital watermark, reversible watermark can effectively protect the integrity, authenticity and privacy of medical data; it uses the redundancy of cover audio to embed watermark such as patient privacy, hospital information into the cover audio.When receiving medical data, the receiving end can recover the data content non-destructively while extracting the watermark information,thereby ensuring the integrity and authenticity of the data, which helps doctors more accurately judge the patient’s condition.Reversible image watermarking can be divided into the following categories:based on compression technology [8], based on differential expansion (DE) [9–11],based on histogram shift (HS) [12,13], based on prediction error (PEE) [14–16], and based on integer transformation [17,18].
At present, audio, as an important part of medical data, is widely used in telemedicine.A study by MIT [19] showed that new artificial intelligence (AI) can detect asymptomatic patients with new coronaviruses, only through medical audio such as cough recorded on the phone can be diagnosed.Reversible audio watermarking technology, as a protection scheme for this medical audio, can protect the privacy of patients and the integrity of data without affecting the quality of medical data.According to the different embedded domains of watermarks, reversible audio watermarks can be divided into time domain watermarks [20,21], transform domain watermarks [22]and compressed domain watermarks [23].
However, the reversible audio watermark is generally fragile due to lacking enough robustness, especially, when it is interfered and attacked by noise or signal processing operations, the watermark cannot be accurately extracted.In fact, the telemedicine data is inevitably subject to malicious or unintentional attacks during data transmission.Therefore, robust reversible watermarks have more application scenarios in the field of telemedicine and privacy protection.When the watermarked audio file is not attacked during the telemedicine transmission process, the watermark can be extracted accurately and the original medical cover audio can be restored non-destructively, improving the accuracy of doctor’s diagnosis.
There are two mainstream robust reversible watermarking frameworks, based on two-stage embedding (TSW) [24–27] and based on histogram modification [28,29].Coltuc et al.[24] proposed an image authentication framework based on a two-stage undistorted watermark.In the first stage, a robust watermark is embedded in the DCT coefficients of the image to obtain an intermediate image, and then the difference between the original image and the intermediate images is reversibly embedded in the intermediate image.This method is robust to JPEG and has a high embedding capacity.But the reversible watermark embedding in the second stage weakens the robustness of the watermark in the first stage.Wang et al.[27] proposed an independent two-stage watermark embedding, where haar wavelet transform is used to decompose the original image into two independent embedding domains.By the method, the watermark is embedded into the low-frequency embedding domain as well as the difference between the original image and the intermediate image are reversibly embedded in the high-frequency embedding domain to restore the original cover.Due to the independence of the two stages, the reversible embedding in the second stage will not affect the robust embedding in the first stage, thereby improving the robustness of the watermark.
At present, Research on robust reversible watermarking on images accounts for the majority.Because images are different from audios in the structure, many image-based robust reversible watermarking scheme cannot be effectively applied to audios.In a robust reversible watermarking framework based on histogram displacement, Xiang et al.[30] proposed a reversible robust audio watermarking scheme based on high-order difference statistics.The original audio was divided into several non-overlapping sub-audios, and the high-order difference statistics model was used to construct a histogram.The histogram is regarded as a robust feature, and the watermark is embedded in the audio file by shifting the histogram.This scheme has strong robustness to MP3 compression and AWGN.
In this paper, we propose a robust reversible audio watermarking for telemedicine and privacy protection, which first divides the medical cover audio into two independent embedding domains by the frequency domain transform functionF.Robust watermark is embedded into the lowfrequency embedding domainAlto obtain the watermarked low-frequency embedding domainAwl, and then the difference betweenAlandAwlis reversibly embedded in the high-frequency embedding domainAhto obtain the watermarked high-frequency embedding domainAwh.Finally,the inverse transform of the frequency domain transform functionF-1is used to synthesizeAwlandAwhinto watermarked audio.In the watermark extraction process, if the watermarked audio is attacked by signal processing operations, the watermark can be extracted correctly to protect medical audio.The watermarked audio obtained by this scheme has good audio quality and strong robustness to noise and signal processing operations.
This section introduces the preparatory work of robust reversible medical audio watermarking in four parts.First, the method of how to decompose medical cover audio will be introduced.Second, we will introduce how to use the Hurst exponent to determine the appropriate watermark embedding position, improving the imperceptibility of the watermark.Then, the method of constructing histogram to hide the watermark will be introduced.Finally, we will introduce method to prevent overflow after embedding watermarks.
First, the original medical cover audioAis divided into several sampling points, denoted as{a1,a2...}.The medical cover audio is decomposed into low-frequency embedding domainAland high-frequency embedding domainAh, the transformation functionFprocesses the sampling point pair(a1,a2), as shown as follows.
Among them,alrepresents the low-frequency embedding domain signal, andahrepresents the high-frequency embedding domain signal.This decomposition transform decomposes the original medical audio dataAinto two half-length sub-audio embedding domains:the low-frequency embedding domainAland the high-frequency embedding domainAh.The medical cover audio can be reconstructed by the inverse transformationF-1by using Eq.(2).The decomposition effect is shown in Fig.1.
Figure 1:(a) Original medical audio; (b) Decomposed low-frequency embedding domain; (c)Decomposed high-frequency embedding domain
In telemedicine, medical audio is a continuous analog signal.Therefore, in the medical audio,the values of adjacent sampling points are generally correlated.Hurst exponentHis an important indicator for judging whether time series data obey random distribution or biased random distribution.The Hurst exponent can be calculated through the range analysis (R/S).The Hurst exponent is proposed by the British hydrologist named H.E.Hurst in 1951.In this paper, the R/S method is used to calculate the Hurst exponent to determine the appropriate watermark embedding position.The specific method is as follows.
The low-frequency embedding domain is decomposed intoNnon-overlapping sub-audios containing 2nsampling points.For each sub-audios, the statistictis calculated according to R/S.Thus, the data pair (logri,logti) i=1,2,...mis obtained.Take logrias the independent variable and logtias the dependent variable to do linear regression, the obtained slope is the Hurst exponent.
When the value of Hurst exponentHfalls into the range of (0, 0.5), it implies that the time series has long-term correlation, and the general trend in the future is opposite to the past trend,which is called anti-sustainability:the first period goes up, the next period goes down, and vice versa.Noting that the closerHis to zero, the stronger the negative correlation.WhenHis equal to 0.5, means that the time series does not have correlation and presents a complete random walk.WhenHis in the range of (0.5, 1), the general trend of the time series in the future is the same as that in the past, which is called positive continuity.The closerHis to 1, the stronger the positive correlation.WhenHis 1, the general trend in the future can be predicted by the present, and the time series is a straight line.
Fig.2 shows the low-frequency embedding domainAlof medical cover audio is divided into several non-overlap sub-audios of the same length.Different sub-audios show different trends.The Hurst exponents of these sub-audios are calculated and displayed in Fig.2, where the two thresholdsH1,H2 ∈ (0,1) of the Hurst exponent are used to determine the embedding position of the watermark towards achieving minimum embedding distortion.
Figure 2:Hurst exponent of different sub-audios in the low frequency embedding domain
In the second stage of the scheme, the difference between the sampling points is calculated,and the sum of the difference is called the difference statistic.By constructing difference statistics,the correlation between adjacent sampling points can be better utilized.In this paper, the difference statistics model is used to calculate the difference statistics, and then the histogram of the difference statistics is constructed to realize the reversible embedding in the second stage.
For the decomposed high-frequency embedding domainAh, divideAhinto several nonoverlapping sub-audios of equal length.The sampling points in each sub-audio are divided intoMgroups, and everyone has two sampling points.Letaph(k,q) represent theq-th sampling point of thek-th sampling point group in thep-th sub-audio,q∈{1,2}; Letdph(k) represent the difference of thek-th sampling point group of thep-th sub-audio, 1 ≤k≤M, as show as follows.
The sum of the difference of all sample points in thep-th sub-audio is called the difference statistic of the sub-audio, denoted asS(p).
By changing the difference of thek-th sampling point group of thep-th sub-audio, the difference statistic of thep-th sub-audio is changed, so that the watermark is reversibly embedded in it.As shown in Eq.(15).
In Eq.(5),Brepresents the shifting quantity ofS(p),represents the largest integer without exceedingx, anddph(k)′represents the changed value ofdph(k).Because the shifted process is reversible,dph(k) can be restored.By using Eqs.(4) and (5), the shifted difference statisticS(p)′of thep-th sub-audio can be obtained.
By shifting the difference statisticsdph(k), the histogram of the difference statistics can be shifted, thereby reversibly embedding the watermark sequence into the sub-audio of the highfrequency embedding domainAh; Letα(k) be the shifting quantity ofdph(k).
Thendph(k) is shifted by performing integer transformation onaph(k,q).The sampling pointaph(k,q) is modified by Eq.(8).
whereaph(k,q)′represents the shifted sampling point.Because the integer transformation is reversible, the original sampling points can be restored after the watermark is extracted.
For medical audio data with a sampling depth of 16 bits, the range of the sampling point is [-215,215], and the transformedaph(k,q)′may not be in this range, therefore, in the audio the pre-processing is needed to prevent overflow before embedding the watermark.From Eq.(8), we can get.
Define
First traverse the sample points containing the watermark, mark the sample values that are not in the range of (-215+σ,215-σ), and then adjust these marked sample points to (-215+σ,215-σ) and record the marked sequence.In actual experiments, due to the large range of sampling point values and smallσ, there are few overflow sampling points, which has little impact on audio quality.
This section will introduce the embedding process and extraction process of the watermark in detail, including the embedding and extraction of the robust watermark and the embedding and extraction of the reversible watermark.The robust watermark and the reversible watermark are respectively embedded in two different embedding domains of the audio.The medical audio watermark framework is shown in Fig.3.
Figure 3:Medical audio watermark frame diagram
The low-frequency embedding domainAlis divided into several non-overlapping sub-audios with equal length, and the robust watermark is embedded in the sub-audios of Hurst exponentH∈(H1,H2).Every sub-audio contains 2nsampling points, denoted asal(1),al(2),...,al(2n).Define a random mapping relationship {1,2,...,2n}→{φ(1),φ(2),...φ(2n)}, whereφ() represents the position sequence of the sampling points after mapping; Defineto represent the value of theφ(i)th sampling point in the sub-audio of the low-frequency embedded domain,1 ≤i≤2n; every sub-audio is embedded with a 1-bit watermarkw∈{0,1}, as shown as follows.
whererepresents theφ(i)th sampling point value in the sub-audio of the low-frequency embedding domain after the watermark is embedded, andμis the watermark embedding strength.
In order to ensure the minimum embedded distortion, the minimum masking thresholdtis determined by the psychoacoustic model [31].dlrepresents the difference between the sample point sets of the two parts before and after the original sub-audio, expressed as.
After the watermark is embedded, the sampling point differencedwlof the two parts before and after the sub-audio is modified as Eq.(14).
The change of the sampling point may cause overflow.For this reason, it is necessary to mark the sequence of the overflowed sampling points and restore the original sampling point value.In fact, there are few overflow sampling points, which will not affect the extraction of watermark.
In the reversible embedding stage, the watermarked errorEgenerated by the robust watermark is reversibly embedded into the high-frequency embedding domain as compensation information.Because of the characteristics of reversible watermarking, the high-frequency embedding domain can be restored without loss when extracting compensation information.The watermarked errorEconsists of four parts:watermark embedding strengthμ, Hurst exponent thresholdH∈{H1,H2}, overflow sampling point mark sequenceL, and minimum masking threshold.LetT∈{t(1),t(2)...t(N)}, whereNrepresents the number of sub-audios embedded in the watermark.
The reversible watermark is embedded in the high-frequency embedding domainAhof the medical audio dataA, andAhis divided into several non-overlapping sub-audios with equal length.The sampling points in each sub-audio are divided intoMgroups, everyone has two sampling points.Then, calculate the difference statistics of each sub-audio according to Eqs.(3)and (4).Finally, select the secret keyT >|S|max, and the movement amountBcan be calculated as.
The difference statisticS={S(p)|1 ≤p≤N}, whereNis the number of sub-audios in the high-frequency embedding domain.Then let
wherek∈[1,M].According to Eqs.(6) and (8), the sampling pointaph(k,q) in thep-th sub-audio is be modified, and the reversible watermark is embedded into high-frequency embedding domain.Finally, the watermarked high-frequency embedding domainAwhis obtained.
Robust reversible audio watermarking can correctly extract and restore medical audio data non-destructively if the watermarked medical audio is not attacked during the telemedicine transmission process.The watermark can still be extracted if the watermarked medical audio is attacked.
In the process of watermark extraction, the frequency domain transform functionFis used to decompose the medical audio into two independent embedding domains.If the medical audio is not attacked, the high-frequency embedding domainAwhof watermark is divided into several nonoverlapping domains.For sub-audios with equal length, the sampling points in each sub-audio are divided intoMgroups, with two sampling points in each group.Next, according to Eqs.(3)and (4), the difference statisticS(p)′will be calculated, and thep-th sub-audio of the watermark is extracted by the keyTby Eq.(17).
In the process of extracting the robust watermark, the low-frequency embedding domainAwlof watermark is divided into several non-overlapping sub-audios with equal length.Each subaudio contains 2nsampling points.The Hurst exponentH∈(H1,H2) is selected.For sub-audios,calculate the differencedwlof the sample point set of the two parts before and after the sub-audio,and extract the watermark through the following as.
The original low-frequency embedding domain is restored to:
If the watermarked medical audio data is attacked, the original medical audio data cannot be recovered, but according to Eq.(18), robust watermark can still be extracted.
This section will evaluate the performance of the proposed.The audio data used is taken from LJ Speech Dataset.The selected speech signals are evenly distributed across all age groups and genders.Using 16 bits of audio data as test data to carry out the following experiments.Use Signal-to-Noise Ratio (SNR) to evaluate the imperceptibility of watermarked medical audio.The larger is the SNR, the better is the imperceptibility.The SNR evaluation method is as follows.
whereLrepresents the length of the medical audio dataA,A(t) represents the value of thet-th sampling point inA, andAw(t) represents the value of thet-th sampling point in the watermarked medical audioAw.The accuracy of the watermark and the robustness of the algorithm are evaluated by calculating the bit error rate (BER) of the extracted watermark.The lower is the BER, the higher is the accuracy of the watermark and the stronger is the robustness of the algorithm.The BER is calculated as follows.
In Eq.(21),Werepresents the extracted wrong watermark, andWcrepresents the original watermark.The experimental results are shown in Fig.4.Fig.4a represents the original audio,and Fig.4b represents the audio after the watermark is embedded, and the SNR is 38.50 db.When the audio is not attacked, the watermark is extracted and the original medical audio is restored, as shown in Fig.4c, the SNR is +∞.The BER of the extracted watermark is 0, which proves that the watermark can be extracted accurately without being attacked.
Figure 4:(a) Original audio data LJ00-0001.Wav; (b) Watermarked audio (SNR=38.50db); (c)Recovered audio (SNR=+∞)
In order to test the imperceptibility of the proposed, we extracted 5 audio data (LJ001-0001, LJ001-0002, LJ001-0003, LJ001-0004, LJ001-0005) from LJ Speech Dataset for experiments.Intercept 204,800 sampling points, and embed the watermark into the intercepted audio data.The distortion produced by watermark embedding affects the imperceptibility of audio, which is related to embedding strengthμ, embedding capacityC, and Hurst exponent thresholdH.This section mainly studies the influence ofμ,C, andHon audio imperceptibility.
Fig.5 shows the relationship between different watermark embedding strengthμand SNR.In this experiment, the thresholdsH1,H2 are set to 0.3 and 0.4, respectively, and the watermark embedding capacityCis set to 100 bits.It can be seen from Fig.5 that the imperceptibility of watermarked medical audio data decreases with the increase of the watermark embedding strengthμ.This is because according to Eq.(12), the increase of the watermark embedding strength causes the distortion, and thus the SNR value reduces.
Figure 5:Relationship between μ and SNR value
Fig.6 shows the relationship between watermark embedding capacityCand SNR.In this experiment, the thresholdsH1,H2 are set to 0.3 and 0.4 respectively, and the watermark embedding strengthμis set to 1.It can be seen from Fig.6 that the imperceptibility of watermarked audio decreases as the watermark embedding capacityCincreases.The reason is that the larger the embedding capacity, the more sampling points that need to be modified, and the greater the distortion.resulting in reduced imperceptibility.
In order to verify the impact of different Hurst exponent thresholds on audio quality, we fixed the watermark embedding strength and the watermark embedding capacity, i.e., letμ= 1,C= 100, and choose different thresholds ofHto test the extracted audio data, as can be seen from Tab.1, When the thresholdH∈ (0.3,0.4), the signal-to-noise ratio is the highest and the imperceptibility is the best.
Experiments on watermarked medical cover audios of LJ001-0001, LJ001-0002, LJ001-003,LJ001-0004, LJ001-0005 under MP3 compression, resampling, re-quantization, and (AWGN) are carried out to verify the robustness of the watermark.We use the software MATLAB to process the watermarked audio with AWGN, resampling (44.1-22.05-44.1 kHz) and re-quantization (16-8-16 bits).It can be seen from Fig.7 that all watermarked medical audio has a certain degree of robustness to compression.When the MP3 compression ratio is 48 kbps, watermark information can still be obtained, indicating that the scheme can effectively resist MP3 compression.
Figure 6:Relationship between C and SNR value
Table 1:Relationship between threshold H and SNR value
Figure 7:Relationship between MP3 and BER
It can be seen from Fig.8 that all watermarked audio has strong robustness to AWGN.When the noise where SNR is lower than 20 db, the watermark can be accurately extracted.When the SNR of the noise is lower than 40 db, the watermark can still be extracted at 40 db, which means that the proposed scheme can effectively resist AWGN.
Figure 8:Relationship between AWGN and BER
In order to further verify the performance of this scheme, Tabs.2 and 3 respectively show the influence of resampling and re-quantization on watermark extraction under different embedding capacities.From Tab.2, it can be seen that under different embedding capacities, re-sampling has little effect on watermark extraction, and the BER is not higher than 0.0002, indicating that the scheme can effectively resist re-sampling attacks.Tab.3 also shows that under different embedding capacities, re-quantification has little effect on watermark extraction, which indicates that this scheme can effectively resist quantification attacks.
Table 2:Relationship between C and BER with resampling
Table 3:Relationship between C and BER with re-quantization
In this paper, we propose a new robust reversible medical audio watermarking scheme.The robust watermark and reversible watermark are embedded into two independent embedding domains respectively, and the reversible watermark embedding does not affect the robust watermark, which improves the robustness of the watermark.In addition, in the stage of reversible watermarking, the correlation between sampling points in medical audio is used to modify the hidden bits of the histogram to reduce the modification of medical audio and reduce the distortion of the medical audio caused by the reversible watermark.When the medical audio is not attacked, the watermark information can be correctly extracted and the medical audio data can be restored without distortion, ensuring the integrity and authenticity of the medical audio.When the medical audio is attacked, the watermark information can still be extracted and the medical audio can be protected by copyright.Simulation experiments show that this scheme has good imperceptibility, and has strong robustness to MP3 compression, AWGN, low-pass filtering,resampling, re-quantization.
Acknowledgement:We thanks NUIST to give us the opportunity for this research work.
Funding Statement:This work was supported, in part, by the Natural Science Foundation of Jiangsu Province under Grant Numbers BK20201136, BK20191401; in part, by the National Nature Science Foundation of China under Grant Numbers 61502240, 61502096, 61304205,61773219; in part, by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD) fund.
Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.
Computers Materials&Continua2022年5期