Uploaded by jhon ranble

2022 [TCSVT] Robust HDR video watermarking method based on the HVS model and T-QR

advertisement
Multimedia Tools and Applications (2022) 81:33375–33395
https://doi.org/10.1007/s11042-022-13145-y
Robust HDR video watermarking method based
on the HVS model and T-QR
Meng Du 1,2 & Ting Luo 1,2
2
1
3
& Haiyong Xu & Yang Song & Chunpeng Wang & Li Li
4
Received: 16 February 2021 / Revised: 16 June 2021 / Accepted: 10 April 2022 /
Published online: 18 April 2022
# The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
Abstract
In order to protect the copyright of the high dynamic range (HDR) video, a robust HDR video
watermarking method based on the human visual system (HVS) model and Tensor-QR
decomposition (T-QR) is proposed. In order to obtain the main information of the HDR video,
the key frames are extracted by detecting the scene change, and each key frame is considered
as the third-order tensor for preserving its main characteristics. Each key frame is divided into
non-overlapping blocks, and each block is decomposed by using T-QR to obtain the orthogonal tensor, which includes three matrices and represents main energies of the frame. Since the
second matrix has more correlations of the frame than other two matrices, it is used to embed
watermark for robustness. Moreover, to obtain the trade-off between watermarking robustness
and the visual quality, the HVS model of each key frame is computed by using the luminance
perception, image contrast and contrast masking for determining the watermark embedding
strength. Experiment results show that the proposed method can resist a variety of tone
mapping and video attacks, and is more robust than existing watermarking methods.
Keywords HDR video . The HVS model . T-QR . Robust watermarking
1 Introduction
Different from the traditional low dynamic range (LDR) video, the high dynamic range (HDR)
video can describe the luminance and color of the real-world. However, the HDR video cannot
* Ting Luo
luoting@nbu.edu.cn
1
College of Science and Technology, Ningbo University, Ningbo 315212, China
2
Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
3
School of Information, Qilu University of Technology (School Academy of Sciences), Jinan 250353,
China
4
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018,
China
33376
Multimedia Tools and Applications (2022) 81:33375–33395
be directly displayed by using the currently available display and print devices, because of the
wide range of luminance [10]. The HDR video is usually converted to the LDR video through
tone mapping operators (TMO), and then displayed on the currently available display and print
devices without losing too much detail information [9, 21, 22]. Therefore, how to protect the
copyright of the HDR video after TMO has become an urgent problem to be solved.
Digital steganography is an information security mechanism, which is used to hide secret
data during a communication session by embedding the secret message into the multimedia
carrier, and only the sender and recipient know the existence of the secret data [1]. In order to
prevent secret information from being discovered by an adversary, digital steganography tries
to obtain high invisibility and high embedding capacity. However, digital steganography is
lack of robustness, and cannot protect the copyright of the HDR video. The watermarking
technology can provide an efficient solution to the copyright protection [24, 32, 39].
Watermarking can be classified into spatial domain and transform domain based
watermarking methods. The watermarking method based on spatial domain directly modifies
pixels to embed watermark, such as least significant bit (LSB) replacement [41]. Since this
type of the watermarking method is sensitive to any modification, it is widely used in the
image content authentication. However, pixels of the HDR video are floating point values, and
it is difficult to modify the pixels directly, because different HDR videos have different
luminance ranges. Recently, some HDR watermarking methods convert the HDR image to
the other format, such as RGBE format [8, 37], LogLuv format [25], OpenEXR format [26],
and so on, which may lose details of the videos. Cheng et al converted the floating point format
of the HDR image to RGBE format, and watermark was embedded by using LSB without the
capability of resisting image attacks.
Compared with the spatial domain based watermarking method, the transform domain
based watermarking method is more robust and can resist a variety of malicious and nonmalicious attacks [14, 18, 27]. Many transforms were used for watermarking, such as Discrete
Cosine Transform (DCT) [20], Discrete Wavelet Transform (DWT) [6], Singular Value
Decomposition (SVD) [11], QR Decomposition [7] and so on. Nouioua et al used shot
boundary detection to select the fast motion frame, and then utilized SVD and
Multiresolution-SVD (MR-SVD) to embed watermark [33]. Sang et al randomly selected
the video frames from the original video, and watermark was embedded into the approximation sub-band of the frame transformed by using DCT and DWT [34]. However, the randomly
selected frames cannot represent the main information of the video, and they may be destroyed
after encoding. Thus, this kind of video watermarking is not able to resist the encoding attack.
In order to obtain the main information of the video, Ponni et al utilized Fibonacci to select the
key frames, and then embedded watermark into the key frames by using SVD and DWT [2].
Since dynamic luminance range of the HDR video are different from the LDR video, the key
frames extracting technique of the LDR video cannot be directly applied to the HDR video,
otherwise the key frames of the HDR video may be obtained ineffectively.
Besides resisting the video attacks, the HDR video watermarking method should resist
special HDR image processing, such as TMO. Bakhsh et al utilized artificial bee colony to
select the best block, and watermark was embedded into the approximation sub-band of DWT
in each selected block [5]. Wu et al transformed the HDR images into the LDR images by
using the special TMO for embedding watermark into its DCT domain, but that method was
only robust to the pre-determined TMO [36]. Maiorana et al applied the logarithm, DWT and
Radon-DCT to the luminance component of the HDR images, and then employed the
quantization index modulation (QIM) to embed watermark. However, the average bit error
Multimedia Tools and Applications (2022) 81:33375–33395
33377
rate (BER) reached 20%, and corresponding watermarking robustness was not high [28]. Yu
et al used the modified specular free to compute the luminance mask of the HDR image, which
was used to guide watermark embedding into the low-luminance areas of the HDR image, but
the method did not consider human visual perception characteristics of HDR image [38].
Above watermarking methods are designed for the HDR images, and the HDR video
watermarking method is still in its infancy. Unlike the HDR image watermarking method,
the HDR video watermarking method not only resists image attacks and TMOs, but also resists
video attacks. In order to design a robust watermarking method for the HDR video, the
multidimensional transformation is required to maintain strong correlations of three channels.
Since Tensor-QR Decomposition (T-QR) [13] can combine three channels as the third-order
tensor, main strong characteristics can be preserved for robustness.
Watermarking robustness can be also enhanced when the embedding strength is increased,
but it will lead to the visual distortion of the watermarked image. Thus, watermarking
imperceptibility and robustness are contradictory, and in order to obtain the trade-off between
watermarking imperceptibility and robustness, watermarking methods based on human visual
perception were studied [4, 12, 15, 23, 35, 40]. Hu et al used the contrast sensitivity function
(CSF) to obtain the perceptual weight of the embedding strength for guiding robust watermark
embedding with low image distortion [15]. Lai et al utilized visual entropy and edge entropy to
select the optimal embedding regions, which can decrease visual distortion [23]. Zhang et al
combined visual saliency and contourlet transform to determine the quantization step-size,
which was modified to embed watermark for enhancing the watermarking imperceptibility and
robustness [40]. However, above visual perception models were designed for the LDR content,
and they cannot perceive human visual system (HVS) characteristics for the wide range of
luminance in the HDR content. Thus, special visual perception model should be designed for
the HDR content to embed watermark.
In order to balance the watermarking imperceptibility and robustness, Guerrini et al
utilized luminance perception mask, activity perception mask and edge perception to
compute the perceptual mask of the HDR image for DWT domain, but the method
was not robust to many attacks since the average BER was only 29% [12]. In order to
optimize the robustness and imperceptibility, Bai et al designed the hierarchical
embedding intensity and hybrid perceptual mask to embed watermark for optimizing
the watermarking imperceptibility and robustness [4]. Vassilios et al utilized the
wavelet transform of the just noticeable difference (JND)-scaled space of the HDR
image as the embedding domain, and the CSF was employed to modulate the
watermark embedding strength. However, the embedding capacity was low [35].
Moreover, above HDR watermarking methods only consider the visual characteristics
of the HDR image, and the temporal correlation of the HDR video is ignored.
In this paper, a robust HDR video watermarking method by using the HVS model
and T-QR is proposed. The key frames are extracted by using the scene change
detection. Each key frame is divided into non-overlapping blocks, and T-QR is
applied to each block for obtaining the orthogonal tensors, which includes the first,
second and third matrices. Compared with the other two matrices, the second matrix
is more robust, and thus, the second matrix is chosen to embed watermark. Moreover,
in order to obtain the trade-off between the watermarking imperceptibility and robustness, the HVS model of the HDR video is computed by using the luminance
perception, image contrast and contrast masking, which can be used to determine
the embedding strength. The main contributions of the paper are listed as follows.
33378
Multimedia Tools and Applications (2022) 81:33375–33395
1. In order to obtain the main information of the HDR video for embedding watermark, the
key frames are extracted by using the scene change detection.
2. Different from traditional transformation operated on one signal channel, T-QR
considers each key frame of the HDR video as a whole to be transformed so that
strong correlations of each key frame can be preserved for robust watermark
embedding.
3. The HVS model of the HDR video is computed by using the luminance perception, image
contrast and contrast masking to balance the imperceptibility and robustness of the HDR
video watermarking.
This paper is organized as follows. Section 2 introduces the background knowledge of related
technology. In Section 3, we describe the processes of watermarking embedding and extraction. Section 4 discusses and analyses the experimental results. Finally Section 5 makes a
summary.
2 Background
In this section, the related background technologies are depicted. The notation will be
introduced in order to be easily described later. Variables are shown in italics, such as a,
matrices are shown in bold letters, such as A, and higher-order tensors are shown in
calligraphic letters, such as A.
2.1 Tensor
With the development of the Internet, the multimedia data is developing in a multidimensional direction. Tensor is a form of multi-dimensional data, which can stores a lot of
information. P-order tensor can be written as:
ð1Þ
A ¼ ai1 i2 …ip ∈ℝ l1 l2 …lp ;
where l1, l2…lp ∈ ℤ indicates the number of elements in each dimension. Therefore the vector
can be considered as the first-order tensor, and the matrix can be considered as the secondorder tensor.
Higher-order tensors can be represented by a set of matrices. For example, the third-order
tensor can be divided into horizontal slice, lateral slice and frontal slice [31], which are
represented as A::k , A:k: and Ak:; , respectively, where k ∈ {1, 2, 3}.
2.2 QR to T-QR
QR is an important decomposition in linear algebra, which is suitable for any matrix. It can
decompose a matrix into an orthogonal matrix and upper triangular matrix.
Suppose A ∈ ℝm × n is an image matrix, and after QR decomposition, A can be
decomposed as:
A ¼ QR;
where Q ∈ ℝm
× n
is orthogonal matrix and R ∈ ℝn
ð2Þ
× n
is upper triangular matrix.
Multimedia Tools and Applications (2022) 81:33375–33395
33379
For higher order QR decomposition, T-QR can be used. Let B∈ℝl1 l2 l3 be the third-order
tensor, and B can be defined as:
B P ¼ Q R;
ð3Þ
where Q∈ℝ l1 l1 l3 is orthogonal tensor, R∈ℝ l1 l2 l3 is upper triangular tensor, P∈ℝ l2 l2 l3 is
a permutation tensor whose values being 0 or 1, and the permutation tensor P satisfies the
formula P T P ¼ P P T ¼ I, wherein I is the unit tensor. Each frame of the HDR video
can be considered as the third-order tensor with the size of n × m × 3, where l1 = n, l2 = m
and l3 = 3. Therefore, after operating T-QR, the orthogonal tensor and the upper triangular
tensor are obtained according to Eq. (3). The orthogonal tensor Q::k consists of three matrices,
and they are named as the first, the second and the third matrices when k is equal to 1, 2 and 3,
respectively. Let Q1, Q2 and Q3 be the first, second and third matrices, respectively. In related
to the watermarking robustness, Q2 is chosen to embed watermark, and the main discussion
will be introduced in Section 4.1.
3 Proposed HDR video watermarking method
In order to protect the copyright of the HDR video, a robust HDR video watermarking method
based on the HVS model and T-QR is proposed. In this section, firstly, key frames of the HDR
video are extracted by using the scene change detection. Next, the HVS model is computed
based on the luminance perception, image contrast and contrast masking. Then, processes of
watermark embedding based on the HVS model and T-QR are illustrated. Finally, processes of
watermark extraction are introduced.
3.1 Key frame extraction
In order to obtain the main information of the HDR video, key frames of the HDR video are
extracted by using the scene change detection. The scene change detection is used to detect the
motion scene in the HDR video, and the motion scene can be identified by using the histogram
difference. The histogram difference between the HDR video frames is compared with a
predefined threshold to determine the key frame.
2
l
hð f e Þ−h f eþ1
;
H¼ ∑
ð4Þ
e¼1 max hð f e Þ; h f eþ1
where l is the number of the HDR video frame, h(∙) represents the histogram of the HDR video
frame, fe represents the HDR video frame, and max(∙) returns the maximum value. If H > θ,
the corresponding frame is the key frame, where θ is a threshold.
3.2 HVS model
The proposed HVS model is calculated by combining the luminance perception, image
contrast and contrast masking, which can be used to determine the embedding strength. The
computation of the luminance perception, image contrast and contrast masking is described as
follows.
33380
Multimedia Tools and Applications (2022) 81:33375–33395
3.2.1 Luminance perception
In order to obtain the luminance perception, a psychophysical study is conducted,
namely, eye sensitivity model (ESM) [19]. ESM let the observer adapt to a background illumination for a sufficient amount of time and then the illumination is
increased to a level so that the change was just noticeable by the observer. This
experiment shows that the luminance range of the HDR content covers the entire
range that HVS can see, and the response of human eye in this range is neither
always linear nor logarithmic. According to ESM, a non-linear model can be obtained,
which shows luminance perception under different luminance range, and the input
background luminance is converted to the luminance in JND units.
8
−3:18
log10 ðLÞ < −3:94
>
>
>
2:18
>
−3:94≤log10 ðLÞ < −1:44
< ð0:405log10 ðLÞ þ 1:6Þ
ð5Þ
−1:44≤log10 ðLÞ < −0:0184 ;
log10 ðLÞ−1:345
log10 ðLa Þ
>
2:7
>
ð
ð
L
Þ
þ
0:65
Þ
−0:0184≤log
ð
L
Þ
<
1:9
0:249log
>
10
10
>
:
log10 ðLÞ−2:205
log10 ðLÞ ≥1:9
where L is the background luminance.
3.2.2 Image contrast
Image contrast can be defined in different ways, but it is usually related to variations in image
luminance. Global contrast measures image sharpness, and is a fine-scale image feature. A
framework for perceptual contrast processing of the HDR images is proposed, namely, contrast
perceptual [29]. The luminance value of the image is converted into the physical contrast
value, and then is switched into response values of the HVS. The contrast perceptual is
computed as follows.
Step.1 The HDR image is transformed from luminance domain to physical contrast
domain. The logarithmic ratio G is used as a measure of contrast between two pixels.
Low-pass contrast is defined as a difference between a pixel and its neighbors at a
particular level t of the Gaussian pyramid.
Gti; j ¼ log10 Lti =Ltj ;
ð6Þ
where Lti and Ltj are the luminance values of neighbor pixels i and j, respectively.
Step.2 Physical contrast domain is converted into a response of the HVS.
T ¼ 54:09288⋅G0:41850 :
ð7Þ
Multimedia Tools and Applications (2022) 81:33375–33395
33381
3.2.3 Contrast masking
The contrast sensitivity function (CSF) describes the relationship between the sensitivity of the
eye as a function of spatial frequency and adaptation luminance, which can be computed as:
MTFðρÞ
−p3 ;
CSFðρÞ¼P4 sA ðlÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2
ð1 þ ðp1 ρÞp2 Þ⋅ 1−e−ðρ=7Þ
ð8Þ
where ρ is the spatial frequency in cycles per degree, p1, p2, p3 and p4 are the fitted parameters,
and sA is the joint photoreceptor luminance sensitivity [30].
The signal dependent noise, NnCSF, can be found from the CSF, and is computed as:
N nCSF ¼
1
MTFðρ; Ea ÞsA ðE a Þ
¼
;
nCSF ½ f ; o
CSFðρ; E a Þ
ð9Þ
where Ea is the adapting luminance, and MTF is the modulation transfer function.
BM[f, o] is the activity in the spatial frequency band f and orientation o, which can be
computed as:
BM ½ f ; o ¼ jBR ½ f ; oj⋅nCSF½ f ; o;
ð10Þ
where the f-th spatial frequency band and o-th orientation of the steerable pyramid are defined
asBR[f, o].
Contrast masking denotes the visibility reduction of one visual feature at the presence of
another one. Three components are used to model the contrast masking, wherein the first
component is responsible for self-masking, the second component is for masking across
orientations and the third component is for masking of adjacent frequency bands.
N mask ½ f ; o ¼
q
k self n f BM ½ f ; o þ
nf
!q
q
k self k xo
n f ∑ BM ½ f ; i þ
n f BM ½ f ; o þ ;
nf
n
f
i¼Onf0g
q
k xn n f þ1 BM ½ f þ 1; o þ n f −1 BM ½ f −1; o
nf
ð11Þ
where kself, kxo and kxn are the weights, and nf = 2−(f − 1).
The HVS model J of the HDR video is computed by summing of the luminance perception,
image contrast and contrast masking.
J ¼ La þ TþNmask :
ð12Þ
3.3 Watermark embedding
In this section, the process of watermark embedding is presented as illustrated in Fig. 1.
Step.1 Key frames are extracted by using the scene change detection, and each key frame
can be regarded as the third-order tensor A, with the size of M × N × 3. A is divided into
non-overlapping blocks with the size of 4 × 4 × 3, and each block is denoted as Ks ,
where s is the index of each block.
33382
Multimedia Tools and Applications (2022) 81:33375–33395
Fig. 1 Watermark embedding
Step.2 Perform T-QR on each block.
Ks ¼ Qs Rs ðP s Þ−1 :
ð13Þ
Step.3 The HVS model J is computed by Eq. (12), with the size of 4 × 4, and J is divided
into non-overlapping blocks with the size of 4 × 4, denoted as Fs.
Step.4 Watermark W is embedded as
Qws
2 ð2; 1Þ ¼ avg þ ∂=2 if W ði; jÞ ¼ 1
Qws
2 ð3; 1Þ ¼ avg−∂=2
ð14Þ
Qws
2 ð2; 1Þ ¼ avg−∂=2 if W ði; jÞ ¼ 0
ð15Þ
Qws
2 ð3; 1Þ ¼ avg þ ∂=2
(
∂ ¼ ∂max if zði; jÞ ≥v
;
ð16Þ
∂ ¼ ∂min if zði; jÞ < v
where z = sum(Fs)/16, v = sum(z)/(M/4 × N/4), avg ¼ Qs2 ð2; 1Þ þ Qs2 ð3; 1Þ =2, Qs2 is the
second matrix of Qs , z(i, j) is the value of z in location (i, j), W(i, j) is the watermark bit, ∂min
and ∂max are the maximum embedding strength and minimum embedding strength, respectively, and sum(•) returns the sum of the matrix. The modified orthogonal tensor Qsw is
obtained.
Step.5 Perform inverse T-QR on each block.
Ks ¼ Qsw Rs ðP s Þ−1 :
Step.6 Repeat steps 1 to 5 until all key frames of the HDR video are embedded.
ð17Þ
Multimedia Tools and Applications (2022) 81:33375–33395
33383
For the grey-level video, a group of frames can be considered as the third-order tensor, which
can preserve the temporal correlation of the video. In order to protect the copyright of the greylevel video, a group of frames is divided into non-overlapping blocks, and each block is
decomposed by using T-QR to extract Qs2 for embedding watermark.
3.4 Watermark extraction
Watermark extraction is the reverse process of watermark embedding as illustrated in Fig. 2.
Step.1 Each watermarked key frame are regarded as the third-order tensor A* , which is
divided into non-overlapping blocks with the size of 4 × 4 × 3, and each block is
denoted as K*s .
Step.2 Perform T-QR on each block
−1
K*s ¼ Q*s R*s P *s ;
ð18Þ
*s
where Q*s
2 is the second matrix of Q .
Step.3 Watermark W∗ is extracted as
8
< W * ði; jÞ ¼ 1 if Q*s ð2; 1Þ ≥Q*s ð3; 1Þ
2
2
:
: W * ði; jÞ ¼ 0 if Q*s ð2; 1Þ < Q*s ð3; 1Þ
2
2
Step.4 Repeat steps 1 to 3 until watermark from all key frames is extracted.
Fig. 2 Watermark extraction
ð19Þ
33384
Multimedia Tools and Applications (2022) 81:33375–33395
4 Experimental results and discussion
In order to prove the effectiveness of the proposed HDR video watermarking method, Tibul,
Sunrise, Playground and ChristmasTree HDR videos are used to evaluate the performance of
the proposed method as illustrated in Fig. 3. 16 types of Tone Mapping (TM) attacks are
selected from HDR Toolbox as shown in Table 1.
The HDR-VDP-2 metric is used to evaluate the quality of the watermarked HDR video
[30], in which V is the imperceptibility index, and is from 0 to 100. HDR-VDP75% and HDRVDP95% are the probabilities of detection in at least 75% and 95% of the images, respectively.
High V denotes high visual quality, and high HDR-VDP75% and HDR-VDP95% denote the low
visual quality of the watermarked HDR video. BER is used to evaluate the correctness of
watermark extraction, which can be expressed as robustness.
BER ¼
Nw
;
Nt
ð20Þ
Where Nw and Nt are the number of false watermark bits and the number of total watermark
bits, respectively.
4.1 Discussion of Q1, Q2 and Q3
In order to select the most suitable matrix to embed watermark, watermark is embedded into
Q1, Q2 and Q3 by using the same way in Section 3.2, and these three methods are named as
Proposed-Q1, Proposed-Q2 and Proposed-Q3, respectively. Watermark is embedded into all
four HDR videos, respectively, and those HDR videos are attacked by using TM1, TM2, TM4,
TM6 and TM13, respectively. Averages BER of three different methods are compared as shown
in Table 2, and BERs of Proposed-Q2 are lower than those of Proposed-Q1 and Proposed-Q3,
(a) Tibul
(b) Sunrise
(c) Playground
(d) ChristmasTree
Fig. 3 HDR videos sequence
Multimedia Tools and Applications (2022) 81:33375–33395
33385
Table 1 16 types of TM attacks
TMs
Name
TMs
Name
TM1
TM3
TM5
TM7
TM9
TM11
TM13
TM15
ChiuTMO
BanterleTMO
ExponentialTMO
KimKautzConsistentTMO
LogarithmicTMO
NormalizeTMO
SchlickTMO
KuangTMO
TM2
TM4
TM6
TM8
TM10
TM12
TM14
TM16
AshikhminTMO
DragoTMO
ReinhardTMO
LischinskiTMO
MertensTMO
PattanaikTMO
VanHaterenTMO
WardHistAdjTMO
which denotes that Proposed-Q2 is more robust than Proposed-Q1 and Proposed-Q3. Thus
Proposed-Q2 is more suitable to embed watermark.
4.2 Discussion of the embedding strength
∂min and ∂max are related to the trade-off between the invisibility and robustness of the proposed
method. In the experiment, the embedding strength ∂min and ∂max are set to different values,
and ∂max is greater than ∂min. The watermarked HDR videos are obtained by using the proposed
method under different embedding strengths, and the invisibility and robustness against
different TM attacks are computed. In order to obtain optimal ∂min and ∂max for different
HDR videos, Y is calculated under different embedding strengths [38]:
5
1
ð21Þ
Y ¼ qc þ ∑ ð1−BERc Þ 100 ;
5
c¼1
where c is the type of the TM attacks, which are TM2, TM10, TM12, TM15 and TM16,
respectively. qc is the imperceptibility index of the watermarked HDR video with different
∂min and ∂max. BERc is the bit error rate of extraction under the cth TM attacks. The large Y
means that the corresponding ∂min and ∂max are the most suitable for embedding watermark.
Let Tibul be an example, when ∂min = 0.01 and ∂max = 0.03, Y has the maximum value.
Similarly, ∂min and ∂max of other HDR videos can be obtained as shown in Table 3.
4.3 Invisibility and robustness
Table 4 shows values of HDR-VDP75%, HDR-VDP95% and V, and it is obviously that averages
of HDR-VDP75%, HDR-VDP95% and V are 6.337%, 3.454% and 76.268, respectively, which
denotes the watermarked HDR video cannot be observed by human visions as illustrated in
Table 2 Average BER of all HDR videos
Attacks
Proposed-Q1
Proposed-Q2
Proposed-Q3
TM1
TM2
TM4
TM6
TM13
0.0606
0.0412
0.0535
0.0636
0.1886
0.0177
0.0044
0.0048
0.0035
0.0122
0.0256
0.0383
0.0368
0.0234
0.0568
33386
Multimedia Tools and Applications (2022) 81:33375–33395
Table 3 The embedding strength of HDR videos
HDR video
∂min
∂max
Tibul
Sunrise
Playground
ChristmasTree
0.01
0.02
0.04
0.01
0.03
0.04
0.06
0.03
Table 4 Invisibility of the HDR videos
HDR videos
HDR-VDP75% (100%)
HDR-VDP95% (100%)
V
Tibul
Sunrise
Playground
ChristmasTree
Average
3.170
22.160
0.012
0.006
6.337
1.450
12.360
0.004
0.003
3.454
76.987
74.444
72.929
80.713
76.268
Fig. 4. Figure 5 shows nearly 100% of watermark can be extracted from different watermarked
HDR videos, when these watermarked HDR videos are not under any attacks.
In order to prove the robustness of the proposed HDR video watermarking method on the
TM attacks, TM attacks are operated on Playground and ChristmasTree, respectively, as
shown in Table 5. Figure 6 shows the attacked Playground by using the part of TM attacks.
Form the Table 5, we can see that averages BER of Playground and ChristmasTree are 0.0095
and 0.0089, respectively, which denotes that the proposed method can resist TM attacks
efficiently.
(a) Tibul
(b) Sunrise
(c) Playground
(d) ChristmasTree
Fig. 4 Watermarked HDR video
Multimedia Tools and Applications (2022) 81:33375–33395
33387
(a) Tibul (BER = 0.0006)
(b) Sunrise (BER = 0.0002)
(c) Playground (BER = 0.0007)
(d) ChristmasTree(BER = 0.0006)
Fig. 5 The extracted watermark image
Table 5 BER under 18 types of TM attacks
TM attacks
Playground
ChristmasTree
TM attacks
Playground
ChristmasTree
TM1
TM3
TM5
TM7
TM9
TM11
TM13
TM15
0.0131
0.0047
0.0065
0.0241
0.0013
0.0008
0.0039
0.0570
0.0073
0.0055
0.0089
0.0263
0.0012
0.0006
0.0050
0.0380
TM2
TM4
TM6
TM8
TM10
TM12
TM14
TM16
0.0022
0.0035
0.0012
0.0009
0.0301
0.0207
0.0012
0.0638
0.0044
0.0050
0.0022
0.0008
0.0134
0.0248
0.0011
0.0347
(a) TM1
(b) TM2
(c) TM5
(e) TM6
(f) TM7
(g) TM8
Fig. 6 Attacked Sunrise by using TM attacks
33388
Multimedia Tools and Applications (2022) 81:33375–33395
In order to show robustness on video attacks and hybrid attacks, a variety of video attacks
and hybrid attacks are operated on Tibul and Sunrise as shown in Table 6, such as frame
average, and H.265 (encoder_intra_main10, GOP = 1, QP = 22). Frame average is to
compute the average of multiple frames, which will destroy watermark. However, the proposed method shows strong robustness, and corresponding BERs are 0.0020 and 0.0505,
respectively. Since watermark is embedded into each key frame of the video, frame swapping
and frame dropping will not destroy watermark, and nearly 100% of watermark can be
extracted. BERs of Tibul and Sunrise are lower than 0.04, and averages BER of Tibul and
Sunrise are 0.0167 and 0.0246, respectively, which denotes that the proposed method has
strong robustness as well. Figure 7 shows watermark extraction from Tibul, which denotes that
the extracted watermark can be recognized.
Table 6 BER of video attacks
Attacks
Tibul
Sunrise
Frame average
Frame swapping (30%)
Frame dropping (30%)
Frame dropping (50%)
H.265
Salt & Pepper (0.001)+TM1
Passion + TM2
Gaussian filter (3×3)+TM6
Sharpen(0.5)+TM8
Average
0.0020
0.0006
0.0006
0.0006
0.0609
0.0281
0.0090
0.0124
0.0359
0.0167
0.0505
0.0002
0.0002
0.0002
0.0282
0.0275
0.0652
0.0230
0.0268
0.0246
(a) Frame average
(b) Frame swapping (30%)
(c) Frame dropping (30%)
(d) Frame dropping (50%)
(e) H.265
(g) Salt & Pepper (0.001)+ TM1
(h) Passion + TM2
(f) Gaussian filter (3×3) + TM6
(I) Sharpen(0.5)+ TM8
Fig. 7 Watermark extractions from Tibul
Multimedia Tools and Applications (2022) 81:33375–33395
33389
4.4 Comparative
In order to prove the effectiveness of the HVS model, the proposed method without the
guidance of the HVS model, namely, proposed-WH, is tested as shown in Table 7. Let Tibul
and ChristmasTree be an example. From Table 7, we can see that V of the proposed method is
similar to that of the proposed-WH, but obviously BERs of the proposed method are better
than those of the proposed-WH, which denotes that the proposed method is more robust than
proposed-WH. Under the guidance of the HVS model, the HDR video watermarking method
can obtain more robustness.
Furthermore, to prove the effectiveness of the proposed method, Bakhsh’s [5], Kang’s [17]
and Joshi’s [16] methods are used to be compared as shown in Table 8 when Sunrise is under
different attacks. For the Table 8, we can see that the imperceptibility of the proposed method
is obviously higher than those of Bakhsh’s [5], Kang’s [17] and Joshi’s [16], and mostly BERs
of the proposed method are lower than those of Bakhsh’s [5], Kang’s [17] and Joshi’s [16]
especially for TM1 and TM11. For example, for TM4, BERs of the proposed method are nearly
0.09, 0.06 and 0.3 lower than those of Bakhsh’s [5], Kang’s [17] and Joshi’s [16], respectively.
For TM9, BERs of the proposed method are nearly 0.01, 0.1 and 0.3 lower than those of
Table 7 Comparisons with Proposed-WH
Tibul
ChristmasTree
V
TM1
TM2
TM3
TM13
TM14
V
TM1
TM2
TM3
TM10
TM13
Proposed
Proposed-WH
76.987
0.0269
0.0075
0.0090
0.0050
0.0016
80.550
0.0073
0.0044
0.0055
0.0134
0.0050
77.240
0.0454
0.0149
0.0166
0.0079
0.0021
80.713
0.0117
0.0487
0.0401
0.0572
0.0263
Table 8 Comparisons on different attacks
Attacks
Proposed
Bakhsh’s [5]
Kang’s [17]
Joshi’s [16]
V
TM1
TM4
TM5
TM7
TM9
TM10
TM11
TM13
Gaussian filter (3×3)+TM6
Sharpen(0.5)+TM8
Passion + TM2
Average
74.444
0.0234
0.0084
0.0339
0.0239
0.0401
0.0631
0.0011
0.0350
0.0230
0.0268
0.0652
0.0313
67.3772
0.0607
0.0915
0.0512
0.0173
0.0594
0.0750
0.0495
0.0552
0.0897
0.0348
0.0556
0.0582
70.0520
0.1445
0.0635
0.1582
0.0791
0.1191
0.1396
0.0137
0.1152
0.1250
0.1104
0.2148
0.1166
72.0132
0.3956
0.3849
0.3861
0.3849
0.3235
0.3440
0.3485
0.4199
0.3944
0.3851
0.4135
0.3800
33390
Multimedia Tools and Applications (2022) 81:33375–33395
Table 9 Average BER of the HDR video database [3]
Attacks
TM3
TM8
TM11
TM13
Proposed
Bakhsh’s [5]
Kang’s [17]
Joshi’s [16]
0.0389
0.0784
0.1025
0.0987
0.0246
0.0981
0.2085
0.1189
0.0358
0.1011
0.0977
0.1278
0.0486
0.1023
0.2145
0.2793
Bakhsh’s [5], Kang’s [17] and Joshi’s [16], respectively. Compared with Bakhsh’s [5],
although BERs of the proposed method are higher for TM7 and Passion + TM2, lower for
other attacks, such as TM1, TM5 and Gaussian filter (3 × 3) + TM6. Compared with Kang’s
[17] and Joshi’s [16], the proposed method is obviously better. In related to all attacks, the
proposed method performs best since average of BER is nearly 0.02, 0.1 and 0.3 lower than
those of Bakhsh’s [5], Kang’s [17] and Joshi’s [16], respectively. Above all, the proposed
method has the strong capability of protecting the copyright of the HDR video.
4.5 Robustness on other HDR videos
One HDR video database [3] including 10 HDR videos is used to demonstrate the robustness
of the proposed method again. Compared with Bakhsh’s [5], Kang’s [17] and Joshi’s [16]
methods as shown in Table 9, BERs of the proposed method are obviously lower, which
proves that the effectiveness of the proposed method again.
5 Conclusion
In this paper, a robust HDR video watermarking method based on the HVS model and T-QR is
proposed. The key frames are extracted by using scene change detection, and each key frame can be
regarded as the third-order tensor to obtain robust domain by using T-QR. After T-QR decomposition, the orthogonal tensor is calculated, which consists of the first, second and third matrices.
Compared with the other two matrices, the second matrix is more robust, and therefore it is more
suitable to embed watermark. The HVS model is computed by using luminance perception, image
contrast and contrast masking, which can determine the embedding strength. Experimental results
show that the proposed method can resist various attacks, and effectively protect the copyright of the
HDR videos. In the future work, we will further explore visual perception factors of the HDR video
to guide watermark embedding for improving the watermarking efficiency.
Acknowledgments This work was supported by Natural Science Foundation of China under Grant No.
61971247 and 61501270, Zhejiang Provincial Natural Science Foundation of China under Grant No.
LY22F020020 and LQ20F010002, Natural Science Foundation of Ningbo under Grant No. 2021J134. It was
also sponsored by the K. C. Wong Magna Fund in Ningbo University.
Declarations
Conflict of interest We declare that we have no financial and personal relationships with other people or
organizations that can inappropriately influence our work, there is no professional or other personal interest of
any nature or kind in any product, service and/or company that could be construed as influencing the position
Multimedia Tools and Applications (2022) 81:33375–33395
33391
presented in, or the review of, the manuscript entitled, “Robust HDR video watermarking method based on the
HVS model and T-QR”.
References
1. Abdulla AA (2015) Exploiting similarities between secret and cover images for improved embedding
efficiency and security in digital steganography. http://bear.buckingham.ac.uk/149/
2. Alias S, Ramakrishnan SP (2018) Fibonacci based key frame selection and scrambling for video
watermarking in DWT–SVD domain. Wirel Pers Commun 102:2011–2031. https://doi.org/10.1007/
s11277-018-5252-1
3. Azimi M, Banitalebi-Dehkordi A, Dong Y, Nasiopoulos P (2018) Evaluating the performance of existing
full-reference quality metrics on high dynamic range (HDR) video content. ICMSP 2014: XII International
Conference on Multimedia Signal Processing
4. Bai Y, Jiang G, Yu M, Peng Z, Chen F (2018) Towards a tone mapping robust watermarking algorithm for
high dynamic range image based on spatial activity. Signal Process Image Commun 65:187–200. https://
doi.org/10.1016/j.image.2018.04.005
5. Bakhsh FY, Moghaddam ME (2018) A robust HDR images watermarking method using artificial bee
colony algorithm. J Inf Secur Appl 41:12–27. https://doi.org/10.1016/j.jisa.2018.05.003
6. Balasamy K, Suganyadevi S (2021) A fuzzy based ROI selection for encryption and watermarking in
medical image using DWT and SVD. Multimed Tools Appl 80:7167–7186. https://doi.org/10.1007/s11042020-09981-5
7. Chen Y, Jia ZG, Peng Y, Peng YX, Zhang D (2021) A new structure-preserving quaternion QR decomposition method for color image blind watermarking. Signal Process 185:108088. https://doi.org/10.1016/j.
sigpro.2021.108088
8. Cheng YM, Wang CM (2009) A novel approach to steganography in high-dynamic-range images. IEEE
Multimed 16:70–80 https://doi.ieeecomputersociety.org/10.1109/MMUL.2009.43
9. Chi B, Yu M, Jiang GY, He ZY, Chen F (2020) Blind tone mapped image quality assessment with image
segmentation and visual perception. J Vis Commun Image Represent 67:102752. https://doi.org/10.1016/j.
jvcir.2020.102752
10. Choudhury A (2020) Robust HDR image quality assessment using combination of quality metrics.
Multimed Tools Appl 79:22843–22867. https://doi.org/10.1007/s11042-020-08985-5
11. Ernawan F, Kabir MN (2020) A block-based RDWT-SVD image watermarking method using human
visual system characteristics. Vis Comput 36:19–37. https://doi.org/10.1007/s00371-018-1567-x
12. Guerrini F, Okuda M, Adami N, Leonardi R (2011) High dynamic range image watermarking robust against
tone-mapping operators. IEEE Trans Inf Forensics Secur 6:283–295. https://doi.org/10.1109/TIFS.2011.
2109383
13. Hao N, Kilmer ME, Braman K, Hoover RC (2013) Facial recognition using tensor-tensor decompositions.
SIAM J Imaging Sci 6:437–463. https://doi.org/10.1137/110842570
14. Hu R, Xiang S (2020) Cover-lossless robust image watermarking against geometric deformations. IEEE
Trans Image Process 30:318–331. https://doi.org/10.1109/TIP.2020.3036727
15. Hu J, Shao Y, Ma W, Zhang T (2015) A robust watermarking scheme based on the human visual system in
the wavelet domain. In: 2015 8th international congress on image and signal processing (CISP), pp 799–
803. https://doi.org/10.1109/CISP.2015.7407986
16. Joshi A, Gupta S, Girdhar M, Agarwal P, Sarker R (2017) Combined DWT–DCT-based video
watermarking algorithm using Arnold transform technique. In: Proceedings of the international conference
on data engineering and communication technology, vol 468, pp 455–463. https://doi.org/10.1007/978-98110-1675-2_45
17. Kang X, Zhao F, Lin G, Chen Y (2018) A novel hybrid of DCT and SVD in DWT domain for robust and
invisible blind image watermarking with optimal embedding strength. Multimed Tools Appl 77:13197–
13224. https://doi.org/10.1007/s11042-017-4941-1
18. Kang J, Hou JU, Ji SK, Lee H (2020) Robust spherical panorama image watermarking against viewpoint
desynchronization. IEEE Access 8:2169–3536. https://doi.org/10.1109/ACCESS.2020.3006980
19. Khan IR, Huang Z, Farbiz F, Manders CM (2009) HDR image tone mapping using histogram adjustment
adapted to human visual system. In: 2009 7th international conference on information, communications and
signal processing, pp 1–5. https://doi.org/10.1109/ICICS.2009.5397652
20. Khare P, Srivastava V (2020) HT-IWT-DCT-based hybrid technique of robust image watermarking.
Advances in VLSI, communication, and signal processing 683, pp 359–370. https://doi.org/10.1007/978981-15-6840-4_28
33392
Multimedia Tools and Applications (2022) 81:33375–33395
21. Khwildi R, Zaid AO (2020) HDR image retrieval by using color-based descriptor and tone mapping
operator. Vis Comput 36:1111–1126. https://doi.org/10.1007/s00371-019-01719-1
22. Khwildi R, Zaid AO, Dufaux F (2021) Query-by-example HDR image retrieval based on CNN. Multimed
Tools Appl 80:115413–115428. https://doi.org/10.1007/s11042-020-10416-4
23. Lai C (2011) An improved SVD-based watermarking scheme using human visual characteristics. Opt
Commun 284:938–944. https://doi.org/10.1016/j.optcom.2010.10.047
24. Li J, Zhang C (2020) Blind and robust watermarking scheme combining bimodal distribution structure with
iterative selection method. Multimed Tools Appl 79:1373–1407. https://doi.org/10.1007/s11042-01908213-9
25. Li MT, Huang NC, Wang CM (2011) A data hiding scheme for high dynamic range images. Int J Innov
Comput Inf Control 7:2021–2035
26. Lin YT, Wang CM, Chen WS, Lin FP (2017) A novel data hiding algorithm for high dynamic range
images. IEEE Trans Multimed 19:196–211. https://doi.org/10.1109/TMM.2016.2605499
27. Luo Y, Peng D (2021) A robust digital watermarking method for depth-image-based rendering 3D video.
Multimed Tools Appl 80:14915–14939. https://doi.org/10.1007/s11042-020-10375-w
28. Maiorana E, Campisi P (2016) High-capacity watermarking of high dynamic range images. EURASIP J
Image Video Process:1–15. https://doi.org/10.1186/s13640-015-0100-7
29. Mantiuk R, Myszkowski K, Seidel HP (2006) A perceptual framework for contrast processing of high
dynamic range images. ACM Trans Appl Percept 3:286–308. https://doi.org/10.1145/1166087.1166095
30. Mantiuk R, Kim KJ, Rempel AG, Heidrich W (2011) HDR-VDP-2: a calibrated visual metric for visibility
and quality predictions in all luminance conditions. ACM Trans Graph 30:1–14. https://doi.org/10.1145/
2010324.1964935
31. Martin CD, Shafer R, LaRue B (2013) An order-p tensor factorization with applications in imaging. SIAM J
Sci Comput 35:A474–A490. https://doi.org/10.1137/110841229
32. Mousavi S, Naghsh A (2021) A robust Plenoptic image watermarking method using graph-based transform.
Multimed Tools Appl 80:14591–14608. https://doi.org/10.1007/s11042-021-10555-2
33. Nouioua I, Amardjia N, Belilita S (2018) A novel blind and robust video watermarking technique in fast
motion frames based on SVD and MR-SVD. Secur Commun Netw. https://doi.org/10.1155/2018/6712065
34. Sang J, Liu Q, Song CL (2020) Robust video watermarking using a hybrid DCT-DWT approach. J Electron
Sci Technol 100052. https://doi.org/10.1016/j.jnlest.2020.100052
35. Solachidis V, Maiorana E, Campisi P (2013) HDR image multi-bit watermarking using bilateral- filteringbased masking. Image Processing: Algorithms and Systems XI 8655, pp 865505. https://doi.org/10.1117/
12.2005240
36. Wu J (2012) Robust watermarking framework for high dynamic range images against tone mapping attacks,
in Watermarking
37. Yu CM, Wu KC, Wang CM (2011) A distortion-free data hiding scheme for high dynamic range images.
Displays. 32:225–236. https://doi.org/10.1016/j.displa.2011.02.004
38. Yu M, Wang Y, Jiang GY, Bai Y, Luo T (2019) High dynamic range image watermarking based on Tucker
decomposition. IEEE Access 7:113053–113064. https://doi.org/10.1109/ACCESS.2019.2935627
39. Yuan Z, Liu D, Zhang X, Wang H, Su Q (2020) DCT-based color digital image blind watermarking method
with variable steps. Multimed Tools Appl 79:30557–30581. https://doi.org/10.1007/s11042-020-09499-w
40. Zhang Y, Sun YY (2019) An image watermarking method based on visual saliency and contourlet
transform. Optik 186:379–389. https://doi.org/10.1016/j.ijleo.2019.04.091
41. Zhou RG, Hu W, Fan P, Luo G (2018) Quantum color image watermarking based on Arnold transformation
and LSB steganography. Int J Quantum Inf 16. https://doi.org/10.1142/S0219749918500211
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
Multimedia Tools and Applications (2022) 81:33375–33395
33393
MENG DU is currently pursuing the master’s degree with the Faculty of Information Science and Engineering,
Ningbo University. She is currently a student with the Faculty of Information Science and Engineering, Ningbo
University. Her research interests include multimedia security and image processing.
TING LUO received the Ph.D. degree with the Faculty of Information Science and Engineering, Ningbo
University in 2016. He is currently a Professor with the College of Science and Technology, Ningbo University.
His research interests include multimedia security, image processing, data hiding, and pattern recognition.
33394
Multimedia Tools and Applications (2022) 81:33375–33395
HAIYONG XU is currently pursuing the Ph.D. degree with the Faculty of Information Science and Engineering,
Ningbo University. He is currently a Teacher with the College of Science and Technology, Ningbo University.
His research interests include multimedia communication, image processing, and machine learning.
YANG SONG received the M.S. degree and Ph.D. degree from Ningbo University, in 2015 and 2018. He is
currently a teacher with the College of Science and Technology, Ningbo University. His research interests
include multimedia communication, image processing, and visual quality assessment.
Multimedia Tools and Applications (2022) 81:33375–33395
33395
Chunpeng Wang received the B.E. degree in computer science and technology in 2010 from Shandong Jiaotong
University, China, the M.S. degree from the School of Computer and Information Technology, Liaoning Normal
University, China, 2013, and the Ph.D. degree in Faculty of Electronic Information & Electrical Engineering,
Dalian University of Technology, China, 2017. He is currently a teacher with the School of Information, Qilu
University of Technology (Shandong Academy of Sciences), China. His research mainly includes image
watermarking and signal processing.
Li Li is currently with Hangzhou Dianzi University, where she was a professor from 2002 to 2005. Her research
interests include digital image watermarking and computer animation. Her current research interests include
image/video/3D mesh watermarking, QR code, and image/video processing. She received the B.S. and M.S.
degrees in mathematics in 1994 and 1997, respectively, from Zhejiang University, where she received the Ph.D.
degree in computer science in 2004.
Download