HQR-Scheme: A High Quality and resilient virtual primary key generation approach for watermarking relational data

https://doi.org/10.1016/j.eswa.2019.06.058Get rights and content

Highlights

  • The proposed metrics help to know if a VPK set will be useful to watermark the data.

  • There is no need to perform the WM embedding to know if the VPK set is good.

  • The cyclic idea of the attribute order helps to design an effective VPK scheme.

  • The WM is detected even after being deleted more than one attribute of R.

Abstract

Most of the watermarking techniques designed to protect relational data often use the Primary Key (PK) of relations to perform the watermark synchronization. Despite offering high confidence to the watermark detection, these approaches become useless if the PK can be erased or updated. A typical example is when an attacker wishes to use a stolen relation, unlinked to the rest of the database. In that case, the original values of the PK lose relevance, since they are not employed to check the referential integrity. Then, it is possible to erase or replace the PK, compromising the watermark detection with no need to perform the slightest modification on the rest of the data. To avoid the problems caused by the PK-dependency some schemes have been proposed to generate Virtual Primary Keys (VPK) used instead. Nevertheless, the quality of the watermark synchronized using VPKs is compromised due to the presence of duplicate values in the set of VPKs and the fragility of the VPK schemes against the elimination of attributes. In this paper, we introduce the metrics to allow precise measuring of the quality of the VPKs generated by any scheme without requiring to perform the watermark embedding. This way, time waste can be avoided in case of low-quality detection. We also analyze the main aspects to design the ideal VPK scheme, seeking the generation of high-quality VPK sets adding robustness to the process. Finally, a new scheme is presented along with the experiments carried out to validate and compare the results with the rest of the schemes proposed in the literature.

Introduction

Databases have been traditionally protected by security methods oriented to control the access of users according to their role in the system. This approach has been mostly implemented through authentication policies and has been backed by other operations of the database management systems like log recording and backups management. Nevertheless, with the increasing of data demands and internet services, portable data and remote accessing of the information are growing in popularity. As a result, the access to data by unauthorized users is even easier, being possible to make illegal copies or to perform data tampering without leaving traces through the network. For that reason, it can be said that traditional database security techniques are not enough to guarantee the protection of the data.

To face those problems, alternative security methods such as watermarking techniques have been proposed. Watermarking techniques allow the protection of the data without restraining their deployment or copying, being that an important reason for their rising popularity. By means of watermarking, data can be protected from tampering, stealing, unauthorized distribution, and many other malicious operations. Watermarking techniques were first proposed for protecting multimedia data (Choudhary, Nath, Panda, 2017, Nematollahi, Vorakulpipat, Rosales, 2017, Wang, 2017) and then were extended to relational data (Ghogare, Junnarkar, 2017, Khanduja, 2017, Rani, Koshley, Halder, 2017, Unnikrishnan, Pramod, 2017). Performing watermarking over relational data have not been an easy task due to their features and differences respect to multimedia data types (Halder, Pal, Cortesi, 2010, Iftikhar, Kamran, Anwar, 2015a, Iftikhar, Kamran, Anwar, 2015b).

The essential element of a watermarking technique is the Watermark (WM), which consists of a set of items called marks mostly represented by bits. The WM must be imperceptibly embedded in the digital asset being protected, preserving the data usability. In that way, malicious users cannot get clues of the WM presence, since they may try to remove it to claim the data ownership. There are some requirements (Halder, Pal, Cortesi, 2010, Khanduja, Chakraverty, Verma, 2016a, Mehta, Aswar, 2014, Pérez Gort, Feregrino Uribe, Nummenmaa, 2017b, Shih, 2017, Xie, Wu, Shen, Hwang, 2016) that watermarking techniques must fulfill to avoid compromising data quality and prevent other undesirable situations. Also, for the extraction, the marks should remain in the watermarked content, at least in a quantity enough to guarantee the WM recognition.

Among the multiple criteria used for WM classification (Halder, Pal, Cortesi, 2010, Patil, Yawalkar, 2015, Prajapati, Tiwari, 2015), two of them are of particular interest to this work: (i) the cover type of the WM, which depends on the type of attribute used to hide the marks, (ii) and the WM intent, which classifies the techniques depending on their application (see Fig. 1). Some watermarking techniques have been developed for copyright protection (Jiang, Chen, Li, 2009, Khanduja, Verma, Chakraverty, 2015, Rao, Patel, Vikani, 2012, Zhang, Gao, Jiang, Zhang, Zhang, 2011), while others named fingerprinting, are oriented to detect traitor users as well as to check the authenticity of the data copies (Gursale, Mohanpurkar, 2014, Iftikhar, Anwar, Kamran, 2014, Mohanpurkar, Joshi, 2015). These techniques are classified as robust, considering the severity of the attacks aimed to compromise them, and the expected resilience against them. On the other hand, WMs can also be used for monitoring data integrity and to protect data against malicious modifications like tampering and fraud (Camara, Li, Li, Xie, 2014, Chang, Wu, 2012, Guo, 2011, Khan, Husain, 2013, Şahin, Ulutaş, İmamoğlu, 2016). These WMs are classified as fragile since the owner of the data not only knows of its presence but he is also benefited from it. For that reason, it is assumed that data will not suffer attacks focused on compromising the WM functioning.

Generally, watermarking techniques are composed of two processes: (A) WM embedding and (B) WM extraction (see Fig. 2). Embedding is composed of two sub-processes, WM generation and marks embedding. WM generation source could be a multimedia file, the database itself, among others (Agrawal, Haas, Kiernan, 2003, Halder, Pal, Cortesi, 2010). WM extraction is usually performed when it is requested as evidence in some litigation (e.g., claim of the data ownership). For that, the WM should remain in the data no matter how long it has been embedded. For relational data, this is a major challenge compared to other data types since the information stored in databases is daily modified through operations called benign updates (addition, update, and elimination of data). Also, robust techniques must guarantee resilience against malicious attacks. The sub-processes composing the extraction are the detection of marks, their extraction, and WM reconstruction. Belonging to the extraction process there is also the WM enhancement optional sub-process that is very useful for cases when the WM signal is meaningful (e.g., when the WM is generated from a multimedia file). This operation makes the technique more resilient to attacks since it can tolerate losing more marks and still achieve the WM identification once it is extracted and enhanced. Also, it can be used to cause a lower distortion during the embedding, helping to avoid compromising the data usability in the process. For both processes (the embedding and the extraction) the value of the private parameters must be the same. Otherwise, WM synchronization will not be achieved. Finally, the simplest watermarking processes require at least the consideration of one parameter defined as the Secret Key (SK), which has to be only known by the data owner (Agrawal & Kiernan, 2002).

Since the first proposal, relational database watermarking techniques have been growing in diversity, backed by a broad theory. Nevertheless, most of the watermarking techniques use the Primary Key (PK) of the relation to decide where and how to embed each mark. The PK stores unique values that identify each tuple in the relation, which is why using it allows high synchronization of the WM. Watermarking techniques that use the PKs are backed by the assumption that there is no way to delete the PKs without compromising the referential integrity of the database, due to their role in the database design. That is why the attackers trying to compromise the WM detection are forced to modify the data, without tampering the PK, since they are also interested in maintaining the quality of the data. Based on that assumption, the majority of watermarking techniques proposed to protect relational data are PK-dependent (e.g., Franco-Contreras, Coatrieux, Cuppens, Cuppens-Boulahia, Roux, 2014, İmamoğlu, Ulutaş, Ulutaş, 2015, Kamran, Suhail, Farooq, 2013, Melkundi, Chandankhede, 2015, Pérez Gort, Feregrino Uribe, Nummenmaa, 2017b).

Despite the PK relevance in database design, watermarked relations are often distributed separately from the database, which allows compromising the WM detection by erasing or updating the PK if the attacker has no intention of using the relation linked to the rest of the database. This has become the perfect attack since the attacker does not require the modification of any data beyond the PK. To avoid this vulnerability (or the absence of PK in the relation), some proposals have been published under the classification of Virtual Primary Key (VPK) schemes, oriented to generate VPKs to be used by watermarking techniques instead of the PKs. The work presented here is oriented to solve some weakness presented by most VPK schemes (oriented to generate VPKs for watermarking techniques of numeric cover type) as well as to find a better way to measure the quality of the VPK set, giving so a more precise idea of how effective can be its use by watermarking techniques. For the case of other cover types different to the numeric one, our approach can also be applied, describing the same behavior, as long as the stored data is handled on its binary representation.

The rest of this paper is organized as follows. Section 2 introduces details concerning the relational data structure and the way watermarking techniques have approached WM embedding in relational data. Section 3 describes the challenges VPK schemes face to be functional and considered for watermarking relational data. Section 4 gives a critical review of the VPK schemes proposed so far. Section 5 presents the set of metrics that better describes the quality of the VPKs and the resilience of a VPK scheme against the elimination of attributes. Section 6 presents the elements describing the ideal design for a VPK scheme and our approach seeking the maximum matching to this design. Section 7 presents the experimental results allowing the comparison between previous schemes and our approach. The conclusion of this paper is given in Section 8.

Section snippets

Watermarking relational data

According to the relational model, relational databases are conceived as a collection of relations corresponding to the entities considered in its design (Date, 2006). The relations are linked using their primary keys, implementing in that way the referential integrity between their records. Each relation presents a table structure where the columns are the attributes and the rows are tuples representing the instances of the entities (see Fig. 3). So far, watermarking techniques for relational

Synchronization problems

Watermarking techniques using the PK of the relation for performing WM embedding and its extraction achieve high synchronization considering that the PK stores unique values used to identify each tuple. Then, high entropy is added to the selection of marks as well as to the places to embed them, which also makes the technique more robust since the WM capacity increases while more marks are considered. The main fragility of this approach is that if the PK can be erased or updated, then WM

Previous work

The first VPK scheme reported in the literature was proposed by Agrawal and Kiernan (2002) to apply the AHK algorithm in relations with no PK. Identified as the S-Scheme by Li et al. (2003), this scheme is conceived to be used by watermarking techniques trying to protect relations composed by one or more than one numeric attributes. When only one attribute composes the relation, the VPK is created by splitting the binary value of the attribute into two fragments, the first one corresponding to

Measuring quality and resilience

As far as we know, the only metric for quantifying the effects of the duplicate problem for WM synchronization was proposed by Li et al. (2003). They defined the equation δ=(wmaxwmin)/wmin, δR:0δ;wmaxZ+,wminZ0+, being δ the duplicate index, wmax the maximum number of times the same mark is embedded and wmin the minimum. According to this equation, the ideal WM embedding is accomplished when all marks are equally embedded, being wmax=wmin, resulting in δ=0. But when any mark is excluded

HQR-Scheme: The high quality and resilient VPK generation approach

According to the literature published, the VPK schemes proposed so far generate too many duplicate values, are vulnerable to the deletion problem, or waste elements of R, denigrating the quality of S and due to that, the WM synchronization. For those reasons, it is important to design a VPK scheme that generates a higher number of exclusive values, varying the attributes involved in the process for each tuple. In this section, we present the main aspects to consider in the design of a VPK

Experimental results

The experiments carried out were focused on measuring the quality of the sets of VPK generated by each VPK scheme and their resilience against the deletion problem. The relation used to embed the WM consists of a subset of the dataset Forest Cover Type (Colorado State, 1999). This subset is formed by the first 30,000 tuples out of the 581,012 that the relation stores. Also, from the 54 attributes, only the first 10 of them were used. This subset is selected to fairly compare the results with

Conclusions

In this paper, we present the metrics to measure the quality of a set of VPK values to be used for watermarking relational data as well as the main aspects to be considered to design a VPK scheme. Based on these considerations and the challenges that practical scenarios bring with them, it was proposed a novel VPK scheme to achieve higher WM synchronization despite the elimination of attributes in the watermarked relation. The experiments carried out show how previous VPK schemes do not

Declaration of Competing Interest

None.

CRediT authorship contribution statement

Maikel Lázaro Pérez Gort: Writing - original draft, Writing - review & editing, Data curation, Investigation, Software, Methodology. Claudia Feregrino-Uribe: Conceptualization, Supervision, Writing - review & editing, Resources. Agostino Cortesi: Methodology, Supervision, Validation. Félix Fernández-Peña: Software, Data curation, Validation.

Acknowledgements

This work was partially supported by a Ph.D. grant No. 714270 from CONACyT, Mexico.

References (42)

  • M.A. Nematollahi et al.

    Video watermarking

    Digital watermarking

    (2017)
  • M.L. Pérez Gort et al.

    A highly-reliable virtual primary key scheme for relational database watermarking techniques

    Proceedings of the international conference on computational science and computational intelligence

    (2017)
  • R. Agrawal et al.

    Watermarking relational data: Framework, algorithms and analysis

    The VLDB journal

    (2003)
  • R. Agrawal et al.

    Watermarking relational databases

    Proceedings of the 28th international conference on very large data bases

    (2002)
  • L. Camara et al.

    Distortion-free watermarking approach for relational database integrity checking

    Mathematical problems in engineering

    (2014)
  • C.-C. Chang et al.

    A blind robust reversible watermark scheme for textual relational databases with virtual primary key

    International workshop on digital watermarking

    (2014)
  • J.-N. Chang et al.

    Reversible fragile database watermarking technology using difference expansion based on svr prediction

    Computer, consumer and control (is3c), 2012 international symposium on

    (2012)
  • S. Choudhary et al.

    Double layered audio zero-watermarking using dwt & dsss

    Communication and signal processing (iccsp), 2017 international conference on

    (2017)
  • Colorado State, U. (1999). Forest covertype, the uci kdd archive. Information and Computer Science. University of...
  • C.J. Date

    An introduction to database systems

    (2006)
  • J. Franco-Contreras et al.

    Robust lossless watermarking of relational databases based on circular histogram modulation

    IEEE transactions on information forensics and security

    (2014)
  • G.R. Ghogare et al.

    Genetic algorithm based reversible watermarking approach for relational data

    IJETT

    (2017)
  • J. Guo

    Fragile watermarking scheme for tamper detection of relational database

    Computer and management (caman), 2011 international conference on

    (2011)
  • N. Gursale et al.

    Distortion minimization fingerprinting technique for relational database

    International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC)

    (2014)
  • R. Halder et al.

    Watermarking techniques for relational databases: survey, classification and comparison.

    J. UCS

    (2010)
  • S. Iftikhar et al.

    A novel and robust fingerprinting technique for digital data based on genetic algorithm

    High-capacity optical networks and emerging/enabling technologies (honet), 2014 11th annual

    (2014)
  • S. Iftikhar et al.

    Rrw-a robust and reversible watermarking technique for relational data

    IEEE Transactions on Knowledge & Data Engineering

    (2015)
  • S. Iftikhar et al.

    A survey on reversible watermarking techniques for relational databases

    Security and Communication Networks

    (2015)
  • M.B. İmamoğlu et al.

    A watermarking technique for relational databases based on partitioning

    Signal processing and communications applications conference (siu), 2015 23th

    (2015)
  • C. Jiang et al.

    Watermarking relational databases for ownership protection based on dwt

    Information assurance and security, 2009. ias’09. fifth international conference on

    (2009)
  • M. Kamran et al.

    A robust, distortion minimizing technique for watermarking relational databases using once-for-all usability constraints

    IEEE Transactions on Knowledge and Data Engineering

    (2013)
  • Cited by (0)

    View full text