Uploaded by techyash545

Seminar Report sample

advertisement
Savitribai Phule Pune University
PROJECT TITLE
A Seminar
By
Studet name(BSCDS)
Submitted in partial fulfillment of the requirements
for the degree of
(BSC Data Science)
November 27,2023
Accepted by the University
Date
HEAD OF THE DEPARTMENT
Acknowledgements
It gives me immense pleasure in presenting the preliminary seminar report on “Hid-
ing the message behind the Words:Advance in Natural Language Watermarkig”. I feel great pleasure to express my deep sense of gratitude towards the
administrative and technical staff of SPPU for helping and supporting me. I owe
thanks to my beloved family and friends for their kind co-operation and valuable
help. Last but not the least I express my deep sense of gratitude to all my wellwishers.
Student Name(Roll No)
ii
Abstract
Digital Watermarking is a new technique of Watermarking, which is used to hide
message and encrypt a digital signal.
Digital Watermarking is used to protect content. Basically Watermarking is used
on images, audio and video. Digital Watermarking has many different techniques, but
this paper focus on text. Digital Watermarking has been implemented for Chines,
English, Arabic and Turkish languages text by different methods. This paper developed a new Digital Watermarking algorithm, a new technique implemented on
English language. This paper proposes an algorithm for English grammatical word
and encryption process.
This Digital Watermarking procedure mainly marked grammatical word such as
verb, conjugation, preposition and articles. This process produces an encrypt message
which is used by the watermark. This technique is applied on different websites
which is verify it , for example https://louisem.com/1912/free-watermark-softwarewatermark-online and etc....
Keywords:Text watermarking; Natural language processing; RSA; Encryption;
Author’s authenticity
iii
Contents
1 Introduction
1
1.1
Project Overvie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2 Literature Review
2.1
3
Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Digital Watermarking Technique
3.1
3.2
5
Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3.1.1
Space Coding . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3.1.2
Feature Coding . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Natural Language Processing . . . . . . . . . . . . . . . . . . . . . .
6
3.2.1
Synonym substitution . . . . . . . . . . . . . . . . . . . . . .
6
3.2.2
Syntactic transformation
. . . . . . . . . . . . . . . . . . . .
6
3.2.3
Semantic transformation
. . . . . . . . . . . . . . . . . . . .
6
4 Methodology
4.1
3
Mathematical Formula
7
. . . . . . . . . . . . . . . . . . . . . . . . .
7
5 Result
9
5.1
Case study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
5.2
Case study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
iv
List of Tables
v
List of Figures
4.1
Create a Security key using algorithm . . . . . . . . . . . . . . . . . .
vi
8
Chapter 1
Introduction
1.1
Project Overvie
This is the age of technology and in this age, we use the internet abundantly. When
using the internet, we access loads of digital data. But in past, large amount of data
was not available digitally, instead it was available in the form of a hard copy. Now
a days digital data is more convenient than hard copy data as sharing, gathering ,
storing it is easy as compared to hard copies of the same data. Digital text data
is more secure as compared to hard copies. In case of any changes, it is easier in
digital data. Digital data not only provides us with great amenities but also is a
haven for malicious illegal attacks, piracy attack and create copyright palavers problems[1]. Besides this digital text information can be illegally copied or susceptible
to threat like important data theft, authentication problems, and forgery. There are
various solutions or many effective ways to overcome these threats. Confidentiality,
authenticity, integrity can be used to overcome these threats [2]. It is an impossible
thing to protect data against copyright totally, but it is possible to protect data from
copyrights by using different methods.
Well known watermarking techniques protect the digital data from copyrights.
Watermarking hides the content and embeds some secret information or data into
original digital text, then compares it and finds out owner attribution of data and
tracking infraction [3]. However different kind of digital watermarking method have
been developed recently. Some methods focus on or finds out attribution of the
1
CHAPTER 1. INTRODUCTION
2
owner using semantic or syntactic analysis, text format, line space, characters or
words font [2]. Mostly digital watermarking is applied on images, audio, video, text.
Watermarking is very useful to protects against illegal threats, copyrights and piracy
data.
In this paper, the algorithm tells us about speech tags including modal verbs,
prepositions, conjunctions and articles. This paper used RSA algorithm for encryption messages, which is used for hiding owner name and etc.
This algorithm is also able to show the real author name from which the text document is copied with the highest accuracy. And it also provides necessary protection
against various threats based on the word document, by modifying the font of the
letters in the word document. Finally, through a dilution of the test, it explores and
shows the accuracy and efficiency of the algorithm.
1.2
Problem Statement
Copyrights issues are create, Specially when the data is in digital form.If this data is
published then copyrights problem create in front of data owners.
1.3
Research Objective
This paper main objective is to find more effective Digital Watermark technique.To
protect owner digital data.
Chapter 2
Literature Review
2.1
Related Documents
Makarand L. Mali, Nitin N. Patil and J. B. Patil [2] proposed a watermarking algorithm based on English grammatical words and used encryption technique. Author
also focused on grammatical rules like conjunctions, pronouns and modal verbs to
generate encrypted watermark message.
Chen Li and You Fucheng [3] implements a text digital watermarking algorithm
based on the word document through study several technologies of text digital watermarking. It realizes the watermark embedding by modifying the font of the letters in
word document. Yingli Zhang et al. [6] proposed this method for watermarking based
on the word document for controlling the dissemination and conserving copyright, for
both Chinese and English language. main thing of this paper is hiding information.
In this approach, each object of word document contains information of author and
legal user after performing encryption technique.
C. Culnaneet. al. [7] has suggested a watermarking method for formatted text
documents. In this method, word spaces are used and take the documents as one
long line for watermarking. Nevertheless, the author proposed a unique method of
threshold and thresholding buffering.
Xianghe Jing, HuapingFei, YuHao and Zhijun Li [8] proposed a novel text encryption method based on natural languageprocessing (NLP). Three linguistic transformation Synonym substitution, Syntactic transformations, Semantic transformations
3
CHAPTER 2. LITERATURE REVIEW
4
are introduced and new encryption technique is provided.
Daojing Li and Bo Zhang [9] have implemented a Dual Watermarking method
founded on ambit Cryptography (DWTC) for Web information to solve the cruxes of
robustness and invisibleness. This job founded on ambit cryptography, watermarking
method can improve the toughness.
MercanTopkara et al. [10] has suggested a natural language watermarking by
applying sentence composition to apply a watermark The text phrase compositions
such as characters, words and lines were modified to apply the necessary information.
The authors provide an audit of governing status of the efficiency in natural language
watermarking, tools and techniques for text processing..
Chapter 3
Digital Watermarking Technique
A digital watermarking is a signal or string embedded in a noise-tolerant . this
signals is usually identified the proprietary right of the copyright of these signal. It is
developed different embedded algorithm. Various categories of algorithm or technique
are derived or generated by the researchers.
3.1
3.1.1
Document Structure
Space Coding
Two types of space coding are mainly used.1) Line space 2) Word space. 1) Line
space is worked with space between two adjacent rows of paragraph.Watermark is
embedded by tactfully changing the space of the adjacent line. Though having strong
robustness and difficult to track watermark, the capacity of the watermark is very
small and difficult for visualization. 2)The next one more technique is Word Space
Coding.Word coding working with Horizontally movement of the word.Word Coding
Shifting same row left or right.Invisible coding is a part of space coding.Watermark
data is attached at the line break. But it is difficult for visualization whether it is
tab or space at the end of the line.
5
CHAPTER 3. DIGITAL WATERMARKING TECHNIQUE
3.1.2
6
Feature Coding
In Feature coding change features font-family, indent, color, text-style and font-style.
the effect of this different types of information and information capacity of the watermark are widely popular than another space coding.
3.2
3.2.1
Natural Language Processing
Synonym substitution
Synonym substitution most popularly and simple watermark embedded technique.
This techniques special thing is it can not change the meaning of sentence. and use
synonyms word with instead of word .
3.2.2
Syntactic transformation
Syntactic transformation work with syntactic transformation of sentence.quietly we
make some small changes in meaning of sentence.To make a sentence Active ,passive
, slicing a sentence, placing topic at the beginning of the sentence is some approach
of syntactic transformation.
3.2.3
Semantic transformation
Data representation is change from one model to another using Semantic information
for watermarking different types of semantics technique are used.To make same meaningful sentence replace the word instead of same word or phrase are few approaches
of semantic transformations.
Chapter 4
Methodology
To share the information on the internet because of that illegal data,copy data that
types issues are generated for owner of data and writers also.To protect the original
authorship and copyright, the necessity of digital text watermarking is raising upward. It is protect authorship and copyright along with the original form of data, a
robust strong watermarking algorithm is needed. In this paper, a strong and more
robust algorithm is proposed for digital watermarking based on Natural Language
Processing technique. The proposed algorithm work with Parts of Speech (POS)
tags.This method scarp a web page and sum of number of total occurrences of modal
verbs, prepositions, conjunctions and articles of the text document. Then convert the
number of occurrences into binary and concatenate this binary number with author’s
ID. this is is approved id ,which is done bye owner himself
4.1
Mathematical Formula
Let n(p) = Number of total occurrences of preposition in the text document.
n(c) = Number of total occurrences of conjunction in the text.
n(mv) = Number of total occurrences of modal verb in the text.
n(a) = Number of total occurrences of article in the text.
AuthID = Author’s ID
7
CHAPTER 4. METHODOLOGY
8
Step-1
Key =
n
n X
n(p) + n(c) + n(mv) + n(a) + AuthID
o
(4.1)
length=1
Step-2
Key = (Key)binary
RSA algorithm is applied to this combined key for final encryption to generate
the watermark. This algorithm is implemented in python language.
Figure 4.1: Create a Security key using algorithm
Chapter 5
Result
This paper proposed algorithm on three web pages content and generate watermarking. Then is shown below.
5.1
Case study 1
Find the web page on the internet and scrap the web page and get only content.On
that contents algorithm apply and this algorithm create a unique key, which is show
the watermark because of that key we find copyrights contents.that is the result of
case study 1. A unique key is generated by applying our method. Figure 1, Figure 2
shows the process
5.2
Case study 2
We use another web page for our experiment. Second web page:
https://nytcrossword.com/tag/it-may-allow-a-textdocument-to-be-displayed-on-aweb-page-crosswordclue
Create a Security key using algorithm Figure 5, Figure 6 show the process of the
algorithm. Figure 6 shows the encrypted key generated by applying our algorithm.
9
CHAPTER 5. RESULT
10
Chapter 6
Conclusion and Future Scope
In this paper author find new watermarking techniques.which is apply on English
language watermark and some grammatical rules.many web pages applied this
technique for secure watermarking. This encrypted watermark of this algorithm is
more secure and help to more protect authorship and copyrights.in this paper
elaborated the algorithm, in that algorithm no of watermark converted to
binary,after that apply on RSA algorithm and make robust key.In This paper
watermark technique apply on English language.English language is more
compatible to other language .Our proposed algorithm is implemented in python
because python is more compatible to natural language. In the future, author will
try to make consistent with other languages (especially Latin) with this algorithm.
11
Bibliography
[1] Wang Zhigang, Rearch of Watermarking algorithm for WORD Document,China
Science and Technology Information, Mar.2010, pp.114.
[2] Makarand L. Mali, Nitin N. Patil, J. B. Patil, Implementation of Text Watermarking Technique Using Natural Language Watermarks, 2013 International Conference
on Communication Systems and Network Technologies.
[3] hen Li, You Fucheng, The Study on Digital Watermarking Based on Word document, 2013 International Conference on Mechatronic Sciences
[4] hen Qing, Zhou Limin, The Research of Digital Watermarking Algorithm Based
on WORD Document Image Processing, 2010, pp. 271–350. Yingli Zhang, Huaiqing
Qin
[6] ingli Zhang, Huaiqing Qin, A Novel Robust Text Watermarking For Word Document 3rd International Congress on Image and Signal Processing,Vol. 1, pp. 38-42,
October 2010.
[7] . Culnane, H. Treharne, and A.T.S. Ho, Improving Multi-Set Formatted Binary Text Watermarking Using Continuous Line Embedding, in Proceedings of
IEEE International Conference on Innovative Computing, Information and Control (ICICIC-07), Kumamoto, Japan, pp. 287-29, 2007. on Multimedia Information
Networking and Security
[8] ianghe Jing, Yu Hao, HuapingFei, Zhijun Li,Text Encryption Algorithm Based on
Natural Language Processing, 2012 Fourth InternationalConference
12
BIBLIOGRAPHY
13
[9] aojing Li and Bo Zhang, DWTC: A Dual Watermarking Scheme Based on Threshold Cryptography for Web Document, 2010 International Conference on Computer
Application and System Modeling (ICCASM 2010)
[10] . Topkara, New Designs for Improving the Efficiency and Resilience of Natural
Language Watermarking, PhD Thesis, Purdue University, WestLafayette, Indiana,
2007t.
Download