Uploaded by Kalyan Vedantam

D6 MajorDocumentation

advertisement
IMAGE PROCESSING AND FORENSIC VERIFICATION OF
FAKE VIDEOS/IMAGES
Thesis/ Dissertation submitted in the partial fulfilment of the requirements for the award of the
degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE & ENGINEERING
By
T. Srikanksha
17K91A05M3
MD Nisha
18K95A0525
V. Kalyan
17K91A05N2
T. Karthik
17K91A05M2
Under the Guidance of
G. Anantha Laxmi
Assistant Professor
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
TKR COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
(Accredited by NBA and NAAC with ‘A’ Grade)
Medbowli, Meerpet, Saroornagar, Hyderabad-500097
DECLARATION BY THE CANDIDATE
We, Ms.T. Srikanksha , bearing Roll No: 17K91A05M3, Ms.MD Nisha ,
bearing Roll No: 18K95A0525, Mr.V.Kalyan, bearing Roll No: 17K91A05N2,
Mr.T. Karthik, bearing Roll No:17K91A05M2, hereby declare that the project
report entitled “IMAGE PROCESSING AND FORENSIC VERIFICATION
OF FAKE VIDEOS/IMAGES” under the guidance of Mr.G.Anantha Laxmi,
Professor in Department of Computer Science & Engineering submitted in partial
fulfilment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science & Engineering.
By
T. Srikanksha (17K91A05M3)
MD Nisha (18K95A0525)
V. Kalyan (17K91A05N2)
T. Karthik(17K91A05M2)
CERTIFICATE
This is to certify that the project report entitled “IMAGE PROCESSING AND
FORENSIC VERIFICATION OF FAKE VIDEOS/IMAGES” being
submitted by Ms.T. Srikanksha(17K91A05M3),Ms.MD Nisha(18K95A0525),
Mr.V.Kalyan(17K91A05N2), Mr.T. Karthik(17K91A05M2) in partial
fulfilment of requirements for the award of degree of Bachelor of Technology
in Computer Science & Engineering, to the Jawaharlal Nehru Technological
University is a record of bonafide work carried out by them under my guidance
and supervision.
Signature of the Guide
Signature of the HOD
G. Anantha Laxmi
Assistant Professor
Dr.A. Suresh Rao
Professor
Signature of the External Examiner
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompanies the successful completion of any task would
be incomplete without the mention of the people who made it possible and whose
encouragement and guidance have crowned my efforts with success.
We are indebted to the Internal Guide, Mrs. G. Anantha Laxmi Professor, Department
of Computer Science & Engineering, TKR College of Engineering and Technology, for his
support and guidance throughout our major project.
We are also indebted to the Head of the Department, Dr.A.Suresh Rao, Professor,
Computer Science & Engineering, TKR college of Engineering & Technology, for his support
and guidance throughout our major project.
We extend my deep sense of gratitude to the Principal, Dr. D. V. Ravi Shankar TKR
college of engineering & technology, for permitting us to undertake this major project.
Finally, we express our thanks to one and all that have helped us in successfully completing
this Major Project. Furthermore, we would like to thank our families and friends for their moral
support and encouragement.
By,
T.Srikanksha(17K91A05M3)
MD Nisha (18K95A0525)
V. Kalyan(17K91A05N2)
T.Karthik(17K91A05M2)
CONTENTS
Abstract
i
List of Figures
List of Screens
Symbols and Abbreviations
ii
iii
iv
S.No
TOPIC NAME
PAGE.NO
1.
Introduction
1.1 Motivation
1.2 Problem Statement
1.3 Limitations of Problem Statement
1.4 Proposed System
1-2
1
1
1
2
2.
How Deep Fake Works
2.1 Deepfake Introduction
2.2 Deepfake Creation
2.3 Overview
3-7
3-4
4-6
7
3.
Literature Survey
8-9
4.
Requirement Analysis
4.1 Functional Requirements
4.2 Non-Functional Requirements
4.3 Software Requirements
4.4 Hardware Requirements
10-12
10
11
11
12
5.
Design
5.1 System Architecture
5.2 Meso4
13-15
13-14
14-15
6.
Coding
6.1 Datasets Download
6.2 Coding-Mesonet Neural Network
16-20
16
16-20
7.
Implementation
7.1 Implementation
7.2 Results and Output Screens
21-27
21-24
25-27
8.
Testing
8.1 Software Validation
8.2 Software Verification
8.3 Target of the Test are
8.4 Black-Box Testing
8.5 White-Box Testing
8.6 Testing Values
28-30
28
28
28-29
29
29
29-30
9.
Advantages, Disadvantages, Applications
9.1 Advantages
9.2 Disadvantages
9.3 Applications
31-32
31
31
32
10. Conclusion and Future Enhancement
10.1 Conclusion
10.2 Future Enhancement
References
33-34
33
34
35-36
ABSTRACT
Deep learning has been successfully applied to resolve various complex problems
ranging from big data analytics to computer vision and human-level control. Advances in deep
learning, however, have also been used to create software that may cause threats to privacy,
democracy, and national security. One such deep learning application is the “deep fake”. Deep
fake algorithms can create fake images and videos that humans cannot distinguish from
authentic ones. It is therefore essential to propose technologies for automatically detecting and
assessing the integrity of digital visual media. This project deals with the methods to detect
deep fakes in the literature to date. We present in-depth discussions on the challenges, research
trends, and orientations related to deep fake technologies. Deep fake detection methods were
proposed as soon as this threat was introduced. The early attempts were based on handcrafted
features obtained from artifacts and inconsistencies in the fake video synthesis process. In this
project, we will apply deep learning techniques to automatically pull out salient and
discriminating characteristics to detect deep fakes. Detection of deep fake is normally
considered a binary classification problem where classifiers are used to classify between
genuine and forged videos. This type of method requires a large database of real and fake
videos to form classification models. The number of fake videos is increasingly available, but
it remains limited in terms of reference to validate the various detection methods. By reviewing
the knowledge of deep fakes and state-of-the-art deep fake detection methods, this study
provides a complete overview of deep fake techniques and facilitates a new and more robust
method to deal with the increasingly challenging deep fakes.
i
LIST OF FIGURES
Fig No.
Title
Page No.
2.1
Deep fake principle
5
2.2
Example of Deep fake image
6
5.1
System Architecture
13
5.2
The Network Architecture of Meso4
15
7.1
Convolutional and Hidden Layers of Meso4 Network Model
21
ii
LIST OF SCREENS
Screen No.
Title
Page No.
7.1
Correct_real Images
25
7.2
Misclassified_real Images
26
7.3
Correct_deepfake Images
27
7.4
Misclassified_deepfake Images
27
iii
SYMBOLS & ABBREVIATIONS
Acronym
CV
CNN
DARPA
Expansion
Computer Vision
Convolutional Neural Network
Defense Advanced Research Projects Agency
AI
Artificial Intelligence
DF
Deep Fakes
IDE
Integrated Development Environment
RAM
Random Access Memory
iv
CHAPTER 1
INTRODUCTION
1.1 MOTIVATION
Digital video is commonly used by many organizations as evidence for crimes. Many
surveillance systems record the information using cameras. These sorts of footage are made
fake by some criminals. Deepfake videos are also created to spread fake news around the
world which leads to political, economical loss. To supply a satisfactory solution for this
problem fake video detection concept is introduced. In this concept, we use a set of
techniques to detect the images/videos are deep fake /real.
1.2 PROBLEM STATEMENT
Cybercriminals are using Image processing tools and techniques for producing a variety of
crimes, including Image Modification, Fabrication using Cheap & Deepfake
Videos/Images.
The solution should focus on helping the Image/Video verifier/examiner find out and
differentiate a fabricated Image/Video from an original one.
1.3 LIMITATIONS OF EXISTING PROBLEM SYSTEM
In the existing system a group of statistical tools for detecting traces of digital tampering in
the absence of any digital watermark or signature. The nature of statistical correlations that
result from specific forms of digital tampering, and have device detection schemes to reveal
these correlations. The tools that, in the same spirit as those presented here reveal statistical
correlations that result from a variety of different manipulations that are typically necessary
to create a digital forgery. Analyzing/check the sensitivity and robustness to counter-attack
of each of the schemes outlined/profile. While digital forensic techniques are designed to
identify digital forgeries or fake even when the forgery is perceptually undetectable by
humans.
Dept of CSE
1
TKRCET
1.4 PROPOSED SYSTEM
Our project aims at detecting the realistic human synthesized videos popularly known as
deep fakes. This project deals with the methods to detect deep fakes in the literature on the
date. In this project, we will apply deep learning techniques to automatically extract salient
and discriminative features to detect deep fakes. Deepfake detection is generally deemed a
binary classification problem where classifiers are wont to classify between authentic
videos and tampered ones. This kind of method requires a large database of real and fake
images /videos to train classification models.
Traditional image forensics methods can be classified according to the image
features that they target, such as local noise estimation, pattern analysis, illumination
modeling, and feature classification. However, with the deep learning breakthrough, the
computer vision (CV) community has radically steered towards neural network techniques.
For example, the recent works are based on Convolutional Neural Networks (CNN). These
CNN-based approaches also aim to capture the aforementioned image features, but in an
explicit way.
We proposed a technique that uses Meso4 a Convolutional Neural Network.
Mesonet is a Convolutional Neural Network exactly designed to detect Deepfakes. We use
Mesonet to do predictions on image data. It classifies our image data as real/deep fake.
Dept of CSE
2
TKRCET
CHAPTER 2
HOW DEEP FAKE WORKS
2.1 DEEP FAKE INTRODUCTION
Deep fake is a technique that can superimpose faces, images of a target person to a video
of a source person to create a video of the person doing or saying things the source person
does. Deep learning models such as autoencoders and generative adversarial networks are
applied widely in the computer vision domain to solve various problems. These models are
also used in deep fake algorithms to look into facial expressions and movements of a person
and combined facial images of another person making related expressions and movements.
Deep fake algorithms are normally essential for a large amount of image and video
data to train models to create photo-realistic images and videos. In the public sector such
as celebrities and politicians, they have a large number of videos and images available
online, they are initial targets of deep fakes.
Deep fakes are used to swap faces of celebrities or politicians to bodies in porn images
and videos. The first deep fake video came out in 2017 where the face of a celebrity was
swapped to that of a porn actor.
It is scary to world security when deep fake methods can be employed to create videos
of world leaders with fake speeches for falsification purposes. Deep fakes are causing
abused political or religious tensions between countries, by creating a piece of fake news
to fool the public and affect results in election campaigns or create chaos in financial
markets. Deepfake technique is also used to generate fake satellite images of the Earth to
hold objects that do not exist to confuse military analysts, e.g., creating a fake bridge over
a river although there is no such a bridge in reality. It is a piece of fake news that has been
guided to cross the bridge in a battle.
There are also possibilities of deep fakes such as creating voices or images of those
who have lost theirs or updating episodes of movies without reshooting them. However,
the number of malicious uses of deep fakes mostly dominates that of the positive ones. The
development of advanced deep networks and the availability of a large amount of data may
help to make the forged images and videos almost indistinguishable to humans
Dept of CSE
3
TKRCET
and even to elaborate computer algorithms. The process of creating those manipulated
images and videos is very simple today as it needs as little as an identity photo or a short
video of a target person.
People may have less and less effort is required to produce stunningly convincing
tempered footage. This technique can even create a deep fake with just a still image.
Deepfakes can be a threat it is not only affecting public figures, but it also affects ordinary
people. For example, a voice deep fake was used to scam or blackmail a CEO out of
$243,000. A recent release of the software called Deep Nude shows more distributing
threats can be transforming a person into non-consensual porn.
Likewise, the Chinese app Zao has gone viral lately as less-skilled people can swap
their faces onto bodies of movie stars and insert themselves into popular movies and TV
clips, and short videos. These forms of falsification create a huge threat to violation of
privacy and identity and affect many aspects of human lives. The critical part is to find the
truth in the digital sector. Finding is even more challenging when dealing with deep fakes
as they are majorly used to serve harmful purposes and almost anyone can create deep fakes
these days using existing deep fake tools. Thus far, there have been a lot of methods
proposed to detect deep fakes. Most of the methods are based on deep learning, and thus a
clash between malicious and positive uses of deep learning methods has been arising. To
address the threat of face-swapping technology or deep fakes, the United States Defense
Advanced Research Projects Agency (DARPA) begins a research scheme in media
forensics (named Media Forensics or MediFor) to stimulate the development of fake digital
visual media detection methods.
Recently, Facebook Inc. grouping up with Microsoft Corp and the Partnership on AI
coalition has launched the Deepfake Detection Challenge to create more research and
development in detecting and stopping deep fakes from being used to cheat viewers. This
paper presents a survey of methods or types for creating and as well as detecting deep fakes.
In Section 2, we are presenting the principles of deep fake algorithms and how deep
learning has been used to create or enable such disruptive technologies.
2.2 DEEPFAKE CREATION
Deepfake is a technique that aims to exchange the face of a targeted person with the face
of someone else in a video. It first appeared in autumn 2017 as a script used to generate
Dept of CSE
4
TKRCET
face-swapped adult content. Afterward, this technique was improved by a little community
to notably create a user-friendly application called FakeApp.
The core idea lies in the parallel training of two autoencoders. Their architecture
can vary consistent with the output size, the specified training time, the expected quality
and the available resources. Traditionally an auto-encoder designates the chaining of an
encoder network and a decoder network. The encoder aims to perform a dimension
reduction by encoding the info from the input layer into a reduced number of variables. The
goal of the decoder is then to use those variables to output and approximation of the first
input. The optimization face is done by comparing the input and its generated
approximation and panelizing the difference between the two, typically using an L2
distance. In case of the Deep fake technique, the original auto-encoder is fed with images
of resolutions 64x64x3 =12,288 variables, encodes those images on 1024 variables then
generates images with an equivalent size as the input.
The process to get Deep fake images is to collect aligned faces of two different
people A and B, then to train an auto-encoder EA to reconstruct the faces of A from the data
of facial images of A, and an auto-encoder EB to reconstruct the faces of B from the dataset
of facial images of B. The trick consists of sharing the weights of the encoding a part of the
two auto-encoders EA and EB, but keeping their respective decoders separated. Once the
optimization is done, any image containing a face of A is often encoded through this shared
encoder but decoded with a decoder of EB. This principle is illustrated in Figure 2.1 & 2.2
Fig 2.1 Deepfake principle. Top: the training parts with the shared encoder in yellow.
Bottom: the usage part where images of A are decoded with the decoder of B
Dept of CSE
5
TKRCET
The intuition behind this approach is to possess an encoder that privileges to encode
general information of illumination, position and expression of the face and a dedicated
decoder for every person to reconstitute constant characteristic shapes and details of the
person face. This might thus separate the contextual information on one side and therefore
the morphological information on the opposite.
In practice, the results are impressive, which explains the popularity of the
technique. The last step is to require the target video, extract and align the target face from
each frame, use the modified auto-encoder to get another face with the same illumination
and expression, then merge it back in the video.
Fig 2.2 Example of deepfake image. Original(left) and Deepfake(right)
Fortunately, this system is way from flawless. Basically, the extraction of faces and
their reintegration can fail, especially within the case of face occlusions: some frames can
end up with no facial reenactment or with an outsized blurred area or a doubled facial
contour. However, those technical errors can easily be avoided with more advanced
networks.
More deeply, and this is often true for other applications, autoencoders tend to
poorly reconstruct fine details due to the compression of the input file on a limited encoding
space, the result thus often appears a touch blurry. A larger encoding space doesn’t work
properly since while the fine details are certainly better approximated, on the opposite hand,
the resulting face loses realism because it tends to resemble the input face, i.e.,
morphological data are passed to the decoder, which may be undesired effect.
Dept of CSE
6
TKRCET
2.3 OVERVIEW
Deep Fake is a type of artificial intelligence used to create a convincing image, audio, and
video hoaxes. The term, which represents both the technology and the resulting fake
content, is a portmanteau of deep learning and fake. There are positive uses for deep fake
technology like making digital voices for people who lost theirs or updating movie videos
instead of reshooting them if actors trip over their lines. There has been tremendous
progress in the quality of deep fakes since only two or three years ago when the first
products of the technology spread. Since that time, many of the scariest examples of
artificial intelligence (AI)- enabled deep fakes have technology leaders, governments, and
media talking about it could create for communities.
Dept of CSE
7
TKRCET
CHAPTER 3
LITERATURE SURVEY
The explosive growth in deep fake video and its illegal use is a major threat to public trust,
justice, and democracy. Due to this there is increase in demand for fake video analysis,
detection and intervention.
Reference 1:
Title: MesoNet: a Compact Facial Video Forgery Detection Network
Author Names: Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen
Description: This paper presents a method to automatically and efficiently setect face
tampering in videos, particularly focuses on two recent techniques used to generate hyperrealistic forged videos: Deepfake and Face2Face. Traditionally image forensics techniques
are usually not well suited to videos to the compression that strongly degrades the data.
Thus, this paper follows a deep learning approach and presents two networks Meso4 and
MesoInception4, both with a low number of layers to focus on the mesoscopic properties
of images.
Reference 2:
Title: Exposing DeepFake Videos By Detecting Face Warping Artifacts
Author Names: Yuezun Li, Siwei Lyu
Description: In this work, we describe a new deep learning based method that can
effectively distinguish AI generated fake videos (DeepFake) from real videos. It uses an
approach to detects artifacts by comparing the generated face areas and their surrounding
regions with a dedicated Convolutional Neural Network model. In this work there were
two-fold of Face Artifacts. Their method is predicted on the observations that current DF
algorithm can only generate images of limited resolutions, which are then needed to be
further transformed to match the faces to be replaced in the source video.
Dept of CSE
8
TKRCET
Reference 3:
Title: Deepfake Video Detection using Neural Networks
Author Names: Abhijit Jadhav, Abhishek Patange, Jay Patel, Hitendra Patil, Manjushri
Mahajan
Description: This paper discuss the, free deep learning based software tools has facilitated
the creation of incredible face exchanges in videos that leave few traces of manipulation,
called “DeepFake”(DF). Recent advances in deep learning have led to a drastic increase
within the realism of fake content and the accessibility in which it can be created. Creating
deepfake is easy but, when it involves detection of this DF, it’s major challenge. We have
taken a breakthrough in detecting the DF using Convolutional Neural Network and
Recurrent neural Network. System uses convolutional neural networks to extract features
at the frame level. These features are used to train a recurrent neural networks which learns
to classify whether the video is fake /real.
Reference 4:
Title: Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos
Author Names: Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Description: This paper discuss a way where it uses a capsule network to detect various
kinds of spoofs, from replay attacks using printed images or recorded videos to computer
generated videos using deep convolutional neural networks(CNN).
Advancing techniques in deepfake creation is creating a huge problem where people are in
dilemma to identify the real/fake news. A deep learning technique is used. We discuss an
approach to detect the data is real/fake. Here we took a large database of Deepfake and Real
images. We do the predictions on the image data. We use Meso4 model. Meso4 - a
convolutional neural network with 4 Convolutional blocks followed by one full connected
hidden layer.
Dept of CSE
9
TKRCET
CHAPTER 4
REQUIREMENT ANALYSIS
In this phase, requirements are gathered and analyzed. This phase is the main focus of the
users and registered accounts. Meetings with the users and registered people determine the
requirements like Who is going to use the system? How will they use the system? What
data should be input into the system? What data should be output by the system? These are
the general questions that get answered during a requirement gathering phase. This
specifies the requirements that our project should achieve.
After requirement gathering, these requirements are analyzed for their validity and
the possibility of incorporating the requirements in the system to be developed is also
studied. As a basis, an article on all the different requirements for software development
was taken into account during the process.
4.1 FUNCTIONAL REQUIREMENTS
These are the requirements that the end-user specifically requests as basic facilities
that the system should provide. All these functions must be included in the system as part
of the contract. These are represented or indicated in the form of an input to the system, the
operation carried out and the expected output. These are the requirements stated by the user
which one can see directly in the final product, unlike the non-functional requirements.
● Dataset Collection
● Training dataset
● Testing dataset
● User video upload
● Pre-processing
● Data Loader
● RestNextCNN for Feature Generation
● LSTM for Sequence Processing
● Prediction
Dept of CSE
10
TKRCET
4.2 NON-FUNCTIONAL REQUIREMENTS
These are basically the quality constraints that the system must satisfy according to
the project contract. The priority or extent to which these factors are implemented varies
from one project to another. They are also called non-behavioral requirements
They basically deal with issues like:
● Portability
● Security
● Maintainability
● Reliability
● Scalability
● Performance
● Reusability
● Flexibility
4.3 SOFTWARE REQUIREMENTS
Software requirements should include both a definition and a specification of
requirements. The software requirements are providing a basis for creating the software
specification. Software requirements are useful in estimating cost, planning team activities,
performing functions and tracing the teams and tracing the team’s progress throughout the
development activity.
OS
:
Windows/Linux/Mac
Programming Language/Platform
:
Python
IDE
:
Google Colab (Web IDE)
Python Libraries
:
OpenCV, Matplotlib, Tensorflow
Dept of CSE
11
TKRCET
4.4 HARDWARE REQUIREMENTS
The hardware requirements may work on the basis for a contract for the
implementation of the system and should therefore be a complete and consistent or
compatible specification of the whole system. These are used by software engineers as the
starting point for the system design.
Processor
:
Intel i3 and above
RAM
:
8GB and Higher
Hard Disk
:
500GB Minimum
Dept of CSE
12
TKRCET
CHAPTER 5
DESIGN
5.1 SYSTEM ARCHITECTURE
Fig 5.1 System Architecture
5.1.1 Data Sets
We have collected a database named deepfake_detection which consists of deepfake, real
images folders. The database is huge. It consists of 7104 images. In which major part of
the data is used for training the model and minor part of the dataset is used for testing the
model.
5.1.2 Pre-processing
Dataset pre-processing includes splitting of the video into frames(images). Followed by the
face detection and cropping frame with the detected face. To keep the number of
Dept of CSE
13
TKRCET
images uniform, the average of the video dataset is calculated and the new processed
reframed dataset is created containing the images. The frames that don’t have faces in them
are ignored during pre-processing.
Due to the unavailability of the required GPU, we proceeded with our project using image
datasets. We collected a huge dataset of deepfake and real images.
5.1.3 Model
The Data Loader loads the pre-processed face cropped videos/images and splits the
videos/images into a train and test set. Further, the images of the processed videos are
passed to the model for training and mini-batch testing.
5.1.4 Prediction
Our system takes a batch of images as input, and does predictions on the image data. It
predicts images are deepfake/real.
5.2 MESO4
This network begins with a sequence of four layers of successive convolutions and pooling,
and is followed by a dense network with one hidden layer. To improve generalization, the
convolutional layers use ReLU activation functions that introduce non-linearities and Batch
Normalization to regularize their output and prevent the vanishing gradient effect, and the
fully-connected layers use Dropout to regularize and improve their robustness.
Sigmoid function:
It is a logistic function, a non-linear activation function. The main reason we use sigmoid
function is because it’s value exists between 0 to 1.
Dept of CSE
14
TKRCET
Fig 5.1 The network architecture of Meso-4. Layers and parameters are displayed in the
boxes, output sizes next to the arrows.
Dept of CSE
15
TKRCET
CHAPTER 6
CODING
6.1 DATASETS DOWNLOAD
To begin with, let’s import our datasets.
#download the dataset deepfake_database.zip
#!gdown
https://e.pcloud.link/publink/show?code=XZnsxkZkEAgI1Og
QIJHLnNl9ErhV4vpHuV0
In our project we are using the datasets of folder Validations. Validations folder consist of
deepfakes and real folders. Download all the images in the two folders. Upload these
folders into google drive.
6.2 CODING – MESONET NEURAL NETWORK
Mount the google drive in which dataset is present.
Import all the required libraries
Image dimensions are height, width, channels (red, green, blue).
Dept of CSE
16
TKRCET
Creating a classifier class
Creating Mesonet network which consists of 4 layers of successive Convolutional Neural
layers and pooling and is followed by a dense network with one hidden layer.
Dept of CSE
17
TKRCET
Instantiating Mesonet model with pretrained weights
Rescaling pixel values (between 1 and 255) to a range between 0 and 1. Instantiating
generator to feed images through the network.
Generating class indices. Found two classes Deepfake as 0 index value and Real as 1 index
value.
Doing predictions on image data.
Dept of CSE
18
TKRCET
Creating separate lists for correctly classified and misclassified images.
The lists are correct_real, misclassified_real, correct_deepfake, misclassified_deepfake.
Generating predictions on validation set, storing in separate lists. Our database consists of
7104 images. Predictions are done on all the images.
Dept of CSE
19
TKRCET
Uses plotter function, which takes a batch of images as input and does predictions on
images.
Dept of CSE
20
TKRCET
CHAPTER 7
IMPLEMENTATION AND RESULTS
7.1 IMPLEMENTATION
7.1.1 Convolution in Convolutional Neural Networks:
Below are the layers of Meso4 model. Meso4 model has 4 convolutional layers and a hidden
dense layer.
Fig 7.1 Convolutional and Hidden layers of Meso4 Network Model.
The convolutional neural network, or CNN for short, is a specialized type of neural network
model designed for working with two-dimensional image data, although they can be used
with one-dimensional and three-dimensional data.
Central to the convolutional neural network is the convolutional layer that gives the
network its name. This layer operates called a “convolution”. In the context of a
convolutional neural network, convolution is a linear operation that involves the
multiplication of a set of weights with the input, much like a traditional neural network.
Given that the technique was designed for two-dimensional input, the multiplication is
performed between an array of input data and a two-dimensional array of weights, called a
filter or a kernel.
The filter is smaller than the input data and the type of multiplication applied
between a filter-sized patch of the input and the filter is a dot product. A dot product is an
element-wise multiplication between the filter-sized patch of the input and filter, which is
then summed, always resulting in a single value. Because it results in a single value, the
Dept of CSE
21
TKRCET
operation is often referred to as the “scalar product”. Using a filter smaller than the input is
intentional as it allows the same filter (set of weights) to be multiplied by the input array
multiple times at different points on the input. Specifically, the filter is applied
systematically to each overlapping part or filter-sized patch of the input data, left to right,
top to bottom. This systematic application of the same filter across an image is a powerful
idea. If the filter is designed to detect a specific type of feature in the input, then the
application of that filter systematically across the entire input image allows the filter an
opportunity to discover that feature anywhere in the image. This capability is commonly
referred to as translation invariance, e.g., the general interest in whether the feature is
present rather than where it was present.
x1 = Conv2D(8, (3, 3), padding='same', activation = 'relu')(x)
x1 = BatchNormalization()(x1)
x1 = MaxPooling2D(pool_size=(2, 2), padding='same')(x1)
x2 = Conv2D(8, (5, 5), padding='same', activation = 'relu')(x1)
x2 = BatchNormalization()(x2)
x2 = MaxPooling2D(pool_size=(2, 2), padding='same')(x2)
x3 = Conv2D(16, (5, 5), padding='same', activation = 'relu')(x2)
x3 = BatchNormalization()(x3)
x3 = MaxPooling2D(pool_size=(2, 2), padding='same')(x3)
x4 = Conv2D(16, (5, 5), padding='same', activation = 'relu')(x3)
x4 = BatchNormalization()(x4)
x4 = MaxPooling2D(pool_size=(4, 4), padding='same')(x4)
y
y
y
y
y
y
=
=
=
=
=
=
Flatten()(x4)
Dropout(0.5)(y)
Dense(16)(y)
LeakyReLU(alpha=0.1)(y)
Dropout(0.5)(y)
Dense(1, activation = 'sigmoid')(y)
Conv2D: Conv2D is a 2D Convolutional layer, which creates a convolution kernel that is
wind with layers input which helps produce a tensor of outputs. Mandatory Conv2D
parameter is the number of filters that convolutional layers will learn. It is an integer value
and also determines the number of output filters in the convolution. In the first Conv2D
layer we used 8 filter.
Kernal: In image processing, kernel is a convolution matrix which can be used for
blurring, sharpening, embossing, edge detection, and more by doing a convolution between
a kernel and an image. The 3x3 determines the dimensions of the kernel. It is a tuple of 2
integers specifying the height and width of the 2D convolution window.
Dept of CSE
22
TKRCET
Padding: Padding parameter of Keras Conv2D class can take one of the two values: ‘valid’
or ‘same’. We can preserve spatial dimensions of the volume such that the output volume
size matches the input volume size, by setting the value to the ‘same’.
Activation: The activation parameter to the Conv2D class is simply a convenience
parameter which allows you to supply a string, which specifies the name of the activation
function you want to apply after performing the convolution.
Batch Normalization: It allows every layer of the network to do learning more
independently. It is used to normalize the output of the previous layers. Using batch
normalization learning becomes efficient also it can be used as regularization to avoid
overfitting of the model.
Maxpooling: Maxpooling is to reduce the spatial dimensions of the output volume. In the
pooling layer we significantly reduce the dimensionality of our data which greatly speeds
up the computation. Maxpooling- which means we reduce a region of pixel values to that
region’s max value.
Dropout: To prevent overfitting we use dropout layer. It dropouts some neurons and then
neurons will become inactive & by doing backpropagation it will check with only active
features.
Sigmoid: It is a logistic function, a non-linear activation function. The main reason we
use sigmoid function is because it exists between 0 to 1.
Let’s train our mesoscopic models. We’ll start with the Meso4 model.
The architecture is incredibly simple, involving just four sets of convolutions. Note that
as this model hasn’t been pre-trained, it’s reasonable that we train it over more iterations
to compensate — this was achieved by training at a learning rate of 0.002 for 30 epochs,
followed by another 20 epochs at a lower learning rate of 2E-4. This difference helps to
balance convergence with overall training time.
Dept of CSE
23
TKRCET
With a validation accuracy approaching 70%, a good starting point. The difference in
performance can be attributed to a multitude of factors, but the latter network’s use of
so-called “Inception” blocks stands out. Essentially, these self-contained blocks contain
convolutional layers with small sized filters in parallel, with the results being pooled and
concatenated at the end of the block.
Dept of CSE
24
TKRCET
7.2 RESULTS AND OUTPUT SCREENS
We created four lists,
 correct_real
 misclassified_real
 correct_deepfake
 misclassified_deepfake.
Plotter function takes a batch of images as input and does predictions. In the first output
it selects a batch of images from real dataset and does predictions. It gives out the
images whose prediction value is 0.6000 to 1.
Output Screen1: Correct_real images
Dept of CSE
25
TKRCET
It selects a batch of images from real dataset and does predictions. It gives the
misclassified_real images whose prediction value is 0.0001 to 0.5999.
Output Screen 2: Misclassified_real images
It selects a batch of images from fake dataset and does predictions. It gives the fake
images whose prediction value is 0.0001 to 0.5999
Dept of CSE
26
TKRCET
Output Screen 3: Correct_fake images
It selects a batch of images from fake dataset and does predictions. It gives the
misclassified_fake images whose prediction value is 0.5000 to 0.5999
Output Screen 4: Misclassified_fake images
Our model gives 70% of accuracy. So, some images maybe misclassified.
Dept of CSE
27
TKRCET
CHAPTER-8
TESTING
Software Testing is the evaluation of the software, from the requirements gathered from
users and system specifications. Testing is conducted based on the software development
life cycle. Software is validated and verified in the process of testing.
8.1 SOFTWARE VALIDATION
Software validation is the process of examining whether the software satisfies the
user requirements or not. It is carried out at the end of the software development life cycle.
 Validation ensures the product under development is as per the user requirements.
 Validation answers the question – "Are we developing the product which attempts
all that user needs from this software?"
 Validation emphasizes user requirements.
8.2 SOFTWARE VERIFICATION
It would imply verifying if the specifications are met by running the software but
this is not possible (e. g., how can anyone know if the architecture/design/etc. are correctly
implemented by running the software?). Only by reviewing its associated artifacts,
someone can conclude if the specifications are met.Verification is the process of confirming
if the software is meeting the business requirements and is developed adhering to the proper
specifications and methodologies
 Verification ensures the product being developed is according to design
specifications.
 Verifications concentrate on the design and system specifications.
8.3 TARGET OF THE TEST ARE
The target of testing is to determine the errors, faults and failures in the software
 Errors -These are actual coding mistakes made by developers. In addition, there is
a difference in output of software and desired output, considered as an error.
 Fault - When error exists fault occurs. A fault, also known as a bug, is a result of an
error which can cause the system to fail.
Dept of CSE
28
TKRCET
 Failure - failure is said to be the inability of the system to perform the desired task.
Failure mostly occurs due to the fault in systems.
 Software testing is further classified into different types such as unit testing,
integration testing, system testing and regression testing.
8.4 BLACK-BOX TESTING
Black Box Testing is a software testing method in which the internal structure/
design/ implementation of the item being tested is not known to the tester.It is carried out
to test functionality of the program. It is also called ‘Behavioural’ testing. The tester in this
case, has a set of input values and respective desired results. On providing input,if the
output matches with the desired results, the program is tested ‘ok’, and problematic
otherwise. In this testing method, the design and structure of the code are not known to the
tester, and testing engineers and end users conduct this test on the software.
8.5 WHITE-BOX TESTING
White Box Testing is a software testing method in which the internal structure/
design/ implementation of the item being tested is known to the tester. It is conducted to
test programs and its implementation, in order to improve code efficiency or structure. It is
also known as ‘Structural’ testing. In this testing method, the design and structure of the
code are known to the tester. Programmers of the code conduct this test on the code.
Some of the white box testing techniques are:
 Control-flow testing - The purpose of the control-flow testing to set up a test case
which covers all statements and branch conditions. The branch conditions are tested
for both being true and false, so that all statements can be covered.
 Data-flow testing - This testing technique emphasis to cover all the data variables
included in the program. It tests where the variables were declared and defined and
where they were used or change.
Dept of CSE
29
TKRCET
8.6 TESTING LEVELS
Tests are grouped together based on where they are added in SDLC or by the level
of detailing they contain. In general, there are four levels of testing: unit testing, integration
testing, system testing, and acceptance testing. The purpose of Levels of testing is to make
software testing systematic and easily identify all possible test cases at a particular level.
There are many different testing levels which help to check behavior and
performance for software testing. These testing levels are designed to recognize missing
areas and reconciliation between the development lifecycle states. In SDLC models there
are characterized phases such as requirement gathering, analysis, design, coding or
execution, testing, and deployment. All these phases go through the process of software
testing levels.
 Unit Testing
 Functional Testing
Functional testing is centered on the following items:

Valid Input: identified classes of valid input must be accepted.

Invalid Input: identified classes of invalid input must be rejected.

Functions: identified functions must be exercised.

Output: identified classes of website outputs must be exercised.
 Performance Testing
 Integration Testing
 System Testing
Dept of CSE
30
TKRCET
CHAPTER -9
ADVANTAGES, DISADVANTAGES, AND APPLICATIONS
9.1 ADVANTAGES
Though it is harmful to society, it also has few advantages like
 It is used to create the voices of died people.
 Creating amazing attention among the online peoples making the web pages
graphical representations popular on the search engine or google, as most of the
peoples are start searching on such erotic or randy topics
 Another factual advantage of deep fake is, it makes us aware of such fake things
and we shouldn't believe in everything that we are seeing around us.
 One of the main advantages is that this technology is mainly used in the film
industry.
9.2 DISADVANTAGES
 Rather than benefiting anyone, this artificial intelligence technology has
disadvantages that affect different segments of our society. Apart from creating fake
news and propaganda, the deep fake is mostly used for revenge porn to smear
notable celebrities.
 Until and unless an official statement of the targeted personality not comes, many
peoples are started believing, making their life difficult, especially when they are
criticized/attack by their fans via social media platforms like Facebook, Twitter or
Instagram, etc.
 Voice or Image Manipulation in the process of Authentication. Forgery of evidence
in criminal proceedings (handling the initial situation, activities...). Body
Movements can be manipulated, Identity stealing will occur.
Dept of CSE
31
TKRCET
9.3 APPLICATIONS
Pornography:
 Deep fake is used for revenge porn to defame notable Celebrities.
 Until and unless an official statement of the targeted personality does not come,
many people start believing.
Morphing:
⮚ Computer-generated special effects for sound and image, video recordings.
⮚ Calculations of intermediate changes between single images or sounds.
The complex process consists of
1. Warping
2.Tweening (intermediate pictures)
3. Cross-dissolving
Dept of CSE
32
TKRCET
CHAPTER 10
CONCLUSION AND FUTURE ENHANCEMENT
10.1 CONCLUSION
Although there are few techniques for detecting deep fakes, we cannot completely rely on
these methods forever. With advancement in creating new deep fakes day by day it will be
much harder to detect the fakes. Even with blockchain technologies in the near future, block
chains are vulnerable to sophisticated cyberattacks that can compromise their integrity and
reliability.
Deep fakes have begun to erode trust of people in media contents as seeing them is
no longer commensurate with believing in them. They could cause distress and negative
effects to those targeted, increase misinformation and hate speech, and even spark political
tension, inflame the public, violence or war. Utility of Deep Learning Techniques such as
CNNs for the detection of deepfake videos is more in use. The proposed solution but has
higher efficiency over narrow areas of applications such as in a small business enterprise
or just for an individual’s usage and concentrates on working with lesser time having access
to limited resources (such as not having internet access or RAM being 8GB or lesser). But
the proposed solution can only help with the existing 7104 images (extracted from 175
videos) if new deep fakes are created this solution might not give us the best possible
outcome
Dept of CSE
33
TKRCET
10.2 FUTURE SCOPE FOR ENHANCEMENT
In the future, the advanced technological concept of block chain is used for better deep fake
detection. The potential role of blockchain in detecting deep fake images or videos can be
cryptographically signed by multiple parties at the source of origin. To any video at the
time of recording, Cryptographic hash can be assigned to it. By using blockchain’s
immutability feature, the hash data once entered cannot be modified. For every instance of
video uploading, editing and downloading a smart contract can be written after validation
by the original parties. This ensures integrity of the video and improves traceability.
The hash data can be compared to the source at every stage of video. If there is any
mismatch between the two datasets, we can know that the video is altered. Let us consider
an example where the police officers and investigators use video cameras to record crime
scene details. The video is assigned with unique hash data (which in this case is
fingerprints) of every person present. This data is written to the block chain as a smart
contract with validation from each member. Also, each download, upload and share
instance, will be checked against the original data to verify its authenticity. Thus, video
manipulation cases can be significantly minimized using blockchain technology.
Dept of CSE
34
TKRCET
REFERENCES
1. Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen, MesoNet: a
Compact Facial Video Forgery Detection Network
2. Yuezun Li, Siwei Lyu, Exposing DeepFake Videos By Detecting Face Warping
Artifacts
3. Abhijit Jadhav, Abhishek Patange, Jay Patel, Hitendra Patil, Manjushri Mahajan
Deepfake Video Detection using Neural Networks
4. Huy H. Nguyen, Junichi Yamagishi, Isao Echizen, Using Capsule Networks to
Detect Forged Images and Videos
5. J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner. Face2face:
Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 2387– 2395, 2016.
6. Yuezun Li, Ming-Ching Chang and Siwei Lyu “Exposing AI Created Fake Videos
by Detecting Eye Blinking”
7. An Overview of ResNet and its Variants: https://towardsdatascience.com/anoverview-of-resnetand-its-variants-5281e2f56035
8. Datasetshttps://e.pcloud.link/publink/show?code=XZnsxkZkEAgI1OgQIJHLnNl9
ErhV4vpHuV0
9. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556, 2014.
10. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov.
Dropout: A simple way to prevent neural networks from overfitting. The Journal of
Machine Learning Research, 15(1):1929–1958, 2014.
Dept of CSE
35
TKRCET
11. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556, 2014.
12. A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, ¨ and M. Nießner.
Faceforensics: A large-scale video dataset for forgery detection in human faces.
arXiv preprint arXiv:1803.09179, 2018.
13. B. Bayar and M. C. Stamm. A deep learning approach to universal image
manipulation detection using a new convolutional layer.
14. D. E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning
Research, 10:1755–1758, 2009.
Dept of CSE
36
TKRCET
Download