Abstract - Chennaisunday.com

advertisement
Application-Aware Local-Global SourceDeduplication
for Cloud Backup Services of Personal Storage
ABSTRACT
In personal computing devices that rely on a cloud storage environment for data
backup, an imminent challenge facing source deduplication for cloud backup
services is the low deduplication efficiency due to a combination of the
resourceintensive nature of deduplication and the limited system resources. In this
paper, we present ALG-Dedupe, an Application-aware Local-Global source
deduplication scheme that improves data deduplication efficiency by exploiting
application awareness, and further combines local and global duplicate detection to
strike a good balance between cloud storage capacity saving and deduplication
time reduction. We perform experiments via prototype implementation to
demonstrate that our scheme can significantly improve deduplication efficiency
over the state-of-the-art methods with low system overhead, resulting in shortened
backup window, increased power efficiency and reduced cost for cloud backup
services of personal storage.
EXISTING SYSTEM
Data deduplication, although not traditionally considered backup software, can be
quite handy when backing up large instances of data. The deduplication process
works by identifying unique chunks of data, removing redundant data, and making
data easier to store. For example, if a marketing director sends out a 10MB
PowerPoint document to everyone in a company, and each of those people saves
the document to their hard drive, the presentation will take up a collective 5G of
storage on the backup disk, tape, server, etc. With data deduplication, however,
only one instance of the document is actually saved, reducing the 5G of storage to
just 10MB. When the document needs to be accessed the computer pulls the one
copy that was initially saved. Deduplication drastically reduces the amount of
storage space needed to back up a server/system because the process is more
granular than other compression systems. Instead of looking through entire files to
determine if they are the same, deduplication segments data into blocks and looks
for repetition. Redundant files are removed from the backup and more data can be
stored.
PROPOSED SYSTEM:
Applicationaware Local Global source deduplication scheme that not
only exploits application awareness, but also combines local and global
duplication detection, to achieve high deduplication efficiency by
reducing the deduplication
latency to as low as the application-aware local deduplication while
saving as much cloud storage cost as the application-aware global
deduplication. Our application-aware deduplication design is motivated
by the systematic deduplication analysis on personal storage.
Advantage:
To achieve high deduplication efficiency by reducing the deduplication
latency to as low as the application-aware local deduplicationwhile
saving as much cloud storage cost as the application-aware global
deduplication.
FEATURES:
1. In personal computing devices that rely on a cloud storage
environment for data backup
2. Data deduplication , an effective data compressionapproach that
exploits data redundancy, partitions largedata objects into smaller
parts, called chunks, representshese chunks by their fingerprints
3. User upload file is Converting into binary. Computers store all
characters as numbers stored as binary data. Binary code uses the
digits of 0 and 1 (binary numbers) to represent computer
instructions or text.
4. User again same file upload that time check all upload data than
not allow to upload data is ( deduplication) ,don’t waste cloud
space .
5. User upload file not directly store in cloud send cloud admin it
allow that time only store in cloud and download the data.
6. There are three type of file is there Dependingon whether file
typeis compressed or whether SC can outperform CDC
indeduplication efficiency, we divide files into three main
categories: compressed files, static uncompressed files, and
dynamic uncompressed files.
PROBLEM STATEMENT:
For a backup dataset with logical dataset size L,itsphysical dataset size
will be reduced to Pafter local source deduplication in personal
computing devices and further decreased to PGL by global
sourcededuplication in thecloud, PL9 P. We divide the backup process
into threeparts: local duplicate detection, global duplicate detectionand
unique data cloud store. Here, the latencies forchunking and
fingerprinting are included in duplicatedetection latency. Meanwhile, we
assume
that
there
areaverage
local
duplicate
detection
latencyG,averageglobalduplicate detection latency TGLand average
cloud storageI/O bandwidth B for average chunk size C, TG9
T.Wecanbuild
averagebackup
models
window
to
calculate
size
per
BWSLand
chunk
of
BWSfor
local
deduplicationbased cloud backup and global source deduplication
the
source
based cloud backup
SCOPE:
In the last decade, we have seen the world cross a billion Internetconnected computers, the widespread adoption of Internet-enabled
smartphones, the rise of the “cloud”, and the digitization of nearly every
single photo, movies, song, and other file. Data has grown exponentially,
devices have proliferated, and the risk of data loss has skyrocketed with
these trends.
MODULE DESCRIPTION:
Number of Modules:
After careful analysis the system has been identified to have the
following modules:
1. Cloud backup.
2. personal storage.
3. source deduplication.
4. application awareness.
Cloud backup:
Cloud backup, also known as online backup, is a strategy for backing up
data that involves sending a copy of the data over a proprietary or public
network to an off-site server. The server is usually hosted by a thirdparty service provider, who charges the backup customer a fee based on
capacity, bandwidth or number of users. In the enterprise, the off-site
server might be proprietary, but the chargeback method would be
similar.Online backup systems are typically built around a client
software application that runs on a schedule determined by the level of
service the customer has purchased. If the customer has contracted for
daily backups, for instance, then the application collects, compresses,
encrypts and transfers data to the service provider's servers every 24
hours. To reduce the amount of bandwidth consumed and the time it
takes to transfer files, the service provider might only provide
incremental backups after the initial full backup.Third-party cloud
backup has gained popularity with small offices and home users because
of its convenience. Capital expenditures for additional hardware are not
required and backups can be run dark, which means they can be run
automatically without manual intervention.
personal storage:
Cloud storage is a model of data storage where the digital data is stored
in logical pools, the physical storage spans multiple servers (and often
locations), and the physical environment is typically owned and
managed by a hosting company. These cloud storage providers are
responsible for keeping the data available and accessible, and the
physical environment protected and running. People and organizations
buy or lease storage capacity from the providers to store end user,
organization, or application data.
Cloud storage services may be accessed through a co-located cloud
compute service, a web service application programming interface (API)
or by applications that utilize the API, such as cloud desktop storage, a
cloud storage gateway or Web-based content management systems.
Architecture Overview
An architectural overview of ALG-Dedupe is illustrated in where tiny
files are first filtered out by file size filter for efficiency reasons, and
backup data streams are broken
into chunks by an intelligent chunker using an application- aware
chunking strategy. Data chunks from the same type of files are then
deduplicated in the application-aware
deduplicator by generating chunk fingerprints in hash engine and
performing data redundancy check in
source deduplication:
In this section, we will investigate how data redundancy, space
utilization efficiency of popular data chunking methods and
computational overhead of typical hash functions change in different
applications of personal computing to motivate our research. We
perform preliminary experimental study on datasets collected from
desktops in our research group, volunteers’ personal laptops, personal
workstations for image processing and financial
analysis,andasharedhomeserver.Table1outlinesthekeydataset
characteristics: the number of devices, applications and dataset size for
each studied workload. To the best of
ourknowledge,thisisthefirstsystematicdeduplication analysis on personal
storage.
data deduplication
Data deduplication is a specialized data compression technique for
eliminating duplicate copies of repeating data. Related and somewhat
synonymous terms are intelligent (data) compression and single-instance
(data) storage. This technique is used to improve storage utilization and
can also be applied to network data transfers to reduce the number of
bytes that must be sent. In the deduplication process, unique chunks of
data, or byte patterns, are identified and stored during a process of
analysis. As the analysis continues, other chunks are compared to the
stored copy and whenever a match occurs, the redundant chunk is
replaced with a small reference that points to the stored chunk. Given
that the same byte pattern may occur dozens, hundreds, or even
thousands of times (the match frequency is dependent on the chunk
size), the amount of data that must be stored or transferred can be greatly
reduced.
This type of deduplication is different from that performed by standard
file-compression tools, such as LZ77 and LZ78. Whereas these tools
identify short repeated substrings inside individual files, the intent of
storage-based data deduplication is to inspect large volumes of data and
identify large sections – such as entire files or large sections of files –
that are identical, in order to store only one copy of it. This copy may be
additionally compressed by single-file compression techniques. For
example a typical email system might contain 100 instances of the same
1 MB (megabyte) file attachment. Each time the email platform is
backed up, all 100 instances of the attachment are saved, requiring 100
MB storage space.
4. Application awareness:
Application awareness is the capacity of a system to maintain
information about connected applications to optimize their operation and
that of any subsystems that they run or control.
An application-aware network uses current information about
applications connected to it, such as application state and resource
requirements. That capacity is central to software-defined networking
(SDN), enabling the network to efficiently allocate resources for the
most effective operation of both applications and the network itself.
Application-aware storage systems rely upon built-in intelligence about
relevant applications and their utilization patterns. Once the storage
"understands" the applications and usage conditions, it is possible to
optimize data layouts, caching behaviors, and quality of service (QoS)
levels.
System Configuration:
HARDWARE REQUIREMENTS:
Hardware
-
Pentium
Speed
-
1.1 GHz
RAM
- 1GB
Hard Disk
- 20 GB
Floppy Drive
- 1.44 MB
Key Board
- Standard Windows Keyboard
Mouse
-
Two or Three Button Mouse
Monitor
-
SVGA
SOFTWARE REQUIREMENTS:
Operating System
: Windows
Technology
: Java and J2EE
Web Technologies
: Html, JavaScript, CSS
IDE
Web Server
Tool kit
Database
Java Version
: My Eclipse
: Tomcat
: Android Phone
: My SQL
: J2SDK1.5
CONCLUSION
In this paper, we propose ALG-Dedupe, an applicationaware localglobal source-deduplication scheme for cloud backup in the personal
computing environment to improve deduplication efficiency. An
intelligent deduplication strategy in ALG-Dedupe is designed to exploit
file semantics to minimize computational overhead and maximize
deduplication effectiveness using application awareness. It combines
local deduplication and global deduplication to balance the effectiveness
and latency of deduplication. The proposed application-aware index
structure can significantly relieve the disk index lookup bottleneck by
dividing a central index into many independent small indices to optimize
lookup performance. In our prototype evaluation, ALG-Dedupe is
shown to improve the deduplication efficiency of the state-of-the-art
application-oblivious source deduplication approaches by a factor of
1.6X
_
2.3X
with
very
low
andshortenthebackupwindowsizeby26percent37
power-efficiency
by
more
than
a
third,
system
percent,
overhead,
improve
andsave41percent-
64percentcloudcostforthecloud backup service. Comparing with our
previous localdeduplication-only design AA-Dedupe, it can reduce
cloud cost by 23 percent without increasing backup window size. As a
direction of future work, we plan to further optimize our scheme for
other resource-constrained mobile devices like smartphone or tablet and
investigate the secure deduplication issue in cloud backup services of
the personal computing environment.
Download