Uploaded by kajals.172926

ism notes

advertisement
1
Unit – 1
Storage technology
Unit-01/Lecture -01
Introduction to information storage and management (ISM)
Information storage and management is the only subject of its kind to fill the knowledge
gap in understanding varied components of modern information storage infrastructure,
including virtual environments. It provides comprehensive learning of storage technology,
m
which will enable you to make more informed decisions in an increasingly complex it
o
environment. Ism builds a strong understanding of underlying storage technologies and
.c
prepares you to learn advanced concepts, technologies, and products. You will learn
a
about the architectures, features, and benefits of intelligent storage systems; storage
networking technologies such as FC-SAN, ip-SAN, NAS, object-based and unified storage;
m
business continuity solutions such as backup, replication, and archive; the increasingly
a
critical area of information security; and the emerging field of cloud computing.
n
y
d
Introduction to storage technology
u
Storage systems are inevitable for modern day computing. All known computing
t
platforms ranging from handheld devices to large super computers use storage systems
S
for storing data temporarily or permanently. Beginning from punch card which stores a
few bytes of data, storage systems have reached to multi terabytes of capacities in
comparatively less space and power consumption. This tutorial is intended to give an
introduction to storage systems and components to the reader.
Storage definition ,
2
here are a few definitions of storage when refers to computers.

A device capable of storing data. The term usually refers to mass storage devices,
such as disk and tape drives.

In a computer, storage is the place where data is held in an electromagnetic or
optical form for access by a computer processor. (whatis.com)

Computer data storage; often called storage or memory refer to computer
components, devices and recording media that retain digital data used for
computing for some interval of time of these, i like the definition coined out by
m
wikipedia.com. Likes and dislikes apart, in basic terms, computer storage can be
o
defined as " device or media stores data for later retrieval". From the definition,
.c
we can see that the storage device possess two features namely "storage" and
a
"retrieval". A storage facility without retrieval options seems to be of no use . A
m
storage device may store application programs, databases, media files etc....
a
as we see in modern day computers, storage devices can be found in many forms.
n
Storage devices can be classified based on many criterions. Of them, the very basic is as
y
d
we learned in schools ie; primary storage and secondary storage. Storage devices can be
further classified based on the memory technology that they use, based on its data
volatility etc...
u
t
S
Storage technologies - Storage caching [Rgpv/dec2011(10)]
Storage caching is used to buffer blocks of data in order to minimize the utilization of
disks or storage arrays and to minimize the read / write latency for storage access.
Especially for write intensive scenarios such as virtual desktops, write caching is very
beneficial as it can keep the storage latency even during peak times at a low level.
Storage cache can be implemented in four places:
3

disk (embedded memory – typically non-expansible)

storage array (vendor specific embedded memory + expansion cards)

computer accessing the storage (ram)

storage network (i.e. Provisioning server)

The cache can be subdivided into two categories:

volatile cache: contained data is lost upon power outages (good for reads or noncritical writes)

m
non-volatile cache: data is kept safe in case of power outages (good for reads and
writes). Often referred as battery backed write cache .
o
.c
To further increase the speed of the disk or storage array advanced algorithms such as
a
read-ahead / read-behind or command queuing are commonly used.
m
a
n
u
y
d
t
S
S.no
Q.1
Rgpv questions
Explain storage technologies in detail?
Year
Marks
Dec 2011
10
4
Unit-01/Lecture-02
Data
–
proliferation
[Rgpv/
Dec
2015(2),
Rgpv/dec2013(10),
Rgpv/dec2013(5),
Rgpv/dec2011(5)]
Data proliferation refers to the prodigious amount of data, structured and unstructured, that
businesses and governments continue to generate at an unprecedented rate and the usability
problems that result from attempting to store and manage that data. While originally pertaining
to problems associated with paper documentation, data proliferation has become a major
m
problem in primary and secondary data storage on computers.
o
While digital storage has become cheaper, the associated costs, from raw power to
.c
maintenance and from metadata to search engines, have not kept up with the proliferation of
data. Although the power required to maintain a unit of data has fallen, the cost of facilities
a
which house the digital storage has tended to rise.
m
Problems caused [Rgpv Dec2015(3), Rgpv Dec2014(2)]
a
n
The problem of data proliferation is affecting all areas of commerce as the result of the
y
d
availability of relatively inexpensive data storage devices. This has made it very easy to dump
data into secondary storage immediately after its window of usability has passed. This masks
u
problems that could gravely affect the profitability of businesses and the efficient functioning
t
of health services, police and security forces, local and national governments, and many other
S
types of organizations. Data proliferation is problematic for several reasons:

Difficulty when trying to find and retrieve information. Increased manpower
requirements to manage increasingly chaotic data storage resources.

Slower networks and application performance due to excess traffic as users search and
search again for the material they need.

High cost in terms of the energy resources required to operate storage hardware.
5
Proposed solutions

Applications that better utilize modern technology

Reductions in duplicate data (especially as caused by data movement)

Improvement of metadata structures

Improvement of file and storage transfer structures

User education and discipline.

The implementation of information lifecycle management solutions to eliminate low-
m
value information as early as possible before putting the rest into actively managed
o
long-term storage in which it can be quickly and cheaply accessed.
.c
S.no
Rgpv questions
Year
Marks
Q.1
What do you mean by data proliferation ?Explain data
Dec 2015
3
Dec 2014
3
Dec 2013
7
n
Dec 2013
10
Explain in brief data proliferation?
Dec 2015/
2
Dec 2011
5
a
m
proliferation process and major problem associated
a
with data proliferation?
Q.2
S
t
u
y
d
6
Unit-01/Lecture -03
Overview of storage infrastructure components – [Rgpv/ dec 2015(7), Rgpv/dec2013 (7)]
The choice of hard discs can have a profound impact on the capacity, performance and long-term
reliability of any storage infrastructure. But it's unwise to trust valuable data to any single point of
failure, so hard discs are combined into groups that can boost performance and offer redundancy
in the event of disc faults. At an even higher level, those arrays must be integrated into the
m
storage infrastructure -combining storage with network technologies to make data available to
o
users over a lan or wan. If you're new to storage, or just looking to refresh some basic concepts,
.c
this chapter on data storage components can help to bring things into focus.
a
The lowest level: hard discs
m
a
Hard discs are random-access storage mechanisms that relegate data to spinning platters coated
n
with extremely sensitive magnetic media. Magnetic read/write heads step across the radius of
y
d
each platter in set increments, forming concentric circles of data dubbed "tracks." hard disc
capacity is loosely defined by the quality of the magnetic media (bits per inch) and the number of
u
tracks. Thus, a late-model drive with superior media and finer head control can achieve far more
t
storage capacity than models just six-12 months old. Some of today's hard drives can deliver up to
S
750 gbytes of capacity. Capacity is also influenced by specific drive technologies including
perpendicular recording, which fits more magnetic points into the same physical disc area.
Grouping the discs: raid
Hard discs are electromechanical devices and their working life is finite. Media faults, mechanical
wear and electronic failures can all cause problems that render drive contents inaccessible. This is
unacceptable for any organization, so tactics are often implemented to protect against failure. One
of the most common data protection tactics is arranging groups of discs into arrays. This is known
7
as a raid.
Raid implementations typically offer two benefits; data redundancy and enhanced performance.
Redundancy is achieved by copying data to two or more discs -when a fault occurs on one hard
disc, duplicate data on another can be used instead. In many cases, file contents are also spanned
(or striped) across multiple hard discs. This improves performance because the various parts of a
file can be accessed on multiple discs simultaneously -rather than waiting for a complete file to be
accessed from a single disc. Raid can be implemented in a variety of schemes, each with its own
designation:

Raid-0 -- disc striping is used to improve storage performance, but there is no redundancy.

Raid-1 -- disc mirroring offers disc-to-disc redundancy, but capacity is reduced and
m
o
performance is only marginally enhanced.

a
.c
Raid-5 -- parity information is spread throughout the disc group, improving read
m
performance and allowing data for a failed drive to be reconstructed once the failed drive
a
is replaced.

n
Raid-6 -- multiple parity schemes are spread throughout the disc group, allowing data for
y
d
up to two simultaneously failed drives to be reconstructed once the failed drive(s) are
replaced.
u
t
There are additional levels, but these four are the most common and widely used. It is also
S
possible to mix raid levels in order to obtain greater benefits. Combinations are typically denoted
with two digits. For example, raid-50 is a combination of raid-5 and raid-0, sometimes noted as
raid-5+0. As another example, raid-10 is actually raid-1 and raid-0 implemented together, raid-1+0.
For more information on raid controllers, see the searchstorage.com article the new breed of raid
controllers.
Getting storage on the network
Storage is useless unless network users can access it. There are two principle means of attaching
8
storage systems: NAS and SAN. NAS boxes are storage devices behind an ethernet interface,
effectively connecting discs to the network through a single ip address. NAS deployments are
typically straightforward and management is light, so new NAS devices can easily be added as
more storage is needed. The downside to NAS is performance - storage traffic must compete for
NAS access across the ethernet cable. But NAS access is often superior to disc access at a local
server.
The SAN overcomes common server and NAS performance limitations by creating a sub network
of storage devices interconnected through a switched fabric like FC or iscsi (called internet scsi or
scsi-over-ip. Both FC and iscsi approaches make any storage device visible from any host, and offer
m
much more availability for corporate data. FC is costlier, but offers optimum performance, while
o
iscsi is cheaper, but somewhat slower. Consequently, FC is found in the enterprise and iscsi
.c
commonly appears in small and mid-sized businesses. However, SAN deployments are more costly
a
to implement (in terms of switches, cabling and host bus adapters) and demand far more
management effort.
m
a
n
Q.1
u
t
S.no
S
y
d
Rgpv questions
Explain briefly the evolution of the storage
Year
Marks
Dec 2013
7
Dec 2015
7
management?
Q.2
Discuss Storage infrastructure components.
9
Unit-01/Lecture -04
Information lifecycle management - [Rgpv/dec 2015(7),Rgpv/dec2014(7), Rgpv/dec 2013 (7),
Rgpv/dec 2012 (5)]
Information life cycle management (ILM) is a comprehensive approach to managing the flow of an
information system's data and associated metadata from creation and initial storage to the time
when it becomes obsolete and is deleted. Unlike earlier approaches to data storage management,
ILM involves all aspects of dealing with data, starting with user practices, rather than just
automating storage procedures, as for example, hierarchical storage management (HSM) does.
m
o
Also in contrast to older systems, ilm enables more complex criteria for storage management than
data age and frequency of access.
a
.c
ILM products automate the processes involved, typically organizing data into separate tiers
according to specified policies, and automating data migration from one tier to another based on
m
those criteria. As a rule, newer data, and data that must be accessed more frequently, is stored on
a
faster, but more expensive storage media, while less critical data is stored on cheaper, but slower
n
media. However, the ILM approach recognizes that the importance of any data does not rely
y
d
solely on its age or how often it's accessed. Users can specify different policies for data that
declines in value at different rates or that retains its value throughout its life span. A path
u
management application, either as a component of ILM software or working in conjunction with
t
it, makes it possible to retrieve any data stored by keeping track of where everything is in the
S
storage cycle.
ILM is often considered a more complex subset of data life cycle management (DLM).DLM
products deal with general attributes of files, such as their type, size, and age; ILM products have
more complex capabilities. For example, a DLM product would allow you to search stored data for
a certain file type of a certain age. while an ilm product would let you search various types of
stored files for instances of a specific piece of data, such as a customer number.
10
Data management has become increasingly important as businesses face compliance issues in the
wake of legislation, that regulates how organizations must deal with particular types of data. Data
management experts stress that information life cycle management should be an organizationwide enterprise, involving procedures and practices as well as applications.
information lifecycle management comprises the policies, processes, practices, and tools used to
align the business value of information with the most appropriate and cost effective it
infrastructure from the time information is conceived through its final disposition. Information is
aligned with business processes through management policies and service levels associated with
applications, metadata, information, and data.
m
o
a
.c
m
a
n
u
y
d
t
Operations
S
Operational aspects of ILM include backup and data protection; disaster recovery, restore, and
restart; archiving and long-term retention; data replication; and day-to-day processes and
procedures necessary to manage a storage architecture.
11
Functionality
For the purposes of business records, there are five phases identified as being part of the lifecycle
continuum along with one exception. These are:

Creation and receipt

Distribution

Use

Maintenance

Disposition
m
Creation and receipt deals with records from their point of origination. This could include their
o
creation by a member of an organization at varying levels or receipt of information from an
.c
external source. It includes correspondence, forms, reports, drawings, computer input/output, or
a
other sources.
m
Distribution is the process of managing the information once it has been created or received. This
a
includes both internal and external distribution, as information that leaves an organization
n
becomes a record of a transaction with others.
y
d
Use takes place after information is distributed internally, and can generate business decisions,
u
document further actions, or serve other purposes.
t
S
Maintenance is the management of information. This can include processes such as filing, retrieval
and transfers. While the connotation of 'filing' presumes the placing of information in a prescribed
container and leaving it there, there is much more involved. Filing is actually the process of
arranging information in a predetermined sequence and creating a system to manage it for its
useful existence within an organization. Failure to establish a sound method for filing information
makes its retrieval and use nearly impossible. Transferring information refers to the process of
responding to requests, retrieval from files and providing access to users authorized by the
organization to have access to the information. While removed from the files, the information is
12
tracked by the use of various processes to ensure it is returned and/or available to others who
may need access to it.
Disposition is the practice of handling information that is less frequently accessed or has met its
assigned retention periods. Less frequently accessed records may be considered for relocation to
an 'inactive records facility' until they have met their assigned retention period. "although a small
percentage of organizational information never loses its value, the value of most information
tends to decline over time until it has no further value to anyone for any purpose. The value of
nearly all business information is greatest soon after it is created and generally remains active for
only a short time --one to three years or so-- after which its importance and usage declines. The
m
record then makes its life cycle transition to a semi-active and finally to an inactive state."
[1]
o
retention periods are based on the creation of an organization-specific retention schedule, based
.c
on research of the regulatory, statutory and legal requirements for management of information
a
for the industry in which the organization operates. Additional items to consider when
establishing a retention period are any business needs that may exceed those requirements and
m
consideration of the potential historic, intrinsic or enduring value of the information. If the
a
information has met all of these needs and is no longer considered to be valuable, it should be
n
disposed of by means appropriate for the content. This may include ensuring that others cannot
y
d
obtain access to outdated or obsolete information as well as measures for protection privacy and
confidentiality.'
u
t
Long-term records are those that are identified to have a continuing value to an organization.
S
Based on the period assigned in the retention schedule, these may be held for periods of 25 years
or longer, or may even be assigned a retention period of "indefinite" or "permanent". The term
"permanent" is used much less frequently outside of the federal government, as it is not feasible
to establish a requirement for such a retention period. There is a need to ensure records of a
continuing value are managed using methods that ensure they remain persistently accessible for
length of the time they are retained. While this is relatively easy to accomplish with paper or
microfilm based records by providing appropriate environmental conditions and adequate
protection from potential hazards, it is less simple for electronic format records. There are unique
13
concerns related to ensuring the format they are generated/captured in remains viable and the
media they are stored on remains accessible. Media is subject to both degradation and
obsolescence over its lifespan, and therefore, policies and procedures must be established for the
periodic conversion and migration of information stored electronically to ensure it remains
accessible for its required retention periods.
m
o
a
.c
m
a
n
S.no
Q.1
y
d
Rgpv questions
Year
Marks
What are the different phases of information life cycle
Dec 2015
7
Dec 2014
7
Dec 2013
7
Dec 2011
10
Dec 2012
5
u
model?
t
S
Q.2
Explain briefly information life cycle implementation?
14
Unit-01/Lecture -05
Data
Categorization
–
[Rgpv/
dec
2015(2),Rgpv/dec2013(7)
Rgpv/dec2013(10),
Rgpv/dec2011(5)]]
Data classification is the categorization of data for its most effective and efficient use. In a basic
approach to storing computer data, data can be classified according to its critical value or how
often it needs to be accessed, with the most critical or often-used data stored on the fastest
media while other data can be stored on slower (and less expensive) media. This kind of
m
classification tends to optimize the use of data storage for multiple purposes - technical,
o
administrative, legal, and economic.
.c
Data can be classified according to any criteria, not only relative importance or frequency of
a
use. For example, data can be broken down according to its topical content, file type, operating
m
platform, average file size in megabytes or gigabytes, when it was created, when it was last
a
accessed or modified, which person or department last accessed or modified it, and which
n
personnel or departments use it the most. A well-planned data classification system makes
y
d
essential data easy to find. This can be of particular importance in risk management, legal
discovery, and compliance with government regulations.
u
Computer programs exist that can help with data classification, but in the end it is a subjective
t
business and is often best done as a collaborative task that considers business, technical, and
S
other points-of-view.
Data collections
Data stewards may wish to assign a single classification to a collection of data that is common in
purpose or function. When classifying a collection of data, the most restrictive classification of
any of the individual data elements should be used. For example, if a data collection consists of
a student’s name, address and social security number, the data collection should be classified
15
as restricted even though the student’s name and address may be considered public
information.
Why is it important?
Data classification provides several benefits. It allows an organization to inventory its
information assets. In many cases, information asset owners aren't aware of all of the different
types of data they hold. It also allows central it to work with departments to develop specific
security requirements that can be readily utilized.
N the field of data management, data classification as a part of information lifecycle
m
management (ILM) process can be defined as a tool for categorization of data to enable/help
o
organization to effectively answer following questions:
.c

What data types are available?

Where are certain data located?

What access levels are implemented?

What protection level is implemented and does it adhere to compliance regulations?
a
m
a
n
When implemented it provides a bridge between it professionals and process or application
y
d
owners. It staff is informed about the data value and on the other hand management (usually
application owners) understands better to what segment of data centre has to be invested to
u
keep operations running effectively. This can be of particular importance in risk management,
t
legal discovery, and compliance with government regulations. Data classification is typically a
S
manual process; however, there are many tools from different vendors that can help gather
information about the data.
How to start process of data classification?
Note that this classification structure is written from a data management perspective and
therefore has a focus for text and text convertible binary data sources. Images, video, and audio
files are highly structured formats built for industry standard api's and do not readily fit within
16
the classification scheme outlined below.
First step is to evaluate and divide the various applications and data as follows:

Relational or tabular data (around 15% of non audio/video data)

Generally describes proprietary data which can be accessible only through
application or application programming interfaces (api)

Applications that produce structured data are usually database applications.

This type of data usually brings complex procedures of data evaluation and
migration between the storage tiers.

m
To ensure adequate quality standards, the classification process has to be
o
monitored by subject matter experts.

.c
Semi-structured or poly-structured data (all other non audio/video data that does not
a
conform to a system or platform defined relational or tabular form).
m

Generally describes data files that have a dynamic or non-relational semantic
a
structure (e.g. Documents,xml,json,device or system log output,sensor output).
n

Relatively simple process of data classification is criteria assignment.

Simple process of data migration between assigned segments of predefined
y
d
storage tiers.
Q.1
u
t
S.no
Rgpv question
Year
Marks
What is data categorization ?why it is
Dec 2013
7
Dec 2013
10
.
Dec 2015
2
Explain briefly data categorization?
Dec 2011
5
S
required.
Q.2
What is data categorization? Explain
challenges for data categorization
Q.3
17
Unit 01/Lecture - 06
Evolution of various storage technologies[Rgpv/dec2012(10)],Rgpv/dec2011(5)]
Das (Direct Attached Storage)
When windows servers leave the factory, they can be configured with several storage
m
options. Most servers will contain 1 or more local disk drives which are installed internal to the
o
server’s cabinet. These drives are typically used to install the operating system and user
.c
applications. If additional storage is needed for user files or databases, it may be necessary to
configure direct attached storage (das).
a
Das is well suited for a small-to-medium sized business where sufficient amounts of storage can
m
be configured at a low startup cost. The das enclosure will be a separate adjacent cabinet that
a
contains the additional disk drives. An internal pci-based raid controller is typically configured
n
in the server to connect to the storage. The sas (serial attached scsi) technology is used to
y
d
connect the disk arrays as illustrated in the following example.
u
As mentioned, one of the primary benefits of das storage is the lower startup cost to
t
implement. Managing the storage array is done individually as the storage is dedicated to a
S
particular server. On the downside, there is typically limited expansion capability with das, and
limited cabling options (1 to 4 meter cables). Finally, because the raid controller is typically
installed in the server, there is a potential single point of failure for the das solution.
18
SAN (Storage Area Networks) [Rgpv Dec 2014(2)]
m
With storage area networks (SAN), we typically see this solution used with medium-to-large size
o
businesses, primarily due to the larger initial investment. Sans require an infrastructure
.c
consisting of SAN switches, disk controllers, hbas (host bus adapters) and fibre cables. Sans
a
leverage external raid controllers and disk enclosures to provide high-speed storage for
numerous potential servers.
m
a
The main benefit to a SAN-based storage solution is the ability to share the storage arrays to
n
multiple servers. This allows you to configure the storage capacity as needed, usually by a
y
d
dedicated SAN administrator. Higher levels of performance throughput are typical in a SAN
environment, and data is highly available through redundant disk controllers and drives. The
u
disadvantages include a much higher startup cost for sans, and they are inherently much more
t
complex to manage. The following diagram illustrates a typical SAN environment.
S
19
m
o
a
NAS (network attached storage)
.c
m
a
A third type of storage solution exists that is a hybrid option called network attached storage
n
(NAS). This solution uses a dedicated server or appliance to serve the storage array. The
y
d
storage can be commonly shared to multiple clients at the same time across the existing
ethernet network. The main difference between NAS and das and SAN is that NAS servers
u
utilize file level transfers, while das and SAN solutions use block level transfers which are more
t
efficient.
S
NAS storage typically has a lower startup cost because the existing network can be used. This
can be very attractive to small-to-medium size businesses. Most NAS models implement the
storage arrays as iscsi targets that can be shared across the networks. Dedicated iscsi networks
can also be configured to maximize the network throughput. The following diagram shows how
a NAS configuration might look.
20
m
o
a
.c
m
a
n
u
y
d
t
S.no
S
Rgpv question
Year
Marks
Q.1
Define SAN (Storage Area Network)
Dec 2014
2
Q.2
Explain briefly about the evolution of
Dec 2012
10
Dec 2011
5
storage technologies and architecture?
Q.3
Explain briefly Evolution of various
storage technologies
21
Unit 01/Lecture - 07
Data Centre - [Rgpv/dec 2014(2), Rgpv/dec2013(7), Rgpv/dec2013(10), Rgpv/dec2012(10),
Rgpv/dec2011(10)]
A data center (sometimes spelled datacenter) is a centralized repository, either physical or
virtual, for the storage, management, and dissemination of data and information organized
around a particular body of knowledge or pertaining to a particular business.
The national climatic data center (ncdc), for example, is a public data center that maintains the
m
world's largest archive of weather information. A private data center may exist within an
o
organization's facilities or may be maintained as a specialized facility. Every organization has a
.c
data center, although it might be referred to as a server room or even a computer closet.
a
In that sense, data center may be synonymous with network operations center (noc), a
m
restricted access area containing automated systems that constantly monitor server activity,
a
web traffic, and network performance.
n
Organizations maintain data centers to provide centralized data processing capabilities
y
d
across the enterprise. Data centers store and manage large amounts of mission-critical data.
The data center infrastructure includes computers, stor- age systems, network devices,
u
dedicated power backups, and environmental controls (such as air conditioning and fire
suppression).
S
t
Large organizations often maintain more than one data center to distribute data processing
workloads and provide backups in the event of a disaster. The storage requirements of a
data center are met by a combination of various stor- age architectures.
22
Core elements
Five core elements are essential for the basic functionality of a data center:

application: an application is a computer program that provides the logic for
computing operations. Applications, such as an order processing system, can be
layered on a database, which in turn uses operating system services to perform
read/write operations to storage devices.

Database: more commonly, a database management system (DBMS) provides
a structured way to store data in logically organized tables that are interrelated.
m
A dbms optimizes the storage and retrieval of data.

databases.

.c
a
network: a data path that facilitates communication between clients and servers
m
or between servers and storage.

o
server and operating system: a computing platform that runs applications and
a
storage array: a device that stores data persistently for subsequent use.
n
These core elements are typically viewed and managed as separate entities, but all the
y
d
elements must work together to address data processing requirements.
u
Key requirements for data center elements
t
S
Uninterrupted operation of data centers is critical to the survival and success of a
business. It is necessary to have a reliable infrastructure that ensures data is accessible
at all times. While the requirements, shown in figure 1-6, are appli- cable to all elements
of the data center infrastructure, our focus here is on storage systems.

Availability: all data center elements should be designed to ensure acces- sibility. The
inability of users to access data can have a significant negative impact on a business.

Security: polices, procedures, and proper integration of the data cen- ter core
elements that will prevent unauthorized access to information must be established.
23
In addition to the security measures for client access, specific mechanisms must
enable servers to access only their allocated resources on storage arrays.

Scalability: data center operations should be able to allocate additional processing
capabilities or storage on demand, without interrupting busi- ness operations.
Business growth often requires deploying more servers, new applications, and
additional databases. The storage solution should be able to grow with the business.

Performance: all the core elements of the data center should be able to
provide optimal performance and service all processing requests at high speed.
The infrastructure should be able to support performance requirements.

Data integrity: data integrity refers to mechanisms such as error correc- tion codes
m
or parity bits which ensure that data is written to disk exactly as it was received.
o
.c
Any variation in data during its retrieval implies cor- ruption, which may affect the
operations of the organization.

a
Capacity: data center operations require adequate resources to store and process
m
large amounts of data efficiently. When capacity requirements increase, the data
a
center must be able to provide additional capacity with- out interrupting availability,
n
or, at the very least, with minimal disruption. Capacity may be managed by
y
d
reallocation of existing resources, rather than by adding new resources.

Manageability: a data center should perform all operations and activi- ties in the
u
most efficient manner. Manageability can be achieved through automation and
t
the reduction of human (manual) intervention in com- mon tasks.
S
24
m
o
a
.c
m
a
n
S.no.
Q.1
Q.2
u
t
S
y
d
Year
Year
Marks
What are the data centre? What are the
Dec2013
7
requirement for the design of a secure
Dec2012
10
data centre.
Dec2012
10
What is the significance of Data Center
Dec 2014
2
in storage technology
25
Reference
Book
Information storage management
Author
Priority
G. Somasundaram
1
Alok Shrivastava
Ulf
Troppens,
Wolfgang 2
m
Storage Network explained : Basic and Mueller-Friedt,
Rainer
o
application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils
iSESI
Haustein
a
.c
Cloud Computing : Principles, 3
Nick Antonopoulos, Lee Gillam
System & Application
m
a
n
S
t
u
y
d
1
Unit -2
Storage systems architecture
Unit -02/Lecture-01
Architecture of Intelligent disk subsystems – [Rgpv/dec 2014(7),Rgpv/dec2013(10)]
In contrast to a file server, a disk subsystem can be visualized as a hard disk server.
Servers are connected to the connection port of the disk subsystem using standard I/O
techniques such as Small Computer System Interface (SCSI), Fibre Channel or Internet SCSI
m
(iSCSI) and can thus use the storage capacity that the disk subsystem provides. The
o
internal structure of the disk subsystem is completely hidden from the server, which sees
.c
only the hard disks that the disk subsystem provides to the server.
a
The connection ports are extended to the hard disks of the disk subsystem by means of
m
internal I/O channels. In most disk subsystems there is a controller between the connection
a
ports and the hard disks. The controller can significantly increase the data availability and data
n
access performance with the aid of a so-called RAID procedure. Fur- thermore, some
y
d
controllers realize the copying services instant copy and remote mirroring and further
additional services. The controller uses a cache in an attempt to accelerate read and
u
write accesses to the server.
S
t
2
m
o
a
.c
m
a
n
t
u
y
d
Servers are connected to a disk subsystem using standard I/O techniques. The figure shows a
S
server that is connected by SCSI. Two others are connected by Fibre Channel SAN.
3
m
o
a
.c
m
a
n
u
y
d
t
Servers are connected to the disk subsystems via the ports. Internally, the disk subsystem
S
consists of hard disks, a controller, a cache and internal I/O channels.
Disk subsystems are available in all sizes. Small disk subsystems have one to two
connection ports for servers or storage networks, six to eight hard disks and, depending
on the disk capacity, storage capacity of a few terabytes. Large disk subsystems have
multiple ten connection ports for servers and storage networks, redundant controllers and
multiple I/O channels. A considerably larger number of servers can access a subsystem
through a connection over a storage network. Large disk subsystems can store up to a
4
petabyte of data and, depending on the supplier, can weigh well over a tonne. The
dimensions of a large disk subsystem are comparable to those of a wardrobe. The
architecture of real disk subsystems is more complex and varies greatly. Ultimately,
however, it will always include the components.
Regardless of storage networks, most disk subsystems have the advantage that free disk
space can be flexibly assigned to each server connected to the disk subsystem (storage
pooling). All servers are either directly connected to the disk subsystem
or indirectly
connected via a storage network. In this configuration each server can be assigned free
storage. Incidentally, free storage capacity should be understood to mean both hard disks
that have already been installed and have not yet been used and also free slots for hard
m
disks that have yet to be installed.
o
a
m
a
n
S
t
u
y
d
.c
5
All servers share the storage capacity of a disk subsystem. Each server can be assigned free
storage more flexibly as required.
m
o
a
.c
m
a
n
u
y
d
t
S
S.no
Rgpv question
Q.1
Explain
component
Year
architecture
intelligent disk subsystem?
Marks
of Dec 2014
07
Dec 2013
10
6
Unit-02/Lecture -02
Modular vs. Integrated: [Rgpv/dec 2015(7),Rgpv/dec2013(7),Rgpv/dec2011(5)]
Which unified storage architecture to go with when looking at unified systems is probably a
decision based on availability more than functionality. Modular solutions may be more
appropriate for users that have an existing storage array that offers NAS services with a gateway
module. On the other hand, users buying new unified systems will be looking mainly at
integrated systems, since most of the new products will have an integrated architecture.
m
At one time block storage supported the most important applications, typically production
o
databases, and file storage was associated with user home directories and office productivity
.c
applications. But now even the most critical databases are being run on NAS devices.
a
Server virtualization has further raised the profile of file storage, since applications like vmware
store and manipulate entire server instances as individual files. Nfs-hosted images are rapidly
m
gaining traction in these environments. "big data" archive systems in industries such as media
a
and entertainment, oil and gas, and remote sensing, to name a few, also do their work at the
n
file level.
u
y
d
t
S
S.no
Rgpv question
Year
Marks
Q.1
Differentiate integrated vs modular Dec 2015
7
array?
Dec 2013
7
Dec 2011
5
7
Unit-02/Lecture – 03
Volume manager vs file system – [Rgpv/dec2013(7), Rgpv/dec2011(5)]]
Current file systems trace their roots from the UFS file system, which was proposed in 1965. By
the early 1970s, the UNIX file system was up and running. Since then, not much has changed in
file systems and there have only been incremental hardware changes. I think the file system
m
and volume manager are the most critical components in achieving I/O performance from both
o
the OS and underlying hardware. Even the best file system and volume manager can be
.c
configured so that the performance is poor. Therefore, my next couple of columns will cover
a
file system and volume management, in addition to file system configuration and tuning.
m
File System Basics
a
The purpose of file systems (FS) is to maintain a view of the storage so we can create files. This
n
is done so that users can create, delete, open, close, read, write, and extent files on the
y
d
device(s). File systems can also be used to maintain security over files.
u
Volume Manager Basics
t
S
The original goal of the UNIX volume management (VM), which was developed in the late
1980s, was to group disk devices together so that file systems larger than a single device could
be created, and to achieve high performance by striping devices.
Standard VM Inner Workings (Striping)
Most file systems require a VM to group disk and/or RAID devices together. Striping spreads the
data across the devices based on the stripe size set within the volume manager. Note that some
volume managers support concatenation, which starts with the first device and then only writes
8
to the next device when the first device becomes full. The idea behind striping is to spread the
data across multiple devices to improve performance and allow multiple I/O disk-head seeks to
occur simultaneously. Figure 1 shows what happens with standard striping for allocation of
multiple files writing at the same time, and shows what happens when one of those files is
removed.
File Systems that Maintain Their Topology
Some file systems maintain and understand the device topology without a volume manager.
These file systems support both striping and round-robin allocation. Round-robin allocation
m
means that each device is used individually. In most cases, each file open moves to the next
o
device. In some file systems, it could be that each directory created moves to the next device.
.c
Figure 2 shows an example of round-robin allocation, which is very different from striping. I will
show how round-robin allocation has some important implications for performance.
a
m
File Allocation Comparison
a
One reason that volume managers do not provide a round-robin allocation method is because
n
of the interaction between the volume manager and the file system. Every file system must
y
d
allocate space and maintain consistency, which is one of the main purposes of the file system.
There are multiple types of file system allocation, but the real issue is that a volume manager
u
presents a single set address range for the block devices in the file system for the file system to
t
allocate from. The volume manager then translates the address to each of the devices. It is
S
difficult, but not impossible, for the volume manager to pass all of the underlying device
topology to a file system. Also, most file systems designed with volume managers do not have
an interface to understand the underlying volume topology. Other file systems that control
their own topology can easily use round-robin allocation, because the file systems understand
the underlying topology.
9
How Volume Managers and File Systems Work
It is important to fully understand how volume managers and file systems work internally to
choose the best file system for the application. By understanding the inner workings, you will
have a much better idea of what the tunable parameters mean and how to improve
performance.
Performance Comparison
Indirect block allocation and read/write performance can be painfully slow compared to the
extent-based allocation method. For example, consider an application doing random reads and
m
writes. To find the block address for the record, a file allocated with indirect blocks must read
o
all of the data areas of the files for the record in question prior to reading the record. With
.c
extent-based allocation, the file system can simply read the inodes in question, which will make
a
an enormous difference in performance. I am unaware of any new file systems using indirect
m
blocks for space allocation because of the huge performance penalties for random I/O. Even for
a
sequential I/O, the performance for indirect blocks is generally less than extent-based file
systems.
n
y
d
Free Space Allocation and Representation Methods
u
Each file system uses an algorithm to find and allocate free space within the file system. Most
t
file systems use binary tree (Btree) allocation to represent free space, but some file systems use
S
bitmaps to represent free space. Each method of free space representation has advantages and
disadvantages.
Bitmap Representation
The use of bitmap representation is less common. This method is used where each bit in the
map represents a single allocation unit such as 1,024 bytes, 512 KB, or even hundreds of
megabytes. Therefore, a single bit could represent a great deal of space.
10
Free Space Allocation
With each allocation type (Btree or bitmap), free space must be found and allocated with the
representation method. These allocation algorithms find the free space based on their internal
search algorithms. The two most common methods used are first fit and best fit.
First Fit
The first fit method tries to find the first space within the file system that matches the
allocation size requested by the file being allocated. In some file systems, the first fit method is
used to find the space closest to the last allocation of the file being extended, thereby allowing
m
the allocation to be sequential block addresses allocated for the file within the file system.
o
Best Fit
a
.c
The best fit method tries to find the best place in the file system for the allocation of the data.
m
This method is used to try to reduce total file system fragmentation. This method always takes
a
more CPU cycles than first fit, because the whole file system must be searched for the best
n
allocation. (Note that in systems using round-robin allocation only, the device on which the
y
d
initial allocation was made must be searched.) This method works to reduce fragmentation,
especially when files cannot be pre-allocated (for file systems that support this method) or for
u
large allocations, such as multiple megabytes. Most vendors do not support this method, and
t
most allocations in file systems are not large because the overhead would be huge. The old
S
Cray NC1FS supports this method by using hardware vector registers to quickly perform the
search.
S.no
Q.1
Rgpv question
Give difference between
manager and file system ?
Year
Marks
Volume Rgpv Dec2013(7)
7
Rgpv Dec2011(5)
5
11
Unit 02/Lecture -04
Physical disk structure[Rgpv/dec2012(10)]
Data on the disk is recorded on tracks, which are concentric rings on the platter around the
spindle. The tracks are numbered, starting from zero, from the outer edge of the platter. The
number of tracks per inch (tpi) on the platter (or the track density) measures how tightly the
tracks are packed on a platter.
m
Each track is divided into smaller units called sectors. A sector is the small- est, individually
o
addressable unit of storage. The track and sector structure is written on the platter by the drive
.c
manufacturer using a formatting operation. The number of sectors per track varies according
to the specific drive. The first personal computer disks had 17 sectors per track. Recent disks
a
have a much larger number of sectors on a single track. There can be thousands of tracks on a
m
platter, depending on the physical dimensions and recording density of the platter.
a
n
Typically, a sector holds 512 bytes of user data, although some disks can be formatted with
y
d
larger sector sizes. In addition to user data, a sector also stores other information, such as
sector number, head number or platter number, and track number. This information helps the
u
controller to locate the data on the drive, but storing this information consumes space on the
t
disk. Consequently, there is a difference between the capacity of an unformatted disk and a
S
format- ted one. Drive manufacturers generally advertise the unformatted capacity — for
example, a disk advertised as being 500gb will only hold 465.7gb of user data, and the
remaining 34.3gb is used for metadata.A cylinder is the set of identical tracks on both surfaces
of each drive plat- ter. The location of drive heads is referred to by cylinder number, not by
track number.
12
m
o
a
.
.c
m
Disk structure: sectors, tracks, and cylinders
a
n
Zoned bit recording [Rgpv/dec2014 (2), Rgpv/dec2012(10)]
y
d
Because the platters are made of concentric tracks, the outer tracks can hold more data than
u
the inner tracks, because the outer tracks are physically longer than the inner tracks, as shown
t
in figure 2-6 (a). On older disk drives, the outer tracks had the same number of sectors as the
S
inner tracks, so data density was low on the outer tracks. This was an inefficient use of
available space.
Zone bit recording utilizes the disk efficiently. As shown in figure 2-6 (b), this mechanism groups
tracks into zones based on their distance from the center of the disk. The zones are numbered,
with the outermost zone being zone 0. An appropriate number of sectors per track are assigned
to each zone, so a zone near the center of the platter has fewer sectors per track than a zone on
the outer edge. However, tracks within a particular zone have the same number of sectors.
13
m
Zoned bit recording
o
Disk drive performance - [Rgpv/dec2013(10)]
a
.c
A disk drive is an electromechanical device that governs the overall perfor- mance of the
m
storage system environment. The various factors that affect the performance of disk drives
a
are discussed in this section.
n
disk service time
y
d
Disk service time is the time taken by a disk to complete an i/o request. Components
u
that contribute to service time on a disk drive are seek time, rota- tional latency, and
t
data transfer rate.
Seek time
S
The seek time (also called access time) describes the time taken to position the r/w heads
across the platter with a radial movement (moving along the radius of the platter). In other
words, it is the time taken to reposition and settle the arm and the head over the correct
track. The lower the seek time, the faster the i/o opera- tion. Disk vendors publish the
following seek time specifications:
14
full stroke: the time taken by the r/w head to move across the entire width of the disk, from
the innermost track to the outermost track.
average: the average time taken by the r/w head to move from one random track to
another, normally listed as the time for one-third of a full stroke.
track-to-track: the time taken by the r/w head to move between adjacent tracks.
Each of these specifications is measured in milliseconds. The average seek time on a modern disk
is typically in the range of 3 to 15 milliseconds. Seek time has more impact on the read
operation of random tracks rather than adjacent tracks. To minimize the seek time, data can be
written to only a subset of the available cylinders. This results in lower usable capacity than
m
the actual capacity of the drive. For example, a 500 gb disk drive is set up to use only the first
o
40 percent of the cylinders and is effectively treated as a 200 gb drive. This is known as shortstroking the drive.
a
Rotational latency
.c
m
a
To access data, the actuator arm moves the r/w head over the platter to a par- ticular track
n
while the platter spins to position the requested sector under the r/w head. The time taken
y
d
by the platter to rotate and position the data under the r/w head is called rotational latency.
This latency depends on the rotation speed of the spindle and is measured in milliseconds.
u
The average rotational latency is one-half of the time taken for a full rotation. Similar to the
t
seek time, rotational latency has more impact on the reading/writing of random sectors on
S
the disk than on the same operations on adjacent sectors.Average rotational latency is around
5.5 ms for a 5,400-rpm drive, and around
2.0 ms for a 15,000-rpm drive.
Data transfer rate
The data transfer rate (also called transfer rate) refers to the average amount of data per unit
time that the drive can deliver to the hba. It is important to first understand the process of
15
read and write operations in order to calculate data transfer rates. In a read operation, the
data first moves from disk platters to r/w heads, and then it moves to the drive s internal
buffer. Finally, data moves from the buffer through the interface to the host hba. In a write
operation, the data moves from the hba to the internal buffer of the disk drive through the
drive s interface. The data then moves from the buffer to the r/w heads. Finally, it moves
from the r/w heads to the platters.
m
o
a
.c
m
a
n
u
y
d
t
S
S.no
Q.1
Q.2
Rgpv question
Year
Marks
What do you understand by Zoned bit
Dec 2014
2
recording and logical block addressing?
Dec 2012
10
Explain various performance criteria of
Dec 2013
10
disk? Write specification of disk?
16
Unit 02/Lecture-05
Raid levels – [Rgpv/dec 2015(7),Rgpv/dec2012(10),Rgpv/dec2012(10)]
Raid levels are defined on the basis of striping, mirroring, and parity techniques. These
techniques determine the data availability and performance characteristics of an array. Some
raid arrays use one technique, whereas others use a combination of techniques. Application
performance and data availability requirements determine the raid level selection.
m
o
Striping
.c
A raid set is a group of disks. Within each disk, a predefined number of contiguously
a
addressable disk blocks are defined as strips. The set of aligned strips that spans across all the
disks within the raid set is called a stripe. Figure 3-2 shows physical and logical representations
m
of a striped raid set.
a
n
u
y
d
S
t
Striped raid set
17
Strip size (also called stripe depth) describes the number of blocks in a strip, and is the
maximum amount of data that can be written to or read from a single HDD in the set before the
next HDD is accessed, assuming that the accessed data starts at the beginning of the strip.
Note that all strips in a stripe have the same number of blocks, and decreasing strip size means
that data is broken into smaller pieces when spread across the disks.
Stripe size is a multiple of strip size by the number of HDDs in the raid set. Stripe width refers to
the number of data strips in a stripe. Striped raid does not protect data unless parity or
mirroring is used. However, striping may significantly improve i/o performance. Depending on
The type of raid implementation, the raid controller can be configured to access data across
m
multiple HDDs simultaneously.
o
Mirroring
a
.c
m
Mirroring is a technique whereby data is stored on two different HDDs, yield- ing two copies of
a
data. In the event of one HDD failure, the data is intact on the surviving HDD (see figure 3-3)
n
and the controller continues to service the host s data requests from the surviving disk of a
mirrored pair.
y
d
When the failed disk is replaced with a new disk, the controller copies the data from the
u
surviving disk of the mirrored pair. This activity is transparent to the host.
t
In addition to providing complete data redundancy, mirroring enables faster recovery from disk
S
failure. However, disk mirroring provides only data protection and is not a substitute for data
backup. Mirroring constantly captures changes in the data, whereas a backup captures point-intime images of data.
Mirroring involves duplication of data
The amount of storage capacity needed is twice the amount of data being stored. Therefore,
mirroring is considered expensive and is preferred for mission-critical applications that cannot
18
afford data loss. Mirroring improves read performance because read requests can be serviced
by both disks. However, write performance deteriorates, as each write request manifests as two
writes on the HDDs. In other words, mirroring does not deliver the same levels of write
performance as a striped raid.
m
o
a
.c
m
Mirrored disks in an array
a
n
Parity
y
d
Parity is a method of protecting striped data from HDD failure without the cost of mirroring. An
u
additional HDD is added to the stripe width to hold parity, a mathematical construct that
t
allows re-creation of the missing data. Parity is a redundancy check that ensures full
S
protection of data without maintaining a full set of duplicate data.
Parity information can be stored on separate, dedicated HDDs or distributed across all the
drives in a raid set. Figure 3-4 shows a parity raid. The first four disks, labeled d, contain the
data. The fifth disk, labeled p, stores the parity information, which in this case is the sum of the
elements in each row. Now, if one of the ds fails, the missing value can be calculated by
subtracting the sum of the rest of the elements from the parity value.
19
Parity raid
m
The computation of parity is represented as a simple arithmetic operation on the data. However,
o
parity calculation is a bitwise xor operation. Calculation of parity is a function of the raid
controller.
a
.c
Compared to mirroring, parity implementation considerably reduces the cost associated with
m
data protection. Consider a raid configuration with five disks. Four of these disks hold data,
a
and the fifth holds parity information. Parity requires 25 percent extra disk space compared
n
to mirroring, which requires 100 percent extra disk space. However, there are some
y
d
disadvantages of using parity. Parity information is generated from data on the data disk. Therefore,
parity is recalculated every time there is a change in data. This recalculation is time-consuming
u
and affects the performance of the raid controller.
t
S
S.no
Q.1
Rgpv question
Write short note on stripping and
mirroring?
Year
Dec 2012
Marks
10
20
Unit-02/Lecture-06
Raid levels – [Rgpv/dec 2014(3),Rgpv/dec2012(10)]
Raid 0
In a raid 0 configuration, data is striped across the HDDs in a raid set. It utilizes the full storage
capacity by distributing strips of data over multiple HDDs in a raid set. To read data, all the strips
are put back together by the controller. The stripe size is specified at a host level for software
m
raid and is vendor specific for hardware raid. Figure 3-5 shows raid 0 on a storage array in
o
which data is striped across 5 disks. When the number of drives in the array increases,
.c
performance improves because more data can be read or written simultaneously. Raid 0 is used
a
in applications that need high i/o throughput. However, if these applications require high
availability, raid 0 does not provide data protection and availability in the event of drive
m
failures.
a
n
Raid 1
y
d
In a raid 1 configuration, data is mirrored to improve fault tolerance . A raid 1 group consists
u
of at least two HDDs. As explained in mirroring, every write is written to both disks, which is
t
transparent to the host in a hardware raid implementation. In the event of disk failure, the
S
impact on data recovery is the least among all raid implementations. This is because the raid
controller uses the mirror drive for data recovery and continuous operation. Raid is suitable for
applications that require high availability.
21
m
o
a
Raid 1
.c
m
Nested Raid
a
n
Most data centers require data redundancy and performance from their raid arrays. Raid 0+1
y
d
and raid 1+0 combine the performance benefits of raid 0 with the redundancy benefits of raid
1. They use striping and mirroring techniques and combine their benefits. These types of raid
u
t
require an even number of disks, the minimum being four (see figure 3-7).
S
Raid 1+0 is also known as raid 10 (ten) or raid 1/0. Similarly, raid 0+1 is also known as raid 01
or raid 0/1. Raid 1+0 performs well for workloads that use small, random, write-intensive i/o.
Some applications that benefit from raid 1+0 include the following:

High transaction rate online transaction processing (oltp)

Large messaging installations

Database applications that require high i/o
rate, random access, and high
availability
A common misconception is that raid 1+0 and raid 0+1 are the same. Under normal conditions,
22
raid levels 1+0 and 0+1 offer identical benefits. However, rebuild operations in the case of
disk failure differ between the two.
Raid 1+0 is also called striped mirror. The basic element of raid 1+0 is a mirrored pair,
which means that data is first mirrored and then both copies of data are striped across
multiple HDDs in a raid set. When replacing a failed drive, only the mirror is rebuilt. In other
words, the disk array controller uses the surviving drive in the mirrored pair for data recovery
and continuous operation. Data from the surviving disk is copied to the replacement disk.
Raid 0+1 is also called mirrored stripe. The basic element of raid 0+1 is a stripe. This means
that the process of striping data across HDDs is performed initially and then the entire stripe
m
is mirrored. If one drive fails, then the entire stripe is faulted. A rebuild operation copies the
o
entire stripe, copying data from each disk in the healthy stripe to an equivalent disk in the
.c
failed stripe. This causes increased and unnecessary i/o load on the surviving disks and
a
makes the raid set more vulnerable to a second disk failure.
m
a
n
S
t
u
y
d
23
m
o
a
.c
m
a
n
u
y
d
t
S
Nested
Raid 3
Raid 3 stripes data for high performance and uses parity for improved fault tolerance. Parity
information is stored on a dedicated drive so that data can be reconstructed if a drive fails.
24
For example, of five disks, four are used for data and one is used for parity. Therefore, the
total disk space required is 1.25 times the size of the data disks. Raid 3 always reads and
writes complete stripes of data across all disks, as the drives operate in parallel. There are no
partial writes that update one out of many strips in a stripe.
m
o
a
.c
m
a
raid 3
n
y
d
Raid 3 provides good bandwidth for the transfer of large volumes of data. Raid 3 is used in
applications that involve large sequential data access, such as video streaming.
Raid 4
u
S
t
Similar to raid 3, raid 4 stripes data for high performance and uses parity for improved fault
tolerance . Data is striped across all disks except the parity disk in the array. Parity information is
stored on a dedicated disk so that the data can be rebuilt if a drive fails. Striping is done at the
block level.
Unlike raid 3, data disks in raid 4 can be accessed independently so that specific data
elements can be read or written on single disk without read or write of an entire stripe. Raid
25
4 provides good read throughput and reason- able write throughput.
Raid 5
Raid 5 is a very versatile raid implementation. It is similar to raid 4 because it uses striping and the
drives (strips) are independently accessible. The difference between raid 4 and raid 5 is the parity
location. In raid 4, parity is written to a dedicated drive, creating a write bottleneck for the parity
disk. In raid 5, parity is distributed across all disks. The distribution of parity in raid 5 overcomes
the write bottleneck. The raid 5 implementation. Raid 5 is preferred for messaging, data mining,
m
medium-performance media serving, and relational database management system (RDBMS)
o
implementations in which database administrators (DBAS) optimize data access.
a
.c
m
a
n
u
y
d
t
S
raid 5
Raid 6
Raid 6 works the same way as raid 5 except that raid 6 includes a second parity element to
enable survival in the event of the failure of two disks in a raid group. Therefore, a raid 6
26
implementation requires at least four disks. Raid 6 distributes the parity across all the disks.
The write penalty in raid 6 is more than that in raid 5; therefore, raid 5 writes perform better than
raid 6. The rebuild operation in raid 6 may take longer than that in raid 5 due to the presence of
two parity sets.
m
o
a
m
a
y
d
Comparison of different raid types
S
t
u
n
Raid 6
.c
27
S.no
Q.1
Q.2
Rgpv question
Year
Marks
Write down different levels of RAID
Dec 2015
7
and compare them?
Dec 2012
10
Dec 2011
10
Dec 2014
3
How RAID4 is different from RAID3?
o
m
a
m
a
n
S
t
u
y
d
.c
28
Unit-02/Lecture-07
Disk service time - [Rgpv/dec2012(10)]
Disk Service time (Ts) can be calculated using its Rotational Latency (L), Average Seek Time (T)
and Internal Data Transfer Time (X).
i.e Ts = L + T + X
Hence the components of Disk Service Time are Disk Rotational Latency, Average Seek Time and
m
Internal Data Transfer Time.
o
.c
In random I/O operation seek time will be high, as the R/W head to seek different sectors on
a
different tracks on the platter to read/write an I/O to/from disk.
m
So Seek Time contributes largest percentage of the disk service time in a random I/O operation.
a
n
S.no
Q.1
u
t
y
d
Rgpv question
Year
Marks
Which components constitute the disk
Dec 2012
10
S
service
time?
Which
component
contributes the largest percentage of
the disk service time in a random I/O
operation?
29
Additional Topic/Lecture-08
Intelligent disk subsystems overview
Intelligent disk subsystems represent the third level of complexity for controllers after
jbods and raid arrays. The controllers of intelligent disk subsystems offer additional functions
over and above those offered by raid. In the disk subsystems that are currently available on
the market these functions are usually instant copies, remote mirroring and lun masking.
m
Instant copies
o
.c
Instant copies can virtually copy data sets of several terabytes within a disk subsystem in a
a
few seconds. Virtual copying means that disk subsystems fool the attached servers into
believing that they are capable of copying such large data quantities in such a short space of
m
time. The actual copying process takes significantly longer. However, the same server, or a
a
second server, can access the virtually copied data after a few seconds.
n
y
d
There are numerous alternative implementations for instant copies. One thing that all
implementations have in common is that the pretence of being able to copy data in a
u
matter of seconds costs resources. All realizations of instant copies require controller
t
computing time and cache and place a load on internal i/o channels and hard disks. The
S
different implementations of instant copy force the performance down at different times.
However, it is not possible to choose the most favorable implementation alternative
depending upon the application used because real disk subsystems only ever realize one
implementation alternative of instant copy.
30
m
o
a
.c
m
a
Instant copies can virtually copy several terabytes of data within a disk sub- system in a
n
few seconds: server 1 works on the original data (1). The original data is virtually copied in a
y
d
few seconds . Then server 2 can work with the data copy, whilst server 1 continues to
operate with the original data .
S
t
u
31
Additional Topic Unit - 02/Lecture-09
Disk drive components
A disk drive uses a rapidly moving arm to read and write data across a flat plat- ter coated with
magnetic particles. Data is transferred from the magnetic platter through the r/w head to the
computer. Several platters are assembled together with the r/w head and controller, most
commonly referred to as a hard disk drive (HDD). Data can be recorded and erased on a magnetic
disk any number of times. This section details the different components of the disk, the mechanism
m
for organizing and storing data on disks, and the factors that affect disk performance.
o
Key components of a disk drive are platter, spindle, read/write head, actuator arm assembly, and
controller
a
m
a
n
S
t
u
y
d
.c
32
m
o
a
.c
m
a
n
Disk drive components
Platter
u
y
d
A typical HDD consists of one or more flat circular disks called platters. The data is
t
recorded on these platters in binary codes . The set of rotating platters is sealed in a case,
S
called a head disk assembly(hda). A platter is a rigid, round disk coated with magnetic material
on bothSurfaces (top and bottom). The data is encoded by polarizing the magnetic Area, or
domains, of the disk surface. Data can be written to or read from both Surfaces of the platter.
The number of platters and the storage capacity of each Platter determine the total capacity of
the drive.
33
m
Spindle
o
.c
A spindle connects all the platters, as shown in figure 2-3, and is connected to a motor. The
motor of the spindle rotates with a constant speed.The disk platter spins at a speed of
a
several thousands of revolutions per minute (rpm). Disk drives have spindle speeds of 7,200
m
rpm, 10,000 rpm, or 15,000 rpm. Disks used on current storage systems have a platter
a
diameter of 3.5” (90 mm). When the platter spins at 15,000 rpm, the outer edge is moving At
n
around 25 percent of the speed of sound. The speed of the platter is increasIng with
y
d
improvements in technology, although the extent to which it can be Improved is limited.
read/write head
u
t
S
Read/write (r/w) heads, shown in figure 2-4, read and write data from or to a platter. Drives
have two r/w heads per platter, one for each surface of the platter. The r/w head changes
the magnetic polarization on the surface of the platter when writing data. While reading
data, this head detects magnetic polarization on the surface of the platter. During reads and
writes, the r/w head senses the magnetic polarization and never touches the surface of the
platter. When the spindle is rotating, there is a microscopic air gap between the r/w heads
and the platters, known as the head flying height. This air gap is removed when the spindle
34
stops rotating and the r/w head rests on a special area on the platter near the spindle. This area
is called the landing zone. The landing zone is coated with a lubricant to reduce friction
between the head and the platter.
The logic on the disk drive ensures that heads are moved to the landing zone before they touch
the surface. If the drive malfunctions and the r/w head accidentally touches the surface of the
platter outside the landing zone, a head crash occurs. In a head crash, the magnetic coating on
the platter is scratched and may cause damage to the r/w head. A head crash generally results
in data loss.
m
o
a
.c
m
a
n
Controller
u
y
d
Actuator arm assembly
S
t
The controller (see figure 2-2 [b]) is a printed circuit board, mounted at the bottom of a disk
drive. It consists of a microprocessor, internal memory, circuitry,
And firmware. The firmware controls power to the spindle motor and the speed of the motor. It
also manages communication between the drive and the host. In addition, it controls the r/w
operations by moving the actuator arm and switching between different r/w heads, and
performs the optimization of data access.
35
Additional Topic Unit - 02/Lecture -10
Physical disk structure
Data on the disk is recorded on tracks, which are concentric rings on the platter around the
spindle. The tracks are numbered, starting from zero, from the outer edge of the platter. The
number of tracks per inch (tpi) on the platter (or the track density) measures how tightly the
tracks are packed on a platter.
Each track is divided into smaller units called sectors. A sector is the small- est, individually
m
addressable unit of storage. The track and sector structure is written on the platter by the drive
o
manufacturer using a formatting operation. The number of sectors per track varies according
.c
to the specific drive. The first personal computer disks had 17 sectors per track. Recent disks
a
have a much larger number of sectors on a single track. There can be thousands of tracks on a
platter, depending on the physical dimensions and recording density of the platter.
m
Typically, a sector holds 512 bytes of user data, although some disks can be formatted with
a
larger sector sizes. In addition to user data, a sector also stores other information, such as
n
sector number, head number or platter number, and track number. This information helps the
y
d
controller to locate the data on the drive, but storing this information consumes space on the
disk. Consequently, there is a difference between the capacity of an unformatted disk and a
u
format- ted one. Drive manufacturers generally advertise the unformatted capacity — for
t
example, a disk advertised as being 500gb will only hold 465.7gb of user data, and the
S
remaining 34.3gb is used for metadata.
A cylinder is the set of identical tracks on both surfaces of each drive plat- ter. The location of
drive heads is referred to by cylinder number, not by track number.
36
m
o
a
.
m
.c
Disk structure: sectors, tracks, and cylinders
a
n
S
t
u
y
d
37
Additional Topic Unit -0 2/Lecture-11
Hot spares[Rgpv/dec2015(2)]
A hot spare refers to a spare HDD in a raid array that temporarily replaces a failed HDD of a raid
set. A hot spare takes the identity of the failed HDD in the array. One of the following methods
of data recovery is performed depending on the raid implementation:

If parity raid is used, then the data is rebuilt onto the hot spare from the parity and the
m
data on the surviving HDDs in the raid set.

o
If mirroring is used, then the data from the surviving mirror is used to copy the data.
a
.c
When the failed HDD is replaced with a new HDD, one of the following takes place:

m
the hot spare replaces the new HDD permanently. This means that it is no longer a hot
a
spare, and a new hot spare must be configured on the array.

n
when a new HDD is added to the system, data from the hot spare is copied to it. The
y
d
hot spare returns to its idle state, ready to replace the next failed drive.
u
A hot spare should be large enough to accommodate data from a failed drive. Some systems
t
implement multiple hot spares to improve data availability. A hot spare can be configured as
S
automatic or user initiated, which specifies how it will be used in the event of disk failure. In
an automatic configuration, when the recoverable error rates for a disk exceed a predetermined
threshold, the disk subsystem tries to copy data from the failing disk to the hot spare
automatically. If this task is completed before the damaged disk fails, then the subsystem
switches to the hot spare and marks the failing disk as unusable. Otherwise, it uses parity or the
mirrored disk to recover the data. In the case of a user-initiated configuration, the administrator
has control of the rebuild process. For example, the rebuild could occur overnight to prevent
38
any degradation of system performance. However, the system is vulnerable to another failure if
a hot spare is unavailable.
Modern raid controllers can manage a common pool of hot spare disks for several virtual raid
disks. Hot spare disks can be defined for all raid levels that offer redundancy.
The recreation of the data from a defective hard disk takes place at the same time as
Write and read operations of the server to the virtual hard disk, so that from the point of
view of the server, performance reductions at least can be observed. Modern hard disks
come with self-diagnosis programs that report an increase in write and read errors to the
system administrator in plenty of time: caution! I am about to depart this life. Please
m
replace me with a new disk. Thank you! to this end, the individual hard disks store the
o
data with a redundant code such as the hamming code. The hamming code permits the
.c
correct recreation of the data, even if individual bits are changed on the hard disk. If the
a
system is looked after properly you can assume that the installed physical hard disks will hold
m
out for a while. Therefore, for the benefit of higher performance, it is generally an acceptable
a
risk to give access by the server a higher priority than the recreation of the data of an
n
exchanged physical hard disk.
y
d
A further side-effect of the bringing together of several physical hard disks to form a virtual
u
hard disk is the higher capacity of the virtual hard disks. As a result, less device addresses are
t
used up in the i/o channel and thus the administration of the server is also simplified, because
S
less hard disks (drive letters or volumes) need to be used.
39
m
o
a
.c
m
a
n
u
Hot spare disk
y
d
t
S
The disk subsystem provides the server with two virtual disks for which a common hot spare
disk is available (1). Due to the redundant data storage the server can continue to process
data even though a physical disk has failed, at the expense of a reduction in performance (2).
The raid controller recreates the data from the defective disk on the hot spare disk (3). After
the defective disk has been replaced a hot spare disk is once again available (4).
S.no
Q.1
Rgpv question
What is hot sparing?
Year
Marks
Dec 2014
2
40
Additional Topic Unit - 02/Lecture-12
Front end
The front end provides the interface between the storage system and the host. It consists of two
components: front-end ports and front-end controllers. The front-end ports enable hosts to
connect to the intelligent storage system. Each front-end port has processing logic that
executes the appropriate transport pro- tocol, such as scsi, fibre channel, or iscsi, for storage
m
connections. Redundant ports are provided on the front end for high availability.
o
.c
Front-end controllers route data to and from cache via the internal data bus. When cache
receives write data, the controller sends an acknowledgment mes- sage back to the host.
a
Controllers optimize i/o processing by using command queuing algorithms.
m
a
Front-end command queuing
n
y
d
Command queuing is a technique implemented on front-end controllers. It determines the
execution order of received commands and can reduce unnec- essary drive head movements
u
and improve disk performance. When a com- mand is received for execution, the
t
command queuing algorithms assigns a tag that defines a sequence in which commands
S
should be executed. With command queuing, multiple commands can be executed
concurrently based on the organization of data on the disk, regardless of the order in which
the commands were received.
The most commonly used command queuing algorithms are as follows:

First in first out (fifo): this is the default algorithm where commands are executed in
the order in which they are received. There is no reordering of requests for
41
optimization; therefore, it is inefficient in terms of performance.

Seek time optimization: commands are executed based on optimizing read/write head
movements, which may result in reordering of commands. Without seek time
optimization, the commands are executed in the order they are received. The
commands are executed in the order a, b, c and d. The radial movement required by
the head to execute c immediately after a is less than what would be required to
execute b. With seek time optimization, the command execution sequence would be
a, c, b and d, as shown in figure.
m
o
a
m
a
n
S
t
u
y
d
.c
42
Additional Topic Unit - 02/Lecture -13
Hard disks and internal i/o channels
The controller of the disk subsystem must ultimately store all data on physical hard disks.
Standard hard disks that range in size from 36 gb to 1 tb are currently used for this
purpose. Since the maximum number of hard disks that can be used is often limited, the
size of the hard disk used gives an indication of the maximum capacity of the overall disk
subsystem.
m
o
When selecting the size of the internal physical hard disks it is necessary to weigh the
.c
requirements of maximum performance against those of the maximum capacity of the
a
overall system. With regard to performance it is often beneficial to use smaller hard disks at
the expense of the maximum capacity: given the same capacity, if more hard disks are
m
available in a disk subsystem, the data is distributed over several hard disks and thus the
a
overall load is spread over more arms and read/write heads and usually over more i/o
n
channels (figure 2.4). For most applications, medium-sized hard disks are sufficient.
S
t
u
y
d
43
m
o
.c
If small internal hard disks are used, the load is distributed over more hard disks and thus
over more read and write heads. On the other hand, the maximum storage capacity is
a
reduced, since in both disk subsystems only 16 hard disks can be fitted.
m
a
For applications with extremely high performance requirements should smaller hard disks
n
be considered. However, consideration should be given to the fact that more modern,
y
d
larger hard disks generally have shorter seek times and larger caches, so it is necessary
to carefully weigh up which hard disks will offer the highest performance for a certain
u
load profile in each individual case.
t
S
Standard i/o techniques such as scsi, fibre channel, increasingly serial ata (sata) and serial
attached scsi (sas) and, still to a degree, serial storage architecture (ssa) are being used for
internal i/o channels between connection ports and controller as well as between
controller and internal hard disks. Sometimes, however, proprietary – i.e., manufacturerspecific – i/o techniques are used. Regardless of the i/o technology used, the i/o channels
can be designed with built-in redundancy in order to increase the fault-tolerance of a
disk subsystem. The following cases can be differentiated here:
44
• active
In active cabling the individual physical hard disks are only connected via one i/o
channel (figure 2.5, left). If this access path fails, then it is no longer possible to
access the data.
• active/passive
In active/passive cabling the individual hard disks are connected via two i/o channels
(figure 2.5, right). In normal operation the controller communicates with the hard disks via
the first i/o channel and the second i/o channel is not used. In the event of the
Failure of the first i/o channel, the disk subsystem switches from the first to the second
m
I/o channel.
o
• active/active (no load sharing)
.c
In this cabling method the controller uses both i/o channels in normal operation
(figure 2.6, left). The hard disks are divided into two groups: in normal operation the
a
first group is addressed via the first i/o channel and the second via the second i/o
m
channel. If one i/o channel fails, both groups are addressed via the other i/o channel.
• active/active (load sharing)
a
n
In this approach all hard disks are addressed via both i/o channels in normal operation
y
d
(figure 2.6, right). The controller divides the load dynamically between the two i/o
channels so that the available hardware can be optimally utilized. If one i/o channel
u
fails, then the communication goes through the other channel only.
t
S
Active cabling is the simplest and thus also the cheapest to realise but offers no protection
against failure. Active/passive cabling is the minimum needed to protect against failure,
whereas active/active cabling with load sharing best utilises the underlying hard- ware.
Implementation of raid
There are two types of raid implementation, hardware and software.
45
Software Raid
Software raid uses host-based software to provide raid functions. It is implemented at the
operating-system level and does not use a dedicated hardware controller to manage the raid
array.
Software raid implementations offer cost and simplicity benefits when com- pared with
hardware raid. However, they have the following limitations:

Performance: software raid affects overall system performance. This is due to the
m
additional cpu cycles required to perform raid calculations. The performance impact is
o
more pronounced for complex implementations of raid, as detailed later in this
chapter.
a
.c

Supported features: software raid does not support all raid levels.

Operating system compatibility: software raid is tied to the host operating system hence
m
a
upgrades to software raid or to the operating system should be validated for
n
compatibility. This leads to inflexibility in the data processing environment.
Hardware raid
u
y
d
t
In hardware raid implementations, a specialized hardware controller is implemented either on
S
the host or on the array. These implementations vary in the way the storage array interacts
with the host.
Controller card raid is host-based hardware raid implementation in which a specialized raid
controller is installed in the host and HDDs are connected to it. The raid controller interacts
with the hard disks using a pci bus. Manufacturers also integrate raid controllers on
motherboards. This integration reduces the overall cost of the system, but does not provide
the flexibility required for high-end storage systems.
46
Reference
Book
Information storage management
Author
Priority
G. Somasundaram
1
Alok Shrivastava
Ulf
Troppens,
Wolfgang 2
Storage Network explained : Basic and Mueller-Friedt,
Rainer
m
application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils
iSESI
o
Haustein
.c
Cloud Computing : Principles, 3
a
System & Application
Nick Antonopoulos, Lee Gillam
m
a
n
S
t
u
y
d
1
Unit – 03
Introduction to Networked Storage
Unit - 03/Lecture - 01
Das(direct attached storage) [Rgpv/dec2014(2),Rgpv/dec2013 (7),Rgpv/dec2012(10)]
Das is an architecture where storage connects directly to servers. Applications access data
from Das using block-level access protocols. The internal HDD of a host, tape libraries, and
directly connected external HDD packs are some examples of das.
m
Types of Das
o
.c
Das is classified as internal or external, based on the location of the storage device with
a
respect to the host.
m
Internal das
a
n
In internal das architectures, the storage device is internally connected to the host by a serial
y
d
or parallel bus. The physical bus has distance limitations and can only be sustained over a
shorter distance for high-speed connectivity. In addition, most internal buses can support
u
only a limited number of devices, and they occupy a large amount of space inside the host,
t
making maintenance of other components difficult.
S
External das
In external das architectures, the server connects directly to the external storage device . In
most cases, communication between the host and the storage device takes place over scsi or FC
protocol. Compared to internal das, an external das overcomes the distance and device count
limitations and provides centralized management of storage devices.
2
m
o
a
Das benefits and limitations
.c
m
Das requires a relatively lower initial investment than storage networking. Storage
a
networking architectures are discussed later in this book. Das con- figuration is simple and
n
can be deployed easily and rapidly. Setup is managed using host-based tools, such as the host
y
d
os, which makes storage management tasks easy for small and medium enterprises. Das is the
simplest solution when compared to other storage networking models and requires fewer
u
management tasks, and less hardware and software elements to set up and operate.
t
S
However, das does not scale well. A storage device has a limited number of ports, which
restricts the number of hosts that can directly connect to the storage. A limited bandwidth in
das restricts the available i/o processing capability. When capacities are being reached, the
service availability may be com- promised, and this has a ripple effect on the performance of
all hosts attached to that specific device or array. The distance limitations associated with
implementing das because of direct connectivity requirements can be addressed by using fibre
channel connectivity. Das does not make optimal use of resources due to its limited ability to
3
share front end ports. In das environments, unused resources cannot be easily re-allocated,
resulting in islands of over-utilized and under-utilized storage pools.
Disk utilization, throughput, and cache memory of a storage device, along with virtual
memory of a host govern the performance of das. Raid-level configurations, storage
controller protocols, and the efficiency of the bus are additional factors that affect the
performance of das. The absence of storage interconnects and network latency provide das
with the potential to outperform other storage networking configurations.
Das disk drive interfaces
m
o
The host and the storage device in das communicate with each other by using predefined
.c
protocols such as IDE/ATA, SATA, sas, scsi, and FC. These proto- cols are implemented on the
a
HDD controller. Therefore, a storage device is also known by the name of the protocol it
m
supports. This section describes each of these storage devices in detail.
a
n
IDE/ATA
y
d
An integrated device electronics/advanced technology attachment (IDE/ATA) disk supports the
u
ide protocol. The term IDE/ATA conveys the dual-naming conventions for various generations
t
and variants of this interface. The ide component in IDE/ATA provides the specification for the
S
controllers connected.
To the computer’s motherboard for communicating with the device attached. The ATA
component is the interface for connecting storage devices, such as cd-roms, floppy disk drives,
and HDDs, to the motherboard.
IDE/ATA has a variety of standards and names, such as ATA, ATA/atapi, eide, ATA-2, fast ATA,
ATA-3, ultra ATA, and ultra dma. The latest version of ATA—ultra dma/133—supports a
throughput of 133 mb per second.
4
In a master-slave configuration, an ATA interface supports two storage devices per connector.
However, if the performance of the drive is important, sharing a port between two devices is
not recommended.
SATA
A SATA (serial ATA) is a serial version of the IDE/ATA specification. SATA is a disk-interface
technology that was developed by a group of the industry’s leading vendors with the aim of
replacing parallel ATA.
m
o
A SATA provides point-to-point connectivity up to a distance of one meter and enables data
.c
transfer at a speed of 150 mb/s. Enhancements to the SATA have increased the data transfer
a
speed up to 600 mb/s.
m
a
A SATA bus directly connects each storage device to the host through a dedicated link,
n
making use of low-voltage differential signaling. Lvds is an electrical signaling system that can
y
d
provide high-speed connectivity over. Lowcost, twisted-pair copper cables. For data transfer,
a SATA bus uses lvds With a voltage of 250 mv.A SATA bus uses a small 7-pin connector and a
u
thin cable for connectivity. A SATA port uses 4 signal pins, which improves its pin efficiency
t
compared to The parallel ATA that uses 26 signal pins, for connecting an 80-conductor ribbon
S
Cable to a 40-pin header connector. SATA devices are hot-pluggable, which means that they
can be connected or Removed while the host is up and running. A SATA port permits singledevice Connectivity. Connecting multiple SATA drives to a host requires multiple ports .To be
present on the host. Single-device connectivity enforced in SATA, eliminates.The performance
problems caused by cable or port sharing in IDE/ATA.
5
Parallel scsi [Rgpv Dec2015(3)]
Scsi is available in a variety of interfaces. Parallel scsi (referred to as scsi) is one of the oldest
and most popular forms of storage interface used in hosts. Scsi is a set of standards used for
connecting a peripheral device to a computer and transferring data between them. Often, scsi
is used to connect HDDs and tapes to a host. Scsi can also connect a wide variety of other
devices such as scanners and printers. Communication between the hosts and the storage
devices uses the scsi command set.Since its inception, scsi has undergone rapid revisions,
resulting in continu- ous performance improvements. The oldest scsi variant, called scsi-1
provided data transfer rate of 5 mb/s; scsi ultra 320 provides data transfer speeds of 320Mb/s.
m
S.no
Rgpv question
Year
Q.1
Write down advantage nd
Dec2014
disadvantage of DAS.
Dec2013
Q,2
a
n
DAS?
S
t
u
y
d
m
a
What are the Limitations of
Dec2012
.c
o
Marks
2
7
10
6
Unit-03/Lecture-02
NAS(network
attached
–
storage)
[Rgpv/dec
2015(2),
Rgpv/dec2012(10),
Rgpv/dec2011(10)]
Network attached storage ( NAS ) i s an ip-based file-sharing device attached to a local
area network. NAS provides the .Advantages of server consolidation by eliminating the need
for multiple file servers. It provides storage consolidation through file-level data access and
sharing. NAS is a preferred storage solution that enables clients to share files quickly and
m
directly with minimum storage management overhead. NAS uses network and file-sharing
o
protocols to perform filing and storage.Functions. These (ftp) and other Protocols for both
.c
environments. Recent advancements in networking technology have enabled NAS to scale up
to enterprise requirements for improved Performance and reliability in accessing data.
a
m
A NAS device is a dedicated, high-performance, high-speed, single-purpose File serving and
a
storage system. NAS serves a mix of clients and servers over an Ip network. Most NAS devices
n
support multiple interfaces and networks. A NAS device uses its own operating system and
y
d
integrated hardware, soft-Ware components to meet specific file service needs. Its
operating system is Optimized for file i/o and, therefore, performs file i/o better than a
u
general- Purpose server. As a result, a NAS device can serve more clients than traditional File
t
servers, providing the benefit of server consolidation.
S
Benefits of NAS
NAS offers the following benefits:

Supports comprehensive access to information: enables efficient file sharing and
supports
many-to-one
and
one-to-many
configurations.
The
many-to-one
7
configuration enables a NAS device to serve many clients simultaneously. The one-tomany configuration enables one client to connect with many NAS devices
simultaneously.

Improved efficiency: eliminates bottlenecks that occur during file access from a
general-purpose file server because NAS uses an operating system specialized for file
serving. It improves the utilization of general-purpose servers by relieving them of fileserver operations.

Improved flexibility: compatible for clients on both unix and windows platforms using
industry-standard protocols. NAS is flexible and can serve requests from different types
of clients from the same source.

m
Centralized storage: centralizes data storage to minimize data dupli- cation on client
o
.c
workstations, simplify data management, and ensures greater data protection.

Simplified management: provides a centralized console that makes it possible to
a
manage file systems efficiently.

m
scalability: scales well in accordance with different utilization profiles and types of
a
business applications because of the high performance and low-latency design.

n
High availability: offers efficient replication and recovery options, enabling high data
y
d
availability. NAS uses redundant networking components that provide maximum
connectivity options. A NAS device can use clustering technology for failover.

u
Security: ensures security, user authentication, and file locking in conjunc- tion with
t
industry-standard security schemas.
S
8
m
NAS(network attached storage) implementation
o
.c
There are two types of NAS implementations: integrated and gateway. The integrated NAS
device has all of its components and storage system in a single enclosure. In gateway
a
implementation, NAS head shares its storage with SAN environment.
m
integrated NAS
a
n
An integrated NAS device has all the components of NAS, such as the NAS head and storage,
y
d
in a single enclosure, or frame. This makes the integrated NAS a self-contained environment.
The NAS head connects to the ip network to provide connectivity to the clients and service the
u
file i/o requests. The storage consists of a number of disks that can range from low-cost ATA
t
to high- throughput FC disk drives. Management software manages the NAS head and storage
S
configurations.
An integrated NAS solution ranges from a low-end device, which is a single enclosure, to a
high-end solution that can have an externally connected storage array.
A low-end appliance-type NAS solution is suitable for applications that a small department
may use, where the primary need is consolidation of storage, rather than high performance or
advanced features such as disaster recovery and business continuity. This solution is fixed in
9
capacity and might not be upgradable beyond its original configuration. To expand the
capacity, the solution must be scaled by deploying additional units, a task that increases
management overhead because multiple devices have to be administered.
In a high-end NAS solution, external and dedicated storage can be used. This enables
independent scaling of the capacity in terms of NAS heads or storage. However, there is a limit
to scalability of this solution.
Gateway NAS
m
A gateway NAS device consists of an independent NAS head and one or more storage arrays.
o
The NAS head performs the same functions that it does in the integrated solution; while the
.c
storage is shared with other applications that require block-level i/o. Management functions
a
in this type of solution are more complex than those in an integrated environment because
m
there are separate administrative tasks for the NAS head and the storage. In addition to the
a
components that are explicitly tied to the NAS solution, a gateway solution can also utilize the
n
FC infrastructure, such as switches, directors, or direct-attached storage arrays.
y
d
The gateway NAS is the most scalable because NAS heads and storage arrays can be
u
independently scaled up when required. Adding processing capacity to the NAS gateway is an
t
example of scaling. When the storage limit is reached, it can scale up, adding capacity on the
S
SAN independently of the NAS head. Administrators can increase performance and i/o
processing capabilities for their environments without purchasing additional interconnect
devices and storage. Gateway NAS enables high utilization of storage capacity by sharing it
with SAN environment.
Integrated NAS connectivity
An integrated solution is self-contained and can connect into a standard ip network.
Although the specifics of how devices are connected within a NAS implementation vary by
10
vendor and model. In some cases, storage is embedded within a NAS device and is
connected to the NAS head through internal connections, such as ATA or scsi controllers. In
others, the storage may be external but connected by using scsi controllers. In a high-end
integrated NAS model, external storage can be directly connected by FC or by dedicated FC
switches. In the case of a low-end integrated NAS model, backup traffic is shared on the
same public ip network along with the regular client access traffic. In the case of a high-end
integrated NAS model, an isolated backup network can be used to segment the traffic from
impeding client access. More complex solutions may include an intelligent storage subsystem,
enabling faster backup and larger capacities while simultaneously enhancing performance.
Figure 7-4 illustrates an example of integrated NAS connectivity.
m
o
Gateway NAS connectivity
a
.c
In a gateway solution, front-end connectivity is similar to that in an integrated solution. An
m
integrated environment has a fixed number of NAS heads, making it relatively easy to
a
determine ip networking requirements. In contrast, networking requirements in a gateway
n
environment are complex to determine due to scalability options. Adding more NAS heads
y
d
may require additional networking connectivity and bandwidth.
u
Communication between the NAS gateway and the storage system in a gate- way solution is
t
achieved through a traditional FC SAN. To deploy a stable NAS solution, factors such as
S
multiple paths for data, redundant fabrics, and load distribution must be considered.
Factors affecting NAS performance
As NAS uses IP network, bandwidth and latency issues associated with IP affect NAS
performance. Network congestion is one of the most significant sources of latency in a NAS
11
environment. Other factors that affect NAS performance at different levels are:
1. Number of hops: A large number of hops can increase latency because IP processing is
required at each hop, adding to the delay caused at the router.
2. Authentication with a directory service such as LDAP, Active Directory, or NIS: The
authentication service must be available on the network, with adequate bandwidth,
and must have enough resources to accommodate the authentication load.
Otherwise, a large number of authentication requests are presented to the servers,
increasing latency. Authentication adds to latency only when authentication occurs.
m
3. Retransmission: Link errors, buffer overflows, and flow control mechanisms can result
o
in retransmission. This causes packets that have not reached the specified destination
.c
to be resent. Care must be taken when configuring parameters for speed and duplex
settings on the network devices and the NAS heads so that they match. Improper
a
configuration may result in errors and retransmission, adding to latency.
m
a
4. Overutilized routers and switches: The amount of time that an overutilized device in
n
a network takes to respond is always more than the response time of an optimally
y
d
utilized or underutilized device. Network administrators can view vendor-specific
statistics to determine the utilization of switches and routers in a network. Additional
u
devices should be added if the current devices are over utilized.
t
5. File/directory lookup and metadata requests: NAS clients access files on NAS devices.
S
The processing required before reaching the appropriate file or directory can cause
delays. Sometimes a delay is caused by deep directory structures and can be resolved
by flattening the directory structure. Poor file system layout and an over utilized disk
system can also degrade performance.
6. Overutilized NAS devices: Clients accessing multiple files can cause high utilization
levels on a NAS device which can be determined by viewing utilization statistics. High
utilization levels can be caused by a poor file system structure or insufficient resources
12
in a storage subsystem.
7. Overutilized clients: The client accessing CIFS or NFS data may also be over utilized.
An overutilized client requires longer time to process the responses received from the
server, increasing latency. Specific performance-monitoring tools are available for
various operating systems to help determine the utilization of client resources.
S.no
Q.1
Rgpv question
What
is
NAS?
Explain
how
Year
Marks
the
Dec2015
2
performance of NAS can be affected if
Dec2012
10
sender and receiver window is not
synchronized?
Q.2
Discuss various factors that affect NAS
performance?
a
m
a
n
S
t
u
y
d
o
m
Dec2012
.c
10
13
Unit-03/Lecture -03
SAN(storage area network) - [Rgpv/dec2013(7)]
Direct-attached storage (das) is often referred to as a stovepiped storage environment.
Hosts own the storage and it is difficult to manage and share resources on these isolated
storage devices. Efforts to organize this dispersed data led to the emergence of the storage
area network (SAN). SAN is a high- speed, dedicated network of servers and shared storage
devices. Traditionally connected over fibre channel (FC) networks, a SAN forms a single-
m
storage pool and facilitates data centralization and consolidation. SAN meets the stor- age
o
demands efficiently with better economies of scale. A SAN also provides effective
maintenance and protection of data.
a
Components of SAN
.c
m
a
A SAN consists of three basic components: servers, network infrastructure, and storage. These
n
components can be further broken down into the following key elements: node ports, cabling,
y
d
interconnecting devices (such as FC switches or hubs), storage arrays, and SAN management
software.
u
S
t
Nodes links and lines
14
Cabling
SAN implementations use optical fiber cabling. Copper can be used for shorter distances for
back-end connectivity, as it provides a better signal-to-noise ratio for distances up to 30
meters. Optical fiber cables carry data in the form of light. There are two types of optical
cables, multi-mode and single-mode.
Multi-mode fiber (mmf) cable carries multiple beams of light projected at different angles
simultaneously onto the core of the cable. Based on the bandwidth, multi-mode fibers are
m
classified as om1 (62.5µm), om2 (50µm) and laser optimized om3 (50µm). In an mmf
o
transmission, multiple light beams traveling inside the cable tend to disperse and collide. This
.c
collision weakens the signal strength after it travels a certain distance — a process known as
a
modal dispersion. An mmf cable is usually used for distances of up to 500 meters because of
m
signal degradation (attenuation) due to modal dispersion.
a
Single-mode fiber (smf) carries a single ray of light projected at the center of the core. These cables
n
are available in diameters of 7–11 microns; the most common size is 9 microns. In an smf
y
d
transmission, a single light beam travels in a straight line through the core of the fiber. The small
core and the single light wave limits modal dispersion. Among all types of fibre cables, single-
u
mode provides minimum signal attenuation over maximum distance (up to 10 km). A single-
t
mode cable is used for long-distance cable runs, limited only by the power of the laser at the
transmitter
S
and
sensitivity
of
Multi mode and single mode fiber
the
receiver.
15
Interconnect devices
Hubs, switches, and directors are the interconnect devices commonly used in SAN.
Hubs are used as communication devices in FC-al implementations. Hubs physically connect
nodes in a logical loop or a physical star topology. All the nodes must share the bandwidth
because data travels through all the connection points. Because of availability of low cost and
high performance switches, hubs are no longer used in sans. Switches are more intelligent than
hubs and directly route data from one physical port to another. Therefore, nodes do not share the
bandwidth. Instead, each node has a dedicated communication path, resulting in bandwidth
aggregation.
m
o
SAN management software
a
.c
SAN management software manages the interfaces between hosts, interconnect devices, and
m
storage arrays. The software provides a view of the SAN environment and enables
a
management of various resources from one central console. It provides key management
n
functions, including mapping of storage devices, switches, and servers, monitoring and generating
y
d
alerts for discovered devices, and logical partitioning of the SAN, called zoning. In addition, the
software provides management of typical SAN components such as storage components, and
u
interconnecting devices .
t
S
Connectivity of SAN(storage area network)
FC connectivity
The FC architecture supports three basic interconnectivity options: point-to- point,
arbitrated loop (FC-al), and fabric connect.
Point-to-point
Point-to-point is the simplest FC configuration — two devices are connected directly to
each other, as shown in figure 6-6. This configuration provides a dedicated connection for
16
data transmission between nodes. However, the point-to-point configuration offers limited
connectivity, as only two devices can communicate with each other at a given time. Moreover,
it cannot be scaled to accommodate a large number of network devices. Standard das usess
point- to-point connectivity.
m
o
a
.c
m
Point to point topology
Fibre channel arbitrated loop
a
n
y
d
In the FC-al configuration, devices are attached to a shared loop, FC-al has the characteristics
u
of a token ring topology and a physical star topology. In FC-al, each device contends with other
t
S
devices to perform i/o operations. Devices on the loop must arbitrate to gain control of the
loop. At any given time, only one device can perform i/o operations on the loop.
Fiber channel ports – [Rgpv/dec2013(7)]
Ports are the basic building blocks of an FC network. Ports on the switch can be one of the
following types:
■■ n_port: an end point in the fabric. This port is also known as the node port. Typically, it is
17
a host port (hba) or a storage array port that is connected to a switch in a switched fabric.
■■ nl_port: a node port that supports the arbitrated loop topology. This port is also known as
the node loop port.
■■ e_port: an FC port that forms the connection between two FC switches. This port is also
known as the expansion port. The e_port on an FC switch Connects to the e_port of another
FC switch in the fabric through a Link, which is called an inter-switch link (isl). Isls are used
to transfer Host-to-storage data as well as the fabric management traffic from one switch
to another. Isl is also one of the scaling mechanisms in SAN Connectivity.
■■ f_port: a port on a switch that connects an n_port. It is also known as a
m
Fabric port and cannot participate in FC-al.
o
■■ fl_port: a fabric port that participates in FC-al. This port is connected to the nl_ports on an
.c
FC-al loop. A fl_port also connects a loop to a switch in a switched fabric. As a result, all
a
nl_ports in the loop can participate in FC-sw. This configuration is referred to as a public
loop. In contrast, an arbitrated loop without any switches is referred to as a private loop. A
m
private loop contains nodes with nl_ports, and does not contain fl_port.
a
■■ g_port: a generic port that can operate as an e_port or an f_port and determines its
n
y
d
functionality automatically during initialization.
S
t
u
18
m
o
a
.c
m
Fibre channel port
a
n
Fibre channel topologies[Rgpv/dec2012(10)]
y
d
There are three major fibre channel topologies, describing how a number of ports are
u
connected together. A port in fibre channel terminology is any entity that actively
t
communicates over the network, not necessarily a hardware port. This port is usually
S
implemented in a device such as disk storage, an HBA on a server or a fibre channel switch.[1]

Point-to-point (FC-P2P). Two devices are connected directly to each other. This is the
simplest topology, with limited connectivity.

Arbitrated loop (FC-AL). In this design, all devices are in a loop or ring, similar to token
ring networking. Adding or removing a device from the loop causes all activity on the loop
to be interrupted. The failure of one device causes a break in the ring. Fibre Channel hubs
exist to connect multiple devices together and may bypass failed ports. A loop may also
19
be made by cabling each port to the next in a ring.

A minimal loop containing only two ports, while appearing to be similar to FC-P2P,
differs considerably in terms of the protocol.


Only one pair of ports can communicate concurrently on a loop.

Maximum speed of 8GFC.
switched fabric (FC-SW). All devices or loops of devices are connected to fibre channel
switches, similar conceptually to modern ethernet implementations. Advantages of this
topology over FC-P2P or FC-AL include:

The switches manage the state of the fabric, providing optimized interconnections.

The traffic between two ports flows through the switches only, it is not transmitted to
m
o
any other port.
.c

Failure of a port is isolated and should not affect operation of other ports.

Multiple pairs of ports may communicate simultaneously in a fabric.
a
S.no
Rgpv question
Q.1
Explain different types of
FC ports?
y
d
Explain different interfaces
Q.3
in FC?
t
u
Discuss the advantages FC-
S
SW and FC-AL?
m
Marks
Rgpv Dec2013
7
Rgpv Dec2012
10
a
n
Q.2
Year
20
Unit-03/Lecture -04
Content-addressed storage(CAS) – [Rgpv/dec 2014(3),Rgpv/dec2013(10)]
CAS is an object-based system that has been purposely built for storing fixed content data. It
is designed for secure online storage and retrieval of fixed content. Unlike file‑level and
block‑level data access that use file names and the physical location of data for storage and
retrieval, CAS stores user data and its attributes as separate objects. The stored object is
assigned a globally unique address known as a content address (ca). This address is derived from
m
the object’s binary representation. CAS provides an optimized and centrally managed storage
o
solution that can support single-instance storage (sis) to eliminate multiple copies of the same
data.
a
Features and benefits of CAS
.c
m
a
CAS has emerged as an alternative to tape and optical solutions because it over‑ comes many of
n
their obvious deficiencies. CAS also meets the demand to improve data accessibility and to properly
y
d
protect, dispose of, and ensure service level agreements for archived data. The features and
benefits of CAS include the following:
u
t
Content authenticity: it assures the genuineness of stored content. This is achieved by
S
generating a unique content address and automating the process of continuously checking
and recalculating the content address for stored objects. Content authenticity is assured
because the address assigned to each piece of fixed content is as unique as a fingerprint. Every
time an object is read, CAS uses a hashing algorithm to recalculate the object’s content
address as a validation step and compares the result to its original content address. If the
object fails validation, it is rebuilt from its mirrored copy.
Content integrity: refers to the assurance that the stored content has not been altered.
21
Use of hashing algorithm for content authenticity also ensures content integrity in CAS. If
the fixed content is altered, CAS assigns a new address to the altered content, rather than
overwrite the original fixed content, providing an audit trail and maintaining the fixed
content in its original state. As an integral part of maintaining data integrity and audit trail
capabilities, CAS supports parity raid protection in addition to mirroring. Every object in a
CAS system is systematically checked in the background. Over time, every object is tested,
guarantee‑ ing content integrity even in the case of hardware failure, random error, or
attempts to alter the content with malicious intent.
Llocation independence: CAS uses a unique identifier that applications can leverage to
m
retrieve data rather than a centralized directory, path
o
Names, or urls. Using a content address to access fixed content makes the physical location
.c
of the data irrelevant to the application request‑ ing the data. Therefore the location from
which the data is accessed is transparent to the application. This yields complete content
a
mobility to applications across locations.
m
Single-instance storage (sis): the unique signature is used to guarantee the storage of only a
a
single instance of an object. This signature is derived from the binary representation of the
n
object. At write time, the CAS system is polled to see if it already has an object with the same
y
d
signature. If the object is already on the system, it is not stored, rather only a pointer to that
object is created. Sis simplifies storage resource management tasks, especially when
u
handling hundreds of terabytes of fixed content.
t
Retention enforcement: protecting and retaining data objects is a core requirement of an
S
archive system. CAS creates two immutable components: a data object and a meta‑object
for every object stored. The meta‑ object stores object’s attributes and data handling policies.
For systems that support object‑retention capabilities, the retention policies are enforced
until the policies expire.
Record-level protection and disposition: all fixed content is stored in CAS once and is backed
up with a protection scheme. The array is com‑ posed of one or more storage clusters. Some
CAS architectures provide an extra level of protection by replicating the content onto arrays
22
located at a different location. The disposition of records also follows the stringent guidelines
established by regulators for shredding and disposing of data in electronic formats.
technology independence: the CAS system interface is impervious to technology changes. As
long as the application server is able to map the original content address the data remains
accessible. Although hardware changes are inevitable, the goal of CAS hardware vendors is to
ensure compatibility across platforms.
Fast record retrieval: CAS maintains all content on disks that provide subsecond time to first
byte (200 ms–400 ms) in a single cluster. Random disk access in CAS enables fast record
retrieval.
m
S.no
Rgpv question
Q.1
To access data in a SAN , a host uses a
Year
physical address known a logical
a
m
address. A host using a CAS device
a
does not use or need a physical
address Why?
S
t
u
.c
y
d
n
o
Dec2013
Marks
10
23
Unit -03 /Lecture-06
Hub, switches, storage array [Rgpv/dec2013(7)]
A hub is the most basic networking device that connects multiple computers or other
network devices together. Unlike a network switch or router, a network hub has no routing
tables or intelligence on where to send information and broadcasts all network data across
each connection. Most hubs can detect basic network errors such as collisions, but having all
information broadcast to multiple ports can be a security risk and cause bottlenecks. In the
m
past network hubs were popular because they were much cheaper than a switch and router,
o
but today most switches do not cost much more than a hub and are a much better solution
for any network.
a
.c
In general, a hub refers to a hardware device that enables multiple devices or connections to
be connected to a computer. Another example besides the one given above is a usb hub,
m
which allows dozens of usb devices to be connected to one computer, even though that
a
computer may only have a few usb connections. The picture is an example of a usb hub.
n
Switches
u
y
d
t
A switch is a device used on a computer network to physically connect devices together.
S
Multiple cables can be connected to a switch to enable networked devices to communicate
with each other. Switches manage the flow of data across a network by only transmitting a
received message to the device for which the message was intended. Each networked device
connected to a switch can be identified using a mac address, allowing the switch to regulate
the flow of traffic. This maximises security and efficiency of the network.
Because of these features, a switch is often considered more "intelligent" than a network
hub. Hubs neither provide security, or identification of connected devices. This means that
24
messages have to be transmitted out of every port of the hub, greatly degrading the
efficiency of the network. Switches may operate at one or more layers of the osi model,
including the data link and network layers. A device that operates simultaneously at more
than one of these layers is known as a multilayer switch.
In switches intended for commercial use, built-in or modular interfaces make it possible to
connect different types of networks, including ethernet, fibre channel, atm, itu-t g.hn and
802.11. This connectivity can be at any of the layers mentioned. While layer-2 functionality is
adequate for bandwidth-shifting within one technology, interconnecting technologies such as
ethernet and token ring is easier at layer 3.
m
o
Devices that interconnect at layer 3 are traditionally called routers, so layer-3 switches can
also be regarded as (relatively primitive) routers.
a
.c
Where there is a need for a great deal of analysis of network performance and security,
m
switches may be connected between wan routers as places for analytic modules. Some
vendors provide firewall,[3][4] network intrusion detection, and performance analysis modules
a
that can plug into switch ports. Some of these functions may be on combined modules.
n
y
d
In other cases, the switch is used to create a mirror image of data that can go to an external
device. Since most switch port mirroring provides only one mirrored stream, network hubs
u
can be useful for fanning out data to several read-only analyzers, such as intrusion detection
t
systems and packet sniffers.
S
Storage array
The fundamental purpose of a SAN is to provide host access to storage resources. The
large storage capacities offered by modern storage arrays have been exploited in SAN
environments for storage consolidation and centralization. SAN implementations
complement the standard features of storage arrays by providing high availability and
redundancy, improved performance, business continuity, and multiple host connectivity.
25
S.no
Rgpv question
Year
Marks
Q.1
Explain briefly following
Dec2013
7
A)node port, B)storage array, C)SAN, D)hub,
E)switches
m
o
a
m
a
n
S
t
u
y
d
.c
26
Additional Topic Unit - 03/Lecture - 07
Jbod: just a bunch of disks [Rgpv/dec2015(2)]
If we compare disk subsystems with regard to their controllers we can differentiate
between three levels of complexity: (1) no controller; (2) raid controller and (3) intelligent
controller with additional services such as instant copy and remote mirroring.
If the disk subsystem has no internal controller, it is only an enclosure full of disks
m
(jbods). In this instance, the hard disks are permanently fitted into the enclosure and
o
the connections for i/o channels and power supply are taken outwards at a single point.
.c
Therefore, a jbod is simpler to manage than a few loose hard disks. Typical jbod disk
a
subsystems have space for 8 or 16 hard disks. A connected server recognises all these hard
disks as independent disks. Therefore, 16 device addresses are required for a jbod disk
m
subsystem incorporating 16 hard disks. In some i/o techniques such as scsi and fibre channel
a
arbitrated loop this can lead to a bottleneck at device addresses in contrast to intelligent
n
disk subsystems, a jbod disk subsystem in particular is not capable of supporting raid or
y
d
other forms of virtualisation. If required, however, these can be realised outside the jbod
disk subsystem, for example, as software in the server or as an independent virtualisation
u
entity in the storage network.
t
S
Fiber channel overview
The fc architecture forms the fundamental construct of the san infrastruc- ture. Fibre channel
is a high-speed network technology that runs on high-speed optical fiber cables (preferred for
front-end san connectivity) and serial cop- per cables (preferred for back-end disk
connectivity). The fc technology was created to meet the demand for increased speeds of
data transfer among com- puters, servers, and mass storage subsystems. Although fc
networking was introduced in 1988, the fc standardization process began when the american
27
national standards institute (ansi) chartered the fibre channel working group (fcwg). By
1994, the new high-speed computer interconnection standard was developed and the fibre
channel association (fca) was founded with 70 charter member companies. Technical
committee, which is the committee within incits (international committee for information
technology standards), is responsible for fibre channel interfaces. T11 (previously known as
x3t9.3) has been producing interface standards for high performance and mass storage
applications.
Higher data transmission speeds are an important feature of the fc network- ing technology.
The initial implementation offered throughput of 100 mb/s (equivalent to raw bit rate of
m
1gb/s i.e. 1062.5 mb/s in fibre channel), which was greater than the speeds of ultra scsi (20
o
.c
mb/s) commonly used in das environments. Fc in full-duplex mode could sustain throughput of
200 mb/s. In comparison with ultra-scsi, fc is a significant leap in storage networking tech-
a
nology. Latest fc implementations of 8 gfc (fibre channel) offers throughput of 1600 mb/s (raw
m
bit rates of 8.5 gb/s), whereas ultra320 scsi is available with a throughput of 320 mb/s. The fc
a
architecture is highly scalable and theoreti- cally a single fc network can accommodate
n
approximately 15 million node.
u
y
d
S.no
Rgpv question
Q.1
What is JBOD?
S
t
Year
Marks
Dec 2015
2
28
References
Book
Information storage management
Author
Priority
G. Somasundaram
1
Alok Shrivastava
Ulf Troppens, Wolfgang
Storage Network explained : Basic
Mueller-Friedt,
and application of fiber channels,
Erkens, Rainer Wolafka,
SAN, NAS, iSESI
Nils Haustein
Cloud
a
&
m
a
n
S
t
u
y
d
System
m
o
.c
:
Application
Nick Antonopoulos, Lee Gillam
Rainer
Computing
Principles,
2
3
1
Unit – 4
Hybrid storage Solutions Virtualization
Unit 04/Lecture - 01
Storage virtualization – [Rgpv/dec2013(10)]
Virtualization is the technique of masking or abstracting physical resources, which simplifies
the infrastructure and accommodates the increasing pace of business and technological
changes. It increases the utilization and capability of it resources, such as servers, networks,
m
or storage devices, beyond their physical limits. Virtualization simplifies resource
o
.c
management by pooling and sharing resources for maximum utilization and makes them
appear as logical resources with enhanced capabilities.
a
m
Forms of virtualization
a
n
Virtualization has existed in the it industry for several years and in different forms, including
memory
virtualization,
y
d
network
virtualization.
u
virtualization,
server
virtualization, and
storage
t
1. Memory virtualization
S
Virtual memory makes an application appear as if it has its own contiguous logi- cal
memory independent of the existing physical memory resources.
Since the beginning of the computer industry, memory has been and continues to be
an expensive component of a host. It determines both the size and the number of
applications that can run on a host.
With technological advancements, memory technology has changed and the cost of
2
memory has decreased. Virtual memory managers have evolved, enabling multiple
applications to be hosted and processed simultaneously.
In a virtual memory implementation, a memory address space is divided into
contiguous blocks of fixed-size pages. A process known as paging saves inactive
memory pages onto the disk and brings them back to physical memory when required.
This enables efficient use of available physical memory among different processes.
The space used by vmms on the disk is known as a swap file. A swap file (also known
as page file or swap space) is a portion of the hard disk that functions like physical
memory (ram) to the operating system. The operating system typically moves the
m
least used data into the swap file so that ram will be available for processes that are
o
more active. Because the space allocated to the swap file is on the hard disk (which is
.c
slower than the physical memory), access to this file is slower.
a
m
2.Network virtualization
a
n
Network virtualization creates virtual networks whereby each application sees its
y
d
own logical network independent of the physical network. A virtual lan (vlan) is an
example of network virtualization that provides an easy, flexible, and less expensive
u
way to manage networks. Vlans make large networks more manageable by enabling
t
a centralized configuration of devices located in physically diverse locations.
S
3. server virtualization
Server virtualization enables multiple operating systems and applications to run
simultaneously on different virtual machines created on the same physical server (or
group of servers). Virtual machines provide a layer of abstraction between the
operating
3
System and the underlying hardware. Within a physical server, any number of virtual
servers can be established; depending on hardware capabilities (see figure 10-1). Each
virtual server seems like a physical machine to the operating system, although all
virtual servers share the same underlying physical hardware in an isolated manner. For
example, the physical memory is shared between virtual servers but the address space is
not. Individual virtual servers can be restarted, upgraded, or even crashed, without
affecting the other virtual servers on the same physical machine.
S.no
Rgpv question
Q.1
What
are
Year
various
forms
Marks
of Rgpv Dec 2011
vertulization? Explain each in brief?
10
o
a
m
a
n
S
t
u
y
d
m
.c
4
Unit-04/Lecture -02
Virtual LAN(VLANs) - [Rgpv/dec 2011(10)]
In simple terms, a VLAN is a set of workstations within a LAN that can communicate with each
other as though they were on a single, isolated LAN. What does it mean to say that they
communicate with each other as though they were on a single, isolated LAN ?
Among other things, it means that:

m
Broadcast packets sent by one of the workstations will reach all the others in the
o
VLAN

workstations that are not in the VLAN

a
Broadcasts sent by workstations that are not in the VLAN will never reach
m
workstations that are in the VLAN

.c
.Broadcasts sent by one of the workstations in the VLAN will not reach any
a
n
The workstations can all communicate with each other without needing to go through
y
d
a gateway. For example, IP connections would be established by ARPing for the
destination IP and sending packets directly to the destination workstation—there
u
would be no need to send packets to the IP gateway to be forwarded on.

S
t
The workstations can communicate with each other using non-routable protocols.
A Local Area Network (LAN) was originally defined as a network of computers located within
the same area. Today, Local Area Networks are defined as a single broadcast domain. This
means that if a user broadcasts information on his/her LAN, the broadcast will be received by
every other user on the LAN. Broadcasts are prevented from leaving a LAN by using a router.
The disadvantage of this method is routers usually take more time to process incoming data
compared to a bridge or a switch. More importantly, the formation of broadcast domains
5
depends on the physical connection of the devices in the network. Virtual Local Area Networks
(VLAN's) were developed as an alternative solution to using routers to contain broadcast traffic.
In a traditional LAN, workstations are connected to each other by means of a hub or a repeater.
These devices propagate any incoming data throughout the network. However, if two people
attempt to send information at the same time, a collision will occur and all the transmitted data
will be lost. Once the collision has occurred, it will continue to be propagated throughout the
network by hubs and repeaters. The original information will therefore need to be resent after
waiting for the collision to be resolved, thereby incurring a significant wastage of time and
resources. To prevent collisions from traveling through all the workstations in the network, a
m
bridge or a switch can be used. These devices will not forward collisions, but will allow
o
broadcasts (to every user in the network) and multicasts (to a pre-specified group of users) to
.c
pass through. A router may be used to prevent broadcasts and multicasts from traveling
a
through the network.
m
The workstations, hubs, and repeaters together form a LAN segment. A LAN segment is also
a
known as a collision domain since collisions remain within the segment. The area within which
n
broadcasts and multicasts are confined is called a broadcast domain or LAN. Thus a LAN can
y
d
consist of one or more LAN segments. Defining broadcast and collision domains in a LAN
depends on how the workstations, hubs, switches, and routers are physically connected
u
together. This means that everyone on a LAN must be located in the same area.
t
S
Types of Connections
Devices on a VLAN can be connected in three ways based on whether the connected devices
are VLAN-aware or VLAN-unaware. Recall that a VLAN-aware device is one which understands
VLAN memberships (i.e. which users belong to a VLAN) and VLAN formats.
1) Trunk Link
All the devices connected to a trunk link, including workstations, must be VLAN-aware.
6
All frames on a trunk link must have a special header attached. These special frames are
called tagged frames .
2) Access Link
An access link connects a VLAN-unaware device to the port of a VLAN-aware bridge. All
frames on access links must be implicitly tagged (untagged) (see Figure8). The VLANunaware device can be a LAN segment with VLAN-unaware workstations or it can be a
number of LAN segments containing VLAN-unaware devices (legacy LAN).
3) Hybrid Link
m
o
This is a combination of the previous two links. This is a link where both VLAN-aware
.c
and VLAN-unaware devices are attached (see Figure9). A hybrid link can have both
tagged and untagged frames, but all the frames for a specific VLAN must be either
a
tagged or untagged.
m
a
Advantages

n
y
d
Performance. As mentioned above, routers
that forward data in software
become a bottleneck as LAN data rates increase. Doing away with the routers
u
removes this bottleneck.

t
S
Formation of virtual workgroups. Because workstations can be moved from one
VLAN to another just by changing the configuration on switches, it is relatively easy to
put all the people working together on a particular project all into a single VLAN. They
can then more easily share files and resources with each other.
To be honest, though, virtual workgroups sound like a good idea in theory, but often
do not work well in practice. It turns out that users are usually more interested in
accessing company-wide resources (file servers, printers, etc.) than files on each
7
others' PCs.

Greater flexibility. If users move their desks, or just move around the place with
their laptops, then, if the VLANs are set up the right way, they can plug their PC in at
the new location, and still be within the same VLAN. This is much harder when a
network is physically divided up by routers.

Ease of partitioning off resources. If there are servers or other equipment to
which the network administrator wishes to limit access, then they can be put off into
their own VLAN. Then users in other VLANs can be given access selectively.
m
o
a
.c
m
a
n
u
y
d
t
S
S.no
Rgpv question
Year
Marks
Q.1
What do you mean by VLANs?
Rgpv Dec 2011
10
8
Unit-04/Lecture -03
Management matrix – [Rgpv/dec 2012(10),rgpv dec2011(10)]
Definition: A style of management where an individual has two reporting superiors (bosses) one functional and one operational.
matrix management is the practice of managing individuals with more than one reporting line
(in a matrix organization structure), but it is also commonly used to describe managing cross
functional, cross business group and other forms of working that cross the traditional vertical
m
business units. It is a type of organizational management in which people with similar skills are
o
pooled for work assignments, resulting in more than one manager (sometimes referred to as
.c
solid line and dotted line reports, in reference to traditional business organization charts).
a
Management advantages and disadvantages
m
Key advantages that organizations seek when introducing a matrix include:

a
n
To break business information silos - to increase cooperation and communication across
y
d
the traditional silos and unlock resources and talent that are currently inaccessible to
the rest of the organization.

u
To deliver work across the business more effectively – to serve global customers,
t
manage supply chains that extend outside the organization, and run integrated business
S
regions, functions and processes.

To be able to respond more flexibly – to reflect the importance of both the global and
the local, the business and the function in the structure, and to respond quickly to
changes in markets and priorities.

To develop broader people capabilities – a matrix helps develop individuals with
broader perspectives and skills who can deliver value across the business and manage in
a more complex and interconnected environment.
9
Key disadvantages of matrix organizations include:

Mid-level management having multiple supervisors can be confusing, in that competing
agendas and emphases can pull employees in different directions, which can lower
productivity.

Mid-level management can become frustrated with what appears to be a lack of clarity
with priorities.

Mid-level management can become over-burdened with the diffusion of priorities.

Supervisory management can find it more difficult to achieve results within their area of
expertise with subordinate staff being pulled in different directions.
m
o
Application
a
.c
The advantages of a matrix for project management can include:
m
a

Individuals can be chosen according to the needs of the project.

The use of a project team that is dynamic and able to view problems in a different way
n
y
d
as specialists have been brought together in a new environment.

Project managers are directly responsible for completing the project within a specific
u
deadline and budget.
t
S
The disadvantages for project management can include:

A conflict of loyalty between line managers and project managers over the allocation of
resources.

Projects can be difficult to monitor if teams have a lot of independence.

Costs can be increased if more managers (i.e. project managers) are created through
the use of project teams.
10
S.no
Rgpv question
Year
Marks
Q.1
What do you understand by managent Rgpv Dec 2012
10
metrix? Explain.
10
Rgpv Dec 2012
m
o
a
m
a
n
S
t
u
y
d
.c
11
Unit-04/Lecture -04
Data center infrastructure - [Rgpv/dec2013(7), Rgpv/dec2012(10), Rgpv/dec2012(10]
Organizations maintain data centers to provide centralized data processing capabilities across
the enterprise. Data centers store and manage large amounts of mission-critical data. The data
center infrastructure includes computers, storage systems, network devices, dedicated power
backups, and environmental controls (such as air conditioning and fire suppression).
Large organizations often maintain more than one data center to distribute data processing
m
workloads and provide backups in the event of a disaster. The storage requirements of a data
o
center are met by a combination of various stor- age architectures.
Core elements
a
.c
m
Five core elements are essential for the basic functionality of a data center:

a
Application: an application is a computer program that provides the logic for
n
computing operations. Applications, such as an order processing system.

y
d
Database: more commonly, a database management system (dbms) provides a
structured way to store data in logically organized tables that are interrelated. A dbms
u
optimizes the storage and retrieval of data.

t
Server and operating system: a computing platform that runs applica- tions and
S
databases.

Network: a data path that facilitates communication between clients and servers or
between servers and storage.

Storage array: a device that stores data persistently for subsequent use.
These core elements are typically viewed and managed as separate entities, but all the elements
must work together to address data processing requirements.
12
key requirements for data center elements
Uninterrupted operation of data centers is critical to the survival and success of a business. It is
necessary to have a reliable infrastructure that ensures data is accessible at all times. While
the requirements, shown in figure 1-6, are appli- cable to all elements of the data center
m
infrastructure, our focus here is on storage systems.
o
a
.c
m
a
n

t
u
y
d
Key characteristics of data center elements
Availability: all data center elements should be designed to ensure accessibility. The
S
inability of users to access data can have a significant negative impact on a business.

Security: polices, procedures, and proper integration of the data center core
elements that will prevent unauthorized access to information must be established. In
addition to the security measures for client access, specific mechanisms must enable
servers to access only their allocated resources on storage arrays.

Scalability: data center operations should be able to allocate additional processing
capabilities or storage on demand, without interrupting business operations. Business
growth often requires deploying more servers, new applications, and additional
13
databases. The storage solution should be able to grow with the business.

Performance: all the core elements of the data center should be able to
optimal performance and service all processing requests at high
provide
speed. The
infrastructure should be able to support performance requirements.

Data integrity: data integrity refers to mechanisms such as error correction codes or
parity bits which ensure that data is written to disk exactly as it was received. Any
variation in data during its retrieval implies corruption, which may affect the
operations of the organization.

Capacity: data center operations require adequate resources to store and process large
amounts of data efficiently. When capacity requirements increase, the data center
m
must be able to provide additional capacity with- out interrupting availability, or, at the
o
.c
very least, with minimal disruption. Capacity may be managed by reallocation of
existing resources, rather than by adding new resources.

a
Manageability: a data center should perform all operations and activities in the most
m
efficient manner. Manageability can be achieved through automation and the
a
reduction of human (manual) intervention in com- mon tasks.
n
y
d
Managing storage infrastructure
Managing a modern, complex data center involves many tasks. Key management activities
u
include:

t
S
Monitoring is the continuous collection of information and the review of the entire
data center infrastructure. The aspects of a data center that are monitored include
security, performance, accessibility, and capacity.

Reporting is done periodically on resource performance, capacity, and utilization.
Reporting tasks help to establish business justifications and chargeback of costs
associated with data center operations.

provisioning is the process of providing the hardware, software, and other resources
needed to run a data center. Provisioning activities include capacity and resource
14
planning. Capacity planning ensures that the user’s and the application’s future needs
will be addressed in the most cost-effective and controlled manner. Resource planning is
the process of evaluating and identifying required resources, such as personnel, the
facility (site), and the technology. Resource planning ensures that adequate resources
are available to meet user and application requirements.
S.no
Rgpv question
Year
Marks
Q.1
What are the data centre? What Rgpv Dec2013
7
are the requirement for the Rgpv Dec2012
10
design of a secure data centre.
10
Rgpv Dec2012
o
m
a
m
a
n
S
t
u
y
d
.c
15
Unit-04/Lecture -05
Backup & disaster recovery – [Rgpv/dec 2013(10), Rgpv/dec 2012(7),Rgpv/dec 2012(10)]
Backup is a copy of production data, created and retained for the sole purpose of recovering
deleted or corrupted data.
With growing business and regulatory demands for data storage, retention, and availability, organizations are faced with the task of backing up an ever-increasing amount of data. This task
becomes more challenging as demand for consistent backup and quick restore of data increases
th roughout the enter prise wh ich may be spread over multiple sites. Moreover, organizations need to accomplish backup at a lower cost with minimum resources.
m
o
a
Backup purpose
.c
m
a
Backups are performed to serve three purposes: disaster recovery, operational backup, and
n
archival.
Disaster recovery
u
y
d
t
Backups can be performed to address disaster recovery needs. The backup copies
S
are used for restoring data at an alternate site when the primary site is incapacitated
due to a disaster. Based on rpo and rto requirements, organizations use different
backup strategies for disaster recovery. When a tape-based backup method is used as
a disaster recovery strategy, the backup tape media is shipped and stored at an offsite
location. These tapes can be recalled for restoration at the disaster recovery site.
Organizations with stringent rpo and rto requirements use remote replication
technology to replicate data to a disaster recovery site. This allows organizations to
bring up production systems online in a relatively short period of time in the event of a
16
disaster.
Operational backup
Data in the production environment changes with every business transaction and operation.
Operational backup is a backup of data at a point in time and is used to restore data in the
event of data loss or logical corruptions that may occur during routine processing. The
majority of restore requests in most organizations fall in this category. For example, it is
common for a user to accidentally delete an important e-mail or for a file to become
corrupted, which can be restored from operational backup.
m
Operational backups are created for the active production information by using incremental or
o
differential backup techniques, detailed later in this chapter. An example of an operational
.c
backup is a backup performed for a production data- base just before a bulk batch update. This
a
ensures the availability of a clean copy of the production database if the batch update corrupts
m
the production database.
a
n
Archival
y
d
Backups are also performed to address archival requirements. Although CAS has emerged as
u
the primary solution for archives, traditional backups are still used by small and medium
t
enterprises for long-term preservation of transaction
S
Records, e-mail messages, and other business records required for regulatory compliance.
Apart from addressing disaster recovery, archival, and operational require- ments, backups
serve as a protection against data loss due to physical damage of a storage device, software
failures, or virus attacks. Backups can also be used to protect against accidents such as a
deletion or intentional data destruction.
Backup methods
17
Hot backup and cold backup are the two methods deployed for backup. They are based on the
state of the application when the backup is performed. In a hot backup, the application is up
and running, with users accessing their data during the backup process. In a cold backup, the
application is not active during the backup process.
The backup of online production data becomes more challenging because data is actively being
used and changed. An open file is locked by the operating system and is not copied during the
backup process until the user closes it. The backup application can back up open files by retrying
the operation on files that were opened earlier in the backup process. During the backup
process, it may be possible that files opened earlier will be closed and a retry will be successful.
The maximum number of retries can be configured depending on the backup application.
m
However, this method is not considered robust because in some environments certain files
o
are always open.
.c
In such situations, the backup application provides open file agents. These agents interact
a
directly with the operating system and enable the creation of consistent copies of open files. In
m
some environments, the use of open file agents is not enough. For example, a database is
a
composed of many files of varying sizes, occupying several file systems. To ensure a
n
consistent database backup, all files need to be backed up in the same state. That does not
y
d
necessarily mean that all files need to be backed up at the same time, but they all must be
syn- chronized so that the database can be restored with consistency.
u
S
t
Backup architecture and process
18
The storage node is responsible for writing data to the backup device (in a backup environment,
a storage node is a host that controls backup devices). Typically, the storage node is integrated
with the backup server and both are hosted on the same physical platform. A backup device is
attached directly to the storage node’s host platform. Some backup architecture refers to the
storage node as the media server because it connects to the storage device. Storage nodes play
an important role in backup planning because they can be used to consolidate backup servers.
Backup software also provides extensive reporting capabilities based on the backup catalog and
the log files. These reports can include information such as the amount of data backed up, the
m
number of completed backups, the number of incomplete backups, and the types of errors that
o
may have occurred. Reports can be customized depending on the specific backup software
used.
a
S.no
Rgpv question
Q.1
Explain briefly the following
.c
a
m
Year
Marks
Rgpv Dec 2013
7
n
a) Disaster
y
d
b) Operational
backup
Q.2
t
u
c) Archival
What do you understand by backup and Rgpv Dec 2013
S
disaster recovery? Explain.
10
19
Unit 04/Lecture - 06
Backup topologies – [Rgpv/2013(7)]
Three basic topologies are used in a backup environment: direct attached backup, lan
based backup, and SAN based backup. A mixed topology is also used by combining lan based
and SAN based topologies.
In a direct-attached backup, a backup device is attached directly to the client. Only the
metadata is sent to the backup server through the lan. This configuration frees the lan from
m
backup traffic.
o
.c
Depicts use of a backup device that is not shared. As the environment grows, however, there
will be a need for central management of all backup devices and to share the resources to
a
optimize costs. An appropriate solution is to share the backup devices among multiple servers.
m
In this example, the client also acts as a storage node that writes data on the backup device.
a
n
In lan-based backup, all servers are connected to the lan and all storage devices are directly
y
d
attached to the storage node. The data to be backed up is transferred from the backup client
(source), to the backup device (destination) over the lan, which may affect network
u
performance. Streaming across the lan also affects network performance of all systems
t
connected to the same segment as the backup server. Network resources are severely
S
constrained when multiple clients access and share the same tape library unit (tlu).
This impact can be minimized by adopting a number of measures, such as configuring
separate networks for backup and installing dedicated storage nodes for some application
servers.
20
Lan-based backup topology
m
o
a
.c
m
a
n
t
u
S
y
d
SAN-based backup topology
The mixed topology uses both the lan-based and SAN-based topologies, as shown in figure.
This topology might be implemented for several reasons, including cost, server location,
reduction in administrative overhead, and performance considerations.
21
m
.c
Mixed backup topology
a
Serverless backup
o
m
a
Serverless backup is a lan-free backup methodology that does not involve a backup server
n
to copy data. The copy may be created by a network-attached controller, utilizing a scsi
y
d
extended copy or an appliance within the SAN. These backups are called serverless because
they use SAN resources instead of host resources to transport backup data from its source to
u
t
the backup device, reducing the impact on the application server.
S
Another widely used method for performing serverless backup is to lever- age local and
remote replication technologies. In this case, a consistent copy of the production data is
replicated within the same array or the remote array, which can be moved to the backup
device through the use of a storage node.
S.no
Rgpv question
Q.1
List
and
Year
explain
topologies for backup?
different Rgpv Dec 2013
Marks
7
22
Unit 04/Lecture -07
SNIA storage virtualization taxonomy - [Rgpv/dec2013(7)]
The snia (storage networking industry association) storage virtualization taxon- omy provides a
systematic classification of storage virtualization, with three levels defining what, where, and
how storage can be virtualized.
m
o
a
.c
m
a
Snia storage virtualization taxonomy
n
y
d
The first level of the storage virtualization taxonomy addresses what is created. It specifies
the types of virtualization: block virtualization, file virtual- ization, disk virtualization, tape
u
virtualization, or any other device virtualiza- tion.
t
S
The second level describes where the virtualization can take place. This requires a
multilevel approach that characterizes virtualization at all three levels of the storage
environment: server, storage network, and storage, as shown in . An effective virtualization
strategy distributes the intelligence across all three levels while centralizing the
management and control functions. Data storage functions—such as raid, caching,
checksums, and hard- ware scanning—should remain on the array. Similarly, the host should
control application-focused areas, such as clustering and application failover, and vol- ume
23
management of raw disks. However, path redirection, path failover, data access, and
distribution or load-balancing capabilities should be moved to the switch or the network.
S.no
Rgpv question
Year
Marks
Q.1
Explain SNIA storage virtualization
Rgpv Dec 2013
7
taxonomy?
m
o
a
m
a
n
S
t
u
y
d
.c
24
Unit - 04/Lecture-08
Types of storage virtualization- [Rgpv/2013(7)]
Virtual storage is about providing logical storage to hosts and applications independent of
physical resources. Virtualization can be implemented in both SAN and NAS storage
environments. In a SAN, virtualization is applied at the block level, whereas in NAS, it is applied
at the file level.
m
Block-level storage virtualization
o
.c
Block-level storage virtualization provides a translation layer in the SAN, between the hosts
and the storage arrays. Instead of being directed to the luns on the individual storage arrays,
a
the hosts are directed to the virtualized luns on the virtualization device. The virtualization
m
device translates between the virtual luns and the physical luns on the individual arrays. This
a
facilitates the use of arrays from different vendors simultaneously, without any interoperability
n
issues. For a host, all the arrays appear like a single target device and luns can be distributed or
y
d
even split across multiple arrays.
Block-level storage virtualization extends storage volumes online, resolves application growth
u
requirements, consolidates heterogeneous storage arrays, and enables transparent volume
t
access. It also provides the advantage of non- disruptive data migration.
S
In traditional SAN environments, lun migration from one array to another was an offline
event because the hosts needed to be updated to reflect the new array configuration. In
other instances, host cpu cycles were required to migrate data from one array to the
other, especially in a multi vendor environment. With a block-level virtualization solution in
place, the virtualization engine handles the back-end migration of data, which enables luns.
To remain online and accessible while data is being migrated. No physical changes are
required because the host still points to the same virtual targets on the virtualization
25
device. However, the mappings on the virtualization device should be changed. These
changes can be executed dynamically and are transparent to the end user.
m
o
a
.c
m
Block-level storage virtualization
a
n
Deploying heterogeneous arrays in a virtualized environment facilitates an information
y
d
lifecycle management (ILM) strategy, enabling significant cost and resource optimization. Lowvalue data can be migrated from high- to low-per- formance arrays or disks.
u
t
file-level virtualization
S
File-level virtualization addresses the NAS challenges by eliminating the dependencies between
the data accessed at the file level and the location where the files are physically stored. This
provides opportunities to optimize storage utilization and server consolidation and to perform
nondisruptive file migrations
26
S.no
Q.1
Rgpv question
Explain
block
level
storage
Year
Marks
Rgpv Dec 2013
7
vertulization?
m
o
a
m
a
n
S
t
u
y
d
.c
27
Additional Topic Unit -04/Lecture -09
Managing & monitoring-
Snmp- the snmp protocol was the standard used to manage multi-vendor san environments.
However, snmp was primarily a network management protocol and was inadequate for
providing the detailed information and functionality required to manage the san
environment. The unavailability of automatic discovery functions, weak modeling constructs,
and lack of transactional sup- port are some inadequacies of snmp in a san environment. Even
m
with these limitations, snmp still holds a predominant role in san management, although newer
o
open storage san management standards have emerged to monitor and manage these
environments more effectively.
a
.c
Storage management initiative
m
a
n
y
d
The storage networking industry association (snia) has been engaged in an initiative to
develop a common, open storage, and san management interface. Smi-s is based on web-based
u
enterprise management (wbem) technology and the dmtf’s common information model (cim).
t
The initiative was formally created to enable broad interoperability among heterogeneous
S
storage vendor systems and to enable better management solutions that span these environments. This initiative is known as the storage management initiative (smi).
The smi specification, known as smi-s, offers substantial benefits to users and vendors. It
forms a normalized, abstracted model to which a storage infra- structure’s physical and logical
components can be mapped, and which can be used by management applications such as
storage resource management, device management, and data management for standardized,
28
effective, end-to-end control of storage resources .
Using smi-s, the storage software developers have a single normalized and unified object
model comprising the detailed document that contains information about managing the
breadth of san components. Moreover, smi-s eliminates the need for development of vendorproprietary management interfaces, enabling vendors to focus on added value functions and
offering solutions in a way that will support new devices as long as they adhere to the standard.
Using smi-s, device vendors can build new features and functions to manage storage
subsystems and expose them via smi-s. The smi-s-compliant products lead to easier, faster
deployment, and accelerated adoption of policy-based storage management frameworks.
m
o
The information required to perform management tasks is better organized or structured in a
.c
way that enables disparate groups of people to use it. This can be accomplished by
a
developing a model or representation of the details required by users working within a
m
particular domain. Such an approach is referred to as an information model. An information
a
model requires a set of legal statements or syntax to capture the representation and
n
expressions necessary to manage common aspects of that domain.
S
t
u
y
d
29
m
o
a
.c
m
a
n
y
d
The cim is a language and methodology for describing management elements. A cim schema
u
includes models for systems, applications, networks, and devices. This schema also enables
t
applications from different vendors working on different platforms to describe the
S
management data in a standard format so that it can be shared among a variety of
management applications.
The following features of smi-s simplify san management:
Common data model: smi-s agents interact with an smi-s-enabled device, such as a switch,
a server, or a storage array, to extract relevant management data. They can also interact at
the management layer to exchange information between one management application and
30
another. They then provide this information to the requester in a consistent syntax and format.
Interconnect independence: smi-s eliminates the need to redesign the management
transport and enables components to be managed by using out-of-band communications. In
addition, smi-s offers the advantages of specifying the cmi-xml over the http protocol stack and
utilizing the lower layers of the tcp/ip stack, both of which are ubiquitous in today’s networking
world.
Multilayer management: smi-s can be used in a multilayer and cross- domain environment—
for example, server-based volume managers and network storage appliances. Many storage
deployment environments currently employ this combination.
m
legacy system accommodation: smi-s can be used to manage legacy systems by using a
o
proxy agent or can be directly supported by the device itself. Smi-s can coexist with
.c
proprietary apis and agents as well as providing integration framework for such mechanisms.
a
Policy-based management: smi-s includes object models applicable across all classes of
devices, enabling a san administrator to implement policy-based management for entire
m
storage networks.
a
n
S
t
u
y
d
31
Reference
Book
Information storage management
Author
Priority
G. Somasundaram
1
Alok Shrivastava
Ulf
Troppens,
Wolfgang 2
Storage Network explained : Basic and Mueller-Friedt,
Rainer
m
application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils
iSESI
o
Haustein
.c
Cloud Computing : Principles, 3
a
System & Application
Nick Antonopoulos, Lee Gillam
m
a
n
S
t
u
y
d
1
Unit - 05
Information storage on cloud
Unit 05/Lecture - 01
Cloud computing – [Rgpv /dec 2014(2), Rgpv/dec 2013(10), Rgpv/dec 2012(10)]
Cloud computing is a term used to refer to a model of network computing where a program or
application runs on a connected server or servers rather than on a local computing device such
as a pc, tablet or smartphone. Like the traditional client-server model or older mainframe
computing,[1] a user connects with a server to perform a task. The difference with cloud
m
o
computing is that the computing process may run on one or many connected computers at the
.c
same time, utilizing the concept of virtualization. With virtualization, one or more physical
servers can be configured and partitioned into multiple independent "virtual" servers, all
a
functioning independently and appearing to the user to be a single physical device. Such virtual
m
servers are in essence disassociated from their physical server, and with this added flexibility,
a
they can be moved around and scaled up or down on the fly without affecting the end user.
n
The computing resources have become "granular", which provides end user and operator
y
d
benefits including on-demand self-service, broad access across multiple devices, resource
pooling, rapid elasticity and service metering capability.
u
In more detail, cloud computing refers to a computing hardware machine or group of
t
S
computing hardware machines commonly referred as a server or servers connected through a
communication network such as the internet, an intranet, a local area network (lan) or wide
area network (wan). Any individual user who has permission to access the server can use the
server's processing power to run an application, store data, or perform any other computing
task. Therefore, instead of using a personal computer every time to run a native application,
the individual can now run the application from anywhere in the world, as the server provides
the processing power to the application and the server is also connected to a network via the
internet or other connection platforms to be accessed from anywhere. all this has become
2
possible due to increased computer processing power available to humankind with decreased
cost .
In common usage the term "the cloud" has become a shorthand way to refer to cloud
computing infrastructure. the term came from the cloud symbol that network engineers used
on network diagrams to represent the unknown (to them) segments of a network. marketers
have further popularized the phrase "in the cloud" to refer to software, platforms and
infrastructure that are sold "as a service", i.e. Remotely through the internet. Typically, the
seller has actual energy-consuming servers which host products and services from a remote
location, so end-users don't have to; they can simply log on to the network without installing
m
anything. The major models of cloud computing service are known as software as a service,
o
platform as a service, and infrastructure as a service. These cloud services may be offered in a
.c
public, private or hybrid network. google, amazon, ibm, oracle cloud, rackspace, salesforce,
a
zoho and microsoft are some well-known cloud vendors.
m
Cherecteristics – [Rgpv/dec 2012(10)]
a
Cloud computing exhibits the following key characteristics:

n
resources.

y
d
Agility improves with users' ability to re-provision technological infrastructure
u
Application programming interface (API) accessibility to software that enables
t
machines to interact with cloud software in the same way that a traditional user
S
interface (e.g., a computer desktop) facilitates interaction between humans and
computers. Cloud computing systems typically use Representational State Transfer
(REST)-based APIs.

Cost: cloud providers claim that computing costs reduce. A public-cloud delivery model
converts capital expenditure to operational expenditure. This purportedly lowers
barriers to entry, as infrastructure is typically provided by a third party and does not
need to be purchased for one-time or infrequent intensive computing tasks. Pricing on a
3
utility computing basis is fine-grained, with usage-based options and fewer IT skills are
required for implementation (in-house). The e-FISCAL project's state-of-the-art
repository[46] contains several articles looking into cost aspects in more detail, most of
them concluding that costs savings depend on the type of activities supported and the
type of infrastructure available in-house.

Device and location independence enable users to access systems using a web browser
regardless of their location or what device they use (e.g., PC, mobile phone). As
infrastructure is off-site (typically provided by a third-party) and accessed via the
Internet, users can connect from anywhere.

Maintenance of cloud computing applications is easier, because they do not need to be
m
installed on each user's computer and can be accessed from different places.

o
a
allowing for:
o
centralization of infrastructure in locations with lower costs (such as real estate,
m
electricity, etc.)
o
o
a
peak-load capacity increases (users need not engineer for highest possible load-
n
levels)
y
d
utilisation and efficiency improvements for systems that are often only 10–20%
utilised.

.c
Multitenancy enables sharing of resources and costs across a large pool of users thus
u
Performance is monitored, and consistent and loosely coupled architectures are
t
constructed using web services as the system interface.

S
Productivity may be increased when multiple users can work on the same data
simultaneously, rather than waiting for it to be saved and emailed. Time may be saved
as information does not need to be re-entered when fields are matched, nor do users
need to install application software upgrades to their computer.

Reliability improves with the use of multiple redundant sites, which makes welldesigned cloud computing suitable for business continuity and disaster recovery.

Scalability and elasticity via dynamic ("on-demand") provisioning of resources on a fine-
4
grained, self-service basis in near real-time (Note, the VM startup time varies by VM
type, location, os and cloud providers, without users having to engineer for peak loads.

Security can improve due to centralization of data, increased security-focused
resources, etc., but concerns can persist about loss of control over certain sensitive
data, and the lack of security for stored kernels. Security is often as good as or better
than other traditional systems, in part because providers are able to devote resources
to solving security issues that many customers cannot afford to tackle. However, the
complexity of security is greatly increased when data is distributed over a wider area or
over a greater number of devices, as well as in multi-tenant systems shared by
unrelated users. In addition, user access to security audit logs may be difficult or
m
impossible. Private cloud installations are in part motivated by users' desire to retain
o
control over the infrastructure and avoid losing control of information security.

.c
Virtualization technology allows sharing of servers and storage devices and increased
a
utilization. Applications can be easily migrated from one physical server to another
S.no
Q.1
Q.2
Rgpv question
m
Year
Marks
What is cloud computing?Enlist and
Dec 2014
2
explain essential characteristics of
Dec 2013
10
cloud computing.
Dec 2011
10
Dec 2012
10
u
a
y
d
n
What are the essential features of
t
cloud computing?
S
5
Unit 05/Lecture - 02
Architecture of cloud computing – [Rgpv/dec 2013(7), Rgpv/dec 2012(10)]
Cloud computing is a term used to refer to a model of network computing where a program or
application runs on a connected server or servers rather than on a local computing device such
as a PC, tablet or smartphone. Like the traditional client-server model or older mainframe
computing,[1] a user connects with a server to perform a task. The difference with cloud
computing is that the computing process may run on one or many connected computers at the
m
same time, utilizing the concept of virtualization. With virtualization, one or more physical
o
servers can be configured and partitioned into multiple independent "virtual" servers, all
.c
functioning independently and appearing to the user to be a single physical device. Such virtual
servers are in essence disassociated from their physical server, and with this added flexibility,
a
they can be moved around and scaled up or down on the fly without affecting the end user.
m
The computing resources have become "granular", which provides end user and operator
a
benefits including on-demand self-service, broad access across multiple devices, resource
n
pooling, rapid elasticity and service metering capability
S
t
u
y
d
6
m
o
a
m
a
n
S
t
u
y
d
.c
7
m
o
a
.c
m
a
n
Advantages
u
y
d
t
Cloud computing relies on sharing of resources to achieve coherence and economies of scale,
S
similar to a utility (like the electricity grid) over a network.[8] at the foundation of cloud
computing is the broader concept of converged infrastructure and shared services.
The cloud also focuses on maximizing the effectiveness of the shared resources. Cloud
resources are usually not only shared by multiple users but are also dynamically reallocated per
demand. This can work for allocating resources to users. For example, a cloud computer facility
that serves european users during european business hours with a specific application (e.g.,
email) may reallocate the same resources to serve north american users during north america's
8
business hours with a different application (e.g., a web server). This approach should maximize
the use of computing power thus reducing environmental damage as well since less power, air
conditioning, rackspace, etc. Are required for a variety of functions. With cloud computing,
multiple users can access a single server to retrieve and update their data without purchasing
licenses for different applications.
The term "moving to cloud" also refers to an organization moving away from a traditional capex
model (buy the dedicated hardware and depreciate it over a period of time) to the opex model
(use a shared cloud infrastructure and pay as one uses it).
m
Proponents claim that cloud computing allows companies to avoid upfront infrastructure costs,
o
and focus on projects that differentiate their businesses instead of infrastructure. [9] proponents
.c
also claim that cloud computing allows enterprises to get their applications up and running
faster, with improved manageability and less maintenance, and enables it to more rapidly
a
adjust resources to meet fluctuating and unpredictable business demand.[9][10][11] cloud
m
providers typically use a "pay as you go" model. This can lead to unexpectedly high charges if
a
administrators do not adapt to the cloud pricing model.[12]
n
S.no
Q.1
Rgpv question
Year
Marks
Explain architectural framework of
Rgpv Dec 2013
7
Explain the architectural frame work of
Rgpv Dec 2012
10
S
Rgpv Dec 2011
10
y
d
cloud computing?
Q.2
u
t
cloud computing?
Q.3
Give
a
brief
architecture?
note
on
cloud
9
Unit 05/Lecture - 03
Cloud models – [Rgpv/dec 2013(10)&(7)]
If a cloud user accesses services on the infrastructure layer, for instance, she can run her own
applications on the resources of a cloud infrastructure and remain responsible for the support,
maintenance, and security of these applications herself. If she accesses a service on the
application layer, these tasks are normally taken care of by the cloud service provider.
SaaS[Rgpv/dec2014]
m
Software-as-a-Service provides complete applications to a cloud’s end user. It is mainly
o
accessed through a web portal and service oriented architectures based on web service
.c
technologies. Credit card or bank account details must be provided to enable the fees for the
a
use of the services to be billed.
m
a
The services on the application layer can be seen as an extension of the ASP (application service
n
provider) model, in which an application is run, maintained, and supported by a service vendor.
y
d
The main differences between the services on the application layer and the classic ASP model
are the encapsulation of the application as a service, the dynamic procurement, and billing by
u
units of consumption (pay as you go). However, both models pursue the goal of focusing on
t
core competencies by outsourcing applications.
S
10
m
o
a
.c
m
PaaS
u
a
n
Software-as-a-Service (SaaS) Stack
y
d
PaaS comprises the environment for developing and provisioning cloud applications. The
t
principal users of this layer are developers seeking to develop and run a cloud application for a
S
particular platform. They are supported by the platform operators with an open or proprietary
language, a set of essential basic services to facilitate communication, monitoring, or service
billing, and various other components, for instance to facilitate startup or ensure an
application’s scalability and/or elasticity (see figure 3). Distributing the application to the
underlying infrastructure is normally the responsibility of the cloud platform operator. The
services offered on a cloud platform tend to represent a compromise between complexity and
fle i ilit that allows applications to be implemented quickly and loaded in the cloud without
11
u h
o figuratio . Restri tio s regardi g the progra
i g la guages supported, the
programming model, the ability to access resources, and persistency are possible downsides.
m
o
a
.c
m
Platform-as-a-Service (PaaS) Stack
IaaS
u
a
n
y
d
t
The services on the infrastructure layer are used to access essential IT resources that are
S
combined under the heading Infrastructure-as-a-Service (IaaS). These essential IT resources
include services linked to computing resources, data storage resources, and the
communications channel. They enable existing applications to be provisioned on cloud
resources
and
new
services
implemented
on
the
higher
layers.
Physical resources are abstracted by virtualization, which means they can then be shared by
several operating systems and end user environments on the virtual resources – ideally,
without any mutual interference. These virtualized resources usually comprise CPU and RAM,
12
data storage resources (elastic block store and databases).
m
o
a
.c
m
a
n
Infrastructure-as-a-Service (IaaS) Stack
S.no
Rgpv question
Q.1
What
t
u
are
y
d
various
cloud
Year
Marks
Dec 2013
10
models?Explain each in brief.
Q.2
S
What does Software as a service Dec 2014
provide.
2
13
Unit 05/Lecture - 04
Private Cloud – [Rgpv/dec 2013(10)]
A private cloud is a particular model of cloud computing that involves a distinct and secure
cloud based environment in which only the specified client can operate. As with other cloud
models, private clouds will provide computing power as a service within a virtualised
m
environment using an underlying pool of physical computing resource. However, under the
o
private cloud model, the cloud (the pool of resource) is only accessible by a single organisation
providing that organisation with greater control and privacy.
a
.c
m
The technical mechanisms used to provide the different services which can be classed as being
a
private cloud services can vary considerably and so it is hard to define what constitutes a
n
private cloud from a technical aspect. Instead such services are usually categorised by the
y
d
features that they offer to their client. Traits that characterise private clouds include the ring
fencing of a cloud for the sole use of one organisation and higher levels of network security.
u
They can be defined in contrast to a public cloud which has multiple clients accessing
t
virtualised services which all draw their resource from the same pool of servers across public
S
networks. Private cloud services draw their resource from a dsitinct pool of physical computers
but these may be hosted internally or externally and may be accessed across private leased
lines or secure encrypted connections via public networks.
The additional security offered by the ring fenced cloud model is ideal for any organisation,
including enterprise, that needs to store and process private data or carry out sensitive tasks.
For example, a private cloud service could be utilised by a financial company that is required by
regulation to store sensitive data internally and who will still want to benefit from some of the
14
advantages of cloud computing within their business infrastructure, such as on demand
resource allocation.
The private cloud model is closer to the more traditional model of individual local access
networks (LANs) used in the past by enterprise but with the added advantages of virtualisation.
The features and benefits of private clouds therefore are:

Higher security and privacy; public clouds services can implement a certain level of
security but private clouds - using techniques such as distinct pools of resources with
access restricted to connections made from behind one organisation’s firewall,
m
dedicated leased lines and/or on-site internal hosting - can ensure that operations are
o
kept out of the reach of prying eyes

.c
More control; as a private cloud is only accessible by a single organisation, that
a
organisation will have the ability to configure and manage it inline with their needs to
achieve a tailored network solution. However, this level of control removes somes the
m
economies of scale generated in public clouds by having centralised management of the
a
hardware

n
Cost and energy efficiency; implementing a private cloud model can improve the
y
d
allocation of resources within an organisation by ensuring that the availability of
resources to individual departments/business functions can directly and flexibly
u
respond to their demand. Therefore, although they are not as cost effective as a public
t
cloud services due to smaller economies of scale and increased management costs, they
S
do make more efficient use of the computing resource than traditional LANs as they
minimise the investment into unused capacity. Not only does this provide a cost saving
but it can reduce an organisation’s carbon footprint too

Improved reliability; even where resources (servers, networks etc.) are hosted
internally, the creation of virtualised operating environments means that the network is
more resilient to individual failures across the physical infrastructure. Virtual partitions
can, for example, pull their resource from the remaining unaffected servers. In addition,
15
where the cloud is hosted with a third party, the organisation can still benefit from the
physical security afforded to infrastructure hosted within data centres

Cloud bursting; some providers may offer the opportunity to employ cloud bursting,
within a private cloud offering, in the event of spikes in demand. This service allows the
provider to switch certain non-sensitive functions to a public cloud to free up more
space in the private cloud for the sensitive functions that require it. Private clouds can
even be integrated with public cloud services to form hybrid clouds where non-sensitive
functions are always allocated to the public cloud to maximise the efficiencies on offer.
How to buit private cloud-
m
o
Private cloud looks and acts like a public cloud, giving your corporation all the speed, agility and cost
.c
savings promised by cloud technology, only it’s single tenant and that tenant is you, right? Well, that’s
the goal, but it’s not quite the reality yet for most enterprises.
a
m
1. There must be a converged infrastructure. Servers must be virtualized. There has got to be
a
underlying software defined networking and a converged storage fabric,
n
This is not something that is done very well in the public cloud space now and it’s an
y
d
opportunity for corporate IT operations that haven’t had sophisticated systems in place to do
these things, to leapfrog themselves in that regard in the new era of private cloud,
u
t
2. There has to be fully automated orchestration of both system management and software
S
distribution across the converged infrastructure.
That is where the cost savings is. Automating deployment and streamlining the human activity
previously required to do daily tasks. That is what will eventually drive private cloud sales,
You have to improve the provisioning process significantly to legitimately call it private cloud,
If it takes you two weeks to provision resources now, getting that down to two days is not
going to cut it. You’ve got to get it to 15 minutes. You can’t be sitting around waiting for various
16
levels of approval to happen because you lose the agility and speed. It’s the difference between
virtualization and cloud, .
3. There must be a self-service catalog of standard computing offerings available to users
across the company.
The litmus test is whether or not the dashboard is available to business users across the
company and not just an interface for traditional IT staff to use to dole out IT resources. Having
just the latter, means that IT just has a new toy,
m
o
4. There has to be accountability by way of some sort of charge-back, track-back or show-
.c
back mechanism that keeps track of which users are employing which resources and for just
how long.
a
m
Enterprise Management Association analyst Torsten Volk argues that at a minimum providing a
a
show back mechanism is crucial for any fledgling private cloud. "If you can't at least show who
n
is responsible for the cycles that have been used, then there is no incentive to use those
resources efficiently,"
y
d
S.no
Rgpv question
Q.1
What is private cloud? Explain how can Rgpv Dec 2013
S
t
u
we build a private cloud.
Year
Marks
10
17
Unit 05/Lecture -05
Cloud service providers - [Rgpv/dec 2012(5)]
A cloud provider is a company that offers some component of cloud computing – typically
Infrastructure as a Service (IaaS), Software as a Service (SaaS) or Platform as a Service (PaaS) –
to other businesses or individuals. Cloud providers are sometimes referred to as cloud service
providers or CSPs.
m
There are a number of things to think about when you evaluate cloud providers. The cost will
o
usually be based on a per-use utility model but there are a number of variations to consider.
.c
The physical location of the servers may also be a factor for sensitive data.
a
Reliability is crucial if your data must be accessible. A typical cloud storage service-level
m
agreement (SLA), for example, specifies precise levels of service – such as, for example, 99.9%
a
uptime and the recourse or compensation that the user is entitled to should the provider fail
n
to provide the service as described. However, it’s important to understand the fine print in that
y
d
agreement because some providers discount outages of less than ten minutes, which may be
too long for some businesses.
u
t
Security is another important consideration. Organizations such as the Cloud Security Alliance
S
(CSA) offer certification to cloud providers that meet their criteria. The CSA's Trusted Cloud
Initiative program was created to help cloud service providers develop industry-recommended,
secure and interoperable identity, access and compliance management configurations and
practices.
S.no
Rgpv question
Q.1
Write short note on cloud service Rgpv Dec 2012
provider?
Year
Marks
5
18
Unit 05/Lecture - 06
Cloud Vocabulary - [Rgpv/dec 2013(7), Rgpv/dec 2011(5)]]
Cloudburst: The term cloudburst is being use in two meanings, negative and positive:
Cloudburst: The term cloudburst is being use in two meanings, negative and positive:
1. Cloudburst (negative): The failure of a cloud computing environment due to the
inability to handle a spike in demand.
Cloudburst (positive): The dynamic deployment of a software application that runs on
m
internal organizational compute resources to a public cloud to address a spike in
o
demand.
.c
2. Cloudstorming: The act of connecting multiple cloud computing environments.
3. Vertical Cloud: A cloud computing environment optimized for use in a particular vertical
a
-- i.e., industry -- or application use case.
m
4. Private Cloud: A cloud computing-like environment within the boundaries of an
a
organization and typically for its exclusive usage.
n
5. Internal Cloud: A cloud computing-like environment within the boundaries of an
y
d
organization and typically available for exclusive use by said organization.
6. Hybrid Cloud: A computing environment combining both private (internal) and public
u
(external) cloud computing environments. May either be on a continuous basis or in the
t
form of a 'cloudburst'.
S
7. Cloudware: A general term referring to a variety of software, typically at the
infrastructure level, that enables building, deploying, running or managing applications
in a cloud computing environment.
8. External Cloud: A cloud computing environment that is external to the boundaries of
the organization. Although it often is, an external cloud is not necessarily a public cloud.
Some external clouds make their cloud infrastructure available to specific other
organizations and not to the public at-large.
9. Public Cloud: A cloud computing environment that is open for use to the general public,
19
whether individuals, corporations or other types of organizations. Amazon Web Services
are an example of a public cloud.
10. Virtual Private Cloud (VPC): A term coined by Reuven Cohen, CEO and founder of
Enomaly. The term describes a concept that is similar to, and derived from, the familiar
concept of a Virtual Private Network (VPN), but applied to cloud computing. It is the
notion of turning a public cloud into a virtual private cloud, particularly in terms of
security and the ability to create a VPC across components that are both within the
cloud and external to it.
11. Cloud Portability: The ability to move applications (and often their associated data)
across cloud computing environments from different cloud providers, as well as across
m
private or internal cloud and public or external clouds.
o
12. Cloud Spanning: Running an application in a way that its components straddle multiple
.c
cloud environments (which could be any combination of internal/private and
a
external/public clouds. Unlike Cloud Bursting, which refers strictly to expanding the
m
application to an External Cloud to handle spikes in demand, Cloud Spanning includes
a
scenarios in which an applications component are continuously distributed across
n
multiple clouds.
y
d
S.no
Rgpv question
Q.1
Write short note on cloud vocabulary?
S
t
u
Year
Marks
Dec 2013
7
Dec 2011
5
20
Unit 05/Lecture - 07
Cloud security – [Rgpv/dec 2013(5), Rgpv/dec 2011(5)]]
Cloud computing security or, more simply, cloud security is an evolving sub-domain of
computer security, network security, and, more broadly, information security. It refers to a
broad set of policies, technologies, and controls deployed to protect data, applications, and the
associated infrastructure of cloud computing.
Cloud security is not to be confused with security software offerings that are cloud-based such
m
as security as a service.
o
a
Cloud security controls
.c
m
Cloud security architecture is effective only if the correct defensive implementations are in
a
place. An efficient cloud security architecture should recognize the issues that will arise with
n
security management.[6] The security management addresses these issues with security
y
d
controls. These controls are put in place to safeguard any weaknesses in the system and reduce
the effect of an attack. While there are many types of controls behind a cloud security
u
architecture, they can usually be found in one of the following categories: [6]
t
S
Deterrent controls
These controls are set in place to prevent any purposeful attack on a cloud system.
Much like a warning sign on a fence or a property, these controls do not reduce the
actual vulnerability of a system.
Preventative controls
These controls upgrade the strength of the system by managing the vulnerabilities. The
preventative control will safeguard vulnerabilities of the system. If an attack were to
occur, the preventative controls are in place to cover the attack and reduce the damage
21
and violation to the system's security.
Corrective controls
Corrective controls are used to reduce the effect of an attack. Unlike the preventative
controls, the corrective controls take action as an attack is occurring.
Detective controls
Detective controls are used to detect any attacks that may be occurring to the system.
In the event of an attack, the detective control will signal the preventative or corrective
controls to address the issue.[6]
m
o
a
.c
m
a
n
u
y
d
t
S
Cloud Application – [Rgpv/dec 2013(5), Rgpv/dec 2011(5)]]
Cloud computing has been credited with increasing competitiveness through cost reduction,
greater flexibility, elasticity and optimal resource utilization. Here are a few situations where
cloud computing is used to enhance the ability to achieve business goals.
22
1. Infrastructure as a service (iaas) and platform as a service (paas)
When it comes to iaas, using an existing infrastructure on a pay-per-use scheme seems to be an
obvious choice for companies saving on the cost of investing to acquire, manage and maintain
an it infrastructure. There are also instances where organizations turn to paas for the same
reasons while also seeking to increase the speed of development on a ready-to-use platform to
deploy applications.
2. Private cloud and hybrid cloud
Among the many incentives for using cloud, there are two situations where organizations are
m
looking into ways to assess some of the applications they intend to deploy into their
o
environment through the use of a cloud (specifically a public cloud). While in the case of test
.c
and development it may be limited in time, adopting a hybrid cloud approach allows for testing
a
application workloads, therefore providing the comfort of an environment without the initial
m
investment that might have been rendered useless should the workload testing fail.
a
Another use of hybrid cloud is also the ability to expand during periods of limited peak usage,
n
which is often preferable to hosting a large infrastructure that might seldom be of use. An
y
d
organization would seek to have the additional capacity and availability of an environment
when needed on a pay-as you-go basis.
u
t
3. Test and development
S
Probably the best scenario for the use of a cloud is a test and development environment. This
entails securing a budget, setting up your environment through physical assets, significant
manpower and time. Then comes the installation and configuration of your platform. All this
can often extend the time it takes for a project to be completed and stretch your milestones.
With cloud computing, there are now readily available environments tailored for your needs at
your fingertips. This often combines, but is not limited to, automated provisioning of physical
23
and virtualized resources.
4. Big data analytics
One of the aspects offered by leveraging cloud computing is the ability to tap into vast
quantities of both structured and unstructured data to harness the benefit of extracting
business value.
Retailers and suppliers are now extracting information derived from consumers’ buying
patterns to target their advertising and marketing campaigns to a particular segment of the
population. Social networking platforms are now providing the basis for analytics on behavioral
patterns that organizations are using to derive meaningful information.
m
o
5. File storage
a
.c
Cloud can offer you the possibility of storing your files and accessing, storing and retrieving
m
them from any web-enabled interface. The web services interfaces are usually simple. At any
a
time and place you have high availability, speed, scalability and security for your environment.
n
In this scenario, organizations are only paying for the amount of storage they are actually
y
d
consuming, and do so without the worries of overseeing the daily maintenance of the storage
infrastructure.
u
There is also the possibility to store the data either on or off premises depending on the
t
S
regulatory compliance requirements. Data is stored in virtualized pools of storage hosted by a
third party based on the customer specification requirements.
6. Disaster recovery
This is yet another benefit derived from using cloud based on the cost effectiveness of a
disaster recovery (dr) solution that provides for a faster recovery from a mesh of different
physical locations at a much lower cost that the traditional dr site with fixed assets, rigid
24
procedures and a much higher cost.
7. Backup
Backing up data has always been a complex and time-consuming operation. This included
maintaining a set of tapes or drives, manually collecting them and dispatching them to a
backup facility with all the inherent problems that might happen in between the originating
and the backup site. This way of ensuring a backup is performed is not immune to problems
such as running out of backup media , and there is also time to load the backup devices for a
restore operation, which takes time and is prone to malfunctions and human errors.
m
Cloud-based backup, while not being the panacea, is certainly a far cry from what it used to be.
o
You can now automatically dispatch data to any location across the wire with the assurance
that neither security, availability nor capacity are issues.
a
.c
While the list of the above uses of cloud computing is not exhaustive, it certainly give an
m
incentive to use the cloud when comparing to more traditional alternatives to increase it
a
infrastructure flexibility , as well as leverage on big data analytics and mobile computing.
n
y
d
Cloud integration – [Rgpv/dec 2013(5)]
Cloud integration is the process of configuring multiple application programs to share data in
u
the cloud. In a network that incorporates cloud integration, diverse applications communicate
t
S
either directly or through third-party software.
Cloud integration offers the following advantages over older, compartmentalized
organizational methods.

Each user can access personal data in real time from any device.

Each user can access personal data from any location with Internet access.

Each user can integrate personal data such as calendars and contact lists served by
diverse application programs.
25

Each user can employ the same logon information (username and password) for all
personal applications.

The system efficiently passes control messages among application programs.

By avoiding the use of data silos, data integrity is maintained and data conflicts (which
can arise from redundancy) are avoided.

Cloud integration offers scalability to allow for future expansion in terms of the number
of users, the number of applications, or both.
In recent years, cloud integration has gained favor among organizations, corporations, and
government agencies that implement SaaS (Software as a Service), a software distribution
m
model in which applications are hosted by a vendor or service provider and made available to
o
users over the Internet.
Risk of cloud computing – [Rgpv/dec 2011]
a
Cloud benefits
.c
m
a
Cloud computing provides a scalable online environment that makes it possible to handle an
n
increased volume of work without impacting system performance. Cloud computing also offers
y
d
significant computing capability and economy of scale that might not otherwise be affordable,
particularly for small and medium-sized organizations, without the IT infrastructure investment.
u
Cloud computing advantages include:

t
S
Lower capital costs — Organizations can provide unique services using large-scale
computing resources from cloud service providers, and then nimbly add or remove IT
capacity to meet peak and fluctuating service demands while only paying for actual
capacity used.

Lower IT operating costs — Organizations can rent added server space for a few hours
at a time rather than maintain proprietary servers without worrying about upgrading
their resources whenever a new application version is available. They also have the
flexibility to host their virtual IT infrastructure in locations offering the lowest cost.
26

No hardware or software installation or maintenance

Optimized IT infrastructure provides quick access to needed computing services
The risks

Environmental security — The concentration of computing resources and users in a
cloud computing environment also represents a concentration of security threats.
Because of their size and significance, cloud environments are often targeted by virtual
machines and bot malware, brute force attacks, and other attacks. Ask your cloud
provider about access controls, vulnerability assessment practices, and patch and
m
configuration management controls to see that they are adequately protecting your
o
data.

.c
Data privacy and security — Hosting confidential data with cloud service providers
involves the transfer of a considerable amount of an organization's control over data
a
security to the provider. Make sure your vendor understands your organization’s data
m
privacy and security needs. Also, make sure your cloud provider is aware of particular
a
data security and privacy rules and regulations that apply to your entity, such as HIPAA,
n
the Payment Card Industry Data Security Standard (DCI DSS), the Federal Information
y
d
Security Management Act of 2002 (FISMA), or the privacy considerations of GrammLeach-Bliley Act.

u
Data availability and business continuity — A major risk to business continuity in the
t
cloud computing environment is loss of internet connectivity. Ask your cloud provider
S
what controls are in place to ensure internet connectivity. If a vulnerability is identified,
you may have to terminate all access to the cloud provider until the vulnerability is
rectified. Finally, the seizure of a data-hosting server by law enforcement agencies may
result in the interruption of unrelated services stored on the same machine.

Record retention requirements — If your business is subject to record retention
requirements, make sure your cloud provider understands what they are and so they
can meet them.

Disaster recovery — Hosting your computing resources and data at a cloud provider
27
makes the cloud provider’s disaster recovery capabilities vitally important to your
company’s disaster recovery plans. Know your cloud provider’s disaster recovery
capabilities and ask your provider if they been tested.
Evaluating your options
Many cloud provider options are available, each with unique risks. As you evaluate your choices
and the associated risks, consider the following

Cloud providers are sometimes reluctant to produce third-party audit reports unless an
audit clause is included in the contract. Some hosts require clients to pay for reports

m
Some internal audit departments are performing control reviews of cloud providers, in
o
addition to receiving and analyzing third party audit reports. This is driven by certain
.c
controls not being tested, exclusion of pertinent systems, or other factor that require
a
on-site testing.

m
Standard cloud provider audit reports typically do not include vulnerability/penetration
a
testing results. Providers are hesitant to allow scanning, as they believe this may
compromise their infrastructure.
n
y
d
Cloud computing is a widely used format and we don't see this changing anytime soon.
Knowing that you are managing the risks associated with housing your sensitive data offsite will
u
give you confidence with the platform, so you can take advantage of the opportunities
t
presented by the cloud.
S.no
Q.1
Q.2
S
Rgpv question
Year
Marks
Write short note on following(any two)
Dec 2013
10
a)Cloud computing,b) application of cloud
Dec 2012
5
c) Cloud integration
Dec 2011
5
Write short note on risk of cloud
Dec 2011
5
computing?
28
Unit 05/Lecture -08
Evolution of cloud computing – [Rgpv/dec 2012(5)]
the trend toward cloud computing started in the late 1980s with the concept of grid computing
when, for the first time, a large number of systems were applied to a single problem, usually
scientific in nature and requiring exceptionally high levels of parallel computation. In europe ,
m
long distance optical networks are used to tie multiple universities into a massive computing
o
grid in order that resources could be shared and scaled for large scientific calculations.
a
.c
Grid computing provided a virtual pool of computation resources but it's different than cloud
computing. Grid computing specifically refers to leveraging several computers in parallel to
m
solve a particular, individual problem, or to run a specific application. Cloud computing, on the
a
other hand, refers to leveraging multiple resources, including computing resources, to deliver a
unified service to the end user.
n
y
d
In grid computing, the focus is on moving a workload to the location of the needed computing
u
resources, which are mostly remote and are readily available for use. Usually a grid is a cluster
t
of servers on which a large task could be divided into smaller tasks to run in parallel. From this
S
point of view, a grid could actually be viewed as just one virtual server. Grids also require
applications to conform to the grid software interfaces.
In a cloud environment, computing and extended it and business resources, such as servers,
storage, network, applications and processes, can be dynamically shaped or carved out from
the underlying hardware infrastructure and made available to a workload. In addition, while a
cloud can provision and support a grid, a cloud can also support non-grid environments, such as
29
a three-tier web architecture running traditional or web 2.0 applications
In the 1990s, the concept of virtualization was expanded beyond virtual servers to higher levels
of abstraction—first the virtual platform, including storage and network resources, and
subsequently the virtual application, which has no specific underlying infrastructure. Utility
computing offered clusters as virtual platforms for computing with a metered business model.
More recently software as a service (saas) has raised the level of virtualization to the
application, with a business model of charging not by the resources consumed but by the value
of the application to subscribers. The concept of cloud computing has evolved from the
m
concepts of grid, utility and saas. It is an emerging model through which users can gain access
o
to their applications from anywhere, at any time, through their connected devices. These
.c
applications reside in massively scalable data centers where compute resources can be
dynamically provisioned and shared to achieve significant economies of scale.
a
m
Companies can choose to share these resources using public or private clouds, depending on
a
their specific needs. Public clouds expose services to customers, businesses and consumers on
the internet. Private clouds are generally restricted to use within a company behind a firewall
n
and have fewer security exposures as a result. The strength of a cloud is its infrastructure
y
d
management, enabled by the maturity and progress of virtualization technology to manage and
better utilize the underlying resources through automatic provisioning, re-imaging, workload
u
t
rebalancing, monitoring, systematic change request handling and a dynamic and automated
S
security and resiliency platform.
As more enterprises add cloud computing the level of applications is migrating toward more
mission critical and saas will become a mainstay of it strategies.
A number of companies, including google, microsoft, amazon, and ibm, have built enormous
datacenter-based computing capacity all over the world to support their web service offerings
(search, instant messaging, web-based retail). With this computing infrastructure in place these
30
companies are already poised to offer new cloud-based software applications.
Large enterprise software solutions, such as erp (enterprise resource planning) applications,
have traditionally only been affordable to very big enterprises with big it budgets. However,
companies that sell these solutions are finding they can reach small to medium businesses by
making their very expensive, very complex applications available as internet-based software
services. This ability of saas to deliver expensive applications at affordable will continue to
accelerate.
S.no
Rgpv question
Year
Marks
Q.1
Write short note on cloud computing Rgpv Dec 2012
o
evolution?
a
m
a
n
S
t
u
m
5
y
d
.c
31
Additional Topic Unit - 05/Lecture - 09
Cloud storage
Cloud storage is a model of data storage where the digital data is stored in logical pools, the
physical storage spans across multiple servers (and often locations), and the physical
environment is typically owned and managed by a hosting company. These cloud storage
m
providers are responsible for keeping the data available and accessible, and the physical
o
.c
environment protected and running. People and organizations buy or lease storage capacity
from the providers to store end user, organization, or application data.
a
Cloud storage services may be accessed through a collocated cloud compute service, a web
m
service application programming interface (api) or by applications that utilize the api, such as
a
cloud desktop storage, a cloud storage gateway or web-based content management systems.
n
y
d
A high level architecture of cloud storage.
u
Cloud storage is based on highly virtualized infrastructure and is like broader cloud computing
t
in terms of accessible interfaces, near-instant elasticity and scalability, multi-tenancy, and
S
metered resources.
Cloud storage typically refers to a hosted object storage service, but the term has broadened to
include other types of data storage that are now available as a service, like block storage.
Cloud storage is:

Made up of many distributed resources, but still acts as one - often referred to as
federated storage clouds [6]
32

Highly fault tolerant through redundancy and distribution of data

Highly durable through the creation of versioned copies

Typically eventually consistent with regard to data replicas.
Advantages

Companies need only pay for the storage they actually use, typically an average of
consumption during a month. this does not mean that cloud storage is less expensive,
only that it incurs operating expenses rather than capital expenses.

Organizations can choose between off-premise and on-premise cloud storage options,
m
or a mixture of the two options, depending on relevant decision criteria that is
o
complementary to initial direct cost savings potential; for instance, continuity of
.c
operations (coop), disaster recovery (dr), security (pii, hipaa, sarbox, ia/cnd), and
records retention laws, regulations, and policies.

a
Storage availability and data protection is intrinsic to object storage architecture, so
m
depending on the application, the additional technology, effort and cost to add
a
availability and protection can be eliminated.

n
Storage maintenance tasks, such as purchasing additional storage capacity, are
y
d
offloaded to the responsibility of a service provider.

Cloud storage provides users with immediate access to a broad range of resources and
u
applications hosted in the infrastructure of another organization via a web service
t
interface.

S
Cloud storage can be used for copying virtual machine images from the cloud to onpremise locations or to import a virtual machine image from an on-premise location to
the cloud image library. In addition, cloud storage can be used to move virtual machine
images between user accounts or between data centers.
33
Reference
Book
Author
Priority
G. Somasundaram
1
Alok Shrivastava
Information storage management
Ulf
Troppens,
Wolfgang 2
Storage Network explained : Basic and Mueller-Friedt,
m
Rainer
o
application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils
Haustein
iSESI
a
.c
Cloud Computing : Principles, 3
m
Nick Antonopoulos, Lee Gillam
System & Application
a
n
S
t
u
y
d
Download