1 Unit – 1 Storage technology Unit-01/Lecture -01 Introduction to information storage and management (ISM) Information storage and management is the only subject of its kind to fill the knowledge gap in understanding varied components of modern information storage infrastructure, including virtual environments. It provides comprehensive learning of storage technology, m which will enable you to make more informed decisions in an increasingly complex it o environment. Ism builds a strong understanding of underlying storage technologies and .c prepares you to learn advanced concepts, technologies, and products. You will learn a about the architectures, features, and benefits of intelligent storage systems; storage networking technologies such as FC-SAN, ip-SAN, NAS, object-based and unified storage; m business continuity solutions such as backup, replication, and archive; the increasingly a critical area of information security; and the emerging field of cloud computing. n y d Introduction to storage technology u Storage systems are inevitable for modern day computing. All known computing t platforms ranging from handheld devices to large super computers use storage systems S for storing data temporarily or permanently. Beginning from punch card which stores a few bytes of data, storage systems have reached to multi terabytes of capacities in comparatively less space and power consumption. This tutorial is intended to give an introduction to storage systems and components to the reader. Storage definition , 2 here are a few definitions of storage when refers to computers. A device capable of storing data. The term usually refers to mass storage devices, such as disk and tape drives. In a computer, storage is the place where data is held in an electromagnetic or optical form for access by a computer processor. (whatis.com) Computer data storage; often called storage or memory refer to computer components, devices and recording media that retain digital data used for computing for some interval of time of these, i like the definition coined out by m wikipedia.com. Likes and dislikes apart, in basic terms, computer storage can be o defined as " device or media stores data for later retrieval". From the definition, .c we can see that the storage device possess two features namely "storage" and a "retrieval". A storage facility without retrieval options seems to be of no use . A m storage device may store application programs, databases, media files etc.... a as we see in modern day computers, storage devices can be found in many forms. n Storage devices can be classified based on many criterions. Of them, the very basic is as y d we learned in schools ie; primary storage and secondary storage. Storage devices can be further classified based on the memory technology that they use, based on its data volatility etc... u t S Storage technologies - Storage caching [Rgpv/dec2011(10)] Storage caching is used to buffer blocks of data in order to minimize the utilization of disks or storage arrays and to minimize the read / write latency for storage access. Especially for write intensive scenarios such as virtual desktops, write caching is very beneficial as it can keep the storage latency even during peak times at a low level. Storage cache can be implemented in four places: 3 disk (embedded memory – typically non-expansible) storage array (vendor specific embedded memory + expansion cards) computer accessing the storage (ram) storage network (i.e. Provisioning server) The cache can be subdivided into two categories: volatile cache: contained data is lost upon power outages (good for reads or noncritical writes) m non-volatile cache: data is kept safe in case of power outages (good for reads and writes). Often referred as battery backed write cache . o .c To further increase the speed of the disk or storage array advanced algorithms such as a read-ahead / read-behind or command queuing are commonly used. m a n u y d t S S.no Q.1 Rgpv questions Explain storage technologies in detail? Year Marks Dec 2011 10 4 Unit-01/Lecture-02 Data – proliferation [Rgpv/ Dec 2015(2), Rgpv/dec2013(10), Rgpv/dec2013(5), Rgpv/dec2011(5)] Data proliferation refers to the prodigious amount of data, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that data. While originally pertaining to problems associated with paper documentation, data proliferation has become a major m problem in primary and secondary data storage on computers. o While digital storage has become cheaper, the associated costs, from raw power to .c maintenance and from metadata to search engines, have not kept up with the proliferation of data. Although the power required to maintain a unit of data has fallen, the cost of facilities a which house the digital storage has tended to rise. m Problems caused [Rgpv Dec2015(3), Rgpv Dec2014(2)] a n The problem of data proliferation is affecting all areas of commerce as the result of the y d availability of relatively inexpensive data storage devices. This has made it very easy to dump data into secondary storage immediately after its window of usability has passed. This masks u problems that could gravely affect the profitability of businesses and the efficient functioning t of health services, police and security forces, local and national governments, and many other S types of organizations. Data proliferation is problematic for several reasons: Difficulty when trying to find and retrieve information. Increased manpower requirements to manage increasingly chaotic data storage resources. Slower networks and application performance due to excess traffic as users search and search again for the material they need. High cost in terms of the energy resources required to operate storage hardware. 5 Proposed solutions Applications that better utilize modern technology Reductions in duplicate data (especially as caused by data movement) Improvement of metadata structures Improvement of file and storage transfer structures User education and discipline. The implementation of information lifecycle management solutions to eliminate low- m value information as early as possible before putting the rest into actively managed o long-term storage in which it can be quickly and cheaply accessed. .c S.no Rgpv questions Year Marks Q.1 What do you mean by data proliferation ?Explain data Dec 2015 3 Dec 2014 3 Dec 2013 7 n Dec 2013 10 Explain in brief data proliferation? Dec 2015/ 2 Dec 2011 5 a m proliferation process and major problem associated a with data proliferation? Q.2 S t u y d 6 Unit-01/Lecture -03 Overview of storage infrastructure components – [Rgpv/ dec 2015(7), Rgpv/dec2013 (7)] The choice of hard discs can have a profound impact on the capacity, performance and long-term reliability of any storage infrastructure. But it's unwise to trust valuable data to any single point of failure, so hard discs are combined into groups that can boost performance and offer redundancy in the event of disc faults. At an even higher level, those arrays must be integrated into the m storage infrastructure -combining storage with network technologies to make data available to o users over a lan or wan. If you're new to storage, or just looking to refresh some basic concepts, .c this chapter on data storage components can help to bring things into focus. a The lowest level: hard discs m a Hard discs are random-access storage mechanisms that relegate data to spinning platters coated n with extremely sensitive magnetic media. Magnetic read/write heads step across the radius of y d each platter in set increments, forming concentric circles of data dubbed "tracks." hard disc capacity is loosely defined by the quality of the magnetic media (bits per inch) and the number of u tracks. Thus, a late-model drive with superior media and finer head control can achieve far more t storage capacity than models just six-12 months old. Some of today's hard drives can deliver up to S 750 gbytes of capacity. Capacity is also influenced by specific drive technologies including perpendicular recording, which fits more magnetic points into the same physical disc area. Grouping the discs: raid Hard discs are electromechanical devices and their working life is finite. Media faults, mechanical wear and electronic failures can all cause problems that render drive contents inaccessible. This is unacceptable for any organization, so tactics are often implemented to protect against failure. One of the most common data protection tactics is arranging groups of discs into arrays. This is known 7 as a raid. Raid implementations typically offer two benefits; data redundancy and enhanced performance. Redundancy is achieved by copying data to two or more discs -when a fault occurs on one hard disc, duplicate data on another can be used instead. In many cases, file contents are also spanned (or striped) across multiple hard discs. This improves performance because the various parts of a file can be accessed on multiple discs simultaneously -rather than waiting for a complete file to be accessed from a single disc. Raid can be implemented in a variety of schemes, each with its own designation: Raid-0 -- disc striping is used to improve storage performance, but there is no redundancy. Raid-1 -- disc mirroring offers disc-to-disc redundancy, but capacity is reduced and m o performance is only marginally enhanced. a .c Raid-5 -- parity information is spread throughout the disc group, improving read m performance and allowing data for a failed drive to be reconstructed once the failed drive a is replaced. n Raid-6 -- multiple parity schemes are spread throughout the disc group, allowing data for y d up to two simultaneously failed drives to be reconstructed once the failed drive(s) are replaced. u t There are additional levels, but these four are the most common and widely used. It is also S possible to mix raid levels in order to obtain greater benefits. Combinations are typically denoted with two digits. For example, raid-50 is a combination of raid-5 and raid-0, sometimes noted as raid-5+0. As another example, raid-10 is actually raid-1 and raid-0 implemented together, raid-1+0. For more information on raid controllers, see the searchstorage.com article the new breed of raid controllers. Getting storage on the network Storage is useless unless network users can access it. There are two principle means of attaching 8 storage systems: NAS and SAN. NAS boxes are storage devices behind an ethernet interface, effectively connecting discs to the network through a single ip address. NAS deployments are typically straightforward and management is light, so new NAS devices can easily be added as more storage is needed. The downside to NAS is performance - storage traffic must compete for NAS access across the ethernet cable. But NAS access is often superior to disc access at a local server. The SAN overcomes common server and NAS performance limitations by creating a sub network of storage devices interconnected through a switched fabric like FC or iscsi (called internet scsi or scsi-over-ip. Both FC and iscsi approaches make any storage device visible from any host, and offer m much more availability for corporate data. FC is costlier, but offers optimum performance, while o iscsi is cheaper, but somewhat slower. Consequently, FC is found in the enterprise and iscsi .c commonly appears in small and mid-sized businesses. However, SAN deployments are more costly a to implement (in terms of switches, cabling and host bus adapters) and demand far more management effort. m a n Q.1 u t S.no S y d Rgpv questions Explain briefly the evolution of the storage Year Marks Dec 2013 7 Dec 2015 7 management? Q.2 Discuss Storage infrastructure components. 9 Unit-01/Lecture -04 Information lifecycle management - [Rgpv/dec 2015(7),Rgpv/dec2014(7), Rgpv/dec 2013 (7), Rgpv/dec 2012 (5)] Information life cycle management (ILM) is a comprehensive approach to managing the flow of an information system's data and associated metadata from creation and initial storage to the time when it becomes obsolete and is deleted. Unlike earlier approaches to data storage management, ILM involves all aspects of dealing with data, starting with user practices, rather than just automating storage procedures, as for example, hierarchical storage management (HSM) does. m o Also in contrast to older systems, ilm enables more complex criteria for storage management than data age and frequency of access. a .c ILM products automate the processes involved, typically organizing data into separate tiers according to specified policies, and automating data migration from one tier to another based on m those criteria. As a rule, newer data, and data that must be accessed more frequently, is stored on a faster, but more expensive storage media, while less critical data is stored on cheaper, but slower n media. However, the ILM approach recognizes that the importance of any data does not rely y d solely on its age or how often it's accessed. Users can specify different policies for data that declines in value at different rates or that retains its value throughout its life span. A path u management application, either as a component of ILM software or working in conjunction with t it, makes it possible to retrieve any data stored by keeping track of where everything is in the S storage cycle. ILM is often considered a more complex subset of data life cycle management (DLM).DLM products deal with general attributes of files, such as their type, size, and age; ILM products have more complex capabilities. For example, a DLM product would allow you to search stored data for a certain file type of a certain age. while an ilm product would let you search various types of stored files for instances of a specific piece of data, such as a customer number. 10 Data management has become increasingly important as businesses face compliance issues in the wake of legislation, that regulates how organizations must deal with particular types of data. Data management experts stress that information life cycle management should be an organizationwide enterprise, involving procedures and practices as well as applications. information lifecycle management comprises the policies, processes, practices, and tools used to align the business value of information with the most appropriate and cost effective it infrastructure from the time information is conceived through its final disposition. Information is aligned with business processes through management policies and service levels associated with applications, metadata, information, and data. m o a .c m a n u y d t Operations S Operational aspects of ILM include backup and data protection; disaster recovery, restore, and restart; archiving and long-term retention; data replication; and day-to-day processes and procedures necessary to manage a storage architecture. 11 Functionality For the purposes of business records, there are five phases identified as being part of the lifecycle continuum along with one exception. These are: Creation and receipt Distribution Use Maintenance Disposition m Creation and receipt deals with records from their point of origination. This could include their o creation by a member of an organization at varying levels or receipt of information from an .c external source. It includes correspondence, forms, reports, drawings, computer input/output, or a other sources. m Distribution is the process of managing the information once it has been created or received. This a includes both internal and external distribution, as information that leaves an organization n becomes a record of a transaction with others. y d Use takes place after information is distributed internally, and can generate business decisions, u document further actions, or serve other purposes. t S Maintenance is the management of information. This can include processes such as filing, retrieval and transfers. While the connotation of 'filing' presumes the placing of information in a prescribed container and leaving it there, there is much more involved. Filing is actually the process of arranging information in a predetermined sequence and creating a system to manage it for its useful existence within an organization. Failure to establish a sound method for filing information makes its retrieval and use nearly impossible. Transferring information refers to the process of responding to requests, retrieval from files and providing access to users authorized by the organization to have access to the information. While removed from the files, the information is 12 tracked by the use of various processes to ensure it is returned and/or available to others who may need access to it. Disposition is the practice of handling information that is less frequently accessed or has met its assigned retention periods. Less frequently accessed records may be considered for relocation to an 'inactive records facility' until they have met their assigned retention period. "although a small percentage of organizational information never loses its value, the value of most information tends to decline over time until it has no further value to anyone for any purpose. The value of nearly all business information is greatest soon after it is created and generally remains active for only a short time --one to three years or so-- after which its importance and usage declines. The m record then makes its life cycle transition to a semi-active and finally to an inactive state." [1] o retention periods are based on the creation of an organization-specific retention schedule, based .c on research of the regulatory, statutory and legal requirements for management of information a for the industry in which the organization operates. Additional items to consider when establishing a retention period are any business needs that may exceed those requirements and m consideration of the potential historic, intrinsic or enduring value of the information. If the a information has met all of these needs and is no longer considered to be valuable, it should be n disposed of by means appropriate for the content. This may include ensuring that others cannot y d obtain access to outdated or obsolete information as well as measures for protection privacy and confidentiality.' u t Long-term records are those that are identified to have a continuing value to an organization. S Based on the period assigned in the retention schedule, these may be held for periods of 25 years or longer, or may even be assigned a retention period of "indefinite" or "permanent". The term "permanent" is used much less frequently outside of the federal government, as it is not feasible to establish a requirement for such a retention period. There is a need to ensure records of a continuing value are managed using methods that ensure they remain persistently accessible for length of the time they are retained. While this is relatively easy to accomplish with paper or microfilm based records by providing appropriate environmental conditions and adequate protection from potential hazards, it is less simple for electronic format records. There are unique 13 concerns related to ensuring the format they are generated/captured in remains viable and the media they are stored on remains accessible. Media is subject to both degradation and obsolescence over its lifespan, and therefore, policies and procedures must be established for the periodic conversion and migration of information stored electronically to ensure it remains accessible for its required retention periods. m o a .c m a n S.no Q.1 y d Rgpv questions Year Marks What are the different phases of information life cycle Dec 2015 7 Dec 2014 7 Dec 2013 7 Dec 2011 10 Dec 2012 5 u model? t S Q.2 Explain briefly information life cycle implementation? 14 Unit-01/Lecture -05 Data Categorization – [Rgpv/ dec 2015(2),Rgpv/dec2013(7) Rgpv/dec2013(10), Rgpv/dec2011(5)]] Data classification is the categorization of data for its most effective and efficient use. In a basic approach to storing computer data, data can be classified according to its critical value or how often it needs to be accessed, with the most critical or often-used data stored on the fastest media while other data can be stored on slower (and less expensive) media. This kind of m classification tends to optimize the use of data storage for multiple purposes - technical, o administrative, legal, and economic. .c Data can be classified according to any criteria, not only relative importance or frequency of a use. For example, data can be broken down according to its topical content, file type, operating m platform, average file size in megabytes or gigabytes, when it was created, when it was last a accessed or modified, which person or department last accessed or modified it, and which n personnel or departments use it the most. A well-planned data classification system makes y d essential data easy to find. This can be of particular importance in risk management, legal discovery, and compliance with government regulations. u Computer programs exist that can help with data classification, but in the end it is a subjective t business and is often best done as a collaborative task that considers business, technical, and S other points-of-view. Data collections Data stewards may wish to assign a single classification to a collection of data that is common in purpose or function. When classifying a collection of data, the most restrictive classification of any of the individual data elements should be used. For example, if a data collection consists of a student’s name, address and social security number, the data collection should be classified 15 as restricted even though the student’s name and address may be considered public information. Why is it important? Data classification provides several benefits. It allows an organization to inventory its information assets. In many cases, information asset owners aren't aware of all of the different types of data they hold. It also allows central it to work with departments to develop specific security requirements that can be readily utilized. N the field of data management, data classification as a part of information lifecycle m management (ILM) process can be defined as a tool for categorization of data to enable/help o organization to effectively answer following questions: .c What data types are available? Where are certain data located? What access levels are implemented? What protection level is implemented and does it adhere to compliance regulations? a m a n When implemented it provides a bridge between it professionals and process or application y d owners. It staff is informed about the data value and on the other hand management (usually application owners) understands better to what segment of data centre has to be invested to u keep operations running effectively. This can be of particular importance in risk management, t legal discovery, and compliance with government regulations. Data classification is typically a S manual process; however, there are many tools from different vendors that can help gather information about the data. How to start process of data classification? Note that this classification structure is written from a data management perspective and therefore has a focus for text and text convertible binary data sources. Images, video, and audio files are highly structured formats built for industry standard api's and do not readily fit within 16 the classification scheme outlined below. First step is to evaluate and divide the various applications and data as follows: Relational or tabular data (around 15% of non audio/video data) Generally describes proprietary data which can be accessible only through application or application programming interfaces (api) Applications that produce structured data are usually database applications. This type of data usually brings complex procedures of data evaluation and migration between the storage tiers. m To ensure adequate quality standards, the classification process has to be o monitored by subject matter experts. .c Semi-structured or poly-structured data (all other non audio/video data that does not a conform to a system or platform defined relational or tabular form). m Generally describes data files that have a dynamic or non-relational semantic a structure (e.g. Documents,xml,json,device or system log output,sensor output). n Relatively simple process of data classification is criteria assignment. Simple process of data migration between assigned segments of predefined y d storage tiers. Q.1 u t S.no Rgpv question Year Marks What is data categorization ?why it is Dec 2013 7 Dec 2013 10 . Dec 2015 2 Explain briefly data categorization? Dec 2011 5 S required. Q.2 What is data categorization? Explain challenges for data categorization Q.3 17 Unit 01/Lecture - 06 Evolution of various storage technologies[Rgpv/dec2012(10)],Rgpv/dec2011(5)] Das (Direct Attached Storage) When windows servers leave the factory, they can be configured with several storage m options. Most servers will contain 1 or more local disk drives which are installed internal to the o server’s cabinet. These drives are typically used to install the operating system and user .c applications. If additional storage is needed for user files or databases, it may be necessary to configure direct attached storage (das). a Das is well suited for a small-to-medium sized business where sufficient amounts of storage can m be configured at a low startup cost. The das enclosure will be a separate adjacent cabinet that a contains the additional disk drives. An internal pci-based raid controller is typically configured n in the server to connect to the storage. The sas (serial attached scsi) technology is used to y d connect the disk arrays as illustrated in the following example. u As mentioned, one of the primary benefits of das storage is the lower startup cost to t implement. Managing the storage array is done individually as the storage is dedicated to a S particular server. On the downside, there is typically limited expansion capability with das, and limited cabling options (1 to 4 meter cables). Finally, because the raid controller is typically installed in the server, there is a potential single point of failure for the das solution. 18 SAN (Storage Area Networks) [Rgpv Dec 2014(2)] m With storage area networks (SAN), we typically see this solution used with medium-to-large size o businesses, primarily due to the larger initial investment. Sans require an infrastructure .c consisting of SAN switches, disk controllers, hbas (host bus adapters) and fibre cables. Sans a leverage external raid controllers and disk enclosures to provide high-speed storage for numerous potential servers. m a The main benefit to a SAN-based storage solution is the ability to share the storage arrays to n multiple servers. This allows you to configure the storage capacity as needed, usually by a y d dedicated SAN administrator. Higher levels of performance throughput are typical in a SAN environment, and data is highly available through redundant disk controllers and drives. The u disadvantages include a much higher startup cost for sans, and they are inherently much more t complex to manage. The following diagram illustrates a typical SAN environment. S 19 m o a NAS (network attached storage) .c m a A third type of storage solution exists that is a hybrid option called network attached storage n (NAS). This solution uses a dedicated server or appliance to serve the storage array. The y d storage can be commonly shared to multiple clients at the same time across the existing ethernet network. The main difference between NAS and das and SAN is that NAS servers u utilize file level transfers, while das and SAN solutions use block level transfers which are more t efficient. S NAS storage typically has a lower startup cost because the existing network can be used. This can be very attractive to small-to-medium size businesses. Most NAS models implement the storage arrays as iscsi targets that can be shared across the networks. Dedicated iscsi networks can also be configured to maximize the network throughput. The following diagram shows how a NAS configuration might look. 20 m o a .c m a n u y d t S.no S Rgpv question Year Marks Q.1 Define SAN (Storage Area Network) Dec 2014 2 Q.2 Explain briefly about the evolution of Dec 2012 10 Dec 2011 5 storage technologies and architecture? Q.3 Explain briefly Evolution of various storage technologies 21 Unit 01/Lecture - 07 Data Centre - [Rgpv/dec 2014(2), Rgpv/dec2013(7), Rgpv/dec2013(10), Rgpv/dec2012(10), Rgpv/dec2011(10)] A data center (sometimes spelled datacenter) is a centralized repository, either physical or virtual, for the storage, management, and dissemination of data and information organized around a particular body of knowledge or pertaining to a particular business. The national climatic data center (ncdc), for example, is a public data center that maintains the m world's largest archive of weather information. A private data center may exist within an o organization's facilities or may be maintained as a specialized facility. Every organization has a .c data center, although it might be referred to as a server room or even a computer closet. a In that sense, data center may be synonymous with network operations center (noc), a m restricted access area containing automated systems that constantly monitor server activity, a web traffic, and network performance. n Organizations maintain data centers to provide centralized data processing capabilities y d across the enterprise. Data centers store and manage large amounts of mission-critical data. The data center infrastructure includes computers, stor- age systems, network devices, u dedicated power backups, and environmental controls (such as air conditioning and fire suppression). S t Large organizations often maintain more than one data center to distribute data processing workloads and provide backups in the event of a disaster. The storage requirements of a data center are met by a combination of various stor- age architectures. 22 Core elements Five core elements are essential for the basic functionality of a data center: application: an application is a computer program that provides the logic for computing operations. Applications, such as an order processing system, can be layered on a database, which in turn uses operating system services to perform read/write operations to storage devices. Database: more commonly, a database management system (DBMS) provides a structured way to store data in logically organized tables that are interrelated. m A dbms optimizes the storage and retrieval of data. databases. .c a network: a data path that facilitates communication between clients and servers m or between servers and storage. o server and operating system: a computing platform that runs applications and a storage array: a device that stores data persistently for subsequent use. n These core elements are typically viewed and managed as separate entities, but all the y d elements must work together to address data processing requirements. u Key requirements for data center elements t S Uninterrupted operation of data centers is critical to the survival and success of a business. It is necessary to have a reliable infrastructure that ensures data is accessible at all times. While the requirements, shown in figure 1-6, are appli- cable to all elements of the data center infrastructure, our focus here is on storage systems. Availability: all data center elements should be designed to ensure acces- sibility. The inability of users to access data can have a significant negative impact on a business. Security: polices, procedures, and proper integration of the data cen- ter core elements that will prevent unauthorized access to information must be established. 23 In addition to the security measures for client access, specific mechanisms must enable servers to access only their allocated resources on storage arrays. Scalability: data center operations should be able to allocate additional processing capabilities or storage on demand, without interrupting busi- ness operations. Business growth often requires deploying more servers, new applications, and additional databases. The storage solution should be able to grow with the business. Performance: all the core elements of the data center should be able to provide optimal performance and service all processing requests at high speed. The infrastructure should be able to support performance requirements. Data integrity: data integrity refers to mechanisms such as error correc- tion codes m or parity bits which ensure that data is written to disk exactly as it was received. o .c Any variation in data during its retrieval implies cor- ruption, which may affect the operations of the organization. a Capacity: data center operations require adequate resources to store and process m large amounts of data efficiently. When capacity requirements increase, the data a center must be able to provide additional capacity with- out interrupting availability, n or, at the very least, with minimal disruption. Capacity may be managed by y d reallocation of existing resources, rather than by adding new resources. Manageability: a data center should perform all operations and activi- ties in the u most efficient manner. Manageability can be achieved through automation and t the reduction of human (manual) intervention in com- mon tasks. S 24 m o a .c m a n S.no. Q.1 Q.2 u t S y d Year Year Marks What are the data centre? What are the Dec2013 7 requirement for the design of a secure Dec2012 10 data centre. Dec2012 10 What is the significance of Data Center Dec 2014 2 in storage technology 25 Reference Book Information storage management Author Priority G. Somasundaram 1 Alok Shrivastava Ulf Troppens, Wolfgang 2 m Storage Network explained : Basic and Mueller-Friedt, Rainer o application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils iSESI Haustein a .c Cloud Computing : Principles, 3 Nick Antonopoulos, Lee Gillam System & Application m a n S t u y d 1 Unit -2 Storage systems architecture Unit -02/Lecture-01 Architecture of Intelligent disk subsystems – [Rgpv/dec 2014(7),Rgpv/dec2013(10)] In contrast to a file server, a disk subsystem can be visualized as a hard disk server. Servers are connected to the connection port of the disk subsystem using standard I/O techniques such as Small Computer System Interface (SCSI), Fibre Channel or Internet SCSI m (iSCSI) and can thus use the storage capacity that the disk subsystem provides. The o internal structure of the disk subsystem is completely hidden from the server, which sees .c only the hard disks that the disk subsystem provides to the server. a The connection ports are extended to the hard disks of the disk subsystem by means of m internal I/O channels. In most disk subsystems there is a controller between the connection a ports and the hard disks. The controller can significantly increase the data availability and data n access performance with the aid of a so-called RAID procedure. Fur- thermore, some y d controllers realize the copying services instant copy and remote mirroring and further additional services. The controller uses a cache in an attempt to accelerate read and u write accesses to the server. S t 2 m o a .c m a n t u y d Servers are connected to a disk subsystem using standard I/O techniques. The figure shows a S server that is connected by SCSI. Two others are connected by Fibre Channel SAN. 3 m o a .c m a n u y d t Servers are connected to the disk subsystems via the ports. Internally, the disk subsystem S consists of hard disks, a controller, a cache and internal I/O channels. Disk subsystems are available in all sizes. Small disk subsystems have one to two connection ports for servers or storage networks, six to eight hard disks and, depending on the disk capacity, storage capacity of a few terabytes. Large disk subsystems have multiple ten connection ports for servers and storage networks, redundant controllers and multiple I/O channels. A considerably larger number of servers can access a subsystem through a connection over a storage network. Large disk subsystems can store up to a 4 petabyte of data and, depending on the supplier, can weigh well over a tonne. The dimensions of a large disk subsystem are comparable to those of a wardrobe. The architecture of real disk subsystems is more complex and varies greatly. Ultimately, however, it will always include the components. Regardless of storage networks, most disk subsystems have the advantage that free disk space can be flexibly assigned to each server connected to the disk subsystem (storage pooling). All servers are either directly connected to the disk subsystem or indirectly connected via a storage network. In this configuration each server can be assigned free storage. Incidentally, free storage capacity should be understood to mean both hard disks that have already been installed and have not yet been used and also free slots for hard m disks that have yet to be installed. o a m a n S t u y d .c 5 All servers share the storage capacity of a disk subsystem. Each server can be assigned free storage more flexibly as required. m o a .c m a n u y d t S S.no Rgpv question Q.1 Explain component Year architecture intelligent disk subsystem? Marks of Dec 2014 07 Dec 2013 10 6 Unit-02/Lecture -02 Modular vs. Integrated: [Rgpv/dec 2015(7),Rgpv/dec2013(7),Rgpv/dec2011(5)] Which unified storage architecture to go with when looking at unified systems is probably a decision based on availability more than functionality. Modular solutions may be more appropriate for users that have an existing storage array that offers NAS services with a gateway module. On the other hand, users buying new unified systems will be looking mainly at integrated systems, since most of the new products will have an integrated architecture. m At one time block storage supported the most important applications, typically production o databases, and file storage was associated with user home directories and office productivity .c applications. But now even the most critical databases are being run on NAS devices. a Server virtualization has further raised the profile of file storage, since applications like vmware store and manipulate entire server instances as individual files. Nfs-hosted images are rapidly m gaining traction in these environments. "big data" archive systems in industries such as media a and entertainment, oil and gas, and remote sensing, to name a few, also do their work at the n file level. u y d t S S.no Rgpv question Year Marks Q.1 Differentiate integrated vs modular Dec 2015 7 array? Dec 2013 7 Dec 2011 5 7 Unit-02/Lecture – 03 Volume manager vs file system – [Rgpv/dec2013(7), Rgpv/dec2011(5)]] Current file systems trace their roots from the UFS file system, which was proposed in 1965. By the early 1970s, the UNIX file system was up and running. Since then, not much has changed in file systems and there have only been incremental hardware changes. I think the file system m and volume manager are the most critical components in achieving I/O performance from both o the OS and underlying hardware. Even the best file system and volume manager can be .c configured so that the performance is poor. Therefore, my next couple of columns will cover a file system and volume management, in addition to file system configuration and tuning. m File System Basics a The purpose of file systems (FS) is to maintain a view of the storage so we can create files. This n is done so that users can create, delete, open, close, read, write, and extent files on the y d device(s). File systems can also be used to maintain security over files. u Volume Manager Basics t S The original goal of the UNIX volume management (VM), which was developed in the late 1980s, was to group disk devices together so that file systems larger than a single device could be created, and to achieve high performance by striping devices. Standard VM Inner Workings (Striping) Most file systems require a VM to group disk and/or RAID devices together. Striping spreads the data across the devices based on the stripe size set within the volume manager. Note that some volume managers support concatenation, which starts with the first device and then only writes 8 to the next device when the first device becomes full. The idea behind striping is to spread the data across multiple devices to improve performance and allow multiple I/O disk-head seeks to occur simultaneously. Figure 1 shows what happens with standard striping for allocation of multiple files writing at the same time, and shows what happens when one of those files is removed. File Systems that Maintain Their Topology Some file systems maintain and understand the device topology without a volume manager. These file systems support both striping and round-robin allocation. Round-robin allocation m means that each device is used individually. In most cases, each file open moves to the next o device. In some file systems, it could be that each directory created moves to the next device. .c Figure 2 shows an example of round-robin allocation, which is very different from striping. I will show how round-robin allocation has some important implications for performance. a m File Allocation Comparison a One reason that volume managers do not provide a round-robin allocation method is because n of the interaction between the volume manager and the file system. Every file system must y d allocate space and maintain consistency, which is one of the main purposes of the file system. There are multiple types of file system allocation, but the real issue is that a volume manager u presents a single set address range for the block devices in the file system for the file system to t allocate from. The volume manager then translates the address to each of the devices. It is S difficult, but not impossible, for the volume manager to pass all of the underlying device topology to a file system. Also, most file systems designed with volume managers do not have an interface to understand the underlying volume topology. Other file systems that control their own topology can easily use round-robin allocation, because the file systems understand the underlying topology. 9 How Volume Managers and File Systems Work It is important to fully understand how volume managers and file systems work internally to choose the best file system for the application. By understanding the inner workings, you will have a much better idea of what the tunable parameters mean and how to improve performance. Performance Comparison Indirect block allocation and read/write performance can be painfully slow compared to the extent-based allocation method. For example, consider an application doing random reads and m writes. To find the block address for the record, a file allocated with indirect blocks must read o all of the data areas of the files for the record in question prior to reading the record. With .c extent-based allocation, the file system can simply read the inodes in question, which will make a an enormous difference in performance. I am unaware of any new file systems using indirect m blocks for space allocation because of the huge performance penalties for random I/O. Even for a sequential I/O, the performance for indirect blocks is generally less than extent-based file systems. n y d Free Space Allocation and Representation Methods u Each file system uses an algorithm to find and allocate free space within the file system. Most t file systems use binary tree (Btree) allocation to represent free space, but some file systems use S bitmaps to represent free space. Each method of free space representation has advantages and disadvantages. Bitmap Representation The use of bitmap representation is less common. This method is used where each bit in the map represents a single allocation unit such as 1,024 bytes, 512 KB, or even hundreds of megabytes. Therefore, a single bit could represent a great deal of space. 10 Free Space Allocation With each allocation type (Btree or bitmap), free space must be found and allocated with the representation method. These allocation algorithms find the free space based on their internal search algorithms. The two most common methods used are first fit and best fit. First Fit The first fit method tries to find the first space within the file system that matches the allocation size requested by the file being allocated. In some file systems, the first fit method is used to find the space closest to the last allocation of the file being extended, thereby allowing m the allocation to be sequential block addresses allocated for the file within the file system. o Best Fit a .c The best fit method tries to find the best place in the file system for the allocation of the data. m This method is used to try to reduce total file system fragmentation. This method always takes a more CPU cycles than first fit, because the whole file system must be searched for the best n allocation. (Note that in systems using round-robin allocation only, the device on which the y d initial allocation was made must be searched.) This method works to reduce fragmentation, especially when files cannot be pre-allocated (for file systems that support this method) or for u large allocations, such as multiple megabytes. Most vendors do not support this method, and t most allocations in file systems are not large because the overhead would be huge. The old S Cray NC1FS supports this method by using hardware vector registers to quickly perform the search. S.no Q.1 Rgpv question Give difference between manager and file system ? Year Marks Volume Rgpv Dec2013(7) 7 Rgpv Dec2011(5) 5 11 Unit 02/Lecture -04 Physical disk structure[Rgpv/dec2012(10)] Data on the disk is recorded on tracks, which are concentric rings on the platter around the spindle. The tracks are numbered, starting from zero, from the outer edge of the platter. The number of tracks per inch (tpi) on the platter (or the track density) measures how tightly the tracks are packed on a platter. m Each track is divided into smaller units called sectors. A sector is the small- est, individually o addressable unit of storage. The track and sector structure is written on the platter by the drive .c manufacturer using a formatting operation. The number of sectors per track varies according to the specific drive. The first personal computer disks had 17 sectors per track. Recent disks a have a much larger number of sectors on a single track. There can be thousands of tracks on a m platter, depending on the physical dimensions and recording density of the platter. a n Typically, a sector holds 512 bytes of user data, although some disks can be formatted with y d larger sector sizes. In addition to user data, a sector also stores other information, such as sector number, head number or platter number, and track number. This information helps the u controller to locate the data on the drive, but storing this information consumes space on the t disk. Consequently, there is a difference between the capacity of an unformatted disk and a S format- ted one. Drive manufacturers generally advertise the unformatted capacity — for example, a disk advertised as being 500gb will only hold 465.7gb of user data, and the remaining 34.3gb is used for metadata.A cylinder is the set of identical tracks on both surfaces of each drive plat- ter. The location of drive heads is referred to by cylinder number, not by track number. 12 m o a . .c m Disk structure: sectors, tracks, and cylinders a n Zoned bit recording [Rgpv/dec2014 (2), Rgpv/dec2012(10)] y d Because the platters are made of concentric tracks, the outer tracks can hold more data than u the inner tracks, because the outer tracks are physically longer than the inner tracks, as shown t in figure 2-6 (a). On older disk drives, the outer tracks had the same number of sectors as the S inner tracks, so data density was low on the outer tracks. This was an inefficient use of available space. Zone bit recording utilizes the disk efficiently. As shown in figure 2-6 (b), this mechanism groups tracks into zones based on their distance from the center of the disk. The zones are numbered, with the outermost zone being zone 0. An appropriate number of sectors per track are assigned to each zone, so a zone near the center of the platter has fewer sectors per track than a zone on the outer edge. However, tracks within a particular zone have the same number of sectors. 13 m Zoned bit recording o Disk drive performance - [Rgpv/dec2013(10)] a .c A disk drive is an electromechanical device that governs the overall perfor- mance of the m storage system environment. The various factors that affect the performance of disk drives a are discussed in this section. n disk service time y d Disk service time is the time taken by a disk to complete an i/o request. Components u that contribute to service time on a disk drive are seek time, rota- tional latency, and t data transfer rate. Seek time S The seek time (also called access time) describes the time taken to position the r/w heads across the platter with a radial movement (moving along the radius of the platter). In other words, it is the time taken to reposition and settle the arm and the head over the correct track. The lower the seek time, the faster the i/o opera- tion. Disk vendors publish the following seek time specifications: 14 full stroke: the time taken by the r/w head to move across the entire width of the disk, from the innermost track to the outermost track. average: the average time taken by the r/w head to move from one random track to another, normally listed as the time for one-third of a full stroke. track-to-track: the time taken by the r/w head to move between adjacent tracks. Each of these specifications is measured in milliseconds. The average seek time on a modern disk is typically in the range of 3 to 15 milliseconds. Seek time has more impact on the read operation of random tracks rather than adjacent tracks. To minimize the seek time, data can be written to only a subset of the available cylinders. This results in lower usable capacity than m the actual capacity of the drive. For example, a 500 gb disk drive is set up to use only the first o 40 percent of the cylinders and is effectively treated as a 200 gb drive. This is known as shortstroking the drive. a Rotational latency .c m a To access data, the actuator arm moves the r/w head over the platter to a par- ticular track n while the platter spins to position the requested sector under the r/w head. The time taken y d by the platter to rotate and position the data under the r/w head is called rotational latency. This latency depends on the rotation speed of the spindle and is measured in milliseconds. u The average rotational latency is one-half of the time taken for a full rotation. Similar to the t seek time, rotational latency has more impact on the reading/writing of random sectors on S the disk than on the same operations on adjacent sectors.Average rotational latency is around 5.5 ms for a 5,400-rpm drive, and around 2.0 ms for a 15,000-rpm drive. Data transfer rate The data transfer rate (also called transfer rate) refers to the average amount of data per unit time that the drive can deliver to the hba. It is important to first understand the process of 15 read and write operations in order to calculate data transfer rates. In a read operation, the data first moves from disk platters to r/w heads, and then it moves to the drive s internal buffer. Finally, data moves from the buffer through the interface to the host hba. In a write operation, the data moves from the hba to the internal buffer of the disk drive through the drive s interface. The data then moves from the buffer to the r/w heads. Finally, it moves from the r/w heads to the platters. m o a .c m a n u y d t S S.no Q.1 Q.2 Rgpv question Year Marks What do you understand by Zoned bit Dec 2014 2 recording and logical block addressing? Dec 2012 10 Explain various performance criteria of Dec 2013 10 disk? Write specification of disk? 16 Unit 02/Lecture-05 Raid levels – [Rgpv/dec 2015(7),Rgpv/dec2012(10),Rgpv/dec2012(10)] Raid levels are defined on the basis of striping, mirroring, and parity techniques. These techniques determine the data availability and performance characteristics of an array. Some raid arrays use one technique, whereas others use a combination of techniques. Application performance and data availability requirements determine the raid level selection. m o Striping .c A raid set is a group of disks. Within each disk, a predefined number of contiguously a addressable disk blocks are defined as strips. The set of aligned strips that spans across all the disks within the raid set is called a stripe. Figure 3-2 shows physical and logical representations m of a striped raid set. a n u y d S t Striped raid set 17 Strip size (also called stripe depth) describes the number of blocks in a strip, and is the maximum amount of data that can be written to or read from a single HDD in the set before the next HDD is accessed, assuming that the accessed data starts at the beginning of the strip. Note that all strips in a stripe have the same number of blocks, and decreasing strip size means that data is broken into smaller pieces when spread across the disks. Stripe size is a multiple of strip size by the number of HDDs in the raid set. Stripe width refers to the number of data strips in a stripe. Striped raid does not protect data unless parity or mirroring is used. However, striping may significantly improve i/o performance. Depending on The type of raid implementation, the raid controller can be configured to access data across m multiple HDDs simultaneously. o Mirroring a .c m Mirroring is a technique whereby data is stored on two different HDDs, yield- ing two copies of a data. In the event of one HDD failure, the data is intact on the surviving HDD (see figure 3-3) n and the controller continues to service the host s data requests from the surviving disk of a mirrored pair. y d When the failed disk is replaced with a new disk, the controller copies the data from the u surviving disk of the mirrored pair. This activity is transparent to the host. t In addition to providing complete data redundancy, mirroring enables faster recovery from disk S failure. However, disk mirroring provides only data protection and is not a substitute for data backup. Mirroring constantly captures changes in the data, whereas a backup captures point-intime images of data. Mirroring involves duplication of data The amount of storage capacity needed is twice the amount of data being stored. Therefore, mirroring is considered expensive and is preferred for mission-critical applications that cannot 18 afford data loss. Mirroring improves read performance because read requests can be serviced by both disks. However, write performance deteriorates, as each write request manifests as two writes on the HDDs. In other words, mirroring does not deliver the same levels of write performance as a striped raid. m o a .c m Mirrored disks in an array a n Parity y d Parity is a method of protecting striped data from HDD failure without the cost of mirroring. An u additional HDD is added to the stripe width to hold parity, a mathematical construct that t allows re-creation of the missing data. Parity is a redundancy check that ensures full S protection of data without maintaining a full set of duplicate data. Parity information can be stored on separate, dedicated HDDs or distributed across all the drives in a raid set. Figure 3-4 shows a parity raid. The first four disks, labeled d, contain the data. The fifth disk, labeled p, stores the parity information, which in this case is the sum of the elements in each row. Now, if one of the ds fails, the missing value can be calculated by subtracting the sum of the rest of the elements from the parity value. 19 Parity raid m The computation of parity is represented as a simple arithmetic operation on the data. However, o parity calculation is a bitwise xor operation. Calculation of parity is a function of the raid controller. a .c Compared to mirroring, parity implementation considerably reduces the cost associated with m data protection. Consider a raid configuration with five disks. Four of these disks hold data, a and the fifth holds parity information. Parity requires 25 percent extra disk space compared n to mirroring, which requires 100 percent extra disk space. However, there are some y d disadvantages of using parity. Parity information is generated from data on the data disk. Therefore, parity is recalculated every time there is a change in data. This recalculation is time-consuming u and affects the performance of the raid controller. t S S.no Q.1 Rgpv question Write short note on stripping and mirroring? Year Dec 2012 Marks 10 20 Unit-02/Lecture-06 Raid levels – [Rgpv/dec 2014(3),Rgpv/dec2012(10)] Raid 0 In a raid 0 configuration, data is striped across the HDDs in a raid set. It utilizes the full storage capacity by distributing strips of data over multiple HDDs in a raid set. To read data, all the strips are put back together by the controller. The stripe size is specified at a host level for software m raid and is vendor specific for hardware raid. Figure 3-5 shows raid 0 on a storage array in o which data is striped across 5 disks. When the number of drives in the array increases, .c performance improves because more data can be read or written simultaneously. Raid 0 is used a in applications that need high i/o throughput. However, if these applications require high availability, raid 0 does not provide data protection and availability in the event of drive m failures. a n Raid 1 y d In a raid 1 configuration, data is mirrored to improve fault tolerance . A raid 1 group consists u of at least two HDDs. As explained in mirroring, every write is written to both disks, which is t transparent to the host in a hardware raid implementation. In the event of disk failure, the S impact on data recovery is the least among all raid implementations. This is because the raid controller uses the mirror drive for data recovery and continuous operation. Raid is suitable for applications that require high availability. 21 m o a Raid 1 .c m Nested Raid a n Most data centers require data redundancy and performance from their raid arrays. Raid 0+1 y d and raid 1+0 combine the performance benefits of raid 0 with the redundancy benefits of raid 1. They use striping and mirroring techniques and combine their benefits. These types of raid u t require an even number of disks, the minimum being four (see figure 3-7). S Raid 1+0 is also known as raid 10 (ten) or raid 1/0. Similarly, raid 0+1 is also known as raid 01 or raid 0/1. Raid 1+0 performs well for workloads that use small, random, write-intensive i/o. Some applications that benefit from raid 1+0 include the following: High transaction rate online transaction processing (oltp) Large messaging installations Database applications that require high i/o rate, random access, and high availability A common misconception is that raid 1+0 and raid 0+1 are the same. Under normal conditions, 22 raid levels 1+0 and 0+1 offer identical benefits. However, rebuild operations in the case of disk failure differ between the two. Raid 1+0 is also called striped mirror. The basic element of raid 1+0 is a mirrored pair, which means that data is first mirrored and then both copies of data are striped across multiple HDDs in a raid set. When replacing a failed drive, only the mirror is rebuilt. In other words, the disk array controller uses the surviving drive in the mirrored pair for data recovery and continuous operation. Data from the surviving disk is copied to the replacement disk. Raid 0+1 is also called mirrored stripe. The basic element of raid 0+1 is a stripe. This means that the process of striping data across HDDs is performed initially and then the entire stripe m is mirrored. If one drive fails, then the entire stripe is faulted. A rebuild operation copies the o entire stripe, copying data from each disk in the healthy stripe to an equivalent disk in the .c failed stripe. This causes increased and unnecessary i/o load on the surviving disks and a makes the raid set more vulnerable to a second disk failure. m a n S t u y d 23 m o a .c m a n u y d t S Nested Raid 3 Raid 3 stripes data for high performance and uses parity for improved fault tolerance. Parity information is stored on a dedicated drive so that data can be reconstructed if a drive fails. 24 For example, of five disks, four are used for data and one is used for parity. Therefore, the total disk space required is 1.25 times the size of the data disks. Raid 3 always reads and writes complete stripes of data across all disks, as the drives operate in parallel. There are no partial writes that update one out of many strips in a stripe. m o a .c m a raid 3 n y d Raid 3 provides good bandwidth for the transfer of large volumes of data. Raid 3 is used in applications that involve large sequential data access, such as video streaming. Raid 4 u S t Similar to raid 3, raid 4 stripes data for high performance and uses parity for improved fault tolerance . Data is striped across all disks except the parity disk in the array. Parity information is stored on a dedicated disk so that the data can be rebuilt if a drive fails. Striping is done at the block level. Unlike raid 3, data disks in raid 4 can be accessed independently so that specific data elements can be read or written on single disk without read or write of an entire stripe. Raid 25 4 provides good read throughput and reason- able write throughput. Raid 5 Raid 5 is a very versatile raid implementation. It is similar to raid 4 because it uses striping and the drives (strips) are independently accessible. The difference between raid 4 and raid 5 is the parity location. In raid 4, parity is written to a dedicated drive, creating a write bottleneck for the parity disk. In raid 5, parity is distributed across all disks. The distribution of parity in raid 5 overcomes the write bottleneck. The raid 5 implementation. Raid 5 is preferred for messaging, data mining, m medium-performance media serving, and relational database management system (RDBMS) o implementations in which database administrators (DBAS) optimize data access. a .c m a n u y d t S raid 5 Raid 6 Raid 6 works the same way as raid 5 except that raid 6 includes a second parity element to enable survival in the event of the failure of two disks in a raid group. Therefore, a raid 6 26 implementation requires at least four disks. Raid 6 distributes the parity across all the disks. The write penalty in raid 6 is more than that in raid 5; therefore, raid 5 writes perform better than raid 6. The rebuild operation in raid 6 may take longer than that in raid 5 due to the presence of two parity sets. m o a m a y d Comparison of different raid types S t u n Raid 6 .c 27 S.no Q.1 Q.2 Rgpv question Year Marks Write down different levels of RAID Dec 2015 7 and compare them? Dec 2012 10 Dec 2011 10 Dec 2014 3 How RAID4 is different from RAID3? o m a m a n S t u y d .c 28 Unit-02/Lecture-07 Disk service time - [Rgpv/dec2012(10)] Disk Service time (Ts) can be calculated using its Rotational Latency (L), Average Seek Time (T) and Internal Data Transfer Time (X). i.e Ts = L + T + X Hence the components of Disk Service Time are Disk Rotational Latency, Average Seek Time and m Internal Data Transfer Time. o .c In random I/O operation seek time will be high, as the R/W head to seek different sectors on a different tracks on the platter to read/write an I/O to/from disk. m So Seek Time contributes largest percentage of the disk service time in a random I/O operation. a n S.no Q.1 u t y d Rgpv question Year Marks Which components constitute the disk Dec 2012 10 S service time? Which component contributes the largest percentage of the disk service time in a random I/O operation? 29 Additional Topic/Lecture-08 Intelligent disk subsystems overview Intelligent disk subsystems represent the third level of complexity for controllers after jbods and raid arrays. The controllers of intelligent disk subsystems offer additional functions over and above those offered by raid. In the disk subsystems that are currently available on the market these functions are usually instant copies, remote mirroring and lun masking. m Instant copies o .c Instant copies can virtually copy data sets of several terabytes within a disk subsystem in a a few seconds. Virtual copying means that disk subsystems fool the attached servers into believing that they are capable of copying such large data quantities in such a short space of m time. The actual copying process takes significantly longer. However, the same server, or a a second server, can access the virtually copied data after a few seconds. n y d There are numerous alternative implementations for instant copies. One thing that all implementations have in common is that the pretence of being able to copy data in a u matter of seconds costs resources. All realizations of instant copies require controller t computing time and cache and place a load on internal i/o channels and hard disks. The S different implementations of instant copy force the performance down at different times. However, it is not possible to choose the most favorable implementation alternative depending upon the application used because real disk subsystems only ever realize one implementation alternative of instant copy. 30 m o a .c m a Instant copies can virtually copy several terabytes of data within a disk sub- system in a n few seconds: server 1 works on the original data (1). The original data is virtually copied in a y d few seconds . Then server 2 can work with the data copy, whilst server 1 continues to operate with the original data . S t u 31 Additional Topic Unit - 02/Lecture-09 Disk drive components A disk drive uses a rapidly moving arm to read and write data across a flat plat- ter coated with magnetic particles. Data is transferred from the magnetic platter through the r/w head to the computer. Several platters are assembled together with the r/w head and controller, most commonly referred to as a hard disk drive (HDD). Data can be recorded and erased on a magnetic disk any number of times. This section details the different components of the disk, the mechanism m for organizing and storing data on disks, and the factors that affect disk performance. o Key components of a disk drive are platter, spindle, read/write head, actuator arm assembly, and controller a m a n S t u y d .c 32 m o a .c m a n Disk drive components Platter u y d A typical HDD consists of one or more flat circular disks called platters. The data is t recorded on these platters in binary codes . The set of rotating platters is sealed in a case, S called a head disk assembly(hda). A platter is a rigid, round disk coated with magnetic material on bothSurfaces (top and bottom). The data is encoded by polarizing the magnetic Area, or domains, of the disk surface. Data can be written to or read from both Surfaces of the platter. The number of platters and the storage capacity of each Platter determine the total capacity of the drive. 33 m Spindle o .c A spindle connects all the platters, as shown in figure 2-3, and is connected to a motor. The motor of the spindle rotates with a constant speed.The disk platter spins at a speed of a several thousands of revolutions per minute (rpm). Disk drives have spindle speeds of 7,200 m rpm, 10,000 rpm, or 15,000 rpm. Disks used on current storage systems have a platter a diameter of 3.5” (90 mm). When the platter spins at 15,000 rpm, the outer edge is moving At n around 25 percent of the speed of sound. The speed of the platter is increasIng with y d improvements in technology, although the extent to which it can be Improved is limited. read/write head u t S Read/write (r/w) heads, shown in figure 2-4, read and write data from or to a platter. Drives have two r/w heads per platter, one for each surface of the platter. The r/w head changes the magnetic polarization on the surface of the platter when writing data. While reading data, this head detects magnetic polarization on the surface of the platter. During reads and writes, the r/w head senses the magnetic polarization and never touches the surface of the platter. When the spindle is rotating, there is a microscopic air gap between the r/w heads and the platters, known as the head flying height. This air gap is removed when the spindle 34 stops rotating and the r/w head rests on a special area on the platter near the spindle. This area is called the landing zone. The landing zone is coated with a lubricant to reduce friction between the head and the platter. The logic on the disk drive ensures that heads are moved to the landing zone before they touch the surface. If the drive malfunctions and the r/w head accidentally touches the surface of the platter outside the landing zone, a head crash occurs. In a head crash, the magnetic coating on the platter is scratched and may cause damage to the r/w head. A head crash generally results in data loss. m o a .c m a n Controller u y d Actuator arm assembly S t The controller (see figure 2-2 [b]) is a printed circuit board, mounted at the bottom of a disk drive. It consists of a microprocessor, internal memory, circuitry, And firmware. The firmware controls power to the spindle motor and the speed of the motor. It also manages communication between the drive and the host. In addition, it controls the r/w operations by moving the actuator arm and switching between different r/w heads, and performs the optimization of data access. 35 Additional Topic Unit - 02/Lecture -10 Physical disk structure Data on the disk is recorded on tracks, which are concentric rings on the platter around the spindle. The tracks are numbered, starting from zero, from the outer edge of the platter. The number of tracks per inch (tpi) on the platter (or the track density) measures how tightly the tracks are packed on a platter. Each track is divided into smaller units called sectors. A sector is the small- est, individually m addressable unit of storage. The track and sector structure is written on the platter by the drive o manufacturer using a formatting operation. The number of sectors per track varies according .c to the specific drive. The first personal computer disks had 17 sectors per track. Recent disks a have a much larger number of sectors on a single track. There can be thousands of tracks on a platter, depending on the physical dimensions and recording density of the platter. m Typically, a sector holds 512 bytes of user data, although some disks can be formatted with a larger sector sizes. In addition to user data, a sector also stores other information, such as n sector number, head number or platter number, and track number. This information helps the y d controller to locate the data on the drive, but storing this information consumes space on the disk. Consequently, there is a difference between the capacity of an unformatted disk and a u format- ted one. Drive manufacturers generally advertise the unformatted capacity — for t example, a disk advertised as being 500gb will only hold 465.7gb of user data, and the S remaining 34.3gb is used for metadata. A cylinder is the set of identical tracks on both surfaces of each drive plat- ter. The location of drive heads is referred to by cylinder number, not by track number. 36 m o a . m .c Disk structure: sectors, tracks, and cylinders a n S t u y d 37 Additional Topic Unit -0 2/Lecture-11 Hot spares[Rgpv/dec2015(2)] A hot spare refers to a spare HDD in a raid array that temporarily replaces a failed HDD of a raid set. A hot spare takes the identity of the failed HDD in the array. One of the following methods of data recovery is performed depending on the raid implementation: If parity raid is used, then the data is rebuilt onto the hot spare from the parity and the m data on the surviving HDDs in the raid set. o If mirroring is used, then the data from the surviving mirror is used to copy the data. a .c When the failed HDD is replaced with a new HDD, one of the following takes place: m the hot spare replaces the new HDD permanently. This means that it is no longer a hot a spare, and a new hot spare must be configured on the array. n when a new HDD is added to the system, data from the hot spare is copied to it. The y d hot spare returns to its idle state, ready to replace the next failed drive. u A hot spare should be large enough to accommodate data from a failed drive. Some systems t implement multiple hot spares to improve data availability. A hot spare can be configured as S automatic or user initiated, which specifies how it will be used in the event of disk failure. In an automatic configuration, when the recoverable error rates for a disk exceed a predetermined threshold, the disk subsystem tries to copy data from the failing disk to the hot spare automatically. If this task is completed before the damaged disk fails, then the subsystem switches to the hot spare and marks the failing disk as unusable. Otherwise, it uses parity or the mirrored disk to recover the data. In the case of a user-initiated configuration, the administrator has control of the rebuild process. For example, the rebuild could occur overnight to prevent 38 any degradation of system performance. However, the system is vulnerable to another failure if a hot spare is unavailable. Modern raid controllers can manage a common pool of hot spare disks for several virtual raid disks. Hot spare disks can be defined for all raid levels that offer redundancy. The recreation of the data from a defective hard disk takes place at the same time as Write and read operations of the server to the virtual hard disk, so that from the point of view of the server, performance reductions at least can be observed. Modern hard disks come with self-diagnosis programs that report an increase in write and read errors to the system administrator in plenty of time: caution! I am about to depart this life. Please m replace me with a new disk. Thank you! to this end, the individual hard disks store the o data with a redundant code such as the hamming code. The hamming code permits the .c correct recreation of the data, even if individual bits are changed on the hard disk. If the a system is looked after properly you can assume that the installed physical hard disks will hold m out for a while. Therefore, for the benefit of higher performance, it is generally an acceptable a risk to give access by the server a higher priority than the recreation of the data of an n exchanged physical hard disk. y d A further side-effect of the bringing together of several physical hard disks to form a virtual u hard disk is the higher capacity of the virtual hard disks. As a result, less device addresses are t used up in the i/o channel and thus the administration of the server is also simplified, because S less hard disks (drive letters or volumes) need to be used. 39 m o a .c m a n u Hot spare disk y d t S The disk subsystem provides the server with two virtual disks for which a common hot spare disk is available (1). Due to the redundant data storage the server can continue to process data even though a physical disk has failed, at the expense of a reduction in performance (2). The raid controller recreates the data from the defective disk on the hot spare disk (3). After the defective disk has been replaced a hot spare disk is once again available (4). S.no Q.1 Rgpv question What is hot sparing? Year Marks Dec 2014 2 40 Additional Topic Unit - 02/Lecture-12 Front end The front end provides the interface between the storage system and the host. It consists of two components: front-end ports and front-end controllers. The front-end ports enable hosts to connect to the intelligent storage system. Each front-end port has processing logic that executes the appropriate transport pro- tocol, such as scsi, fibre channel, or iscsi, for storage m connections. Redundant ports are provided on the front end for high availability. o .c Front-end controllers route data to and from cache via the internal data bus. When cache receives write data, the controller sends an acknowledgment mes- sage back to the host. a Controllers optimize i/o processing by using command queuing algorithms. m a Front-end command queuing n y d Command queuing is a technique implemented on front-end controllers. It determines the execution order of received commands and can reduce unnec- essary drive head movements u and improve disk performance. When a com- mand is received for execution, the t command queuing algorithms assigns a tag that defines a sequence in which commands S should be executed. With command queuing, multiple commands can be executed concurrently based on the organization of data on the disk, regardless of the order in which the commands were received. The most commonly used command queuing algorithms are as follows: First in first out (fifo): this is the default algorithm where commands are executed in the order in which they are received. There is no reordering of requests for 41 optimization; therefore, it is inefficient in terms of performance. Seek time optimization: commands are executed based on optimizing read/write head movements, which may result in reordering of commands. Without seek time optimization, the commands are executed in the order they are received. The commands are executed in the order a, b, c and d. The radial movement required by the head to execute c immediately after a is less than what would be required to execute b. With seek time optimization, the command execution sequence would be a, c, b and d, as shown in figure. m o a m a n S t u y d .c 42 Additional Topic Unit - 02/Lecture -13 Hard disks and internal i/o channels The controller of the disk subsystem must ultimately store all data on physical hard disks. Standard hard disks that range in size from 36 gb to 1 tb are currently used for this purpose. Since the maximum number of hard disks that can be used is often limited, the size of the hard disk used gives an indication of the maximum capacity of the overall disk subsystem. m o When selecting the size of the internal physical hard disks it is necessary to weigh the .c requirements of maximum performance against those of the maximum capacity of the a overall system. With regard to performance it is often beneficial to use smaller hard disks at the expense of the maximum capacity: given the same capacity, if more hard disks are m available in a disk subsystem, the data is distributed over several hard disks and thus the a overall load is spread over more arms and read/write heads and usually over more i/o n channels (figure 2.4). For most applications, medium-sized hard disks are sufficient. S t u y d 43 m o .c If small internal hard disks are used, the load is distributed over more hard disks and thus over more read and write heads. On the other hand, the maximum storage capacity is a reduced, since in both disk subsystems only 16 hard disks can be fitted. m a For applications with extremely high performance requirements should smaller hard disks n be considered. However, consideration should be given to the fact that more modern, y d larger hard disks generally have shorter seek times and larger caches, so it is necessary to carefully weigh up which hard disks will offer the highest performance for a certain u load profile in each individual case. t S Standard i/o techniques such as scsi, fibre channel, increasingly serial ata (sata) and serial attached scsi (sas) and, still to a degree, serial storage architecture (ssa) are being used for internal i/o channels between connection ports and controller as well as between controller and internal hard disks. Sometimes, however, proprietary – i.e., manufacturerspecific – i/o techniques are used. Regardless of the i/o technology used, the i/o channels can be designed with built-in redundancy in order to increase the fault-tolerance of a disk subsystem. The following cases can be differentiated here: 44 • active In active cabling the individual physical hard disks are only connected via one i/o channel (figure 2.5, left). If this access path fails, then it is no longer possible to access the data. • active/passive In active/passive cabling the individual hard disks are connected via two i/o channels (figure 2.5, right). In normal operation the controller communicates with the hard disks via the first i/o channel and the second i/o channel is not used. In the event of the Failure of the first i/o channel, the disk subsystem switches from the first to the second m I/o channel. o • active/active (no load sharing) .c In this cabling method the controller uses both i/o channels in normal operation (figure 2.6, left). The hard disks are divided into two groups: in normal operation the a first group is addressed via the first i/o channel and the second via the second i/o m channel. If one i/o channel fails, both groups are addressed via the other i/o channel. • active/active (load sharing) a n In this approach all hard disks are addressed via both i/o channels in normal operation y d (figure 2.6, right). The controller divides the load dynamically between the two i/o channels so that the available hardware can be optimally utilized. If one i/o channel u fails, then the communication goes through the other channel only. t S Active cabling is the simplest and thus also the cheapest to realise but offers no protection against failure. Active/passive cabling is the minimum needed to protect against failure, whereas active/active cabling with load sharing best utilises the underlying hard- ware. Implementation of raid There are two types of raid implementation, hardware and software. 45 Software Raid Software raid uses host-based software to provide raid functions. It is implemented at the operating-system level and does not use a dedicated hardware controller to manage the raid array. Software raid implementations offer cost and simplicity benefits when com- pared with hardware raid. However, they have the following limitations: Performance: software raid affects overall system performance. This is due to the m additional cpu cycles required to perform raid calculations. The performance impact is o more pronounced for complex implementations of raid, as detailed later in this chapter. a .c Supported features: software raid does not support all raid levels. Operating system compatibility: software raid is tied to the host operating system hence m a upgrades to software raid or to the operating system should be validated for n compatibility. This leads to inflexibility in the data processing environment. Hardware raid u y d t In hardware raid implementations, a specialized hardware controller is implemented either on S the host or on the array. These implementations vary in the way the storage array interacts with the host. Controller card raid is host-based hardware raid implementation in which a specialized raid controller is installed in the host and HDDs are connected to it. The raid controller interacts with the hard disks using a pci bus. Manufacturers also integrate raid controllers on motherboards. This integration reduces the overall cost of the system, but does not provide the flexibility required for high-end storage systems. 46 Reference Book Information storage management Author Priority G. Somasundaram 1 Alok Shrivastava Ulf Troppens, Wolfgang 2 Storage Network explained : Basic and Mueller-Friedt, Rainer m application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils iSESI o Haustein .c Cloud Computing : Principles, 3 a System & Application Nick Antonopoulos, Lee Gillam m a n S t u y d 1 Unit – 03 Introduction to Networked Storage Unit - 03/Lecture - 01 Das(direct attached storage) [Rgpv/dec2014(2),Rgpv/dec2013 (7),Rgpv/dec2012(10)] Das is an architecture where storage connects directly to servers. Applications access data from Das using block-level access protocols. The internal HDD of a host, tape libraries, and directly connected external HDD packs are some examples of das. m Types of Das o .c Das is classified as internal or external, based on the location of the storage device with a respect to the host. m Internal das a n In internal das architectures, the storage device is internally connected to the host by a serial y d or parallel bus. The physical bus has distance limitations and can only be sustained over a shorter distance for high-speed connectivity. In addition, most internal buses can support u only a limited number of devices, and they occupy a large amount of space inside the host, t making maintenance of other components difficult. S External das In external das architectures, the server connects directly to the external storage device . In most cases, communication between the host and the storage device takes place over scsi or FC protocol. Compared to internal das, an external das overcomes the distance and device count limitations and provides centralized management of storage devices. 2 m o a Das benefits and limitations .c m Das requires a relatively lower initial investment than storage networking. Storage a networking architectures are discussed later in this book. Das con- figuration is simple and n can be deployed easily and rapidly. Setup is managed using host-based tools, such as the host y d os, which makes storage management tasks easy for small and medium enterprises. Das is the simplest solution when compared to other storage networking models and requires fewer u management tasks, and less hardware and software elements to set up and operate. t S However, das does not scale well. A storage device has a limited number of ports, which restricts the number of hosts that can directly connect to the storage. A limited bandwidth in das restricts the available i/o processing capability. When capacities are being reached, the service availability may be com- promised, and this has a ripple effect on the performance of all hosts attached to that specific device or array. The distance limitations associated with implementing das because of direct connectivity requirements can be addressed by using fibre channel connectivity. Das does not make optimal use of resources due to its limited ability to 3 share front end ports. In das environments, unused resources cannot be easily re-allocated, resulting in islands of over-utilized and under-utilized storage pools. Disk utilization, throughput, and cache memory of a storage device, along with virtual memory of a host govern the performance of das. Raid-level configurations, storage controller protocols, and the efficiency of the bus are additional factors that affect the performance of das. The absence of storage interconnects and network latency provide das with the potential to outperform other storage networking configurations. Das disk drive interfaces m o The host and the storage device in das communicate with each other by using predefined .c protocols such as IDE/ATA, SATA, sas, scsi, and FC. These proto- cols are implemented on the a HDD controller. Therefore, a storage device is also known by the name of the protocol it m supports. This section describes each of these storage devices in detail. a n IDE/ATA y d An integrated device electronics/advanced technology attachment (IDE/ATA) disk supports the u ide protocol. The term IDE/ATA conveys the dual-naming conventions for various generations t and variants of this interface. The ide component in IDE/ATA provides the specification for the S controllers connected. To the computer’s motherboard for communicating with the device attached. The ATA component is the interface for connecting storage devices, such as cd-roms, floppy disk drives, and HDDs, to the motherboard. IDE/ATA has a variety of standards and names, such as ATA, ATA/atapi, eide, ATA-2, fast ATA, ATA-3, ultra ATA, and ultra dma. The latest version of ATA—ultra dma/133—supports a throughput of 133 mb per second. 4 In a master-slave configuration, an ATA interface supports two storage devices per connector. However, if the performance of the drive is important, sharing a port between two devices is not recommended. SATA A SATA (serial ATA) is a serial version of the IDE/ATA specification. SATA is a disk-interface technology that was developed by a group of the industry’s leading vendors with the aim of replacing parallel ATA. m o A SATA provides point-to-point connectivity up to a distance of one meter and enables data .c transfer at a speed of 150 mb/s. Enhancements to the SATA have increased the data transfer a speed up to 600 mb/s. m a A SATA bus directly connects each storage device to the host through a dedicated link, n making use of low-voltage differential signaling. Lvds is an electrical signaling system that can y d provide high-speed connectivity over. Lowcost, twisted-pair copper cables. For data transfer, a SATA bus uses lvds With a voltage of 250 mv.A SATA bus uses a small 7-pin connector and a u thin cable for connectivity. A SATA port uses 4 signal pins, which improves its pin efficiency t compared to The parallel ATA that uses 26 signal pins, for connecting an 80-conductor ribbon S Cable to a 40-pin header connector. SATA devices are hot-pluggable, which means that they can be connected or Removed while the host is up and running. A SATA port permits singledevice Connectivity. Connecting multiple SATA drives to a host requires multiple ports .To be present on the host. Single-device connectivity enforced in SATA, eliminates.The performance problems caused by cable or port sharing in IDE/ATA. 5 Parallel scsi [Rgpv Dec2015(3)] Scsi is available in a variety of interfaces. Parallel scsi (referred to as scsi) is one of the oldest and most popular forms of storage interface used in hosts. Scsi is a set of standards used for connecting a peripheral device to a computer and transferring data between them. Often, scsi is used to connect HDDs and tapes to a host. Scsi can also connect a wide variety of other devices such as scanners and printers. Communication between the hosts and the storage devices uses the scsi command set.Since its inception, scsi has undergone rapid revisions, resulting in continu- ous performance improvements. The oldest scsi variant, called scsi-1 provided data transfer rate of 5 mb/s; scsi ultra 320 provides data transfer speeds of 320Mb/s. m S.no Rgpv question Year Q.1 Write down advantage nd Dec2014 disadvantage of DAS. Dec2013 Q,2 a n DAS? S t u y d m a What are the Limitations of Dec2012 .c o Marks 2 7 10 6 Unit-03/Lecture-02 NAS(network attached – storage) [Rgpv/dec 2015(2), Rgpv/dec2012(10), Rgpv/dec2011(10)] Network attached storage ( NAS ) i s an ip-based file-sharing device attached to a local area network. NAS provides the .Advantages of server consolidation by eliminating the need for multiple file servers. It provides storage consolidation through file-level data access and sharing. NAS is a preferred storage solution that enables clients to share files quickly and m directly with minimum storage management overhead. NAS uses network and file-sharing o protocols to perform filing and storage.Functions. These (ftp) and other Protocols for both .c environments. Recent advancements in networking technology have enabled NAS to scale up to enterprise requirements for improved Performance and reliability in accessing data. a m A NAS device is a dedicated, high-performance, high-speed, single-purpose File serving and a storage system. NAS serves a mix of clients and servers over an Ip network. Most NAS devices n support multiple interfaces and networks. A NAS device uses its own operating system and y d integrated hardware, soft-Ware components to meet specific file service needs. Its operating system is Optimized for file i/o and, therefore, performs file i/o better than a u general- Purpose server. As a result, a NAS device can serve more clients than traditional File t servers, providing the benefit of server consolidation. S Benefits of NAS NAS offers the following benefits: Supports comprehensive access to information: enables efficient file sharing and supports many-to-one and one-to-many configurations. The many-to-one 7 configuration enables a NAS device to serve many clients simultaneously. The one-tomany configuration enables one client to connect with many NAS devices simultaneously. Improved efficiency: eliminates bottlenecks that occur during file access from a general-purpose file server because NAS uses an operating system specialized for file serving. It improves the utilization of general-purpose servers by relieving them of fileserver operations. Improved flexibility: compatible for clients on both unix and windows platforms using industry-standard protocols. NAS is flexible and can serve requests from different types of clients from the same source. m Centralized storage: centralizes data storage to minimize data dupli- cation on client o .c workstations, simplify data management, and ensures greater data protection. Simplified management: provides a centralized console that makes it possible to a manage file systems efficiently. m scalability: scales well in accordance with different utilization profiles and types of a business applications because of the high performance and low-latency design. n High availability: offers efficient replication and recovery options, enabling high data y d availability. NAS uses redundant networking components that provide maximum connectivity options. A NAS device can use clustering technology for failover. u Security: ensures security, user authentication, and file locking in conjunc- tion with t industry-standard security schemas. S 8 m NAS(network attached storage) implementation o .c There are two types of NAS implementations: integrated and gateway. The integrated NAS device has all of its components and storage system in a single enclosure. In gateway a implementation, NAS head shares its storage with SAN environment. m integrated NAS a n An integrated NAS device has all the components of NAS, such as the NAS head and storage, y d in a single enclosure, or frame. This makes the integrated NAS a self-contained environment. The NAS head connects to the ip network to provide connectivity to the clients and service the u file i/o requests. The storage consists of a number of disks that can range from low-cost ATA t to high- throughput FC disk drives. Management software manages the NAS head and storage S configurations. An integrated NAS solution ranges from a low-end device, which is a single enclosure, to a high-end solution that can have an externally connected storage array. A low-end appliance-type NAS solution is suitable for applications that a small department may use, where the primary need is consolidation of storage, rather than high performance or advanced features such as disaster recovery and business continuity. This solution is fixed in 9 capacity and might not be upgradable beyond its original configuration. To expand the capacity, the solution must be scaled by deploying additional units, a task that increases management overhead because multiple devices have to be administered. In a high-end NAS solution, external and dedicated storage can be used. This enables independent scaling of the capacity in terms of NAS heads or storage. However, there is a limit to scalability of this solution. Gateway NAS m A gateway NAS device consists of an independent NAS head and one or more storage arrays. o The NAS head performs the same functions that it does in the integrated solution; while the .c storage is shared with other applications that require block-level i/o. Management functions a in this type of solution are more complex than those in an integrated environment because m there are separate administrative tasks for the NAS head and the storage. In addition to the a components that are explicitly tied to the NAS solution, a gateway solution can also utilize the n FC infrastructure, such as switches, directors, or direct-attached storage arrays. y d The gateway NAS is the most scalable because NAS heads and storage arrays can be u independently scaled up when required. Adding processing capacity to the NAS gateway is an t example of scaling. When the storage limit is reached, it can scale up, adding capacity on the S SAN independently of the NAS head. Administrators can increase performance and i/o processing capabilities for their environments without purchasing additional interconnect devices and storage. Gateway NAS enables high utilization of storage capacity by sharing it with SAN environment. Integrated NAS connectivity An integrated solution is self-contained and can connect into a standard ip network. Although the specifics of how devices are connected within a NAS implementation vary by 10 vendor and model. In some cases, storage is embedded within a NAS device and is connected to the NAS head through internal connections, such as ATA or scsi controllers. In others, the storage may be external but connected by using scsi controllers. In a high-end integrated NAS model, external storage can be directly connected by FC or by dedicated FC switches. In the case of a low-end integrated NAS model, backup traffic is shared on the same public ip network along with the regular client access traffic. In the case of a high-end integrated NAS model, an isolated backup network can be used to segment the traffic from impeding client access. More complex solutions may include an intelligent storage subsystem, enabling faster backup and larger capacities while simultaneously enhancing performance. Figure 7-4 illustrates an example of integrated NAS connectivity. m o Gateway NAS connectivity a .c In a gateway solution, front-end connectivity is similar to that in an integrated solution. An m integrated environment has a fixed number of NAS heads, making it relatively easy to a determine ip networking requirements. In contrast, networking requirements in a gateway n environment are complex to determine due to scalability options. Adding more NAS heads y d may require additional networking connectivity and bandwidth. u Communication between the NAS gateway and the storage system in a gate- way solution is t achieved through a traditional FC SAN. To deploy a stable NAS solution, factors such as S multiple paths for data, redundant fabrics, and load distribution must be considered. Factors affecting NAS performance As NAS uses IP network, bandwidth and latency issues associated with IP affect NAS performance. Network congestion is one of the most significant sources of latency in a NAS 11 environment. Other factors that affect NAS performance at different levels are: 1. Number of hops: A large number of hops can increase latency because IP processing is required at each hop, adding to the delay caused at the router. 2. Authentication with a directory service such as LDAP, Active Directory, or NIS: The authentication service must be available on the network, with adequate bandwidth, and must have enough resources to accommodate the authentication load. Otherwise, a large number of authentication requests are presented to the servers, increasing latency. Authentication adds to latency only when authentication occurs. m 3. Retransmission: Link errors, buffer overflows, and flow control mechanisms can result o in retransmission. This causes packets that have not reached the specified destination .c to be resent. Care must be taken when configuring parameters for speed and duplex settings on the network devices and the NAS heads so that they match. Improper a configuration may result in errors and retransmission, adding to latency. m a 4. Overutilized routers and switches: The amount of time that an overutilized device in n a network takes to respond is always more than the response time of an optimally y d utilized or underutilized device. Network administrators can view vendor-specific statistics to determine the utilization of switches and routers in a network. Additional u devices should be added if the current devices are over utilized. t 5. File/directory lookup and metadata requests: NAS clients access files on NAS devices. S The processing required before reaching the appropriate file or directory can cause delays. Sometimes a delay is caused by deep directory structures and can be resolved by flattening the directory structure. Poor file system layout and an over utilized disk system can also degrade performance. 6. Overutilized NAS devices: Clients accessing multiple files can cause high utilization levels on a NAS device which can be determined by viewing utilization statistics. High utilization levels can be caused by a poor file system structure or insufficient resources 12 in a storage subsystem. 7. Overutilized clients: The client accessing CIFS or NFS data may also be over utilized. An overutilized client requires longer time to process the responses received from the server, increasing latency. Specific performance-monitoring tools are available for various operating systems to help determine the utilization of client resources. S.no Q.1 Rgpv question What is NAS? Explain how Year Marks the Dec2015 2 performance of NAS can be affected if Dec2012 10 sender and receiver window is not synchronized? Q.2 Discuss various factors that affect NAS performance? a m a n S t u y d o m Dec2012 .c 10 13 Unit-03/Lecture -03 SAN(storage area network) - [Rgpv/dec2013(7)] Direct-attached storage (das) is often referred to as a stovepiped storage environment. Hosts own the storage and it is difficult to manage and share resources on these isolated storage devices. Efforts to organize this dispersed data led to the emergence of the storage area network (SAN). SAN is a high- speed, dedicated network of servers and shared storage devices. Traditionally connected over fibre channel (FC) networks, a SAN forms a single- m storage pool and facilitates data centralization and consolidation. SAN meets the stor- age o demands efficiently with better economies of scale. A SAN also provides effective maintenance and protection of data. a Components of SAN .c m a A SAN consists of three basic components: servers, network infrastructure, and storage. These n components can be further broken down into the following key elements: node ports, cabling, y d interconnecting devices (such as FC switches or hubs), storage arrays, and SAN management software. u S t Nodes links and lines 14 Cabling SAN implementations use optical fiber cabling. Copper can be used for shorter distances for back-end connectivity, as it provides a better signal-to-noise ratio for distances up to 30 meters. Optical fiber cables carry data in the form of light. There are two types of optical cables, multi-mode and single-mode. Multi-mode fiber (mmf) cable carries multiple beams of light projected at different angles simultaneously onto the core of the cable. Based on the bandwidth, multi-mode fibers are m classified as om1 (62.5µm), om2 (50µm) and laser optimized om3 (50µm). In an mmf o transmission, multiple light beams traveling inside the cable tend to disperse and collide. This .c collision weakens the signal strength after it travels a certain distance — a process known as a modal dispersion. An mmf cable is usually used for distances of up to 500 meters because of m signal degradation (attenuation) due to modal dispersion. a Single-mode fiber (smf) carries a single ray of light projected at the center of the core. These cables n are available in diameters of 7–11 microns; the most common size is 9 microns. In an smf y d transmission, a single light beam travels in a straight line through the core of the fiber. The small core and the single light wave limits modal dispersion. Among all types of fibre cables, single- u mode provides minimum signal attenuation over maximum distance (up to 10 km). A single- t mode cable is used for long-distance cable runs, limited only by the power of the laser at the transmitter S and sensitivity of Multi mode and single mode fiber the receiver. 15 Interconnect devices Hubs, switches, and directors are the interconnect devices commonly used in SAN. Hubs are used as communication devices in FC-al implementations. Hubs physically connect nodes in a logical loop or a physical star topology. All the nodes must share the bandwidth because data travels through all the connection points. Because of availability of low cost and high performance switches, hubs are no longer used in sans. Switches are more intelligent than hubs and directly route data from one physical port to another. Therefore, nodes do not share the bandwidth. Instead, each node has a dedicated communication path, resulting in bandwidth aggregation. m o SAN management software a .c SAN management software manages the interfaces between hosts, interconnect devices, and m storage arrays. The software provides a view of the SAN environment and enables a management of various resources from one central console. It provides key management n functions, including mapping of storage devices, switches, and servers, monitoring and generating y d alerts for discovered devices, and logical partitioning of the SAN, called zoning. In addition, the software provides management of typical SAN components such as storage components, and u interconnecting devices . t S Connectivity of SAN(storage area network) FC connectivity The FC architecture supports three basic interconnectivity options: point-to- point, arbitrated loop (FC-al), and fabric connect. Point-to-point Point-to-point is the simplest FC configuration — two devices are connected directly to each other, as shown in figure 6-6. This configuration provides a dedicated connection for 16 data transmission between nodes. However, the point-to-point configuration offers limited connectivity, as only two devices can communicate with each other at a given time. Moreover, it cannot be scaled to accommodate a large number of network devices. Standard das usess point- to-point connectivity. m o a .c m Point to point topology Fibre channel arbitrated loop a n y d In the FC-al configuration, devices are attached to a shared loop, FC-al has the characteristics u of a token ring topology and a physical star topology. In FC-al, each device contends with other t S devices to perform i/o operations. Devices on the loop must arbitrate to gain control of the loop. At any given time, only one device can perform i/o operations on the loop. Fiber channel ports – [Rgpv/dec2013(7)] Ports are the basic building blocks of an FC network. Ports on the switch can be one of the following types: ■■ n_port: an end point in the fabric. This port is also known as the node port. Typically, it is 17 a host port (hba) or a storage array port that is connected to a switch in a switched fabric. ■■ nl_port: a node port that supports the arbitrated loop topology. This port is also known as the node loop port. ■■ e_port: an FC port that forms the connection between two FC switches. This port is also known as the expansion port. The e_port on an FC switch Connects to the e_port of another FC switch in the fabric through a Link, which is called an inter-switch link (isl). Isls are used to transfer Host-to-storage data as well as the fabric management traffic from one switch to another. Isl is also one of the scaling mechanisms in SAN Connectivity. ■■ f_port: a port on a switch that connects an n_port. It is also known as a m Fabric port and cannot participate in FC-al. o ■■ fl_port: a fabric port that participates in FC-al. This port is connected to the nl_ports on an .c FC-al loop. A fl_port also connects a loop to a switch in a switched fabric. As a result, all a nl_ports in the loop can participate in FC-sw. This configuration is referred to as a public loop. In contrast, an arbitrated loop without any switches is referred to as a private loop. A m private loop contains nodes with nl_ports, and does not contain fl_port. a ■■ g_port: a generic port that can operate as an e_port or an f_port and determines its n y d functionality automatically during initialization. S t u 18 m o a .c m Fibre channel port a n Fibre channel topologies[Rgpv/dec2012(10)] y d There are three major fibre channel topologies, describing how a number of ports are u connected together. A port in fibre channel terminology is any entity that actively t communicates over the network, not necessarily a hardware port. This port is usually S implemented in a device such as disk storage, an HBA on a server or a fibre channel switch.[1] Point-to-point (FC-P2P). Two devices are connected directly to each other. This is the simplest topology, with limited connectivity. Arbitrated loop (FC-AL). In this design, all devices are in a loop or ring, similar to token ring networking. Adding or removing a device from the loop causes all activity on the loop to be interrupted. The failure of one device causes a break in the ring. Fibre Channel hubs exist to connect multiple devices together and may bypass failed ports. A loop may also 19 be made by cabling each port to the next in a ring. A minimal loop containing only two ports, while appearing to be similar to FC-P2P, differs considerably in terms of the protocol. Only one pair of ports can communicate concurrently on a loop. Maximum speed of 8GFC. switched fabric (FC-SW). All devices or loops of devices are connected to fibre channel switches, similar conceptually to modern ethernet implementations. Advantages of this topology over FC-P2P or FC-AL include: The switches manage the state of the fabric, providing optimized interconnections. The traffic between two ports flows through the switches only, it is not transmitted to m o any other port. .c Failure of a port is isolated and should not affect operation of other ports. Multiple pairs of ports may communicate simultaneously in a fabric. a S.no Rgpv question Q.1 Explain different types of FC ports? y d Explain different interfaces Q.3 in FC? t u Discuss the advantages FC- S SW and FC-AL? m Marks Rgpv Dec2013 7 Rgpv Dec2012 10 a n Q.2 Year 20 Unit-03/Lecture -04 Content-addressed storage(CAS) – [Rgpv/dec 2014(3),Rgpv/dec2013(10)] CAS is an object-based system that has been purposely built for storing fixed content data. It is designed for secure online storage and retrieval of fixed content. Unlike file‑level and block‑level data access that use file names and the physical location of data for storage and retrieval, CAS stores user data and its attributes as separate objects. The stored object is assigned a globally unique address known as a content address (ca). This address is derived from m the object’s binary representation. CAS provides an optimized and centrally managed storage o solution that can support single-instance storage (sis) to eliminate multiple copies of the same data. a Features and benefits of CAS .c m a CAS has emerged as an alternative to tape and optical solutions because it over‑ comes many of n their obvious deficiencies. CAS also meets the demand to improve data accessibility and to properly y d protect, dispose of, and ensure service level agreements for archived data. The features and benefits of CAS include the following: u t Content authenticity: it assures the genuineness of stored content. This is achieved by S generating a unique content address and automating the process of continuously checking and recalculating the content address for stored objects. Content authenticity is assured because the address assigned to each piece of fixed content is as unique as a fingerprint. Every time an object is read, CAS uses a hashing algorithm to recalculate the object’s content address as a validation step and compares the result to its original content address. If the object fails validation, it is rebuilt from its mirrored copy. Content integrity: refers to the assurance that the stored content has not been altered. 21 Use of hashing algorithm for content authenticity also ensures content integrity in CAS. If the fixed content is altered, CAS assigns a new address to the altered content, rather than overwrite the original fixed content, providing an audit trail and maintaining the fixed content in its original state. As an integral part of maintaining data integrity and audit trail capabilities, CAS supports parity raid protection in addition to mirroring. Every object in a CAS system is systematically checked in the background. Over time, every object is tested, guarantee‑ ing content integrity even in the case of hardware failure, random error, or attempts to alter the content with malicious intent. Llocation independence: CAS uses a unique identifier that applications can leverage to m retrieve data rather than a centralized directory, path o Names, or urls. Using a content address to access fixed content makes the physical location .c of the data irrelevant to the application request‑ ing the data. Therefore the location from which the data is accessed is transparent to the application. This yields complete content a mobility to applications across locations. m Single-instance storage (sis): the unique signature is used to guarantee the storage of only a a single instance of an object. This signature is derived from the binary representation of the n object. At write time, the CAS system is polled to see if it already has an object with the same y d signature. If the object is already on the system, it is not stored, rather only a pointer to that object is created. Sis simplifies storage resource management tasks, especially when u handling hundreds of terabytes of fixed content. t Retention enforcement: protecting and retaining data objects is a core requirement of an S archive system. CAS creates two immutable components: a data object and a meta‑object for every object stored. The meta‑ object stores object’s attributes and data handling policies. For systems that support object‑retention capabilities, the retention policies are enforced until the policies expire. Record-level protection and disposition: all fixed content is stored in CAS once and is backed up with a protection scheme. The array is com‑ posed of one or more storage clusters. Some CAS architectures provide an extra level of protection by replicating the content onto arrays 22 located at a different location. The disposition of records also follows the stringent guidelines established by regulators for shredding and disposing of data in electronic formats. technology independence: the CAS system interface is impervious to technology changes. As long as the application server is able to map the original content address the data remains accessible. Although hardware changes are inevitable, the goal of CAS hardware vendors is to ensure compatibility across platforms. Fast record retrieval: CAS maintains all content on disks that provide subsecond time to first byte (200 ms–400 ms) in a single cluster. Random disk access in CAS enables fast record retrieval. m S.no Rgpv question Q.1 To access data in a SAN , a host uses a Year physical address known a logical a m address. A host using a CAS device a does not use or need a physical address Why? S t u .c y d n o Dec2013 Marks 10 23 Unit -03 /Lecture-06 Hub, switches, storage array [Rgpv/dec2013(7)] A hub is the most basic networking device that connects multiple computers or other network devices together. Unlike a network switch or router, a network hub has no routing tables or intelligence on where to send information and broadcasts all network data across each connection. Most hubs can detect basic network errors such as collisions, but having all information broadcast to multiple ports can be a security risk and cause bottlenecks. In the m past network hubs were popular because they were much cheaper than a switch and router, o but today most switches do not cost much more than a hub and are a much better solution for any network. a .c In general, a hub refers to a hardware device that enables multiple devices or connections to be connected to a computer. Another example besides the one given above is a usb hub, m which allows dozens of usb devices to be connected to one computer, even though that a computer may only have a few usb connections. The picture is an example of a usb hub. n Switches u y d t A switch is a device used on a computer network to physically connect devices together. S Multiple cables can be connected to a switch to enable networked devices to communicate with each other. Switches manage the flow of data across a network by only transmitting a received message to the device for which the message was intended. Each networked device connected to a switch can be identified using a mac address, allowing the switch to regulate the flow of traffic. This maximises security and efficiency of the network. Because of these features, a switch is often considered more "intelligent" than a network hub. Hubs neither provide security, or identification of connected devices. This means that 24 messages have to be transmitted out of every port of the hub, greatly degrading the efficiency of the network. Switches may operate at one or more layers of the osi model, including the data link and network layers. A device that operates simultaneously at more than one of these layers is known as a multilayer switch. In switches intended for commercial use, built-in or modular interfaces make it possible to connect different types of networks, including ethernet, fibre channel, atm, itu-t g.hn and 802.11. This connectivity can be at any of the layers mentioned. While layer-2 functionality is adequate for bandwidth-shifting within one technology, interconnecting technologies such as ethernet and token ring is easier at layer 3. m o Devices that interconnect at layer 3 are traditionally called routers, so layer-3 switches can also be regarded as (relatively primitive) routers. a .c Where there is a need for a great deal of analysis of network performance and security, m switches may be connected between wan routers as places for analytic modules. Some vendors provide firewall,[3][4] network intrusion detection, and performance analysis modules a that can plug into switch ports. Some of these functions may be on combined modules. n y d In other cases, the switch is used to create a mirror image of data that can go to an external device. Since most switch port mirroring provides only one mirrored stream, network hubs u can be useful for fanning out data to several read-only analyzers, such as intrusion detection t systems and packet sniffers. S Storage array The fundamental purpose of a SAN is to provide host access to storage resources. The large storage capacities offered by modern storage arrays have been exploited in SAN environments for storage consolidation and centralization. SAN implementations complement the standard features of storage arrays by providing high availability and redundancy, improved performance, business continuity, and multiple host connectivity. 25 S.no Rgpv question Year Marks Q.1 Explain briefly following Dec2013 7 A)node port, B)storage array, C)SAN, D)hub, E)switches m o a m a n S t u y d .c 26 Additional Topic Unit - 03/Lecture - 07 Jbod: just a bunch of disks [Rgpv/dec2015(2)] If we compare disk subsystems with regard to their controllers we can differentiate between three levels of complexity: (1) no controller; (2) raid controller and (3) intelligent controller with additional services such as instant copy and remote mirroring. If the disk subsystem has no internal controller, it is only an enclosure full of disks m (jbods). In this instance, the hard disks are permanently fitted into the enclosure and o the connections for i/o channels and power supply are taken outwards at a single point. .c Therefore, a jbod is simpler to manage than a few loose hard disks. Typical jbod disk a subsystems have space for 8 or 16 hard disks. A connected server recognises all these hard disks as independent disks. Therefore, 16 device addresses are required for a jbod disk m subsystem incorporating 16 hard disks. In some i/o techniques such as scsi and fibre channel a arbitrated loop this can lead to a bottleneck at device addresses in contrast to intelligent n disk subsystems, a jbod disk subsystem in particular is not capable of supporting raid or y d other forms of virtualisation. If required, however, these can be realised outside the jbod disk subsystem, for example, as software in the server or as an independent virtualisation u entity in the storage network. t S Fiber channel overview The fc architecture forms the fundamental construct of the san infrastruc- ture. Fibre channel is a high-speed network technology that runs on high-speed optical fiber cables (preferred for front-end san connectivity) and serial cop- per cables (preferred for back-end disk connectivity). The fc technology was created to meet the demand for increased speeds of data transfer among com- puters, servers, and mass storage subsystems. Although fc networking was introduced in 1988, the fc standardization process began when the american 27 national standards institute (ansi) chartered the fibre channel working group (fcwg). By 1994, the new high-speed computer interconnection standard was developed and the fibre channel association (fca) was founded with 70 charter member companies. Technical committee, which is the committee within incits (international committee for information technology standards), is responsible for fibre channel interfaces. T11 (previously known as x3t9.3) has been producing interface standards for high performance and mass storage applications. Higher data transmission speeds are an important feature of the fc network- ing technology. The initial implementation offered throughput of 100 mb/s (equivalent to raw bit rate of m 1gb/s i.e. 1062.5 mb/s in fibre channel), which was greater than the speeds of ultra scsi (20 o .c mb/s) commonly used in das environments. Fc in full-duplex mode could sustain throughput of 200 mb/s. In comparison with ultra-scsi, fc is a significant leap in storage networking tech- a nology. Latest fc implementations of 8 gfc (fibre channel) offers throughput of 1600 mb/s (raw m bit rates of 8.5 gb/s), whereas ultra320 scsi is available with a throughput of 320 mb/s. The fc a architecture is highly scalable and theoreti- cally a single fc network can accommodate n approximately 15 million node. u y d S.no Rgpv question Q.1 What is JBOD? S t Year Marks Dec 2015 2 28 References Book Information storage management Author Priority G. Somasundaram 1 Alok Shrivastava Ulf Troppens, Wolfgang Storage Network explained : Basic Mueller-Friedt, and application of fiber channels, Erkens, Rainer Wolafka, SAN, NAS, iSESI Nils Haustein Cloud a & m a n S t u y d System m o .c : Application Nick Antonopoulos, Lee Gillam Rainer Computing Principles, 2 3 1 Unit – 4 Hybrid storage Solutions Virtualization Unit 04/Lecture - 01 Storage virtualization – [Rgpv/dec2013(10)] Virtualization is the technique of masking or abstracting physical resources, which simplifies the infrastructure and accommodates the increasing pace of business and technological changes. It increases the utilization and capability of it resources, such as servers, networks, m or storage devices, beyond their physical limits. Virtualization simplifies resource o .c management by pooling and sharing resources for maximum utilization and makes them appear as logical resources with enhanced capabilities. a m Forms of virtualization a n Virtualization has existed in the it industry for several years and in different forms, including memory virtualization, y d network virtualization. u virtualization, server virtualization, and storage t 1. Memory virtualization S Virtual memory makes an application appear as if it has its own contiguous logi- cal memory independent of the existing physical memory resources. Since the beginning of the computer industry, memory has been and continues to be an expensive component of a host. It determines both the size and the number of applications that can run on a host. With technological advancements, memory technology has changed and the cost of 2 memory has decreased. Virtual memory managers have evolved, enabling multiple applications to be hosted and processed simultaneously. In a virtual memory implementation, a memory address space is divided into contiguous blocks of fixed-size pages. A process known as paging saves inactive memory pages onto the disk and brings them back to physical memory when required. This enables efficient use of available physical memory among different processes. The space used by vmms on the disk is known as a swap file. A swap file (also known as page file or swap space) is a portion of the hard disk that functions like physical memory (ram) to the operating system. The operating system typically moves the m least used data into the swap file so that ram will be available for processes that are o more active. Because the space allocated to the swap file is on the hard disk (which is .c slower than the physical memory), access to this file is slower. a m 2.Network virtualization a n Network virtualization creates virtual networks whereby each application sees its y d own logical network independent of the physical network. A virtual lan (vlan) is an example of network virtualization that provides an easy, flexible, and less expensive u way to manage networks. Vlans make large networks more manageable by enabling t a centralized configuration of devices located in physically diverse locations. S 3. server virtualization Server virtualization enables multiple operating systems and applications to run simultaneously on different virtual machines created on the same physical server (or group of servers). Virtual machines provide a layer of abstraction between the operating 3 System and the underlying hardware. Within a physical server, any number of virtual servers can be established; depending on hardware capabilities (see figure 10-1). Each virtual server seems like a physical machine to the operating system, although all virtual servers share the same underlying physical hardware in an isolated manner. For example, the physical memory is shared between virtual servers but the address space is not. Individual virtual servers can be restarted, upgraded, or even crashed, without affecting the other virtual servers on the same physical machine. S.no Rgpv question Q.1 What are Year various forms Marks of Rgpv Dec 2011 vertulization? Explain each in brief? 10 o a m a n S t u y d m .c 4 Unit-04/Lecture -02 Virtual LAN(VLANs) - [Rgpv/dec 2011(10)] In simple terms, a VLAN is a set of workstations within a LAN that can communicate with each other as though they were on a single, isolated LAN. What does it mean to say that they communicate with each other as though they were on a single, isolated LAN ? Among other things, it means that: m Broadcast packets sent by one of the workstations will reach all the others in the o VLAN workstations that are not in the VLAN a Broadcasts sent by workstations that are not in the VLAN will never reach m workstations that are in the VLAN .c .Broadcasts sent by one of the workstations in the VLAN will not reach any a n The workstations can all communicate with each other without needing to go through y d a gateway. For example, IP connections would be established by ARPing for the destination IP and sending packets directly to the destination workstation—there u would be no need to send packets to the IP gateway to be forwarded on. S t The workstations can communicate with each other using non-routable protocols. A Local Area Network (LAN) was originally defined as a network of computers located within the same area. Today, Local Area Networks are defined as a single broadcast domain. This means that if a user broadcasts information on his/her LAN, the broadcast will be received by every other user on the LAN. Broadcasts are prevented from leaving a LAN by using a router. The disadvantage of this method is routers usually take more time to process incoming data compared to a bridge or a switch. More importantly, the formation of broadcast domains 5 depends on the physical connection of the devices in the network. Virtual Local Area Networks (VLAN's) were developed as an alternative solution to using routers to contain broadcast traffic. In a traditional LAN, workstations are connected to each other by means of a hub or a repeater. These devices propagate any incoming data throughout the network. However, if two people attempt to send information at the same time, a collision will occur and all the transmitted data will be lost. Once the collision has occurred, it will continue to be propagated throughout the network by hubs and repeaters. The original information will therefore need to be resent after waiting for the collision to be resolved, thereby incurring a significant wastage of time and resources. To prevent collisions from traveling through all the workstations in the network, a m bridge or a switch can be used. These devices will not forward collisions, but will allow o broadcasts (to every user in the network) and multicasts (to a pre-specified group of users) to .c pass through. A router may be used to prevent broadcasts and multicasts from traveling a through the network. m The workstations, hubs, and repeaters together form a LAN segment. A LAN segment is also a known as a collision domain since collisions remain within the segment. The area within which n broadcasts and multicasts are confined is called a broadcast domain or LAN. Thus a LAN can y d consist of one or more LAN segments. Defining broadcast and collision domains in a LAN depends on how the workstations, hubs, switches, and routers are physically connected u together. This means that everyone on a LAN must be located in the same area. t S Types of Connections Devices on a VLAN can be connected in three ways based on whether the connected devices are VLAN-aware or VLAN-unaware. Recall that a VLAN-aware device is one which understands VLAN memberships (i.e. which users belong to a VLAN) and VLAN formats. 1) Trunk Link All the devices connected to a trunk link, including workstations, must be VLAN-aware. 6 All frames on a trunk link must have a special header attached. These special frames are called tagged frames . 2) Access Link An access link connects a VLAN-unaware device to the port of a VLAN-aware bridge. All frames on access links must be implicitly tagged (untagged) (see Figure8). The VLANunaware device can be a LAN segment with VLAN-unaware workstations or it can be a number of LAN segments containing VLAN-unaware devices (legacy LAN). 3) Hybrid Link m o This is a combination of the previous two links. This is a link where both VLAN-aware .c and VLAN-unaware devices are attached (see Figure9). A hybrid link can have both tagged and untagged frames, but all the frames for a specific VLAN must be either a tagged or untagged. m a Advantages n y d Performance. As mentioned above, routers that forward data in software become a bottleneck as LAN data rates increase. Doing away with the routers u removes this bottleneck. t S Formation of virtual workgroups. Because workstations can be moved from one VLAN to another just by changing the configuration on switches, it is relatively easy to put all the people working together on a particular project all into a single VLAN. They can then more easily share files and resources with each other. To be honest, though, virtual workgroups sound like a good idea in theory, but often do not work well in practice. It turns out that users are usually more interested in accessing company-wide resources (file servers, printers, etc.) than files on each 7 others' PCs. Greater flexibility. If users move their desks, or just move around the place with their laptops, then, if the VLANs are set up the right way, they can plug their PC in at the new location, and still be within the same VLAN. This is much harder when a network is physically divided up by routers. Ease of partitioning off resources. If there are servers or other equipment to which the network administrator wishes to limit access, then they can be put off into their own VLAN. Then users in other VLANs can be given access selectively. m o a .c m a n u y d t S S.no Rgpv question Year Marks Q.1 What do you mean by VLANs? Rgpv Dec 2011 10 8 Unit-04/Lecture -03 Management matrix – [Rgpv/dec 2012(10),rgpv dec2011(10)] Definition: A style of management where an individual has two reporting superiors (bosses) one functional and one operational. matrix management is the practice of managing individuals with more than one reporting line (in a matrix organization structure), but it is also commonly used to describe managing cross functional, cross business group and other forms of working that cross the traditional vertical m business units. It is a type of organizational management in which people with similar skills are o pooled for work assignments, resulting in more than one manager (sometimes referred to as .c solid line and dotted line reports, in reference to traditional business organization charts). a Management advantages and disadvantages m Key advantages that organizations seek when introducing a matrix include: a n To break business information silos - to increase cooperation and communication across y d the traditional silos and unlock resources and talent that are currently inaccessible to the rest of the organization. u To deliver work across the business more effectively – to serve global customers, t manage supply chains that extend outside the organization, and run integrated business S regions, functions and processes. To be able to respond more flexibly – to reflect the importance of both the global and the local, the business and the function in the structure, and to respond quickly to changes in markets and priorities. To develop broader people capabilities – a matrix helps develop individuals with broader perspectives and skills who can deliver value across the business and manage in a more complex and interconnected environment. 9 Key disadvantages of matrix organizations include: Mid-level management having multiple supervisors can be confusing, in that competing agendas and emphases can pull employees in different directions, which can lower productivity. Mid-level management can become frustrated with what appears to be a lack of clarity with priorities. Mid-level management can become over-burdened with the diffusion of priorities. Supervisory management can find it more difficult to achieve results within their area of expertise with subordinate staff being pulled in different directions. m o Application a .c The advantages of a matrix for project management can include: m a Individuals can be chosen according to the needs of the project. The use of a project team that is dynamic and able to view problems in a different way n y d as specialists have been brought together in a new environment. Project managers are directly responsible for completing the project within a specific u deadline and budget. t S The disadvantages for project management can include: A conflict of loyalty between line managers and project managers over the allocation of resources. Projects can be difficult to monitor if teams have a lot of independence. Costs can be increased if more managers (i.e. project managers) are created through the use of project teams. 10 S.no Rgpv question Year Marks Q.1 What do you understand by managent Rgpv Dec 2012 10 metrix? Explain. 10 Rgpv Dec 2012 m o a m a n S t u y d .c 11 Unit-04/Lecture -04 Data center infrastructure - [Rgpv/dec2013(7), Rgpv/dec2012(10), Rgpv/dec2012(10] Organizations maintain data centers to provide centralized data processing capabilities across the enterprise. Data centers store and manage large amounts of mission-critical data. The data center infrastructure includes computers, storage systems, network devices, dedicated power backups, and environmental controls (such as air conditioning and fire suppression). Large organizations often maintain more than one data center to distribute data processing m workloads and provide backups in the event of a disaster. The storage requirements of a data o center are met by a combination of various stor- age architectures. Core elements a .c m Five core elements are essential for the basic functionality of a data center: a Application: an application is a computer program that provides the logic for n computing operations. Applications, such as an order processing system. y d Database: more commonly, a database management system (dbms) provides a structured way to store data in logically organized tables that are interrelated. A dbms u optimizes the storage and retrieval of data. t Server and operating system: a computing platform that runs applica- tions and S databases. Network: a data path that facilitates communication between clients and servers or between servers and storage. Storage array: a device that stores data persistently for subsequent use. These core elements are typically viewed and managed as separate entities, but all the elements must work together to address data processing requirements. 12 key requirements for data center elements Uninterrupted operation of data centers is critical to the survival and success of a business. It is necessary to have a reliable infrastructure that ensures data is accessible at all times. While the requirements, shown in figure 1-6, are appli- cable to all elements of the data center m infrastructure, our focus here is on storage systems. o a .c m a n t u y d Key characteristics of data center elements Availability: all data center elements should be designed to ensure accessibility. The S inability of users to access data can have a significant negative impact on a business. Security: polices, procedures, and proper integration of the data center core elements that will prevent unauthorized access to information must be established. In addition to the security measures for client access, specific mechanisms must enable servers to access only their allocated resources on storage arrays. Scalability: data center operations should be able to allocate additional processing capabilities or storage on demand, without interrupting business operations. Business growth often requires deploying more servers, new applications, and additional 13 databases. The storage solution should be able to grow with the business. Performance: all the core elements of the data center should be able to optimal performance and service all processing requests at high provide speed. The infrastructure should be able to support performance requirements. Data integrity: data integrity refers to mechanisms such as error correction codes or parity bits which ensure that data is written to disk exactly as it was received. Any variation in data during its retrieval implies corruption, which may affect the operations of the organization. Capacity: data center operations require adequate resources to store and process large amounts of data efficiently. When capacity requirements increase, the data center m must be able to provide additional capacity with- out interrupting availability, or, at the o .c very least, with minimal disruption. Capacity may be managed by reallocation of existing resources, rather than by adding new resources. a Manageability: a data center should perform all operations and activities in the most m efficient manner. Manageability can be achieved through automation and the a reduction of human (manual) intervention in com- mon tasks. n y d Managing storage infrastructure Managing a modern, complex data center involves many tasks. Key management activities u include: t S Monitoring is the continuous collection of information and the review of the entire data center infrastructure. The aspects of a data center that are monitored include security, performance, accessibility, and capacity. Reporting is done periodically on resource performance, capacity, and utilization. Reporting tasks help to establish business justifications and chargeback of costs associated with data center operations. provisioning is the process of providing the hardware, software, and other resources needed to run a data center. Provisioning activities include capacity and resource 14 planning. Capacity planning ensures that the user’s and the application’s future needs will be addressed in the most cost-effective and controlled manner. Resource planning is the process of evaluating and identifying required resources, such as personnel, the facility (site), and the technology. Resource planning ensures that adequate resources are available to meet user and application requirements. S.no Rgpv question Year Marks Q.1 What are the data centre? What Rgpv Dec2013 7 are the requirement for the Rgpv Dec2012 10 design of a secure data centre. 10 Rgpv Dec2012 o m a m a n S t u y d .c 15 Unit-04/Lecture -05 Backup & disaster recovery – [Rgpv/dec 2013(10), Rgpv/dec 2012(7),Rgpv/dec 2012(10)] Backup is a copy of production data, created and retained for the sole purpose of recovering deleted or corrupted data. With growing business and regulatory demands for data storage, retention, and availability, organizations are faced with the task of backing up an ever-increasing amount of data. This task becomes more challenging as demand for consistent backup and quick restore of data increases th roughout the enter prise wh ich may be spread over multiple sites. Moreover, organizations need to accomplish backup at a lower cost with minimum resources. m o a Backup purpose .c m a Backups are performed to serve three purposes: disaster recovery, operational backup, and n archival. Disaster recovery u y d t Backups can be performed to address disaster recovery needs. The backup copies S are used for restoring data at an alternate site when the primary site is incapacitated due to a disaster. Based on rpo and rto requirements, organizations use different backup strategies for disaster recovery. When a tape-based backup method is used as a disaster recovery strategy, the backup tape media is shipped and stored at an offsite location. These tapes can be recalled for restoration at the disaster recovery site. Organizations with stringent rpo and rto requirements use remote replication technology to replicate data to a disaster recovery site. This allows organizations to bring up production systems online in a relatively short period of time in the event of a 16 disaster. Operational backup Data in the production environment changes with every business transaction and operation. Operational backup is a backup of data at a point in time and is used to restore data in the event of data loss or logical corruptions that may occur during routine processing. The majority of restore requests in most organizations fall in this category. For example, it is common for a user to accidentally delete an important e-mail or for a file to become corrupted, which can be restored from operational backup. m Operational backups are created for the active production information by using incremental or o differential backup techniques, detailed later in this chapter. An example of an operational .c backup is a backup performed for a production data- base just before a bulk batch update. This a ensures the availability of a clean copy of the production database if the batch update corrupts m the production database. a n Archival y d Backups are also performed to address archival requirements. Although CAS has emerged as u the primary solution for archives, traditional backups are still used by small and medium t enterprises for long-term preservation of transaction S Records, e-mail messages, and other business records required for regulatory compliance. Apart from addressing disaster recovery, archival, and operational require- ments, backups serve as a protection against data loss due to physical damage of a storage device, software failures, or virus attacks. Backups can also be used to protect against accidents such as a deletion or intentional data destruction. Backup methods 17 Hot backup and cold backup are the two methods deployed for backup. They are based on the state of the application when the backup is performed. In a hot backup, the application is up and running, with users accessing their data during the backup process. In a cold backup, the application is not active during the backup process. The backup of online production data becomes more challenging because data is actively being used and changed. An open file is locked by the operating system and is not copied during the backup process until the user closes it. The backup application can back up open files by retrying the operation on files that were opened earlier in the backup process. During the backup process, it may be possible that files opened earlier will be closed and a retry will be successful. The maximum number of retries can be configured depending on the backup application. m However, this method is not considered robust because in some environments certain files o are always open. .c In such situations, the backup application provides open file agents. These agents interact a directly with the operating system and enable the creation of consistent copies of open files. In m some environments, the use of open file agents is not enough. For example, a database is a composed of many files of varying sizes, occupying several file systems. To ensure a n consistent database backup, all files need to be backed up in the same state. That does not y d necessarily mean that all files need to be backed up at the same time, but they all must be syn- chronized so that the database can be restored with consistency. u S t Backup architecture and process 18 The storage node is responsible for writing data to the backup device (in a backup environment, a storage node is a host that controls backup devices). Typically, the storage node is integrated with the backup server and both are hosted on the same physical platform. A backup device is attached directly to the storage node’s host platform. Some backup architecture refers to the storage node as the media server because it connects to the storage device. Storage nodes play an important role in backup planning because they can be used to consolidate backup servers. Backup software also provides extensive reporting capabilities based on the backup catalog and the log files. These reports can include information such as the amount of data backed up, the m number of completed backups, the number of incomplete backups, and the types of errors that o may have occurred. Reports can be customized depending on the specific backup software used. a S.no Rgpv question Q.1 Explain briefly the following .c a m Year Marks Rgpv Dec 2013 7 n a) Disaster y d b) Operational backup Q.2 t u c) Archival What do you understand by backup and Rgpv Dec 2013 S disaster recovery? Explain. 10 19 Unit 04/Lecture - 06 Backup topologies – [Rgpv/2013(7)] Three basic topologies are used in a backup environment: direct attached backup, lan based backup, and SAN based backup. A mixed topology is also used by combining lan based and SAN based topologies. In a direct-attached backup, a backup device is attached directly to the client. Only the metadata is sent to the backup server through the lan. This configuration frees the lan from m backup traffic. o .c Depicts use of a backup device that is not shared. As the environment grows, however, there will be a need for central management of all backup devices and to share the resources to a optimize costs. An appropriate solution is to share the backup devices among multiple servers. m In this example, the client also acts as a storage node that writes data on the backup device. a n In lan-based backup, all servers are connected to the lan and all storage devices are directly y d attached to the storage node. The data to be backed up is transferred from the backup client (source), to the backup device (destination) over the lan, which may affect network u performance. Streaming across the lan also affects network performance of all systems t connected to the same segment as the backup server. Network resources are severely S constrained when multiple clients access and share the same tape library unit (tlu). This impact can be minimized by adopting a number of measures, such as configuring separate networks for backup and installing dedicated storage nodes for some application servers. 20 Lan-based backup topology m o a .c m a n t u S y d SAN-based backup topology The mixed topology uses both the lan-based and SAN-based topologies, as shown in figure. This topology might be implemented for several reasons, including cost, server location, reduction in administrative overhead, and performance considerations. 21 m .c Mixed backup topology a Serverless backup o m a Serverless backup is a lan-free backup methodology that does not involve a backup server n to copy data. The copy may be created by a network-attached controller, utilizing a scsi y d extended copy or an appliance within the SAN. These backups are called serverless because they use SAN resources instead of host resources to transport backup data from its source to u t the backup device, reducing the impact on the application server. S Another widely used method for performing serverless backup is to lever- age local and remote replication technologies. In this case, a consistent copy of the production data is replicated within the same array or the remote array, which can be moved to the backup device through the use of a storage node. S.no Rgpv question Q.1 List and Year explain topologies for backup? different Rgpv Dec 2013 Marks 7 22 Unit 04/Lecture -07 SNIA storage virtualization taxonomy - [Rgpv/dec2013(7)] The snia (storage networking industry association) storage virtualization taxon- omy provides a systematic classification of storage virtualization, with three levels defining what, where, and how storage can be virtualized. m o a .c m a Snia storage virtualization taxonomy n y d The first level of the storage virtualization taxonomy addresses what is created. It specifies the types of virtualization: block virtualization, file virtual- ization, disk virtualization, tape u virtualization, or any other device virtualiza- tion. t S The second level describes where the virtualization can take place. This requires a multilevel approach that characterizes virtualization at all three levels of the storage environment: server, storage network, and storage, as shown in . An effective virtualization strategy distributes the intelligence across all three levels while centralizing the management and control functions. Data storage functions—such as raid, caching, checksums, and hard- ware scanning—should remain on the array. Similarly, the host should control application-focused areas, such as clustering and application failover, and vol- ume 23 management of raw disks. However, path redirection, path failover, data access, and distribution or load-balancing capabilities should be moved to the switch or the network. S.no Rgpv question Year Marks Q.1 Explain SNIA storage virtualization Rgpv Dec 2013 7 taxonomy? m o a m a n S t u y d .c 24 Unit - 04/Lecture-08 Types of storage virtualization- [Rgpv/2013(7)] Virtual storage is about providing logical storage to hosts and applications independent of physical resources. Virtualization can be implemented in both SAN and NAS storage environments. In a SAN, virtualization is applied at the block level, whereas in NAS, it is applied at the file level. m Block-level storage virtualization o .c Block-level storage virtualization provides a translation layer in the SAN, between the hosts and the storage arrays. Instead of being directed to the luns on the individual storage arrays, a the hosts are directed to the virtualized luns on the virtualization device. The virtualization m device translates between the virtual luns and the physical luns on the individual arrays. This a facilitates the use of arrays from different vendors simultaneously, without any interoperability n issues. For a host, all the arrays appear like a single target device and luns can be distributed or y d even split across multiple arrays. Block-level storage virtualization extends storage volumes online, resolves application growth u requirements, consolidates heterogeneous storage arrays, and enables transparent volume t access. It also provides the advantage of non- disruptive data migration. S In traditional SAN environments, lun migration from one array to another was an offline event because the hosts needed to be updated to reflect the new array configuration. In other instances, host cpu cycles were required to migrate data from one array to the other, especially in a multi vendor environment. With a block-level virtualization solution in place, the virtualization engine handles the back-end migration of data, which enables luns. To remain online and accessible while data is being migrated. No physical changes are required because the host still points to the same virtual targets on the virtualization 25 device. However, the mappings on the virtualization device should be changed. These changes can be executed dynamically and are transparent to the end user. m o a .c m Block-level storage virtualization a n Deploying heterogeneous arrays in a virtualized environment facilitates an information y d lifecycle management (ILM) strategy, enabling significant cost and resource optimization. Lowvalue data can be migrated from high- to low-per- formance arrays or disks. u t file-level virtualization S File-level virtualization addresses the NAS challenges by eliminating the dependencies between the data accessed at the file level and the location where the files are physically stored. This provides opportunities to optimize storage utilization and server consolidation and to perform nondisruptive file migrations 26 S.no Q.1 Rgpv question Explain block level storage Year Marks Rgpv Dec 2013 7 vertulization? m o a m a n S t u y d .c 27 Additional Topic Unit -04/Lecture -09 Managing & monitoring- Snmp- the snmp protocol was the standard used to manage multi-vendor san environments. However, snmp was primarily a network management protocol and was inadequate for providing the detailed information and functionality required to manage the san environment. The unavailability of automatic discovery functions, weak modeling constructs, and lack of transactional sup- port are some inadequacies of snmp in a san environment. Even m with these limitations, snmp still holds a predominant role in san management, although newer o open storage san management standards have emerged to monitor and manage these environments more effectively. a .c Storage management initiative m a n y d The storage networking industry association (snia) has been engaged in an initiative to develop a common, open storage, and san management interface. Smi-s is based on web-based u enterprise management (wbem) technology and the dmtf’s common information model (cim). t The initiative was formally created to enable broad interoperability among heterogeneous S storage vendor systems and to enable better management solutions that span these environments. This initiative is known as the storage management initiative (smi). The smi specification, known as smi-s, offers substantial benefits to users and vendors. It forms a normalized, abstracted model to which a storage infra- structure’s physical and logical components can be mapped, and which can be used by management applications such as storage resource management, device management, and data management for standardized, 28 effective, end-to-end control of storage resources . Using smi-s, the storage software developers have a single normalized and unified object model comprising the detailed document that contains information about managing the breadth of san components. Moreover, smi-s eliminates the need for development of vendorproprietary management interfaces, enabling vendors to focus on added value functions and offering solutions in a way that will support new devices as long as they adhere to the standard. Using smi-s, device vendors can build new features and functions to manage storage subsystems and expose them via smi-s. The smi-s-compliant products lead to easier, faster deployment, and accelerated adoption of policy-based storage management frameworks. m o The information required to perform management tasks is better organized or structured in a .c way that enables disparate groups of people to use it. This can be accomplished by a developing a model or representation of the details required by users working within a m particular domain. Such an approach is referred to as an information model. An information a model requires a set of legal statements or syntax to capture the representation and n expressions necessary to manage common aspects of that domain. S t u y d 29 m o a .c m a n y d The cim is a language and methodology for describing management elements. A cim schema u includes models for systems, applications, networks, and devices. This schema also enables t applications from different vendors working on different platforms to describe the S management data in a standard format so that it can be shared among a variety of management applications. The following features of smi-s simplify san management: Common data model: smi-s agents interact with an smi-s-enabled device, such as a switch, a server, or a storage array, to extract relevant management data. They can also interact at the management layer to exchange information between one management application and 30 another. They then provide this information to the requester in a consistent syntax and format. Interconnect independence: smi-s eliminates the need to redesign the management transport and enables components to be managed by using out-of-band communications. In addition, smi-s offers the advantages of specifying the cmi-xml over the http protocol stack and utilizing the lower layers of the tcp/ip stack, both of which are ubiquitous in today’s networking world. Multilayer management: smi-s can be used in a multilayer and cross- domain environment— for example, server-based volume managers and network storage appliances. Many storage deployment environments currently employ this combination. m legacy system accommodation: smi-s can be used to manage legacy systems by using a o proxy agent or can be directly supported by the device itself. Smi-s can coexist with .c proprietary apis and agents as well as providing integration framework for such mechanisms. a Policy-based management: smi-s includes object models applicable across all classes of devices, enabling a san administrator to implement policy-based management for entire m storage networks. a n S t u y d 31 Reference Book Information storage management Author Priority G. Somasundaram 1 Alok Shrivastava Ulf Troppens, Wolfgang 2 Storage Network explained : Basic and Mueller-Friedt, Rainer m application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils iSESI o Haustein .c Cloud Computing : Principles, 3 a System & Application Nick Antonopoulos, Lee Gillam m a n S t u y d 1 Unit - 05 Information storage on cloud Unit 05/Lecture - 01 Cloud computing – [Rgpv /dec 2014(2), Rgpv/dec 2013(10), Rgpv/dec 2012(10)] Cloud computing is a term used to refer to a model of network computing where a program or application runs on a connected server or servers rather than on a local computing device such as a pc, tablet or smartphone. Like the traditional client-server model or older mainframe computing,[1] a user connects with a server to perform a task. The difference with cloud m o computing is that the computing process may run on one or many connected computers at the .c same time, utilizing the concept of virtualization. With virtualization, one or more physical servers can be configured and partitioned into multiple independent "virtual" servers, all a functioning independently and appearing to the user to be a single physical device. Such virtual m servers are in essence disassociated from their physical server, and with this added flexibility, a they can be moved around and scaled up or down on the fly without affecting the end user. n The computing resources have become "granular", which provides end user and operator y d benefits including on-demand self-service, broad access across multiple devices, resource pooling, rapid elasticity and service metering capability. u In more detail, cloud computing refers to a computing hardware machine or group of t S computing hardware machines commonly referred as a server or servers connected through a communication network such as the internet, an intranet, a local area network (lan) or wide area network (wan). Any individual user who has permission to access the server can use the server's processing power to run an application, store data, or perform any other computing task. Therefore, instead of using a personal computer every time to run a native application, the individual can now run the application from anywhere in the world, as the server provides the processing power to the application and the server is also connected to a network via the internet or other connection platforms to be accessed from anywhere. all this has become 2 possible due to increased computer processing power available to humankind with decreased cost . In common usage the term "the cloud" has become a shorthand way to refer to cloud computing infrastructure. the term came from the cloud symbol that network engineers used on network diagrams to represent the unknown (to them) segments of a network. marketers have further popularized the phrase "in the cloud" to refer to software, platforms and infrastructure that are sold "as a service", i.e. Remotely through the internet. Typically, the seller has actual energy-consuming servers which host products and services from a remote location, so end-users don't have to; they can simply log on to the network without installing m anything. The major models of cloud computing service are known as software as a service, o platform as a service, and infrastructure as a service. These cloud services may be offered in a .c public, private or hybrid network. google, amazon, ibm, oracle cloud, rackspace, salesforce, a zoho and microsoft are some well-known cloud vendors. m Cherecteristics – [Rgpv/dec 2012(10)] a Cloud computing exhibits the following key characteristics: n resources. y d Agility improves with users' ability to re-provision technological infrastructure u Application programming interface (API) accessibility to software that enables t machines to interact with cloud software in the same way that a traditional user S interface (e.g., a computer desktop) facilitates interaction between humans and computers. Cloud computing systems typically use Representational State Transfer (REST)-based APIs. Cost: cloud providers claim that computing costs reduce. A public-cloud delivery model converts capital expenditure to operational expenditure. This purportedly lowers barriers to entry, as infrastructure is typically provided by a third party and does not need to be purchased for one-time or infrequent intensive computing tasks. Pricing on a 3 utility computing basis is fine-grained, with usage-based options and fewer IT skills are required for implementation (in-house). The e-FISCAL project's state-of-the-art repository[46] contains several articles looking into cost aspects in more detail, most of them concluding that costs savings depend on the type of activities supported and the type of infrastructure available in-house. Device and location independence enable users to access systems using a web browser regardless of their location or what device they use (e.g., PC, mobile phone). As infrastructure is off-site (typically provided by a third-party) and accessed via the Internet, users can connect from anywhere. Maintenance of cloud computing applications is easier, because they do not need to be m installed on each user's computer and can be accessed from different places. o a allowing for: o centralization of infrastructure in locations with lower costs (such as real estate, m electricity, etc.) o o a peak-load capacity increases (users need not engineer for highest possible load- n levels) y d utilisation and efficiency improvements for systems that are often only 10–20% utilised. .c Multitenancy enables sharing of resources and costs across a large pool of users thus u Performance is monitored, and consistent and loosely coupled architectures are t constructed using web services as the system interface. S Productivity may be increased when multiple users can work on the same data simultaneously, rather than waiting for it to be saved and emailed. Time may be saved as information does not need to be re-entered when fields are matched, nor do users need to install application software upgrades to their computer. Reliability improves with the use of multiple redundant sites, which makes welldesigned cloud computing suitable for business continuity and disaster recovery. Scalability and elasticity via dynamic ("on-demand") provisioning of resources on a fine- 4 grained, self-service basis in near real-time (Note, the VM startup time varies by VM type, location, os and cloud providers, without users having to engineer for peak loads. Security can improve due to centralization of data, increased security-focused resources, etc., but concerns can persist about loss of control over certain sensitive data, and the lack of security for stored kernels. Security is often as good as or better than other traditional systems, in part because providers are able to devote resources to solving security issues that many customers cannot afford to tackle. However, the complexity of security is greatly increased when data is distributed over a wider area or over a greater number of devices, as well as in multi-tenant systems shared by unrelated users. In addition, user access to security audit logs may be difficult or m impossible. Private cloud installations are in part motivated by users' desire to retain o control over the infrastructure and avoid losing control of information security. .c Virtualization technology allows sharing of servers and storage devices and increased a utilization. Applications can be easily migrated from one physical server to another S.no Q.1 Q.2 Rgpv question m Year Marks What is cloud computing?Enlist and Dec 2014 2 explain essential characteristics of Dec 2013 10 cloud computing. Dec 2011 10 Dec 2012 10 u a y d n What are the essential features of t cloud computing? S 5 Unit 05/Lecture - 02 Architecture of cloud computing – [Rgpv/dec 2013(7), Rgpv/dec 2012(10)] Cloud computing is a term used to refer to a model of network computing where a program or application runs on a connected server or servers rather than on a local computing device such as a PC, tablet or smartphone. Like the traditional client-server model or older mainframe computing,[1] a user connects with a server to perform a task. The difference with cloud computing is that the computing process may run on one or many connected computers at the m same time, utilizing the concept of virtualization. With virtualization, one or more physical o servers can be configured and partitioned into multiple independent "virtual" servers, all .c functioning independently and appearing to the user to be a single physical device. Such virtual servers are in essence disassociated from their physical server, and with this added flexibility, a they can be moved around and scaled up or down on the fly without affecting the end user. m The computing resources have become "granular", which provides end user and operator a benefits including on-demand self-service, broad access across multiple devices, resource n pooling, rapid elasticity and service metering capability S t u y d 6 m o a m a n S t u y d .c 7 m o a .c m a n Advantages u y d t Cloud computing relies on sharing of resources to achieve coherence and economies of scale, S similar to a utility (like the electricity grid) over a network.[8] at the foundation of cloud computing is the broader concept of converged infrastructure and shared services. The cloud also focuses on maximizing the effectiveness of the shared resources. Cloud resources are usually not only shared by multiple users but are also dynamically reallocated per demand. This can work for allocating resources to users. For example, a cloud computer facility that serves european users during european business hours with a specific application (e.g., email) may reallocate the same resources to serve north american users during north america's 8 business hours with a different application (e.g., a web server). This approach should maximize the use of computing power thus reducing environmental damage as well since less power, air conditioning, rackspace, etc. Are required for a variety of functions. With cloud computing, multiple users can access a single server to retrieve and update their data without purchasing licenses for different applications. The term "moving to cloud" also refers to an organization moving away from a traditional capex model (buy the dedicated hardware and depreciate it over a period of time) to the opex model (use a shared cloud infrastructure and pay as one uses it). m Proponents claim that cloud computing allows companies to avoid upfront infrastructure costs, o and focus on projects that differentiate their businesses instead of infrastructure. [9] proponents .c also claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables it to more rapidly a adjust resources to meet fluctuating and unpredictable business demand.[9][10][11] cloud m providers typically use a "pay as you go" model. This can lead to unexpectedly high charges if a administrators do not adapt to the cloud pricing model.[12] n S.no Q.1 Rgpv question Year Marks Explain architectural framework of Rgpv Dec 2013 7 Explain the architectural frame work of Rgpv Dec 2012 10 S Rgpv Dec 2011 10 y d cloud computing? Q.2 u t cloud computing? Q.3 Give a brief architecture? note on cloud 9 Unit 05/Lecture - 03 Cloud models – [Rgpv/dec 2013(10)&(7)] If a cloud user accesses services on the infrastructure layer, for instance, she can run her own applications on the resources of a cloud infrastructure and remain responsible for the support, maintenance, and security of these applications herself. If she accesses a service on the application layer, these tasks are normally taken care of by the cloud service provider. SaaS[Rgpv/dec2014] m Software-as-a-Service provides complete applications to a cloud’s end user. It is mainly o accessed through a web portal and service oriented architectures based on web service .c technologies. Credit card or bank account details must be provided to enable the fees for the a use of the services to be billed. m a The services on the application layer can be seen as an extension of the ASP (application service n provider) model, in which an application is run, maintained, and supported by a service vendor. y d The main differences between the services on the application layer and the classic ASP model are the encapsulation of the application as a service, the dynamic procurement, and billing by u units of consumption (pay as you go). However, both models pursue the goal of focusing on t core competencies by outsourcing applications. S 10 m o a .c m PaaS u a n Software-as-a-Service (SaaS) Stack y d PaaS comprises the environment for developing and provisioning cloud applications. The t principal users of this layer are developers seeking to develop and run a cloud application for a S particular platform. They are supported by the platform operators with an open or proprietary language, a set of essential basic services to facilitate communication, monitoring, or service billing, and various other components, for instance to facilitate startup or ensure an application’s scalability and/or elasticity (see figure 3). Distributing the application to the underlying infrastructure is normally the responsibility of the cloud platform operator. The services offered on a cloud platform tend to represent a compromise between complexity and fle i ilit that allows applications to be implemented quickly and loaded in the cloud without 11 u h o figuratio . Restri tio s regardi g the progra i g la guages supported, the programming model, the ability to access resources, and persistency are possible downsides. m o a .c m Platform-as-a-Service (PaaS) Stack IaaS u a n y d t The services on the infrastructure layer are used to access essential IT resources that are S combined under the heading Infrastructure-as-a-Service (IaaS). These essential IT resources include services linked to computing resources, data storage resources, and the communications channel. They enable existing applications to be provisioned on cloud resources and new services implemented on the higher layers. Physical resources are abstracted by virtualization, which means they can then be shared by several operating systems and end user environments on the virtual resources – ideally, without any mutual interference. These virtualized resources usually comprise CPU and RAM, 12 data storage resources (elastic block store and databases). m o a .c m a n Infrastructure-as-a-Service (IaaS) Stack S.no Rgpv question Q.1 What t u are y d various cloud Year Marks Dec 2013 10 models?Explain each in brief. Q.2 S What does Software as a service Dec 2014 provide. 2 13 Unit 05/Lecture - 04 Private Cloud – [Rgpv/dec 2013(10)] A private cloud is a particular model of cloud computing that involves a distinct and secure cloud based environment in which only the specified client can operate. As with other cloud models, private clouds will provide computing power as a service within a virtualised m environment using an underlying pool of physical computing resource. However, under the o private cloud model, the cloud (the pool of resource) is only accessible by a single organisation providing that organisation with greater control and privacy. a .c m The technical mechanisms used to provide the different services which can be classed as being a private cloud services can vary considerably and so it is hard to define what constitutes a n private cloud from a technical aspect. Instead such services are usually categorised by the y d features that they offer to their client. Traits that characterise private clouds include the ring fencing of a cloud for the sole use of one organisation and higher levels of network security. u They can be defined in contrast to a public cloud which has multiple clients accessing t virtualised services which all draw their resource from the same pool of servers across public S networks. Private cloud services draw their resource from a dsitinct pool of physical computers but these may be hosted internally or externally and may be accessed across private leased lines or secure encrypted connections via public networks. The additional security offered by the ring fenced cloud model is ideal for any organisation, including enterprise, that needs to store and process private data or carry out sensitive tasks. For example, a private cloud service could be utilised by a financial company that is required by regulation to store sensitive data internally and who will still want to benefit from some of the 14 advantages of cloud computing within their business infrastructure, such as on demand resource allocation. The private cloud model is closer to the more traditional model of individual local access networks (LANs) used in the past by enterprise but with the added advantages of virtualisation. The features and benefits of private clouds therefore are: Higher security and privacy; public clouds services can implement a certain level of security but private clouds - using techniques such as distinct pools of resources with access restricted to connections made from behind one organisation’s firewall, m dedicated leased lines and/or on-site internal hosting - can ensure that operations are o kept out of the reach of prying eyes .c More control; as a private cloud is only accessible by a single organisation, that a organisation will have the ability to configure and manage it inline with their needs to achieve a tailored network solution. However, this level of control removes somes the m economies of scale generated in public clouds by having centralised management of the a hardware n Cost and energy efficiency; implementing a private cloud model can improve the y d allocation of resources within an organisation by ensuring that the availability of resources to individual departments/business functions can directly and flexibly u respond to their demand. Therefore, although they are not as cost effective as a public t cloud services due to smaller economies of scale and increased management costs, they S do make more efficient use of the computing resource than traditional LANs as they minimise the investment into unused capacity. Not only does this provide a cost saving but it can reduce an organisation’s carbon footprint too Improved reliability; even where resources (servers, networks etc.) are hosted internally, the creation of virtualised operating environments means that the network is more resilient to individual failures across the physical infrastructure. Virtual partitions can, for example, pull their resource from the remaining unaffected servers. In addition, 15 where the cloud is hosted with a third party, the organisation can still benefit from the physical security afforded to infrastructure hosted within data centres Cloud bursting; some providers may offer the opportunity to employ cloud bursting, within a private cloud offering, in the event of spikes in demand. This service allows the provider to switch certain non-sensitive functions to a public cloud to free up more space in the private cloud for the sensitive functions that require it. Private clouds can even be integrated with public cloud services to form hybrid clouds where non-sensitive functions are always allocated to the public cloud to maximise the efficiencies on offer. How to buit private cloud- m o Private cloud looks and acts like a public cloud, giving your corporation all the speed, agility and cost .c savings promised by cloud technology, only it’s single tenant and that tenant is you, right? Well, that’s the goal, but it’s not quite the reality yet for most enterprises. a m 1. There must be a converged infrastructure. Servers must be virtualized. There has got to be a underlying software defined networking and a converged storage fabric, n This is not something that is done very well in the public cloud space now and it’s an y d opportunity for corporate IT operations that haven’t had sophisticated systems in place to do these things, to leapfrog themselves in that regard in the new era of private cloud, u t 2. There has to be fully automated orchestration of both system management and software S distribution across the converged infrastructure. That is where the cost savings is. Automating deployment and streamlining the human activity previously required to do daily tasks. That is what will eventually drive private cloud sales, You have to improve the provisioning process significantly to legitimately call it private cloud, If it takes you two weeks to provision resources now, getting that down to two days is not going to cut it. You’ve got to get it to 15 minutes. You can’t be sitting around waiting for various 16 levels of approval to happen because you lose the agility and speed. It’s the difference between virtualization and cloud, . 3. There must be a self-service catalog of standard computing offerings available to users across the company. The litmus test is whether or not the dashboard is available to business users across the company and not just an interface for traditional IT staff to use to dole out IT resources. Having just the latter, means that IT just has a new toy, m o 4. There has to be accountability by way of some sort of charge-back, track-back or show- .c back mechanism that keeps track of which users are employing which resources and for just how long. a m Enterprise Management Association analyst Torsten Volk argues that at a minimum providing a a show back mechanism is crucial for any fledgling private cloud. "If you can't at least show who n is responsible for the cycles that have been used, then there is no incentive to use those resources efficiently," y d S.no Rgpv question Q.1 What is private cloud? Explain how can Rgpv Dec 2013 S t u we build a private cloud. Year Marks 10 17 Unit 05/Lecture -05 Cloud service providers - [Rgpv/dec 2012(5)] A cloud provider is a company that offers some component of cloud computing – typically Infrastructure as a Service (IaaS), Software as a Service (SaaS) or Platform as a Service (PaaS) – to other businesses or individuals. Cloud providers are sometimes referred to as cloud service providers or CSPs. m There are a number of things to think about when you evaluate cloud providers. The cost will o usually be based on a per-use utility model but there are a number of variations to consider. .c The physical location of the servers may also be a factor for sensitive data. a Reliability is crucial if your data must be accessible. A typical cloud storage service-level m agreement (SLA), for example, specifies precise levels of service – such as, for example, 99.9% a uptime and the recourse or compensation that the user is entitled to should the provider fail n to provide the service as described. However, it’s important to understand the fine print in that y d agreement because some providers discount outages of less than ten minutes, which may be too long for some businesses. u t Security is another important consideration. Organizations such as the Cloud Security Alliance S (CSA) offer certification to cloud providers that meet their criteria. The CSA's Trusted Cloud Initiative program was created to help cloud service providers develop industry-recommended, secure and interoperable identity, access and compliance management configurations and practices. S.no Rgpv question Q.1 Write short note on cloud service Rgpv Dec 2012 provider? Year Marks 5 18 Unit 05/Lecture - 06 Cloud Vocabulary - [Rgpv/dec 2013(7), Rgpv/dec 2011(5)]] Cloudburst: The term cloudburst is being use in two meanings, negative and positive: Cloudburst: The term cloudburst is being use in two meanings, negative and positive: 1. Cloudburst (negative): The failure of a cloud computing environment due to the inability to handle a spike in demand. Cloudburst (positive): The dynamic deployment of a software application that runs on m internal organizational compute resources to a public cloud to address a spike in o demand. .c 2. Cloudstorming: The act of connecting multiple cloud computing environments. 3. Vertical Cloud: A cloud computing environment optimized for use in a particular vertical a -- i.e., industry -- or application use case. m 4. Private Cloud: A cloud computing-like environment within the boundaries of an a organization and typically for its exclusive usage. n 5. Internal Cloud: A cloud computing-like environment within the boundaries of an y d organization and typically available for exclusive use by said organization. 6. Hybrid Cloud: A computing environment combining both private (internal) and public u (external) cloud computing environments. May either be on a continuous basis or in the t form of a 'cloudburst'. S 7. Cloudware: A general term referring to a variety of software, typically at the infrastructure level, that enables building, deploying, running or managing applications in a cloud computing environment. 8. External Cloud: A cloud computing environment that is external to the boundaries of the organization. Although it often is, an external cloud is not necessarily a public cloud. Some external clouds make their cloud infrastructure available to specific other organizations and not to the public at-large. 9. Public Cloud: A cloud computing environment that is open for use to the general public, 19 whether individuals, corporations or other types of organizations. Amazon Web Services are an example of a public cloud. 10. Virtual Private Cloud (VPC): A term coined by Reuven Cohen, CEO and founder of Enomaly. The term describes a concept that is similar to, and derived from, the familiar concept of a Virtual Private Network (VPN), but applied to cloud computing. It is the notion of turning a public cloud into a virtual private cloud, particularly in terms of security and the ability to create a VPC across components that are both within the cloud and external to it. 11. Cloud Portability: The ability to move applications (and often their associated data) across cloud computing environments from different cloud providers, as well as across m private or internal cloud and public or external clouds. o 12. Cloud Spanning: Running an application in a way that its components straddle multiple .c cloud environments (which could be any combination of internal/private and a external/public clouds. Unlike Cloud Bursting, which refers strictly to expanding the m application to an External Cloud to handle spikes in demand, Cloud Spanning includes a scenarios in which an applications component are continuously distributed across n multiple clouds. y d S.no Rgpv question Q.1 Write short note on cloud vocabulary? S t u Year Marks Dec 2013 7 Dec 2011 5 20 Unit 05/Lecture - 07 Cloud security – [Rgpv/dec 2013(5), Rgpv/dec 2011(5)]] Cloud computing security or, more simply, cloud security is an evolving sub-domain of computer security, network security, and, more broadly, information security. It refers to a broad set of policies, technologies, and controls deployed to protect data, applications, and the associated infrastructure of cloud computing. Cloud security is not to be confused with security software offerings that are cloud-based such m as security as a service. o a Cloud security controls .c m Cloud security architecture is effective only if the correct defensive implementations are in a place. An efficient cloud security architecture should recognize the issues that will arise with n security management.[6] The security management addresses these issues with security y d controls. These controls are put in place to safeguard any weaknesses in the system and reduce the effect of an attack. While there are many types of controls behind a cloud security u architecture, they can usually be found in one of the following categories: [6] t S Deterrent controls These controls are set in place to prevent any purposeful attack on a cloud system. Much like a warning sign on a fence or a property, these controls do not reduce the actual vulnerability of a system. Preventative controls These controls upgrade the strength of the system by managing the vulnerabilities. The preventative control will safeguard vulnerabilities of the system. If an attack were to occur, the preventative controls are in place to cover the attack and reduce the damage 21 and violation to the system's security. Corrective controls Corrective controls are used to reduce the effect of an attack. Unlike the preventative controls, the corrective controls take action as an attack is occurring. Detective controls Detective controls are used to detect any attacks that may be occurring to the system. In the event of an attack, the detective control will signal the preventative or corrective controls to address the issue.[6] m o a .c m a n u y d t S Cloud Application – [Rgpv/dec 2013(5), Rgpv/dec 2011(5)]] Cloud computing has been credited with increasing competitiveness through cost reduction, greater flexibility, elasticity and optimal resource utilization. Here are a few situations where cloud computing is used to enhance the ability to achieve business goals. 22 1. Infrastructure as a service (iaas) and platform as a service (paas) When it comes to iaas, using an existing infrastructure on a pay-per-use scheme seems to be an obvious choice for companies saving on the cost of investing to acquire, manage and maintain an it infrastructure. There are also instances where organizations turn to paas for the same reasons while also seeking to increase the speed of development on a ready-to-use platform to deploy applications. 2. Private cloud and hybrid cloud Among the many incentives for using cloud, there are two situations where organizations are m looking into ways to assess some of the applications they intend to deploy into their o environment through the use of a cloud (specifically a public cloud). While in the case of test .c and development it may be limited in time, adopting a hybrid cloud approach allows for testing a application workloads, therefore providing the comfort of an environment without the initial m investment that might have been rendered useless should the workload testing fail. a Another use of hybrid cloud is also the ability to expand during periods of limited peak usage, n which is often preferable to hosting a large infrastructure that might seldom be of use. An y d organization would seek to have the additional capacity and availability of an environment when needed on a pay-as you-go basis. u t 3. Test and development S Probably the best scenario for the use of a cloud is a test and development environment. This entails securing a budget, setting up your environment through physical assets, significant manpower and time. Then comes the installation and configuration of your platform. All this can often extend the time it takes for a project to be completed and stretch your milestones. With cloud computing, there are now readily available environments tailored for your needs at your fingertips. This often combines, but is not limited to, automated provisioning of physical 23 and virtualized resources. 4. Big data analytics One of the aspects offered by leveraging cloud computing is the ability to tap into vast quantities of both structured and unstructured data to harness the benefit of extracting business value. Retailers and suppliers are now extracting information derived from consumers’ buying patterns to target their advertising and marketing campaigns to a particular segment of the population. Social networking platforms are now providing the basis for analytics on behavioral patterns that organizations are using to derive meaningful information. m o 5. File storage a .c Cloud can offer you the possibility of storing your files and accessing, storing and retrieving m them from any web-enabled interface. The web services interfaces are usually simple. At any a time and place you have high availability, speed, scalability and security for your environment. n In this scenario, organizations are only paying for the amount of storage they are actually y d consuming, and do so without the worries of overseeing the daily maintenance of the storage infrastructure. u There is also the possibility to store the data either on or off premises depending on the t S regulatory compliance requirements. Data is stored in virtualized pools of storage hosted by a third party based on the customer specification requirements. 6. Disaster recovery This is yet another benefit derived from using cloud based on the cost effectiveness of a disaster recovery (dr) solution that provides for a faster recovery from a mesh of different physical locations at a much lower cost that the traditional dr site with fixed assets, rigid 24 procedures and a much higher cost. 7. Backup Backing up data has always been a complex and time-consuming operation. This included maintaining a set of tapes or drives, manually collecting them and dispatching them to a backup facility with all the inherent problems that might happen in between the originating and the backup site. This way of ensuring a backup is performed is not immune to problems such as running out of backup media , and there is also time to load the backup devices for a restore operation, which takes time and is prone to malfunctions and human errors. m Cloud-based backup, while not being the panacea, is certainly a far cry from what it used to be. o You can now automatically dispatch data to any location across the wire with the assurance that neither security, availability nor capacity are issues. a .c While the list of the above uses of cloud computing is not exhaustive, it certainly give an m incentive to use the cloud when comparing to more traditional alternatives to increase it a infrastructure flexibility , as well as leverage on big data analytics and mobile computing. n y d Cloud integration – [Rgpv/dec 2013(5)] Cloud integration is the process of configuring multiple application programs to share data in u the cloud. In a network that incorporates cloud integration, diverse applications communicate t S either directly or through third-party software. Cloud integration offers the following advantages over older, compartmentalized organizational methods. Each user can access personal data in real time from any device. Each user can access personal data from any location with Internet access. Each user can integrate personal data such as calendars and contact lists served by diverse application programs. 25 Each user can employ the same logon information (username and password) for all personal applications. The system efficiently passes control messages among application programs. By avoiding the use of data silos, data integrity is maintained and data conflicts (which can arise from redundancy) are avoided. Cloud integration offers scalability to allow for future expansion in terms of the number of users, the number of applications, or both. In recent years, cloud integration has gained favor among organizations, corporations, and government agencies that implement SaaS (Software as a Service), a software distribution m model in which applications are hosted by a vendor or service provider and made available to o users over the Internet. Risk of cloud computing – [Rgpv/dec 2011] a Cloud benefits .c m a Cloud computing provides a scalable online environment that makes it possible to handle an n increased volume of work without impacting system performance. Cloud computing also offers y d significant computing capability and economy of scale that might not otherwise be affordable, particularly for small and medium-sized organizations, without the IT infrastructure investment. u Cloud computing advantages include: t S Lower capital costs — Organizations can provide unique services using large-scale computing resources from cloud service providers, and then nimbly add or remove IT capacity to meet peak and fluctuating service demands while only paying for actual capacity used. Lower IT operating costs — Organizations can rent added server space for a few hours at a time rather than maintain proprietary servers without worrying about upgrading their resources whenever a new application version is available. They also have the flexibility to host their virtual IT infrastructure in locations offering the lowest cost. 26 No hardware or software installation or maintenance Optimized IT infrastructure provides quick access to needed computing services The risks Environmental security — The concentration of computing resources and users in a cloud computing environment also represents a concentration of security threats. Because of their size and significance, cloud environments are often targeted by virtual machines and bot malware, brute force attacks, and other attacks. Ask your cloud provider about access controls, vulnerability assessment practices, and patch and m configuration management controls to see that they are adequately protecting your o data. .c Data privacy and security — Hosting confidential data with cloud service providers involves the transfer of a considerable amount of an organization's control over data a security to the provider. Make sure your vendor understands your organization’s data m privacy and security needs. Also, make sure your cloud provider is aware of particular a data security and privacy rules and regulations that apply to your entity, such as HIPAA, n the Payment Card Industry Data Security Standard (DCI DSS), the Federal Information y d Security Management Act of 2002 (FISMA), or the privacy considerations of GrammLeach-Bliley Act. u Data availability and business continuity — A major risk to business continuity in the t cloud computing environment is loss of internet connectivity. Ask your cloud provider S what controls are in place to ensure internet connectivity. If a vulnerability is identified, you may have to terminate all access to the cloud provider until the vulnerability is rectified. Finally, the seizure of a data-hosting server by law enforcement agencies may result in the interruption of unrelated services stored on the same machine. Record retention requirements — If your business is subject to record retention requirements, make sure your cloud provider understands what they are and so they can meet them. Disaster recovery — Hosting your computing resources and data at a cloud provider 27 makes the cloud provider’s disaster recovery capabilities vitally important to your company’s disaster recovery plans. Know your cloud provider’s disaster recovery capabilities and ask your provider if they been tested. Evaluating your options Many cloud provider options are available, each with unique risks. As you evaluate your choices and the associated risks, consider the following Cloud providers are sometimes reluctant to produce third-party audit reports unless an audit clause is included in the contract. Some hosts require clients to pay for reports m Some internal audit departments are performing control reviews of cloud providers, in o addition to receiving and analyzing third party audit reports. This is driven by certain .c controls not being tested, exclusion of pertinent systems, or other factor that require a on-site testing. m Standard cloud provider audit reports typically do not include vulnerability/penetration a testing results. Providers are hesitant to allow scanning, as they believe this may compromise their infrastructure. n y d Cloud computing is a widely used format and we don't see this changing anytime soon. Knowing that you are managing the risks associated with housing your sensitive data offsite will u give you confidence with the platform, so you can take advantage of the opportunities t presented by the cloud. S.no Q.1 Q.2 S Rgpv question Year Marks Write short note on following(any two) Dec 2013 10 a)Cloud computing,b) application of cloud Dec 2012 5 c) Cloud integration Dec 2011 5 Write short note on risk of cloud Dec 2011 5 computing? 28 Unit 05/Lecture -08 Evolution of cloud computing – [Rgpv/dec 2012(5)] the trend toward cloud computing started in the late 1980s with the concept of grid computing when, for the first time, a large number of systems were applied to a single problem, usually scientific in nature and requiring exceptionally high levels of parallel computation. In europe , m long distance optical networks are used to tie multiple universities into a massive computing o grid in order that resources could be shared and scaled for large scientific calculations. a .c Grid computing provided a virtual pool of computation resources but it's different than cloud computing. Grid computing specifically refers to leveraging several computers in parallel to m solve a particular, individual problem, or to run a specific application. Cloud computing, on the a other hand, refers to leveraging multiple resources, including computing resources, to deliver a unified service to the end user. n y d In grid computing, the focus is on moving a workload to the location of the needed computing u resources, which are mostly remote and are readily available for use. Usually a grid is a cluster t of servers on which a large task could be divided into smaller tasks to run in parallel. From this S point of view, a grid could actually be viewed as just one virtual server. Grids also require applications to conform to the grid software interfaces. In a cloud environment, computing and extended it and business resources, such as servers, storage, network, applications and processes, can be dynamically shaped or carved out from the underlying hardware infrastructure and made available to a workload. In addition, while a cloud can provision and support a grid, a cloud can also support non-grid environments, such as 29 a three-tier web architecture running traditional or web 2.0 applications In the 1990s, the concept of virtualization was expanded beyond virtual servers to higher levels of abstraction—first the virtual platform, including storage and network resources, and subsequently the virtual application, which has no specific underlying infrastructure. Utility computing offered clusters as virtual platforms for computing with a metered business model. More recently software as a service (saas) has raised the level of virtualization to the application, with a business model of charging not by the resources consumed but by the value of the application to subscribers. The concept of cloud computing has evolved from the m concepts of grid, utility and saas. It is an emerging model through which users can gain access o to their applications from anywhere, at any time, through their connected devices. These .c applications reside in massively scalable data centers where compute resources can be dynamically provisioned and shared to achieve significant economies of scale. a m Companies can choose to share these resources using public or private clouds, depending on a their specific needs. Public clouds expose services to customers, businesses and consumers on the internet. Private clouds are generally restricted to use within a company behind a firewall n and have fewer security exposures as a result. The strength of a cloud is its infrastructure y d management, enabled by the maturity and progress of virtualization technology to manage and better utilize the underlying resources through automatic provisioning, re-imaging, workload u t rebalancing, monitoring, systematic change request handling and a dynamic and automated S security and resiliency platform. As more enterprises add cloud computing the level of applications is migrating toward more mission critical and saas will become a mainstay of it strategies. A number of companies, including google, microsoft, amazon, and ibm, have built enormous datacenter-based computing capacity all over the world to support their web service offerings (search, instant messaging, web-based retail). With this computing infrastructure in place these 30 companies are already poised to offer new cloud-based software applications. Large enterprise software solutions, such as erp (enterprise resource planning) applications, have traditionally only been affordable to very big enterprises with big it budgets. However, companies that sell these solutions are finding they can reach small to medium businesses by making their very expensive, very complex applications available as internet-based software services. This ability of saas to deliver expensive applications at affordable will continue to accelerate. S.no Rgpv question Year Marks Q.1 Write short note on cloud computing Rgpv Dec 2012 o evolution? a m a n S t u m 5 y d .c 31 Additional Topic Unit - 05/Lecture - 09 Cloud storage Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans across multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. These cloud storage m providers are responsible for keeping the data available and accessible, and the physical o .c environment protected and running. People and organizations buy or lease storage capacity from the providers to store end user, organization, or application data. a Cloud storage services may be accessed through a collocated cloud compute service, a web m service application programming interface (api) or by applications that utilize the api, such as a cloud desktop storage, a cloud storage gateway or web-based content management systems. n y d A high level architecture of cloud storage. u Cloud storage is based on highly virtualized infrastructure and is like broader cloud computing t in terms of accessible interfaces, near-instant elasticity and scalability, multi-tenancy, and S metered resources. Cloud storage typically refers to a hosted object storage service, but the term has broadened to include other types of data storage that are now available as a service, like block storage. Cloud storage is: Made up of many distributed resources, but still acts as one - often referred to as federated storage clouds [6] 32 Highly fault tolerant through redundancy and distribution of data Highly durable through the creation of versioned copies Typically eventually consistent with regard to data replicas. Advantages Companies need only pay for the storage they actually use, typically an average of consumption during a month. this does not mean that cloud storage is less expensive, only that it incurs operating expenses rather than capital expenses. Organizations can choose between off-premise and on-premise cloud storage options, m or a mixture of the two options, depending on relevant decision criteria that is o complementary to initial direct cost savings potential; for instance, continuity of .c operations (coop), disaster recovery (dr), security (pii, hipaa, sarbox, ia/cnd), and records retention laws, regulations, and policies. a Storage availability and data protection is intrinsic to object storage architecture, so m depending on the application, the additional technology, effort and cost to add a availability and protection can be eliminated. n Storage maintenance tasks, such as purchasing additional storage capacity, are y d offloaded to the responsibility of a service provider. Cloud storage provides users with immediate access to a broad range of resources and u applications hosted in the infrastructure of another organization via a web service t interface. S Cloud storage can be used for copying virtual machine images from the cloud to onpremise locations or to import a virtual machine image from an on-premise location to the cloud image library. In addition, cloud storage can be used to move virtual machine images between user accounts or between data centers. 33 Reference Book Author Priority G. Somasundaram 1 Alok Shrivastava Information storage management Ulf Troppens, Wolfgang 2 Storage Network explained : Basic and Mueller-Friedt, m Rainer o application of fiber channels, SAN, NAS, Erkens, Rainer Wolafka, Nils Haustein iSESI a .c Cloud Computing : Principles, 3 m Nick Antonopoulos, Lee Gillam System & Application a n S t u y d