Everything You Wanted to Know About Storage, but Were Afraid to Ask •Do you have a Cell phone, PDA or Smartphone? •Do you have a DIGITAL CAMERA? •Do you have a PC? •What do all of these devices have in common ? • How do you protect your data? Digital Footprint Calculator http://www.emc.com/digital_universe/downloads/web/personal-ticker.htm •Are you familiar with RAID ? RAID 0 • Data is striped across the HDDs in a RAID set • The stripe size is specified at a host level for software RAID and is vendor specific for hardware RAID • When the number of drives in the array increases, performance improves because more data can be read or written simultaneously • Used in applications that need high I/O throughput • Does not provide data protection and availability in the event of drive failures RAID 1 • Mirroring is a technique whereby data is stored on two different HDDs, yielding two copies of data. • In addition to providing complete data redundancy, mirroring enables faster recovery from disk failure. • Mirroring involves duplication of data — the amount of storage capacity needed is twice the amount of data being stored. Therefore, mirroring is considered expensive • It is preferred for mission-critical applications that cannot afford data loss Nested RAID • Mirroring can be implemented with striped RAID by mirroring entire stripes of disks to stripes on other disks • RAID 0+1 and RAID 1+0 combine the performance benefits of RAID 0 with the redundancy benefits of RAID 1 • These types of RAID require an even number of disks, the minimum being four. • RAID 0+1 is also called mirrored stripe. • This means that the process of striping data across HDDs is performed initially and then the entire stripe is mirrored. Nested RAID • RAID 1+0 is also called striped mirror • The basic element of RAID 1+0 is that data is first mirrored and then both copies of data are striped across multiple HDDs in a RAID set • Some applications that benefit from RAID 1+0 include the following: • High transaction rate Online Transaction Processing (OLTP),Database applications that require high I/O rate, random access, and high availability RAID 3 • RAID 3 stripes data for high performance and uses parity for improved fault tolerance. • Parity information is stored on a dedicated drive so that data can be reconstructed if a drive fails • RAID 3 is used in applications that involve large sequential data access, such as video streaming. RAID 4 • Stripes data across all disks except the parity disk at the block level • Parity information is stored on a dedicated disk • Unlike RAID 3 , data disks can be accessed independently so that specific data elements can be read or written on a single disk without read or write of an entire stripe RAID 5 • RAID 5 is a very versatile RAID implementation • The difference between RAID 4 and RAID 5 is the parity location. • RAID 4, parity is written to a dedicated drive, while In RAID 5, parity is distributed across all disks • The distribution of parity in RAID 5 overcomes the write bottleneck. • RAID 5 is preferred for messaging, medium-performance media serving, and relational database management system (RDBMS) implementations in which database administrators (DBAs) optimize data access RAID 6 • RAID 6 works the same way as RAID 5 except that RAID 6 includes a second parity element • This enable survival in the event of the failure of two disks in a RAID group. • RAID-6 protects against two disk failures by maintaining two parities Hot Spare • A hot spare refers to a spare HDD in a RAID array that temporarily replaces a failed HDD of a RAID set. • When the failed HDD is replaced with a new HDD, The hot spare replaces the new HDD permanently, and a new hot spare must be configured on the array, or data from the hot spare is copied to it, and the hot spare returns to its idle state, ready to replace the next failed drive. • A hot spare should be large enough to accommodate data from a failed drive. • Some systems implement multiple hot spares to improve data availability. • A hot spare can be configured as automatic or user initiated, which specifies how it will be used in the event of disk failure What is an Intelligent Storage System • Intelligent Storage Systems are RAID arrays that are: Highly optimized for I/O processing Have large amounts of cache for improving I/O performance Have operating environments that provide: – Intelligence for managing cache – Array resource allocation – Connectivity for heterogeneous hosts – Advanced array based local and remote replication options Components of an Intelligent Storage System • An intelligent storage system consists of four key components: front end, cache, back end, and physical disks. Components of an Intelligent Storage System • • • • The front end provides the interface between the storage system and the host. It consists of two components: front-end ports and front-end controllers The front-end ports enable hosts to connect to the intelligent storage system, and has processing logic that executes the appropriate transport protocol, such as SCSI, Fibre Channel, or iSCSI, for storage connections Front-end controllers route data to and from cache via the internal data bus. When cache receives write data, the controller sends an acknowledgment Components of an Intelligent Storage System • Controllers optimize I/O processing by using command queuing algorithms • Command queuing is a technique implemented on front-end controllers • It determines the execution order of received commands and can reduce unnecessary drive head movements and improve disk performance Intelligent Storage System: Cache • Cache is an important component that enhances the I/O performance in an intelligent storage system. • Cache improves storage system performance by isolating hosts from the mechanical delays associated with physical disks, which are the slowest components of an intelligent storage system. Accessing data from a physical disk usually takes a few milliseconds • Accessing data from cache takes less than a millisecond. Write data is placed in cache and then written to disk Cache Data Protection • Cache mirroring: Each write to cache is held in two different memory locations on two independent memory cards • Cache vaulting: Cache is exposed to the risk of uncommitted data loss due to power failure • using battery power to write the cache content to the disk storage vendors use a set of physical disks to dump the contents of cache during power failure Intelligent Storage System: Back End • It consists of two components: back-end ports and back-end controllers • Physical disks are connected to ports on the back end. • The back end controller communicates with the disks when performing reads and writes and also provides additional, but limited, temporary data storage. • The algorithms implemented on back-end controllers provide error detection and correction, along with RAID functionality. Controller • Multiple controllers also facilitate load balancing Intelligent Storage System: Physical Disks • Disks are connected to the back-end with either SCSI or a Fibre Channel interface What is LUNs • Physical drives or groups of RAID protected drives can be logically split into volumes known as logical volumes, commonly referred to as Logical Unit Numbers (LUNs) High-end Storage Systems • High-end storage systems, referred to as active-active arrays, are generally aimed at large enterprises for centralizing corporate data • These arrays are designed with a large number of controllers and cache memory • An active-active array implies that the host can perform I/Os to its LUNs across any of the available Paths Midrange Storage Systems • Also referred as Active-passive arrays • Host can perform I/Os to LUNs only through active paths • Other paths remain passive till active path fails • Midrange array have two controllers, each with cache, RAID controllers and disks drive interfaces • Designed for small and medium enterprises • Less scalable as compared to high-end array CLARiiON Whiteboard Video DAS DAS Direct-Attached Storage (DAS) • storage connects directly to servers • applications access data from DAS using block-level access protocols • Examples: • internal HDD of a host, • tape libraries, and • directly connected external HDD DAS Direct-Attached Storage (DAS) • DAS is classified as internal or external, based on the location of the storage device with respect to the host. • Internal DAS: storage device internally connected to the host by a serial or parallel bus • distance limitations for high-speed connectivity • can support only a limited number of devices, and • occupy a large amount of space inside the host DAS Direct-Attached Storage (DAS) • External DAS: server connects directly to the external storage device • usually communication via SCSI or FC protocol. • overcomes the distance and device count limitations of internal DAS, and • provides centralized management of storage devices. DAS Benefits • Ideal for local data provisioning • Quick deployment for small environments • Simple to deploy • Reliability • Low capital expense • Low complexity DAS Connectivity Options • host storage device communication via protocols •ATA/IDE and SATA – Primarily for internal bus • SCSI – Parallel (primarily for internal bus) – Serial (external bus) • FC – High speed network technology DAS Connectivity Options • protocols are implemented on the HDD controller • a storage device is also known by the name of the protocol it supports DAS Management • LUN creation, filesystem layout, and data addressing •Internal – Host (or 3rd party software) provides: • Disk partitioning (Volume management) • File system layout DAS Management • External – Array based management – Lower TCO for managing data and storage Infrastructure DAS Challenges • limited scalability • Number of connectivity ports to hosts • Number of addressable disks • Distance limitations •For internal DAS, maintenance requires downtime • Limited ability to share resources (unused resources cannot be easily re-allocated) – Array front-end port, storage space – Resulting in islands of over and under utilized storage pools Introduction to SCSI •SCSI–3 is the latest version of SCSI SCSI Architecture Primary commands common to all devices SCSI Architecture Standard rules for device communication and information sharing SCSI Architecture Interface details such as electrical signaling methods and data transfer modes SCSI Device Model • SCSI initiator device – Issues commands to SCSI target devices – Example: SCSI host adaptor SCSI Device Model • SCSI target device – Executes commands issued by initiators – Examples: SCSI peripheral devices SCSI Device Model • Device requests contain Command Descriptor Block (CDB) SCSI Device Model • CDB structure – 8 bit structure – defines the command to be executed – contains operation code, command specific parameter and control parameter SCSI Addressing a number from 0 to 15 with the most common value being 7 SCSI Addressing a number from 0 to 15 SCSI Addressing a number that specifies a device addressable through a target SCSI Addressing Example controller target device Areas Where DAS Fails • Just-in-time information to business users • Integration of information infrastructure with business processes • Flexible and resilient storage architecture The Solution? • Storage Networking • FC SAN • NAS • IP SAN What is a SAN ? • Dedicated high speed network of servers and shared storage devices • Provide block level data access What is a SAN ? • Resource Consolidation – Centralized storage and management • Scalability – Theoretical limit: Appx. 15 million devices • Secure Access Fibre Channel Latest FC implementations support 8Gb/s Fibre Channel a high-speed network technology that runs on high-speed optical fiber cables (for frontend SAN connectivity) Fibre Channel and serial copper cables (for back-end disk connectivity) FC SAN Evolution Components of SAN • three basic components: • servers, • network infrastructure, and •storage, • can be further broken down into the following key elements: • node ports, • cabling, • interconnecting devices (such as FC switches or hubs), • storage arrays, and • SAN management software Components of SAN: Node ports • Examples of nodes – Hosts, storage and tape library • Ports are available on: – HBA in host– Front-end adapters in storage – Each port has transmit (Tx) link and receive (Rx) link • HBAs perform lowlevel interface functions automatically to minimize impact on host performance Components of SAN: Cabling • Copper cables for short distance • Optical fiber cables for long distance – Single-mode • Can carry single beams of light • Distance up to 10 KM – Multi-mode • Can carry multiple beams of light simultaneously • Distance up to 500 meters Components of SAN: Cabling Components of SAN: Cabling (connectors) Node Connectors: • SC Duplex Connectors • LC Duplex Connectors Patch panel Connectors: • ST Simplex Connectors Components of SAN: Interconnecting devices – Hubs – Switches and – Directors Components of SAN: Storage array • storage consolidation and centralization • provides – High Availability/Redundancy – Performance – Business Continuity – Multiple host connect Components of SAN: SAN management software • A suite of tools used in a SAN to manage the interface between host and storage arrays • Provides integrated management of SAN environment • Web based GUI or CLI SAN Interconnectivity Options: FC-AL Fibre Channel Arbitrated Loop (FC-AL) – Devices must arbitrate to gain control – Devices are connected via hubs – Supports up to 127 devices SAN Interconnectivity Options: FC-SW Fabric connect (FC-SW) – Dedicated bandwidth between devices – Support up to 15 million devices – Higher availability than hubs Network-Attached Storage Think "File Sharing" Sharing Files Sharing Files 2.2 GB 4 GB Sharing Files Sharing Files Sharing Files What is NAS? What is NAS? • IP-based file sharing device attached to LAN • Server consolidation • File-level data access and sharing Why NAS? dedicated to file-serving Benefits of NAS •Support comprehensive access to information •Improves efficiency and flexibility •Centralizes storage •Simplifies management •Scalability •High availability – through native clustering •Provides security integration to environment (user authentication and authorization) CPU and Memory NICs file sharing protocols IP network NAS OS storage protocols (ATA, SCSI, or FC) Benefits: •Increases performance throughput (service level) to end users •Minimizes investment in additional servers •Provides storage pooling •Provides heterogeneous file servings •Uses existing infrastructure, tools, and processes Benefits: •Provides continuous availability to files •Heterogeneous file sharing •Reduces cost for additional OS dependent servers •Adds storage capacity nondisruptively •Consolidates storage management •Lowers Total Cost of Ownership IP SAN Celerra Whiteboard Video Driver for IP SAN • In FC SAN transfer of block level data takes place over Fibre Channel • Emerging technologies provide for the transfer of block-level data over an existing IP network infrastructure Why IP? • Easier management • Existing network infrastructure can be leveraged • Reduced cost compared to new SAN hardware and software • Supports multi-vendor interoperability • Many long-distance disaster recovery solutions already leverage IP-based networks • Many robust and mature security options are available for IP networks Block Storage over IP - iSCSI • SCSI over IP • IP encapsulation • Ethernet NIC card • iSCSI HBA • Hardware-based gateway to Fibre Channel storage • Used to connect servers Block Storage over IP - FCIP • Fibre Channel-toIP bridge / tunnel (point to point) • Fibre Channel end points • Used in DR implementations iSCSI ? • IP based protocol used to connect host and storage • Carries block-level data over IP-based network • Encapsulate SCSI commands and transport as TCP/IP packet Components of iSCSI • iSCSI host initiators – Host computer using a NIC or iSCSI HBA to connect to storage – iSCSI initiator software may need to be installed • iSCSI targets – Storage array with embedded iSCSI capable network port – FC-iSCSI bridge • LAN for IP storage network – Interconnected Ethernet switches and/or routers • No FC components • Each iSCSI port on the array is configured with an IP address and port number – iSCSI Initiators Connect directly to the Array • Bridge device translates iSCSI/IP to FCP – Standalone device – Integrated into FC switch (multi-protocol router) • iSCSI initiator/host configured with bridge as target • Bridge generates virtual FC initiator • Array provides FC and iSCSI connectivity natively • No bridge devices needed FCIP (Fibre Channel over IP)? • FCIP is an IP-based storage networking technology • Combines advantages of Fibre Channel and IP • Creates virtual FC links that connect devices in a different fabric • FCIP is a distance extension solution – Used for data sharing over geographically dispersed SAN FCIP (Fibre Channel over IP)? FCoE Whiteboard Video Question 1 What was EMC’s revenue in 2009? A. 60 Billion B. 46.2 Billion C. 14 Billion D. 9 Billion Ask a Colleague 50:50 Ask the Audience EMC Corporation 2009 At a Glance Employees $14 billion $1.9 billion ~41,500 Countries where EMC does business >80 R&D Investment ~$1.5 billion Operating Cash Flow $3.3 billion Free Cash Flow $2.6 billion Founded 1979 Revenues Net Income 112 IDC Digital Universe Study IDC – May 2010 Question 2 How much digital information was created worldwide in 2009? A. 846 Terabytes B. 686 Petabytes C. .8 Zettabytes D. 2502 Exabytes Ask a Colleague 50:50 Ask the Audience The Digital Universe 2009-2020 2009: 0.8 ZB Growing by a Factor of 44 2020: 35.2 Zettabytes One Zettabyte (ZB) = 1 trillion gigabytes Source: IDC Digital Universe Study, sponsored by EMC, May 2010 1.2 ZB in 2010 is Equal to . . . 75 Billion Fully Loaded 16GB iPads What is Driving the Digital Explosion? Web 2.0 Applications Ubiquitous Content-Generating Devices 3G/4G Longer Data Retention Periods Freedom of Information Act SEC 17a-4 HIPAA Sarbanes-Oxley Regulation Landscape Secure Collaboration Data Center Remote Site 1 3 2 4 Local Copies Remote Copies Data 5Backup copy 6 Copy for archiving Question 3 What percentage of the .8 zettabytes of digital information is created by individuals? A. 30% B. 50% C. 70% D. 90% Ask a Colleague 50:50 Ask the Audience The Digital Information World Individuals create data …companies manage it! Corp. Corp. Ind. Of the digital universe will be created by individuals Ind. Create Source: IDC Digital Universe Study, sponsored by EMC, May 2010 Manage Of the digital universe will be the responsibility of companies to manage and secure Question 4 How much storage capacity was available on the first Symmetrix 4200 that EMC shipped in 1990? A. 24 Gigabytes B. 240 Gigabytes C. 24 Terabytes D. 2502 Exabytes Ask a Colleague 50:50 Ask the Audience EMC’s Tiered Storage Platforms Broadest Range of Function, Performance, and Connectivity iSCSI ADIC Scalar family IP Fibre Channel EMC Disk Library EMC Centera FICON Celerra CLARiiON SAN NAS CAS Symmetrix Invista Connectrix iSCSI NS500 NS700 NS40 NS704 NS80 CX3 UltraScale Series DL4100 DL740 DL4200 DL4400 DMX-3 950 DMX-3 NSX NS350 DL210 DL710 DL720 SATA 250 GB 7,200 rpm EMC Centera 4-Node FC & iSCSI SATA 500 GB 7,200 rpm AX150 NS40G NS80G NS500G NS704GRainfinity NS700G Global File Virtualization Fibre Channel 73 GB 10k/15k rpm 1990 Symmetrix 4200 Integrated Cached Disk Array introduced with a capacity of 24 gigabytes. Fibre Channel 146 GB 10k/15k rpm DMX800 DMX1000 Fibre Channel 300 GB 10k rpm Low-cost Fibre Channel 500 GB 7,200 rpm 2009 Symmetrix V-Max Systems are available with up to 2 petabytes of usable storage in a single system. Managing Information Storage Trends, Challenges and Options EMC – 2010-2011 Question 6 What is the number 1 challenge identified by IT and storage managers? A. Storage consolidation B. Designing & deploying multi-site environments C. Managing storage growth D. Making informed x strategic / big picture decisions Ask a Colleague 50:50 Ask the Audience Digital Information Storage Challenges Most important activities/constraints identified as challenges by IT/storage managers 1. Managing Storage Growth 2. Designing, deploying, and managing backup and recovery 3. Designing, deploying, and managing storage in a virtualized server environment 4. Designing, deploying, and managing disaster recovery solutions 5. Storage consolidation 6. Making informed strategic / big-picture decisions 7. Integrating storage in application environments (such as Oracle, Exchange, etc.) 8. Designing and deploying multi-site environments 9. Lack of skilled storage professionals *Source Input from over 1,450 storage professionals worldwide http://education.EMC.com/ManagingStorage/ Managing Information Storage: Trends, Challenges and Options 2010-2011 Building an Effective Storage Mgmt Organization Hire an additional 22%+ storage professionals . . . Based on EMC study ‘ Managing Information Storage: Trends, Challenges & Options (2010-2011)’ www.emc.com/managingstorage Where Managers Plan to Find Storage Expertise Based on EMC study ‘ Managing Information Storage: Trends, Challenges & Options (2010-2011)’ www.emc.com/managingstorage Top IT Certifications by Salary Source: Certification Magazine, December 2009 Storage Role Across IT Disciplines Leverage the functionalities of storage technology products to….. • • • • • Systems Architects/Administrators – Maximize performance, increase availability, and avoid costly server upgrades. Network Administrators – Maximize performance of your network and to help you plan in advance. Database Administrators – Maximize performance, increase availability, and realize faster recoverability of your database. Application Architect – Increase the performance and availability of your application IT Project Managers – Plan & execute your IT Projects, which involve or are impacted by Storage technology components EMC Academic Alliance Key Pillars of IT Businesses IT perspective on the data center in the last 20 years have focused on 4 pillars of Information Technology: operating systems, databases, networking, and software application development Based on today’s IT infrastructure, Information Storage is the 5th pillar of IT! Question 7 What is the name of the EMC authored booked that was released in May 2009? A. Storage Area Networks for Dummies B. Storage Networks Explained C. Administering Data Centers D. Information x Storage and Management Ask a Colleague 50:50 Ask the Audience Information Storage and Management (ISM) Modules Section 1. Storage System Section 3. Business Continuity http://education.EMC.com/ismbook Section 2. Storage Networking Technologies & Virtualization Section 4. Storage Security & Management Information Storage and Mgmt (ISM) • Section 1. Storage System KEY CONCEPTProfiles COVERAGE Student Data and Host, Connectivity, Information and Storage Structured and Block-Level and File Unstructured Data Level Access Storage File System and Technology Volume Manager Architectures Storage Media and Core Elements of Devices a Data Center Disk Components Information Management Zoned Bit Recording Information Lifecycle Logical Block Management Addressing Little’s Law and the Utilization Law Experienced Section 1. Section 2. Section 3. Section 4. ‘Open’ Hardware and Software RAID Striping, Mirroring, and Parity RAID Write Penalty Intelligent Storage System Front-End Command Queuing Cache Mirroring and Vaulting Logical Unit Number (LUN) Aspiring Hot Spares LUN Masking High-end Storage System Midrange Storage System Information Storage and Mgmt (ISM) • Section 2. Storage Networking Technologies and Virtualization KEY CONCEPT COVERAGE Key initiatives for all companies Storage Fixed Content Internal and External DAS Consolidation Fibre Channel (FC) Architecture Fibre Channel Protocol Stack Fibre Channel Ports Fibre Channel Addressing World Wide Names (WWN) Consolidation SCSI Architecture Section 1. Section 2. SCSI Addressing Section 3. Section 4. iSCSI Protocol Native and Bridged iSCSI ‘Open’ FCIP Protocol Zoning Fibre Channel Physical / Smaller Footprint Topologies NAS Device and Archives Single-Instance Storage Object Storage and Retrieval Content Authenticity Virtualization Remote File Sharing NAS Connectivity and Protocols NAS Performance and Availability MTU and Jumbo Frames Memory Virtualization Storage Virtualization In-Band and OutNetwork of-Band Virtualization Implementations Logical / Greater Flexibility Block-Level and Server File Level Virtualization Virtualization Information Storage and Mgmt (ISM) • Section 3. Business Continuity KEY available CONCEPT COVERAGE Always / Never lost Operational Business Backup Continuity Information Archival Availability Disaster Retention 1 Period Customer Recovery / Business Bare-Metal BC Planning Data Recovery 3 Backup Business Impact Local Copies Architecture Analysis Backup Topologies 5 BackupVirtual copy Tape Library Data Center Section 1. Section 2. Section 3. Section 4. ‘Open’ Maximize Data Availability Synchronous and Asynchronous Replication Host-Based Local LVM-Based Replication Replication Array-Based 2 Host-Based Log Local Replication Shipping Copy on First Disk-Buffered 4 Access (CoFA) Replication Copy on First Remote Copies Three-Site Write (CoFW) Replication Restore and 6 Restart Data Consistency Data Consistency Remote Site Copy for archiving Minimize chances of data loss Information Storage and Mgmt (ISM) • Section 4. Storage Security and Management KEY CONCEPT COVERAGE Is my data secure? Section 1. Section 2. Section 3. Section 4. ‘Open’ Storage Security Framework Alerts The Risk Triad Management Platform Standards Security Domain Internal Chargeback Infrastructure Right Management Access Control Consolidated Virtualized and in the Cloud Data storage security considerations EMC Academic Alliance Developing tomorrow’s Information Storage Professionals…today! • • • • • • Partnering with leading Institutes of Higher Education worldwide to bridge the storage knowledge gap in Industry Providing EMC, Customers and Partners with source to hire storage educated graduates Hundreds of institutions globally, educating thousands of students Offering unique ‘open’ course on Information Storage and Management • Focus on concepts and principles Opportunity for EMC to give back as the industry leader For the latest list of participating institutions and to introduce us to your Alma Mater, visit http://education.EMC.com/academicalliance Becoming an Academic Partner Required Steps . . . 1. Institution enrolls via the EAA online application. http://info.emc.com/mk/get/EAA_APPL_form?src=&HBX_Account_Number=emc-emccom 2. Institution identifies faculty to teach course and administer the program. 3. Institution identifies faculty to attend the 5 day ISM Faculty Readiness Seminar (FRS) and clear ISM certification exam. 4. Institution accesses secure Faculty website to download teaching aids such as chapter PowerPoints, quizzes, simulators, etc. 5. Institution promotes ISM course to students. 6. Institution schedules and begins teaching the ISM course. Summary • Information storage is one of the fastest growing sectors within IT. • Information growth and complexity creates challenges and career opportunities • Business and industry are looking for IT professionals who know all 5 pillars. • Those who obtain the skills through formal education and industry qualification have an advantage.