VNX Unified Storage Solutions Design - Student Guide Education Services October 2012 Welcome to VNX Unified Storage Solutions Design. Copyright © 1996, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC2, EMC, Data Domain, RSA, EMC Centera, EMC ControlCenter, EMC LifeLine, EMC OnCourse, EMC Proven, EMC Snap, EMC SourceOne, EMC Storage Administrator, Acartus, Access Logix, AdvantEdge, AlphaStor, ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, ClaimPack, ClaimsEditor, CLARiiON, ClientPak, Codebook Correlation Technology, Common Information Model, Configuration Intelligence, Configuresoft, Connectrix, CopyCross, CopyPoint, Dantz, DatabaseXtender, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, Document Sciences, Documentum, elnput, E-Lab, EmailXaminer, EmailXtender, Enginuity, eRoom, Event Explorer, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS, Max Retriever, MediaStor, MirrorView, Navisphere, NetWorker, nLayers, OnAlert, OpenScale, PixTools, Powerlink, PowerPath, PowerSnap, QuickScan, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, Smarts, SnapImage, SnapSure, SnapView, SRDF, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, UltraFlex, UltraPoint, UltraScale, Unisphere, VMAX, Vblock, Viewlets, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, VisualSAN, VisualSRM, Voyence, VPLEX, VSAM-Assist, WebXtender, xPression, xPresso, YottaYotta, the EMC logo, and where information lives, are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. © Copyright 2012 EMC Corporation. All rights reserved. Published in the USA. Revision Date: October 2012 Revision Number: MR-7CP-VNXUNISDTA 5.32/7.1 v1.5 Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 1 This slide gives an overview of the course, the audience at whom it is aimed, and pre-requisite courses. Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 2 Upon completion of this course, you should be able to gather relevant information, analyze it, and use the result of that analysis to design a VNX solution. Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 3 This is the agenda for Day 1 of the class. Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 4 This is the agenda for Day 2 and 3 of the class. Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 5 This is the agenda for Day 4 and Day 5 of the class. Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 6 This module focuses on gathering the proper data in preparation for designing a VNX solution. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 1 This lesson covers the theory and methodology of documenting the current hardware environment and requirements prior to the design phase. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 2 In order to model and validate that a storage solution meets the determined business requirements, supporting capacity, connectivity and performance data from the environment must be collected and modeled using certified toolsets. The modeling of this data provides the necessary criteria to validate a solution. It is not possible to grant credibility for a solution to adequately meet the customer’s business requirements without first gathering the appropriate data. This data is used in a modeling process designed to validate the solution. The type of data and collection methods vary according to the different solutions. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 3 As we look at designing VNX storage solutions, it is important to understand the customer’s environment: knowing the applications used, how the data is accessed, what and how much data needs to be replicated, across which logical volumes and which physical spindles, and the replication requirements are all critical to designing VNX storage solutions. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 4 Once we understand which devices make up the environment, we need to look at how things are utilized across the servers, storage subsystems, SAN and WAN. We need to understand application, storage, and network performance. If replication is used, current bandwidth and latency are key considerations for future designs. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 5 Gathering performance during peak workloads is critical to ensure proper sizing of the environment. To focus on some subset of the hour, day, or week, we look to the customer to identify the peak production times and then confirm with host- or storage- production data statistics. Properly determining collection intervals is also critical. Choosing smaller periods without affecting the production environment is preferred over larger intervals. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 6 Shared logical volumes may unknowingly contend for the same logical volumes and degrade each others’ performance. Performance analysis should be performed at both the host and storage level. Storage performance may not show host side issues and host side performance may not identify storage hot spots. Determination of current hot spots will help in preventing hot spots in the VNX environment. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 7 All proposed or recommended configurations and solutions must come from actual customer data points. If we don’t have the data, we can’t make a calculated recommendation. If there are changes or if the analysis is outdated we need to revisit the configuration to make sure the proposed configuration still meets the customer’s objectives. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 8 Most IT environments are dynamic. Changes in the environment can occur frequently and unless they are taken into account, the final solution may be based on stale data. Care must be taken to work with all parties to ensure that any changes are accounted for in the final solution. Implementing properly timed freezes will help ensure successful implementations/migrations. An example of this would be a migration schedule for August 28. Let’s say we gather the data on August 1 and design the future environment with this data. If on August 20 there is an addition to the current environment which is not accounted for in the future environment design, then we will be unsuccessful when we try to migrate on August 28. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 9 Verifying the scope of the project will ensure that all the proper data is collected and no time will be wasted on resources outside of the scope. Be sure to cover the servers, network, storage, applications, backup and recovery, replication, and any other in scope resources. It is also important to identify high profile assets. Planning the solution for high profile data needs special care. The applications can be very sensitive to performance and the business may be significantly affected if these applications perform poorly. After data collection a thorough analysis needs to be performed. Current and final location of data, network architecture, backup and recovery, and replication requirements all need to be considered. After analysis of the current environment, conversation with the Customer is critical to ensure that unnecessary data is not kept and that growth will be accommodated. The final design can then be created and verified with the solution. Implementation and validation need to be performed before completion of the project. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 10 It is very important to determine the final list of servers as early as possible in the project. Scope creep can greatly affect the timeliness and efficiency of a project. Determining all servers in scope of the project is should be one of the first tasks in designing the solution. Gathering emcgrab is the next step. Using data gathering tools will be discussed in the next module. It is critical to account for all current storage and storage requirements and ensure no needed data is left behind in the case of migrations. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 11 Gathering the current storage configuration is also one of the first steps in any design. Tools vary depending on the current storage platform. Some of these tools will be discussed in the next module. Once the data is gathered, correlate this information with the emcgrab information. This will help ensure that no storage is unaccounted for. Discussions with the Customer will also determine whether additional storage needs to be allocated for growth. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 12 VNX storage access is done via Block and/or File protocols. A clear distinction should be made of the storage used for the different protocols. Proper design will require special performance considerations for File vs. Block data as discussed later in this course. Gathering File storage data means including the underlying storage, the connectivity between the NAS servers/Data Movers and the backend storage, as well as the network shares and NFS exports information. For environments with Active Directory, LDAP domains, and NIS, these environmental details must be documented and incorporated in the final design. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 13 Connectivity between hosts, storage, and remote sites is critical to a proper design. Ethernet networks are often used for the management of the environment, iSCSI, CIFS, and NFS traffic. Fibre Channel (FC) networks will carry the Fibre Channel data. Converged networks will account for both Ethernet and FC data. Documenting converged networks still requires the same due diligence in gathering both the Ethernet and FC configurations. VLAN, VSANs, and any special networking should be noted and incorporated into the final design. Proper design of remote replication solutions often requires a network assessment to determine bandwidth, latency, and packet loss. If this is required for the environment it is good to schedule this early in the project. Creating a diagram of the current configuration will help ensure a good understanding of the environment and is a good first step to designing a network solution. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 14 Backups are an often overlooked part of VNX designs. Determining requirements and discussing any current issues is critical to a successful design. Double check whether backup configuration is in scope of the project as this can be contentious. Archiving may also be part of the scope. Setting up proper connectivity and implementing desired retention policies require ensuring all requirements have been gathered and documented. Designing replication can be simple or complicated depending on the requirements. All volumes needing replication must be accounted for. Consistency among volumes need to be documented and designed for. And replication methodologies must be chosen to properly meet the RTOs and RPOs. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 15 E-Lab Advisor is a simple method of processing the data gathering configuration. The outputs can be used for planning designs, migration plans, and remediation. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 16 E-Lab Advisor is a web based tool which combines the functionality of the HEAT, SWAT, Celerra Health Check, SYMAPI Log Analyzer, and SANsummary tools. HTML or PDF formatted reports can be produced via the upload of data collections (EMCGrab for UNIX, EMCREPORTS for Windows, supportshow for Brocade, show tech-support details for Cisco, product information or Connectrix data collection for McData). These reports can be produced in a few seconds (excluding upload time) saving an average of two hours per report compared to the previous methods. E-Lab Advisor is a complete rewrite of a number of decommissioned tools. Existing tool functionality was retained and new functionality was added. E-Lab Advisor allows users to: • • • • • • Automate high level health checks for host and fibre switch environments Compare actual host configuration against EMC e-Lab Support Matrix (ESM) Compare actual host configuration against EMC Simple Support Matrix (ESSM) Send XML file to CCA5 for Windows, Solaris and HP-UX environments Generate SANsummary XLS reports to document customer environments Generate SANsummary XLS reports for host and switch import into GSSD tools E-Lab Advisor provides dashboard functionality which allows the user to have a high level view of the severity of issues before opening a report. SANsummary reporting has been integrated into the E-Lab Advisor web application, removing the need to maintain locally installed software. By using SANsummary's Excel formatted spreadsheets, users can also import hosts and fibre channel switch data into EMC Migration Planner (EMP) and Networked Storage Designer (NSD-U), thereby reducing manual data entry. (Continued) Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 17 Major benefits for EMC technical personnel include: • Standardized reports for customer deliverables • Automation for E-Lab Navigator supportability checks • Simplification via a single integrated utility Major benefits for customers include: • Improved Web Interface merges multiple existing tools/URLs into a single interface • Standardized formatting for all reports allows cut and paste from multiple reports without having to reformat, saving remediation time • Single upload page simplifies the end-user experience https://elabadvisor.emc.com Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 18 Any necessary remediation of the environment should be communicated to the customer as early as possible. Remediation, such as switch code upgrades, can be difficult to schedule and can lengthen a project timeline if not quickly addressed. The output of eLab Advisor should help in producing the remediation document to give to the customer. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 19 This lesson covers interviewing key personnel. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 20 Covering all bases requires discussion with all vested parties. Anyone who will use, manage, support, and base business decisions on the VNX storage environment should be interviewed to ensure that all aspects of storage design have been accounted for. People and companies do not generally seek out technology; they seek out solutions to their needs or pain points. Technology for the sake of technology is not a great marketing strategy. When customers buy from us, they are really buying outcomes, feelings, results, and solutions. In many cases, personnel at the CxO level will view objectives from a business point of view, while administrators are likely to more technical in nature. It is important to bear in mind that rivalries often exist between various disciplines, and that employees may regard certain procedures as being part of their exclusive responsibility. They may work actively to preserve what they perceive as their personal domain, and may be hostile to, or even try to sabotage, attempts to move responsibilities to other areas. Recent EMC software, e.g. the plugins for VMware, allow server or virtualization administrators to provision their own storage – previously the sole preserve of storage administrators. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 21 IT often involves both intangible and complex sales. Solutions are often invisible—they run in the background, the hardware is seldom seen, and the majority of personnel may not even know they are there. Storage is a vital part of all solutions, so most customers are, to some extent, in the high technology and storage business. People from vastly different roles are involved in the purchase of high tech solutions. For instance, if you were selling storage management software to a customer, the following personnel could be involved in the decision making and evaluation process: • CEO, CIO, CTO roles • Finance/Accounting • The training department that will train users • IT personnel who will have help integrate and support the software Each of the roles needs to have the solution explained differently: • CxO roles want to know the big picture, the bottom-line and ROI. They will not want technical details; they should not be asked questions related to those details. • Finance and accounting will be concerned about capital costs, warrantees, process management, risk and on-going fees. • The training department will want to know about implementation, training tools and materials, and how it will impact their ability to educate the users. • The IT department will want to know specifics on scalability, ease of use, and many other technical issues. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 22 Most people are comfortable communicating with one or two of these types of stakeholders. Good architects and salespeople know how to address the needs of each of these groups, and communicate the benefits in a way that is unique and pertinent to each stakeholder type. There are several other important attributes as well: Knowing the customer market Understanding the customer market and niche is important. This includes the customer’s unique circumstances, competitive environment, and business processes. If this knowledge is at hand, it is possible to easily identify the core business challenges our solution can address. Knowing the specific customer Each customer will have unique business challenges and processes that need the support of technology in a special way. Factors that affect the solution needed will depend on their stage of business growth, existing business processes, corporate goals, immediate and long term pain points, as well as management and operations philosophy. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 23 Be aware of the solution strengths and limitations In order to help the customer and become a true resource and problem solver, we must understand the weaknesses and limitations of the solution in addition to its strengths and capabilities. Customers may have unreasonable or ill-defined expectations — by understanding our limitations and communicating them effectively we can dispel any misconceptions. Small misunderstandings at the beginning of a project may lead to major dissatisfaction later. On the other hand, by being aware of the benefits of the solution we maximize revenues and customer satisfaction. Aim to solve the customer problem This is an integral part of selling technology. New technologies often stem from problems for which there is no solution. Become a trusted advisor Being seen as a trusted advisor can be a real advantage. To be seen as a consultant, one needs familiarity with needs analysis, must be a subject matter expert, have a high level of rapport with customers, and look for ideal solutions — not the boilerplate solution to client problems. Most large high-tech deals require a number of people to help close the sale. Technical and support staff may need to interact with key staff in the customer company. Once the sale is closed, there will be a need to continue to monitor these interactions to ensure fulfillment of commitments made to the customer. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 24 Costs difficult to justify: Some ideas may work out to be too expensive. The only solution may be to add a touch of realism to make the solution affordable Implementation: How will it work in practice? How will it interface with existing business processes? Is it new and untested? Benefits difficult to measure: New ideas often demand new skills and new thinking. That can make costing them difficult. Training and education costs may also have to be considered. Risky solutions: New ideas often depend on new and possibly untested technologies. This will lengthen the implementation time, which delays the benefits and increases the costs. The customer may also want performance and operational guarantees which could add cost to the solution. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 25 This lesson covers documenting expectations. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 26 Use server and storage data along with the information learned from interviewing key personnel to document the capacity expectations. Capacity should be documented on a per LUN basis. Final LUN size should be the current capacity plus any growth the Customer wants to build into the design. Growth can also be accomplished by adding brand new LUNs from the new storage. The application, environment, and migration methods will all play a part in determining whether growth will be accomplished by growing current LUNs or adding new LUNs to the server(s). Expandability is an important design criteria and should be discussed with all key personnel. The ability to grow and shrink data usage will significantly affect Customer satisfaction. Network is also an important area to consider for growth. New servers will require switch ports and bandwidth. Documenting current head room in the environment and working with the Customer to determine future growth will help determine the final design. Determine performance requirements on a server to server basis. If performance needs to be increased, clearly document this so that the final solution will meet the expected goals. Often the performance requirement is for the final performance to be equal or better than the current performance. As such, ensure good baselines of current performance. By comparing the current performance to the final performance, there will be a clear indication of whether the performance expectations were met. If the Customer has baselining tools, these are good methods of gathering application level performance. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 27 Availability expectations will determine the storage, network, and server design. Ensure that all parties have indicated their availability needs on a storage, server, network, and application level. Security is also an important part of VNX design. Clearly documenting the level of security required will allow proper design of administration and support access to the environment. Management expectations are also critical for Customer satisfaction. Determining who will manage the devices and from where and what access they will need enables the proper set up of the management environment. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 28 This lesson covers how user data will be accessed. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 29 When designing migrations, interviews with key personnel should include discussions of data access methods. Often the data access method will be maintained after the migration. If a change in access method is required, care must be taken to ensure proper connectivity and performance. An example of changing access methods would be a migration from a CX3 to a VNX array where the servers originally used fibre channel to access the CX3 and will use FCoE to access the VNX. For new implementations data access methods must be determined by discussions with the Customer. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 30 This lesson covers determining how data is protected. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 31 For environments with data archiving, primary and secondary storage must be properly designed and configured for smooth operations. Gathering current archiving information plus any desired changes will start the archiving design portion. Connectivity between primary and secondary storage must account for redundancy and bandwidth requirements. Migrating current archived data may or may not require recalling the archived data before the migration. Check the documentation of the specific platforms to determine migration paths and any special migration considerations. Shown here is an example architecture for EMC CTA archiving to a EMC Centera or Atmos system. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 32 Although the VNX does not provide backup software, it is important to consider the backup and restore environment when designing a VNX solution. Physical spindles may need to be dedicated to backup servers or archival locations. Software such as SnapView may need to be considered to offload the workload to dedicated backup to dedicated backup servers. Replication is also a means of protecting information by providing remote site copies of the data. The VNX series offers all the replication methodologies needed to keep data available and secure. VNX SnapView for block and VNX SnapSure for file for local protection; VNX SAN Copy, offering both push and pull, incremental and full, local and remote copies from and to VNX, CX, and NS Series arrays; VNX MirrorView, both Synchronous and Asynchronous from and to VNX, CX, and NS Series arrays; VNX Replicator for file system level replications from and to VNX and NS Series arrays. VNX Replicator is capable of 1-to-1, 1-to-many, many-to-1, and cascading replications. Replicated copies managed by RecoverPoint/SE or other standard EMC replication utilities are not only vital for DR and data protection solutions, they can also be very useful for application testing, offloading backups, reporting, and performing software upgrade tests. Replication Manager fully integrates with RecoverPoint/SE and these replication utilities to enable automated, application consistent copies. Combine this with the visibility provided by Data Protection Advisor for Replication to monitor and alert on recovery gaps for remediation. One can also prove compliance to protection policies with integrated reporting. Note: Data Protection Advisor does not support VNX Replicator or SnapSure. Gathering current backup, archiving, and replication methodologies along with discussions with key personnel will provide the basis for the final data protection design. Details of designing data protection will be covered later in this course. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 33 The VNX series offers all the replication methodologies needed to keep data available and secure. Integrated with Unisphere, RecoverPoint/SE offers several data protection strategies. RecoverPoint/SE Continuous data protection (CDP) enables local protection for block environments. RecoverPoint/SE Concurrent local and remote (CLR) replication enables concurrent local and remote replication for block environments. RecoverPoint/SE Continuous remote replication (CRR) enables block protection as well as a filelevel site DR solution. This enables failover and failback of both block and file data from one VNX to another. Data can be synchronously or asynchronously replicated. During file failover, one or more stand-by Data Movers at the DR site come online and take over for the primary location. After the primary site has been brought online, failback would allow the primary site to resume operations as normal. Configuration can be active/passive as shown here or active/active where each array is the DR site for the other. Deduplication and Compression allow for efficient network utilization, reducing WAN bandwidth by up to 90%, enabling very large amounts of data to be protected without requiring large WAN bandwidth. RecoverPoint/SE is also integrated with both vCenter and Site Recovery Manager (SRM). SRM integration is block only. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 34 Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 35 Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 36 Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 37 Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 38 This module covered gathering data to prepare for the analysis phase of a VNX design. Copyright © 2012 EMC Corporation. All rights reserved Module 1: Data Gathering 39 This module focuses on tools used to gather data and validate an environment. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 1 This lesson covers VNX utilities and host utilities used to gather data. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 2 These are the VNX utilities. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 3 The Unisphere GUI displays real-time information about a number of object types. This information cannot be saved to disk, and is displayed in text form on the dialogs where it appears. These statistics are only displayed if Statistics Logging is enabled at the storage system level, and are accessible through the Properties page of the object (or the Status page for Incremental SAN Copy). Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 4 Most of these statistics have names that are self-explanatory. Some additional notes follow: Cache Hit Ratios are calculated values that divide the number of cache hits by the number of reads or writes. They are a measure of how well the cache subsystem is performing. The Number of Unused Blocks Prefetched is a measure of how efficient prefetches are. Lower numbers are better. The Write Cache Flush Ratio is calculated by dividing writes that caused a flush by the total number of writes – essentially a look at whether or not this LUN is causing forced flushes. Utilization is a calculated number that determines how busy an object is. This parameter may not always give an accurate picture; if it is incorrect, it is likely to be too high rather than too low. Note that a LUN will be regarded as busy if 1 or more of its disks are busy with a task related to that LUN. Stripe Crossings and Disk Crossings (Analyzer only) indicate that an I/O has spanned 2 or more disks, or spanned 2 or more stripes. This may be an indication that data is misaligned. See the previous slide that discussed disk and stripe crossings. Note that the concept of stripe crossings is meaningless for a Pool LUN, and the option is not displayed. The Number of Trespass parameter counts the number of times a LUN has been trespassed. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 5 These new LUN parameters show I/O through the owning and non-owning SP for a LUN, and the number of times that I/O has been rerouted. These parameters are displayed on the LUN Statistics tab only if Failover Mode 4 – the mode that supports ALUA - is selected. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 6 These statistics are similar to the equivalent statistics for ordinary LUNs. Note that, as with ordinary LUNs, ALUA statistics will be displayed if the host supports failover mode 4. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 7 Most of these statistics have names which are self-explanatory. The Number of Write Cache Pages is the number of pages allocated to this SP (initially half of the total number of write cache pages). Dirty Pages is a measure of how many write cache pages contain modified data which has not yet been flushed to disk. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 8 These statistics are self-explanatory. Note that, as is the case with some other counters, utilization may be incorrect if the disk is being used by both SPs. VNX OE for Block, and its predecessor FLARE, have used different methods of calculating utilization when objects are used by both SPs; generally, the object will be as busy, or busier, as reported by the SP with the highest Utilization number. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 9 These statistics are identical to those available from Unisphere Analyzer. One can determine at a glance what impact the Session will have on the Source LUN by looking at the reads from the Source LUN. These show additional reads performed as a result of the session running on the Source LUN. The Writes Larger than Reserved LUN Pool Entry Size measures the number of writes that exceed 64 kB in size (the size of a chunk). Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 10 Though this information may be regarded as not being true performance information, it does give one an idea of how much data is being copied, and at what rate the copy is proceeding. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 11 A number of Navisphere Secure CLI commands return the current values of running counters. These raw values may be massaged into a usable state. Some of those are discussed in following slides. Note that, unless the security file has been created, all commands will contain the username, password and scope. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 12 Note the ‘Prct Cache Pages Owned’ field – SPA owns 51% of the write cache pages, so has been marginally busier than SPB. Page assignment, which is checked every 10 minutes, has been modified to give the busiest SP more of the available cache. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 13 The counters shown here, such as Total Reads and Total Writes, are cumulative counters. They can be reset with a CLI command to enable a clean start. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 14 The getdisk command displays disk information and status, as well as several cumulative counters related to reads, writes, and errors. The Private line shows the block number at which the LUN begins: LUN 500, for example, starts at block 69704 – the preceding blocks (~ 34 MB) are used internally by VNX OE for Block. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 15 The getlun command displays a wealth of information about a LUN or LUNs. Included are the cumulative counters for reads and writes, broken down by I/O size; these are the same statistics displayed by Analyzer’s I/O Size Distribution views. Note that data is not broken down by optimal/non-optimal path. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 16 Navisphere Secure CLI returns MirrorView and SAN Copy data similar to that returned by Unisphere. This information falls into the category of status information rather than performance information. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 17 This lesson covers VNX utilities and host utilities used to gather data. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 18 The basics of Unisphere Analyzer should already be familiar to you. In the next few slides we’ll be looking at the performance parameters monitored by Analyzer, and defining several of them. Some counters include information broken down by optimal and non-optimal paths. Though the main object types for performance analysis are the disk, LUN and SP, note that other objects may also be selected. The RAID Group, host and Storage Group objects allow a subset of the VNX LUNs to be selected and displayed. Viewing a RAID Group will show cumulative values for the disks and LUNs in that RAID Group. Counters for SP front-end ports allow more precise determination of where I/O is going on the SP; these counters include a queue full counter, designed to aid troubleshooting in very large environments with very busy hosts. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 19 These are the performance parameters that Analyzer will display for a Traditional LUN. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Items marked + display the parameter, and the parameter for the optimal as well as non-optimal paths. The optimal/non-optimal information is only displayed if the Advanced checkbox is checked. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. • Used Prefetches – if any part of the prefetched data is used, it is counted as a used prefetch. High numbers are better • Read Cache Hits/s – read requests satisfied from either read cache or write cache, broken down into Reads from Write Cache/s and Reads from Read Cache/s. The former gives an indication of how many writes are being re-read; the latter gives an idea of prefetching efficiency • Write Cache Hits/s – writes that do not require physical disk access. These may be ‘new’ writes that did not cause forced flushes, or writes made to data still in cache (rehits) • Write Cache Rehits/s – writes made to data still in write cache, and which has not yet been flushed to disk. This gives an idea of the locality of reference of writes • Average Busy Queue Length – counts the length of the queue, but only when the component is already busy. This helps to indicate how bursty the I/O is • Service Time (ms) – time taken to service a request. This does not include the time spent waiting in the queue Note that there are separate categories for SP cache (RAM-based) and FAST Cache (Flash drive based). Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 20 These are the performance parameters that Analyzer will display for a Pool LUN. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Items marked + display the parameter, and the parameter for the optimal as well as non-optimal paths. The optimal/non-optimal information is only displayed if the Advanced checkbox is checked. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. • Average Busy Queue Length – counts the length of the queue, but only when the component is already busy. This helps to indicate how bursty the I/O is • Service Time (ms) – time taken to service a request. This does not include the time spent waiting in the queue Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 21 These are the performance parameters that Analyzer will display for a metaLUN. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Items marked + display the parameter, and the parameter for the optimal as well as non-optimal paths. The optimal/non-optimal information is only displayed if the Advanced checkbox is checked. Most of these parameters are identical to those used for standard LUNs. There are 2 parameters not used for Traditional LUNs – LUN Read Crossings/s and LUN Write Crossings/s. These are similar in concept to the Disk Crossings/s used for standard LUNs, except that in these cases, an I/O has accessed more than one LUN. Bear in mind that, as previously mentioned, a metaLUN is a large RAID 0 LUN composed of other LUNs, which will often be RAID 5, RAID 6, or RAID 1/0. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 22 These are the performance parameters that Analyzer will display for an SP. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. • SP Cache Dirty Pages (%) – the percentage of write cache pages owned by this SP that were modified since last being read from disk or written to disk • SP Cache Flush Ratio – the ratio of flush operations to write operations, or, put differently, the ratio of back-end write operations to front-end write operations. Low numbers mean better performance • SP Cache High Water Flush On – the number of times in a sample period that dirty pages reached the High Watermark. This is a measure of front-end activity • SP Cache Idle Flush On – the number of times in the last sample period that idle flushing has been used to flush data to LUNs. This is an indication that one or more LUNs are idle • SP Cache Low Water Flush Off – the number of times that watermark processing was turned off because the number of dirty pages reached the Low Watermark. This number should be close to the value for High Water Flush On. If it is not, it is an indication that a high level of front-end I/O activity is preventing the cache from being flushed Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 23 These are the performance parameters that Analyzer will display for an SP port. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. • Queue Full Count – the number of Queue Full events that occurred for a particular front-end port during a polling interval. A queue full response is sent to a host when the port receives more I/O requests than it can handle. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 24 These are the performance parameters that Analyzer will display for a disk. Because a RAID Group is simply a collection of disks, the same parameters are displayed for a RAID Group . Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. Average Seek Distance (GB) – a measure of randomness of the I/O pattern to the disk. Low numbers mean that data is more sequential. Note that this value should be compared to the physical disk size to be meaningful. Note also that this parameter is disk-specific, not LUN-specific, so for Storage Pools with multiple LUNs it is not possible to accurately determine the randomness of a single LUN. For a Pool, because of the way data is distributed, this data is meaningless at the LUN level. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 25 These are the performance parameters that Analyzer will display for a Pool. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 26 These are the performance parameters that Analyzer will display for a SnapView Session. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Note that Analyzer refers to the Snapshot Cache, where Unisphere and the CLI refer to the Reserved LUN Pool. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. • Writes Larger Than Cache Chunk Size – strictly, writes to the Source LUN that resulted in 2 or more Reserved LUN chunks being used. • Chunks Used in Snapshot Copy Session – the number of Reserved LUN chunks used by this Session Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 27 These are the performance parameters that Analyzer will display for a MirrorView/Asynchronous mirror. Items marked with an asterisk (*) are only displayed when the Advanced checkbox is checked. Some of these parameters have names that are self-explanatory. Those that do not, and those that have functions that are not obvious, are discussed below. • Total Bandwidth (MB/s) and Total Throughput (I/O/sec) – these refer to bandwidth and throughput of the updates made to the secondary image from the primary. They are therefore a measurement of MirrorView traffic across the link between the storage systems • Average Transfer Size (KB) – average size of update I/Os sent from primary image to secondary image • Time Lag (min) – a measure of how far the secondary image is, in time, behind the primary • Data Lag (MB) – a measure of how much data on the primary image is different to that on the secondary image • Cycle Count - the number of updates that completed during the polling interval • Average Cycle Time (min) - the average duration of all updates that finished within the polling interval Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 28 This summarizes information with which you should already be familiar. Note that Analyzer menu options also appear when a host or Storage Group is right-clicked; these options may be used to filter the LUNs being displayed. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 29 The Analyzer CLI commands allow scripted control over the Analyzer polling interval, and the starting and stopping of logging. These Secure CLI commands require that a security file be created, or each command will require username, password and scope. The standalone archiveretrieve command allows scripted retrieval of log data from a storage system. Once the data is stored on the host as a NAR file, the archive dump utility will produce a report, formatted as per user choice, from the raw data, and write it to a file. The archivemerge command may be used to merge 2 NAR files into a single one for viewing; the original files are unaltered. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 30 The –status command displays the Analyzer logging status – started or stopped. The archive –list command lists all archive files, and ignores all other switches. The –path switch allows a folder to be specified; by default archives are saved in the current folder. The –delete switch allows deletion of archive files. The archive –new command starts a new archive (if more than 10 samples have been collected), or returns the name of the newest archive file. The –statusnew switch returns the status of the newly created archive. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 31 The -archiveretrieve command allows scripted retrieval of log data from a storage system. Once the data is stored on the host as a NAR file, the -archivedump command can be used to produce a report, formatted as per user choice, from the raw data, and write it to a file. The archivemerge command may be used to merge 2 NAR files into a single one for viewing; the original files are unaltered. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 32 This lesson covers VNX utilities and host utilities used to gather data. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 33 Unisphere Service Manager (USM) is a GUI-based tool used to perform services tasks on the VNX Series systems. USM is a java-based application that runs on a Windows PC and communicates over the Internet to Powerlink and over a local network to the VNX system. The USM Software Upgrade tool can download File and Block software packages from Powerlink. Also, the USM Software Download wizards guide the administrator, step-by-step, through the process of upgrading the VNX software upgrade, including File, Block, or Unified. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 34 The Unisphere Service Manager (USM) can be launched from the Service Tasks section under the System tab of Unisphere. If USM is not installed on the management station that you are using Install Anywhere can automatically prompt you to do so; however you must have a valid Powerlink account. USM is also available on Powerlink or a VNX Installation Toolbox CD. Once the installation is completed, USM will open, or you can click the Unisphere Service Manager icon on the desktop of the management station. Note: If started from the “Launch USM” option within Unisphere, USM is automatically logged using the same authentication and privileges as in Unisphere. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 35 In the USM Home screen, the top menu will show four options: (1) home tab which is the screen you currently are; (2) Advisories tab which will provide any advisories for all system models; (3) Download tab which provides a direct link to PowerLink to download necessary software file; (4) Support tab, which is the same as in Unisphere, makes your service and maintenance experience simple. The login options will allow you to login to either a VNX series platform, CLARiiON storage system, or a Celerra system running at least DART 5.6. The Reports section allows you easier capability to view the already-downloaded repository files. When requested, you can also use it to submit a system configuration report to EMC Support personnel. You can also use the Reports section to generate new system configuration file for your system. Without logging into USM, let’s see what else you can do. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 36 The screen shown here is the Downloads tab without logging into any systems. You are provided with four selections: (1) Download VNX Software Updates, (2) Download CLARiiON Software Updates, (3) Download Celerra Update Files, and (4) Download Disk Firmware Packages. Each one of the selection allows you to download software or files updates for a specified system, and the disk firmware packages allows you to download selected disk firmware packages. For these selections to work, you would need Powerlink access, therefore you may need to authenticate on the management station. Now that you have seen the capabilities of USM without logging in, let’s go through the process to log in to see what else is available. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 37 Besides providing the IP address, username and password, there are three different scopes available when logging into USM: global, local, LDAP. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 38 Once you are successfully authenticated, you will automatically see the System’s view of USM. You have five options on top of the page: System, Hardware, Software, Diagnostics, and Support tabs. Under System you get instant access to: install or replace hardware component using the Hardware selection, install or update block/file codes using the Software selection, verify the storage system and gather diagnostic data using the Diagnostic selection. On the right-side panel, you can verify the system’s information, logout of the system, and submit or view system configuration through the system Reporting tool. User authentication is not needed to view the existing configuration file using the System Reporting Tool. On the lower left-side of the screen, you will find an Advisory icon with a Certificate icon. The certificate icon will have a counter to display how many certificates are currently active for that specific system through USM. On the lower right-side of USM, you should be able to see the username that is currently logged onto USM, who is sysadmin on our example on the picture. Overall, USM is a GUI-based tool that can be used to maintain the VNX system. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 39 USM can be used to maintain multiple different platform across the EMC portfolio. Here we see a table of all the current wizards available from within USM and the platforms that are supported. Now that you have seen USM as a whole, let’s see some of the wizards and tools available. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 40 The first utility you will learn of is the System Reporting Tool. It allows you to view system configuration and submit the configuration to EMC with ease, and also view the Repository. Once you open USM, the System Reporting Tool is located on the lower right-hand corner under a section titled Reports. By selecting View System Configuration, the System Reporting Tool will pop up. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 41 As the System Reporting Window opens up, you can see that you can customize it. The System Configuration Wizard allows you two options: Generate system configuration: you need to be authenticated in order access this option. The wizard will go through and collect system configuration from the VNX system, and put it all in one file of one of two format choices. Existing system configuration: There is no need to authenticate because the tool will use an existing (xml or zip) file to output the system configuration file in the specified format of your choice. You can choose to add additional content to the report by selecting the checkbox next to Configuration Analysis to get the report analyzed. This wizard also allows you two choices to output the file: HTML and/or Microsoft Excel. Although Microsoft Excel requires that you have the Excel installed on the management station that you are using, you can open the file once the tool completes the process. Once the tool has compiled the configuration report in the specified format, it places the file in the repository and provides a button to view it. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 42 The system Reports’ output is an easy to read and follow document that provides an overview of the VNX system’s configuration. Within the selected area, there is the system’s serial number, the name of the configuration, and the version of the software currently installed on the system. You also get: - Selected general information about the system under the General tab Selected hardware-based information under the Hardware tab Detailed information about the available software and enablers installed on the system under the Software tab An overview of the user-created storage group under the Storage tab Selected information of all the hosts, initiator, virtual machines connected to the system under the SAN tab Each one of the tabs has multiple sub-selections. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 43 Now, you will learn of the Hardware section tab within Unisphere Service Manager. It includes the tools to increase connectivity and storage, and to replace a faulted disk. You are provided with two options from under the Hardware tab of USM. As the name reads, the Hardware tab takes care of selected VNX hardware components. These two options are Hardware Installation, and Hardware Replacement. As of this release, the Hardware Installation allows you to: <Click-1> (1)increase the total storage capacity by installing additional Disk Array Enclosure to the system and (2) add additional connectivity by installating additional I/O Module and/or SFPs. The Hardware Replacement would allow you to replace faulted disk. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 44 One of the capabilities that the VNX series platform has is the ability to grow in storage capacity and connectivity as long as the maximum threshold has not been met for each. USM facilitates this through its many wizards. Two of them are presented under the Hardware Installation options, they are: - Install Disk Array Enclosure: USM helps determine whether or not you can expand the capacity of the system by adding more DAEs. If approved, it also advises on bus and enclosure location for the DAE. - Install I/O Module and/or SFPs: USM helps determine whether or not you can install I/O Modules and/or SFPs into the system. Although it does not advise on which ports to install the I/O Modules and/or SFPs, it is best practice to verify with a configuration guide or the Pocket Reference before designating a slot for a specified I/O Module. For more information about I/O Module assignment, please check with the VNX Procedure Generator. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 45 While the hardware installation allows you to add more storage capacity, the hardware replacement allows you the capability to replace a faulted disk. First off, USM verifies that a faulted disk does exist, before executing any replacement command to the system. The Replace Faulted Disk, once selected, opens the Disk Replacement Wizard (DRU). The Disk Replacement Wizard is a self-explanatory program, and it provides step-by-step procedure for identifying whether a faulted disk can be replaced or not. For more information on how to replace a faulted disk, check the VNX Series Platform Maintenance and Troubleshooting course in the Education Services database. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 46 Now you will learn about the Software tab with Unisphere Service Manager. It includes the wizards used to download system upgrade files and packages, disk firmware upgrades, and other tools to service a VNX system and/or other legacy systems. As a service tool, USM also has the capability to update the VNX system. Under the Software tab, there are three options: System Software, Disk Firmware, and Downloads. The Disk Firmware opens Online Disk Firmware Wizard. If an update is available and you do not have it in the repository, the wizard will allow you to do the update via the Install Update option. Note: This option requires that the management station is connected to the internet. If an update is available and you do have it in the repository, you can select Install from Local Repository. Whether you select to update the disk from the repository or from online, the disks will be updated online with little to minimal impact to an active environment. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 47 The second option under the Software tab is Download. With the Download button, and a valid Powerlink account, USM can download packages for the VNX system. Upon selecting the Download button, you have the option to do one of two things: (1) Download VNX Software Updates or (2) Download Disk Firmware Packages. The Download Software Updates lets you download all the software necessary for the VNX system to the local repository. The list of software available includes Unisphere Client, Unisphere Initialization Tool, Navisphere Secure CLI. The Download Disk Firmware Packages lets you download disk firmware packages for later use. It does not let you update disk firmware; you must use the Disk Firmware option from under the Software tab. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 48 From time to time, every system needs to be updated. The VNX system is no different, and USM plays a major part in the process. As the only GUI-based tool to service the VNX system, USM’s System Software is the only tool to update all types of the VNX system. Besides updating the VNX system, it can also be used to install additional enablers on the system using the Install Software option. Alternatively, there are separate command methods for updating the VNX for block or file system software; however USM is the recommended tool for updating all models and types of the VNX Series family. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 49 Once you select Install Software, this window will pop up. You need to verify that you are logged onto the system you intend to. Then you need to specify what type of VNX system you are updating: VNX for file, VNX for block or Install VNX OE (both). If you select to install both, the system will update VNX for file first, then VNX for block. If you select VNX for block, you will need to check with the EMC Support Matrix for compatibility first. Although you are given the option to select either VNX for file or VNX for block, you should never update VNX for block before updating VNX for file. This could impact the environment greatly including the Control Station losing access to the Storage Processors. The best approach is always to update VNX for file first or check with the VNX Procedure Generator before proceeding with the update. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 50 Now you will learn about the Diagnostic tab within Unisphere Service Manager. It include the steps to collect diagnostics data including SPCollects and log collection. USM’s Diagnostic tab helps you keep an eye on the system’s status by checking the functionality of every Field Replaceable Unit (FRU) and Customer Replaceable Unit (CRU), and logs. The Verify Storage System button also checks backend functionality to determine whether the system is not functioning normally. Once the verification is completed, most faults can be taken care of right on the screen. The Verify Storage System button helps you determine the system’s operation status with result on the screen, but if you want to gather information about the system including SPCollects and log collection from the Control Station you need the Capture Diagnostic Data button. This button would open the Diagnostic Data Capture wizard. The wizard’s purpose is to initiate a process to capture the SPCollect process on each SP and log collection process from the Control Station. Once they are captured, the wizard transfers them to the local repository or a location of your choice. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 51 This lesson covers VNX utilities and host utilities used to gather data. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 52 The Control Station CLI commands allow management, configuration, and monitoring of the VNX system. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 53 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 54 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 55 These are the categories of available tools in the VNX space. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 56 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 57 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 58 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 59 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 60 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 61 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 62 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 63 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 64 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 65 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 66 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 67 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 68 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 69 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 70 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 71 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 72 This slide shows the workflow in a typical VNX sales and deployment cycle. Specific tools are listed in the phases where they are used. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 73 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 74 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 75 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 76 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 77 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 78 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 79 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 80 This lesson covers VNX utilities and host utilities used to gather data. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 81 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 82 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 83 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 84 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 85 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 86 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 87 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 88 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 89 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 90 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 91 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 92 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 93 The following slide series will step through the setup of perfmon, and the display of various performance parameters. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 94 This shows the menu structure for Perfmon. Most of the management will take place via the tool buttons. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 95 The first group of tool buttons are displayed here. From left to right they are: • • • • • • • • • • • New Counter Set – remove all counters, and start over Clear Display – clear current chart, and restart the display View Current Activity – view real-time information View Log Data – get performance information from a log file View Graph – view data in chart form (usually the most useful) View Histogram – view data in histogram form View Report – view data as a text report The next group of buttons are: Add – add a counter to the current set Delete – remove a counter from the current set Highlight – highlight the line or bar for a specific counter Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 96 The final group of buttons are: • • • • • • Copy Properties – copy counter data to the Clipboard Paste Counter List – paste data from the Clipboard into another instance of Perfmon Properties – view properties of this instance of Perfmon Freeze Display – stop updates to the display Update Data – restart updates to the display Help – show Perfmon help Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 97 Here is a perfmon graph view, with several counters chosen for each of 2 physical disks. Note that one of the counters, Disk Bytes/sec on PhysicalDisk 19, has been highlighted by clicking the Highlight tool button. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 98 Select the system to gather information from (normally the local host), the objects to be monitored, and the counters to use for those objects. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 99 The new log is displayed in the counter logs container. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 100 The esxtop utility allows monitoring of ESX Server performance. The result of the monitoring may be sent to a file, but is often used in real-time, interactive mode, as discussed here. Other modes are: Batch mode, which allows output to be captured to a file, and Replay mode, which allows replaying (viewing) of previously captured performance information. Disk parameters are likely to be of more interest than the others, particularly CPU and NIC parameters. Memory use can have an effect on disk performance. Shortage of memory will cause the swap area on disk to be used. The options that allow manipulation of displayed fields are useful; there are often too many parameters to display on an 80-column display, and some of the parameters displayed on the screen by default are less important than those in hidden columns. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 101 The CPU display page shows CPU usage for various components of the ESX Server environment, including the VMs. By pressing ‘e’ to expand the display, more detail is shown for components of interest. The VM IDs have been highlighted in the display; these IDs are useful when interpreting the disk display shown next. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 102 This slide shows the unexpanded display for the storage, or disk, statistics. Note that the display shows HBAs, and does not differentiate between internal and external HBAs. In this slide, vmhba0 is the internal HBA, while vmhba1 and vmhba2 are FC HBAs attached to a CLARiiON. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 103 The display has been expanded by means of the c, t and l options to show more detail for controller, target and LUN. LUN 0 (the HLU is 0) is a shared VMFS LUN, used by all 3 VMs. The VM ID shows how each VM is using the LUN. Other LUNs have not been expanded. This view allows a finer granularity than that used by the CLARiiON; events are allocated to the VM that caused them. A comparison of this data and the data generated by Navisphere Analyzer will help clarify where potential performance issues exist. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 104 While memory affects the overall performance of the ESX Server, it has very little direct effect on the performance of the storage subsystem. A high incidence of swapping may cause a path or SP port to be overloaded if the swap file is located on SAN storage. The 3 VMs displayed here are running disk tests on CLARiiON LUNs; though the amount of I/O generated is reasonably high, the memory use of the VMs is low. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 105 This display shows the network information for the ESX Server. It is broken down by NIC (vmnic0 is the only active NIC here), by virtual NIC (vswif0) and by individual VM. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 106 The performance monitoring that can be performed from the VI Client displays a reasonable amount of detail about all aspects of ESX Server and VM performance. The disk monitoring includes bandwidth, throughput, bus resets and aborted commands. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 107 The Performance Chart displayed by VI Client allows display of CPU, Disk, Memory, Network and System performance parameters. Charts may be displayed as line graphs, stacked graphs or stacked graphs per VM as shown in following slides. A number of different performance counters may be displayed; for the disk subsystem they include bandwidth and throughput counters. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 108 The information presented here is similar to that seen in Navisphere Analyzer. Note, though, that some of the values shown are for a polling interval (20s default), and are not displayed as ‘per second’ values. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 109 This example shows that stacked chart option. Only one performance parameter may be displayed, though multiple objects can be chosen, as shown here. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 110 This view displays parameters as they relate to the individual VMs, rather than being system-wide. Only one performance parameter may be displayed on a chart. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 111 This lesson covers tools used to validate the VNX environment. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 112 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 113 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 114 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 115 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 116 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 117 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 118 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 119 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 120 This example of an Release Note (RN) is for FAST Cache on VNX systems running VNX OE for Block 5.31. Note the sections in the list of topics – these are common to many RNs. Useful information includes a product description, known problems, fixed problems, and troubleshooting. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 121 An example product description from a Release Note. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 122 Questions 1 to 3 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 123 Questions 1 to 3 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 124 Questions 4 and 5 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 125 Questions 4 and 5 Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 126 This module covered tools used to gather data, and tools used to validate the environment. Copyright © 2012 EMC Corporation. All rights reserved Module 2: Tools 127 This module focuses on analyzing collected data. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 1 This lesson covers analyzing data collected from a customer environment. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 2 Capacity analysis is typically simpler than performance analysis. If the VNX system is replacing another storage system, or replacing local storage on production servers, then the storage requirement is already known. For a new installation, the sizes of databases, email inboxes, etc can usually be estimated fairly well. The level of protection must be chosen carefully, and this will affect the number of disks for a given capacity. If growth is expected, Virtual Provisioning should be considered. Expansion of LUNs is direct and simple with Pool LUNs; metaLUNs may be used if more predictable performance is desired, but may be too restrictive since only traditional LUNs may be used for metaLUNs. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 3 A Unisphere performance analysis is easy if the Survey is used as the starting point; LUNs with potential issues are flagged, and are easy to identify. More information may be obtained by looking at SP, LUN and disk parameters in the Performance Detail view. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 4 The Survey view identifies LUNs with Utilization, Response Time or Queue Length which exceed the predetermined thresholds. Note that Bandwidth and Throughput are not useful here. Any LUNs with one or more red borders around a pane need to be investigated. Expanding the LUN in the Performance Detail view will show the disks on which the LUN was created, as well as the SP that owns the LUN. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 5 We need to know if cache is enabled; if it isn’t, the performance we expect will be much lower than if it was. In some cases, especially if cache has been enabled or disabled during the capture interval, the checkbox may be empty and grayed out. In that case, further investigation may be required to determine whether or not the caches are enabled. As an example, if we see write cache hits for a LUN, the cache is enabled. Note that the absence of write cache hits does not, by itself, indicate that write cache is disabled. SP utilization consistently higher than 70% may indicate a performance problem, and will almost certainly contra-indicate the addition of VNX replication software. We also need to look at the dirty pages to get an idea of write cache performance. Ideally, dirty pages should remain within the watermarks, with few excursions above the High Watermark, and few or none to the 100% mark. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 6 Trespassed LUNs are an indication that there may be a hardware problem of some sort. Other LUNs on the same RAID Group should be checked to see if they are trespassed as well. Other LUN performance attributes should also be checked – the forced flush rate, read and write rate and I/O size, and the measure of burstiness of the I/O. Based on what we know about the access pattern, we can set expectations for the performance of the LUN involved. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 7 We look at the disks in much the same way as we look at the LUNs – utilization is important, and usually much more so than LUN utilization. We also need to look at the throughput and bandwidth of the individual disks, and compare with our rule of thumb values. Disk service time and queue length will determine disk response time, while average seek distance may give us an idea of how random the I/O is. Comparing disk and LUN I/O sizes will show if coalescing is taking place – a sign that at least some I/O is sequential. In the case of RAID 1/0, we need to compare the I/O rate of the primary and secondary disks in a pair; they will not always show identical workloads, but will often be fairly similar. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 8 If our predictions and reality don’t match, there are several possible reasons. Those need to be investigated more fully. This first slide mentions some reasons why random I/O workloads may not perform as expected. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 9 On this slide we look at a few reasons why sequential workloads may not perform as expected. We also need to verify whether or not the workload is balanced across all available resources. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 10 This lesson covers analyzing data collected from a customer environment. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 11 This slide introduces a scenario used to illustrate the analysis process. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 12 The Performance Survey is the view that opens by default when an archive is opened. We can check for red flags to see which LUNs are potential problems. LUNs 600 and 601 are both flagged. Each shows Utilization over 70% in the latter part of the capture, and each shows response times consistently over 20 ms during the latter portion of the capture. Because the LUNs look the same, we’ll concentrate on only one of them – LUN 600. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 13 We note that LUNs 600 and 601 are on the same RAID Group, and that both are owned by SPB. As noted previously, the performance of the LUNs is identical. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 14 SPB Utilization shows a change at around the 18:00 mark, but is consistently below 50% for the entire period under observation. The watermarks are set to 49%/70%, and dirty pages fall inside these limits at all times. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 15 Read cache and write cache are enabled. Read cache size is probably adequate for this environment – further investigation will show whether or not it should be increased. Write cache size is 2048 MiB, and could be increased for this system. Page size is at the default value. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 16 LUN Properties, when viewed from Analyzer, may not always accurately reflect the state of the LUN at any given time. In the example shown, the LUN has read and write cache enabled, but Analyzer does not reflect this. You make need to look at read and write cache hit ratios to determine if cache is enabled. The LUN is not trespassed, and all prefetch settings are at their default values. No problems with the LUNs can be identified here. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 17 LUN bandwidth is low. This could be because the LUN is not heavily utilized, or because I/O sizes are small. We’ll need to look at throughput, and I/O sizes, to check. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 18 Throughput appears to be moderate for a 6-disk R6 LUN (though, of course, other LUNs are also present on the RAID Group). We’ll check for forced flushes, and also check prefetch behavior. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 19 If write cache is enabled for the LUN, then the lack of forced flushes indicates that caching is working well for this LUN, and that the cache subsystem has adequate headroom. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 20 The lack of read cache hits may indicate that reads are very random, that prefetching is not being triggered, or that read cache is turned off. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 21 Though this slide shows only a subset of the available read and write sizes, all were checked. Only the 512 B size shows any activity at all. These small I/Os will not bypass write cache, and if they are random, as seems to be the case, will not cause disk coalescing. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 22 Average seek distance for the initial period under observation (up to 18:00 or so) is fairly low, at 5 GiB or so (disks are 300 GB FC). After that, the average seek distance changes to around 23 GiB, still moderately low. Disk utilization varies between 50 and 65 percent, and is fairly consistent for the entire interval. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 23 Disk and LUN write sizes are identical until the 18:00 mark, after which disk write sizes increase fourfold. The I/O access pattern to LUNs 600 and 601 did not change over the entire period of observation, so some other factor is causing the increased disk write sizes. Additional LUNs use this set of disks, and their access patterns will need to be investigated. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 24 Response times are reasonable; service time, not shown, is consistent. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 25 The disks are running at over 50% utilization. They will allow additional load, but not a doubling of the workload. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 26 Here are suggestions to improve the performance of the system. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 27 The Survey configuration should use reasonable values; the Utilization and Queue Length values are the recommended ones. The Response Time setting is more dependant on the specific environment – 30 ms may be reasonable in environments that require low latency. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 28 This example shows the queue length over a period of 10 polling intervals. Based on the curve shown, what are the values for average queue length, average busy queue length, and utilization? Average queue length = ( 6 + 0 + 0 + 4 + 4 + 4 + 0 + 0 + 0 + 0 ) / 10 = 1.8 Average busy queue length = ( 6 + 4 + 4 + 4 ) / 4 = 4.5 Utilization = ( 1 + 0 + 0 + 1 + 1 + 1 + 0 + 0 + 0 + 0 ) / 10 = 0.4 = 40% Note what the calculated values tell you – the Queue Length is a reasonable measure of how busy this object is. The Average Busy Queue Length is calculated only from polling intervals where the queue value is 1 or more – empty points are ignored. The ABQL is therefore a measure of the burstiness of the I/O. Utilization is calculated by giving any polling interval where there is at least one I/O in the queue a value of 1, and intervals where the queue is empty a value of 0. Note that, as in the slide, where I/O is very bursty, the utilization may be displayed as higher than the actual ‘busy’ time for the object. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 29 This lesson covers analyzing data collected from a customer environment. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 30 Next we’ll look at performance analysis on file-based storage. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 31 In order to demonstrate how the server_stats tool can be used when troubleshooting a performance issue, we will look at a case study regarding a certain business process. This business process has been having throughput issues and the administrator would like to increase throughput by at least 20%. Most performance issues can be summarized by four questions: 1. What am I getting? 2. Is that what I should be getting? 3. If not, why not? 4. What, if anything, can I do about it? Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 32 The first step in any performance analysis is to find out what type of workload your environment or application is generating. We need to find out the operation rate (IOPS), I/O size, and access pattern (read or write). Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 33 Let‘s begin by looking at what the host application is sending the VNX. From the cifs-std and nfs-std compound stats we can see that the host is only doing NFS transactions. The workload from the Host looks small - 8KiB random and 100% write oriented with about 20,000 write IOPS. Some columns in the above output are omitted. To get the full output consider running the command for CIFS and NFS in separate windows or redirect output into a .csv format. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 34 Next, we will use the nfsOps-std statPath to determine the response time for NFS I/Os. The command displays a response time of 6.5 to 7 ms, which is a little bit higher than expected but not a real bottleneck. Average response time for a drive should be between 2 and 5.6 ms depending on the drive’s rotational speed. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 35 With the diskvolume-std stat we get statistics on the dVols. Here we are able to verify if the dVols are well balanced. In this scenario, the write I/O rates are a bit high and dVols are not as well balanced as they should. For these dVols to be well balanced, the write IOPS should be in the 800‘s to 900‘s. Some of them are in 1000‘s, which is not a significant difference. There would be a problem if some of these write IOPS were double in size, or in the 1600‘s. If that was the case, that would be a good indicative that AVM was not used in the creation of the file system. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 36 To find out which LUNs are being used by the file system, run the nas_fs -i to determine which dVols are being used, and then run the nas_disk -i to find out the corresponding LUN on the back-end. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 37 With the naviseccli getlun command we are able to verify the I/O size being used to send data to the LUN. To calculate the I/O size simply divide the blocks written value by the write requests number. In this case, 16 blocks are being written per write request, which translates to 8KiB I/O size. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 38 Shown here is the I/O size being used to send data to disk. To calculate I/O size to disk, divide the amount of Kbytes written by the amount of write requests. For this example, the disks are seeing about 32 KiB sized I/Os. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 39 By using server_stats, we came up with the following results; the host application is sending data in 8KiB I/Os and the dVols and LUNs are both seeing 8KiB I/Os as well. The only difference is with the disk which is getting 32KiB I/Os. The reason being is that the data leaving the host was logically contiguous in 32KiB increments. Once this data reaches the storage system cache, the 8KiB I/Os are coalesced in 32KiB I/Os and sent to disk. An idea would be to have the host send data in 32KiB I/Os instead of 8KiB I/Os to improve efficiency. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 40 Looking at the file system from the host’s perspective we find out that the application file system is mounted with rsize and wsize of 8KiB. The rsize and wsize parameters cause the NFS client to try to negotiate a buffer size up to the size specified. A large buffer size does improve performance, but both the server and client have to support it. In the case where one of these does not support the size specified, the size negotiated will be the largest that both support. In this example, the largest I/O size that the host would be able to use is only 8KiB. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 41 Needless to say, we can’t change the I/O size from the NAS front-end to the Block back-end (dVol or LUN), but we can change the host I/O size. Here is the command to mount the file system with rsize and wsize of 32KiB. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 42 Once we changed the mount read and write size on the host, we were able to see an increase in throughput and a decrease in IOPS, making the business process more efficient in transferring data. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 43 HAVT (High Availability Verification Test) will report deficiencies in the HA setup for a host. Because ESXi has no service console, the test is run from an external host – a Windows host. SAN zoning should be checked for path redundancy. The host failover software should also be checked, and ALUA supported configuration used where appropriate. Systems with file-level access allow redundancy in the LAN configuration, and this should be verified. Aggregation protocols allow for failover and load balancing; this configuration should also be checked. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 44 The choice of RAID type is important for performance and availability; a compromise may have to be made, especially when pools are used. The backend port (strictly not a bus in the SAS environment) selection should be made to maximize availability, especially for FAST Cache drives. Bear in mind that if a RAID Group has all disks but one in the Vault enclosure, the disk could be marked for rebuilding when power is removed. Ensure that hot spares of the correct speed, size and disk type are available. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 45 Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 46 Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 47 Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 48 Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 49 This module covered analysis of data collected from the environment. Copyright © 2012 EMC Corporation. All rights reserved Module 3: Analysis 50 This module focuses on design best practices for VNX systems in unified environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 1 This lesson covers the introduction to storage design, specifically the terminology used and environmental considerations. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 2 The storage system does not operate alone; other devices in the environment will play an important role as well. A designer will need to know the abilities and limitations of those devices. On occasion the goals for the environment will conflict, and a measure of compromise will be required. An example is the potential conflict between performance and availability; in the choice of RAID types, for example, RAID 6 would be the optimal choice for availability, whereas RAID 1/0 may be the best choice for performance in the specific environment. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 3 Units used in the design of storage environments can be confusing. In the SI system of measurement, multipliers are decimal, and mega, for example, means 10^6, or 1,000,000. Some measurements used in IT, though, are based on the binary equivalents, which are somewhat larger than the decimal units. Note, for example, that at the TB/TiB level the binary unit is almost 10% larger than the decimal unit. Disk manufacturers specify disk sizes in decimal units, but use a sector size of 512 bytes (a binary value) when discussing formatted sizes. The VNX Block systems, and their CLARiiON predecessors, use 520 byte sectors, and this must be taken into account as well. Note that cache sizes, LUN sizes and file system sizes are specified in binary units. The binary units are a relatively new standard, established by the IEC in 2000. The standard has been accepted by all major standards organizations including the IEEE and NIST. See the Wikipedia article at http://en.wikipedia.org/wiki/Binary_prefix for more detail. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 4 This slide shows the terminology used to specify the amount of space used by Pools. Note that some terms have changed in the 05.32 VNX OE for Block code release. Because Thin LUNs allow oversubscription, and because VNX Snapshots are thinly provisioned, the consumed capacity and user capacity may not be the same for the Pool. In the Physical Capacity pane, the system reports: Total - Total usable space in the Pool (disk space minus RAID overhead) Free - Total amount of available space in the Pool Percent Full - Percentage of consumed Pool user capacity Total Allocation (visible when VNX Snapshot enabler is installed) - Amount of space in the pool allocated for all data Snapshot Allocation (visible when VNX Snapshot enabler is installed) - Amount of space in the pool allocated to VNX Snapshot LUNs In the Virtual Capacity pane, the system reports: Total Subscription - Total amount of LUN user capacity configured in the pool and (potentially) presented to attached hosts Snapshot Subscription (visible when VNX Snapshot enabler is installed) - Total potential size required by VNX Snapshots if their primary data was completely overwritten, plus the size of any writable (attached) VNX Snapshots Percent Subscribed - Percentage of Pool capacity that has been assigned to LUNs. This includes primary and snapshot capacity Oversubscribed By - Amount of subscribed capacity that exceeds the usable capacity of the pool. If this value is less than or equal to zero, the option is grayed out Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 5 This slide shows the LUN types currently available on VNX systems, and looks at some of the terminology used to specify the amount of space used by those LUNs. Thick and Thin LUNs both use a mapping mechanism to keep track of data, and are sometimes referred to as Mapped LUNs, or MLUs. The driver for this LUN type may be described as the MLU driver in documentation and White Papers. Because Thin LUNs allow oversubscription, the consumed capacity and user capacity may not be the same for the LUN. Thick LUNs have space allocated at creation time, and use additional space for metadata, so their consumed capacity will always be higher than their user capacity, though consumed capacity will not be displayed in the LUN Properties dialog.. RAID Groups allocate space to LUNs at LUN creation time, so the LUN size and the consumed capacity will always be identical (or within 1 stripe of identical). Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 6 Storage system performance, and planning for that performance, depends on the sizes of the I/O used in the environment. In VNX Block environments, I/O sizes up to 16 KiB in size are regarded as small, while I/O sizes above 64 KiB are regarded as large. Various terms exist to describe I/O access patterns; I/O can generally be classified as either random or sequential. Some I/O patterns are single-threaded, where one operation will finish before the next begins; in other cases, multiple operations may be active simultaneously, and this pattern is therefore described as multi-threaded, or as exhibiting concurrent I/O. In some cases, I/O accesses are made to areas of the data surface which are very close to each other, and this nearness is described as locality, or more correctly spatial locality. Spatial locality refers to disk addresses, usually described in terms of logical block addresses, or LBAs. Temporal locality refers to nearness of access on a time basis, and is important when dealing with, for example, FAST VP. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 7 VNX Block systems have OE code which is designed to optimize the use of disk storage. These optimizations will typically try to make disk accesses as sequential as possible, and will try to make I/O sizes as large as possible. Other features deal with the operation of cache. Coalescing refers to the grouping of smaller writes with contiguous LBAs into a larger write. This will be especially efficient if the writes can fill a stripe in parity-protected RAID environments; a fullstripe write, sometimes also known as an MR3 write, will then occur. The RAID write penalty associated with small-block parity-protected writes does not apply to full-stripe writes. As an example, the small block write penalty for a 4+1 R5 LUN is 4, whereas the full-stripe write penalty for the same LUN is 1.25 – even less than R1/0. Where reads are concerned, multiple contiguous accesses will cause prefetching to be triggered if it is enabled. Cache terminology includes destaging or flushing, the regular copying of data from cache to disk. This activity is controlled by the watermarks, HWM and LWM, and by the idle flushing configuration. Dumping of cache to the vault only occurs on failures or power down of the storage system. Disk crossings, reported by Unisphere Analyzer, occur when a single host I/O touches 2 or more physical disks. This will always occur for I/O of > 64 KiB; for smaller I/Os, it may be an indication that data is misaligned. Data alignment will be discussed later. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 8 FAST Cache and FAST VP are topics which will be covered in more detail later. The terminology is often seen elsewhere, though, so is mentioned here. An application may have a vast amount of data storage allocated to it, but may be actively using only a small portion of that data space. The portion in use is the active data, or the working data set. Having some parts of the data more active than others is the basis of skew, which both FAST Cache and FAST VP rely on for their operation. Skew is discussed more fully in a later slide. FAST Cache and FAST VP copy or move data, and those data movements are known as promotions, write-backs, or relocations. FAST Cache and FAST VP take some time to gather statistics and perform the data movement; in FAST Cache, this time is known as the warm-up time. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 9 This lesson covers storage design best practices in block-only environments. Note, though, that many of these best practices will also apply to file-only and unified environments due to the relationship between the file front-end and the block back-end. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 10 Some general best practices for all design activity are shown here. Be familiar with the documentation for the systems and software. Documentation will describe features and limitations that are important to bear in mind. Also, make sure to use the latest supported code, and verify code interoperability across systems. Understanding the workload is a very important part of the design process. Different applications have vastly different access patterns and I/O sizes; look at the vendor documentation and white papers to get more information. Remember also that with disk sizes increasing while I/O capability remains about the same, designing for performance first and then looking at capacity is the accepted practice. In many cases, storage system (and host, etc) default values will perform well over a wide range of environments. Changing these values may improve performance in some cases, but can easily lead to performance degradation. Be careful when moving away from default values, and understand the consequences of your actions. VNX storage systems generate informational, warning, and alert messages of various kinds when potential issues exist. Note these messages carefully, and work speedily to resolve the issues. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 11 Hosts connected to VNX systems benefit from multipathing. Direct-attach multipathing requires at least two HBAs for attachment to the 2 SPs; four HBAs allow failover with a lessened chance of LUNs needing to trespass. SAN multipathing also requires at least two HBAs, with each HBA zoned to more than one SP port. Though a multiport HBA may be used to provide 2 HBA ports, and therefore may simplify failover, the HBA may become a single point of failure. The advantages of multipathing are: • • • Failover from port to port on the same SP, maintaining an even system load and minimizing LUN trespassing Port load balancing across SP ports and host HBAs Higher bandwidth attach from host to storage system (assuming the host has as many HBAs as paths used) While PowerPath offers failover and load balancing across all available active paths, this comes at some cost: • • • Some host CPU resources are used during both normal operations and failover Every active and passive path from the host requires an initiator record; VNX systems allow only a finite number of initiators Active paths increase time to fail over in some situations. (PowerPath tries several paths before trespassing a LUN from one SP to the other.) Microsoft Multi-Path I/O (MPIO) as implemented by MS Windows Server versions provides a similar, but more limited, multipathing capability than PowerPath. Features found in MPIO include failover, failback, Round Robin Pathing, weighted Pathing, and I/O Queue Depth management. Consult the Microsoft documentation for information on MPIO features and implementation. Linux MPIO is implemented by Device Mapper (dm). Multipathing capability is similar to PowerPath, though more limited. The MPIO features found in Device Mapper are dependent on the Linux release and the revision. Review the Native Multipath Failover Based on DM-MPIO for v2.6.x Linux Kernel and EMC Storage Arrays Configuration Guide available on Powerlink for more details. Vmware Native Multi-Pathing (NMP) allows failover similar to that of PowerPath, with very limited load balancing. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 12 PowerPath versions 5.1 and later are ALUA-compliant releases. PowerPath load balances across optimized paths, and only uses non-optimized paths if all the optimized paths have failed. For example, if an optimized path to the original owning SP fails, I/O will be sent across remaining optimized paths. If all optimized paths fail, for example as the result of a storage processor failure, I/O is sent to the peer SP. If the I/O count to the non-optimized path exceeds a preset value, PowerPath initiates a trespass to change LUN ownership. The nonoptimized paths then become the optimized paths, and the optimized paths become the nonoptimized paths. Not all multipathing applications or revisions are ALUA compliant; verify that your native hostbased failover application can interoperate with ALUA. When configuring PowerPath on hosts that can use ALUA, the default storage system failover mode is Failover Mode 4. This configures the VNX for asymmetric Active/Active operation. This has the advantage of allowing I/O to be sent to a LUN regardless of LUN ownership. Details on the separate failover modes 1 through 4 can be found in the EMC CLARiiON Asymmetric Active/Active Feature — A Detailed Review white paper, available on Powerlink. To take advantage of ALUA features, the host operating system also needs to be ALUA-compliant. Several operating systems support native failover with Active/Passive (A/P) controllers. However, there are exceptions. Refer to the appropriate support guide for O/S support. For example, ALUA supported Linux operating systems would be found in the EMC® Host Connectivity Guide for Linux. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 13 High availability requires at least two HBA connections to provide redundant paths to the SAN or storage system. It is a best practice to have redundant HBAs. Using more than one single-port HBA enables port- and pathfailure isolation, and may provide performance benefits. Using a multiport HBA provides a component cost savings and efficient port management. Multiport HBAs are useful for hosts with few available I/O bus slots, but represent a single point of failure for several ports. With a single-ported HBA, a failure would affect only one port. HBAs should also be placed on separate host buses for performance and availability. This may not be possible on hosts that have a single bus or a limited number of bus slots. In this case, multiport HBAs are the only option. Always use an HBA that equals or exceeds the bandwidth of the storage network, e.g. do not use 2 Gb/s or slower HBAs for connections to 4 Gb/s SANs. FC SANs reduce the speed of the network path to the HBA’s speed either as far as the first connected switch, or to the storage system’s front-end port if directly connected. This may cause a bottleneck when the intention is to optimize bandwidth. Finally, using the most current HBA firmware and driver from the manufacturer is always recommended. This software may be found, and should be downloaded from, the vendor (EMC) area of the HBA manufacturer’s website. The Unified Procedure Generator (installation available through Powerlink) provides instructions and the configuration settings for HBAs specific to your storage system. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 14 iSCSI environments may make use of NICs, TOE cards or iSCSI HBAs. The differences include cost, host CPU utilization, and features such as security. The same server cannot use NICs and HBAs to connect to the same VNX storage system. NICs are the typical way of connecting a host to an Ethernet network, and are supported by software iSCSI initiators. Ethernet networks will auto-negotiate down to the lowest common device speed; a slower NIC may bottleneck the storage network’s bandwidth. Do not use legacy 10/100 Mb/s NICs for iSCSI connections to 1 Gb/s or higher Ethernet networks. A TCP Offload Engine (TOE) NIC is a faster type of NIC. A TOE has on-board processors that offload TCP packet segmentation, checksum calculations, and optionally IPSec from the host CPU to themselves. This allows the host CPU(s) to be used exclusively for application processing. Redundant NICs, iSCSI HBAs, and TOEs should be used for availability. NICs may be either single or multiported. A host with a multiported NIC or more than one NIC is called a multihomed host. Typically, each NIC or NIC port is configured to be on a separate subnet. Ideally, when more than one NIC is provisioned, they should also be placed on separate host buses. Note this may not be possible on smaller hosts having a single bus or a limited number of bus slots, or when the onboard host NIC is used. All NICs do not have the same level of performance. This is particularly true of host motherboard NICs, 10 Gb/s NICs, and 10 Gb/s HBAs. For the most up-to-date compatibility information, check the E-Lab Interoperability Navigator at: http://elabnavigator.EMC.com. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 15 At least two paths between the hosts and the storage system are required for high availability. Ideally, the cabling for these paths should be physically separated. In addition paths should be handled by separate switching, if not directly connecting hosts and storage systems. This includes redundant, separate HBAs, and attachment to both of the storage system’s storage processors. Path management software such as PowerPath and dynamic multipathing software on hosts (to enable failover to alternate paths and load balancing) are recommended. For device fan-in, connect low-bandwidth devices such as tape, and low utilized and older, slower hosts to edge switches or director blades. Contact an EMC USPEED Professional (or your EMC Sales representative if a partner) for assistance with FCoE performance. For additional information on FCoE, see the Fibre Channel over Ethernet (FCoE) TechBook available on Powerlink. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 16 iSCSI SANs on Ethernet do not have the same reliability and built-in protocol availability as Fibre Channel SANs; advantages are that they handle longer transmission distances and are less expensive to setup and maintain. If you require the highest availability, for a SAN under 500m (1640 ft.) a Fibre Channel SAN is recommended. Note that the number of VLANs that may be active per iSCSI port is dependent on the LAN’s bandwidth. A 10 GigE network can support a greater number. Ideally, separate Ethernet networks should be created to ensure redundant communications between hosts and storage systems. The cabling for the networks should be physically as widely separated as is practical. In addition, paths should be handled by separate switching, direct connections are not used. If you do not use a dedicated storage network, iSCSI traffic should be either separated onto separate LAN segments, or a virtual LAN (VLAN). VLANs allow the creation of multiple virtual LANs, as opposed to multiple physical LANs in your Ethernet infrastructure. This allows more than one logical network to share the same physical network while maintaining separation of the data. Ethernet connections to the storage system should use separate subnets depending on if they are workload or storage system management related. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 17 Separate the storage processor management 10/100 Mb/s ports into separate subnets from the iSCSI front-end network ports. It is also prudent to separate the front-end iSCSI ports of each storage processor onto a separate subnet. Do this by placing each port from SPA on a different subnet. Place the corresponding ports from SPB on the same set of subnets. The 10.x.x.x or 172.16.0.0 through 172.31.255.255 private network addresses are completely available. For example, a typical configuration for the iSCSI ports on a storage system, with two iSCSI ports per SP would be: A0: 10.168.10.10 (Subnet mask 255.255.255.0; Gateway 10.168.10.1) A1: 10.168.11.10 (Subnet mask 255.255.255.0; Gateway 10.168.11.1) B0: 10.168.10.11 (Subnet mask 255.255.255.0; Gateway 10.168.10.1) B1: 10.168.11.11 (Subnet mask 255.255.255.0; Gateway 10.168.11.1) A host with two NICs should have its connections configured similar to the following in the iSCSI initiator to allow for load balancing and failover: NIC1 (for example, 10.168.10.180) - SP A0 and SP B0 iSCSI connections NIC2 (for example, 10.168.11.180) - SP A1 and SP B1 iSCSI connections Note that 128.221.0.0/16 should never be used because the management service ports are hard configured for this subnet. There is also a restriction on 192.168.0.0/16 subnets. This has to do with the configuration of the PPP ports. The only restricted addresses are 192.168.1.1 and 192.168.1.2 and the rest of the 192.168.x.x address space are usable with no problems. For more information about VLANs and VLAN tagging, please refer to the VLAN Tagging and Routing on EMC CLARiiON white paper available on Powerlink. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 18 Availability refers to the storage system‘s ability to provide user access to data in the case of a hardware or software fault. Midrange systems like the VNX-series are classified as highly available because they provide access to data without any single point-of-failure. Performance in degraded mode is typically lower than during normal operation. The following configuration settings can improve performance under degraded mode scenarios. Single DAE Provisioning is the practice of restricting the placement of a RAID group within a single enclosure. This is sometimes called horizontal provisioning. Single DAE provisioning is the default method of provisioning RAID groups, and, because of its convenience and High Availability attributes, is the most commonly used method. In Multiple DAE Provisioning, two or more enclosures are used. An example of multiple DAE provisioning requirement is where drives are selected from one or more additional DAEs because there are not enough drives remaining in one enclosure to fully configure a desired RAID Group. Another example is SAS back-end port balancing. The resulting configuration may or may not span back-end ports depending on the storage system model and the drive-to-enclosure placement. An LCC connects the drives in a DAE to one SP’s SAS back-end port; the peer LCC connects the DAE’s drives to the peer SP. In a single DAE LCC failure, the peer storage processor still has access to all the drives in the DAE, and RAID group rebuilds are avoided. The storage system automatically uses its lower director capability to re-route around the failed LCC and through the peer SP. The peer SP experiences an increase in its bus loading while this redirection is in-use. The storage system is in a degraded state until the failed LCC is replaced. When direct connectivity is restored between the owning SP and its LUNs, data integrity is maintained by a background verify (BV) operation. Request forwarding’s advantages of data protection and availability result in a recommendation to horizontally provision. In addition, note that horizontal provisioning requires less planning and labor. If vertical provisioning was used for compelling performance reasons, provision drives within RAID groups to take advantage of request forwarding. This is done as follows: RAID 5: At least two (2) drives per SAS back-end port in the same DAE. RAID 6: At least three (3) drives per back-end port in the same DAE. RAID 1/0: Both drives of a mirrored pair on separate backend ports. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 19 FAST Cache It is required that flash drives be provisioned as hot spares for FAST Cache drives. Hot sparing for FAST Cache works in a similar fashion to hot sparing for traditional LUNs made up of flash drives. However, the FAST Cache feature’s RAID 1 provisioning affects the result. If a FAST Cache Flash drive indicates potential failure, proactive hot sparing attempts to initiate a repair with a copy to an available flash drive hot spare before the actual failure. An outright failure results in a repair with a RAID group rebuild. If a flash drive hot spare is not available, then FAST Cache goes into degraded mode with the failed drive. In degraded mode, the cache page cleaning algorithm increases the rate of cleaning and the FAST Cache is read-only. A double failure within a FAST Cache RAID group may cause data loss. Note that double failures are extremely rare. Data loss will only occur if there are any dirty cache pages in the FAST cache at the moment both drives of the mirrored pair in the RAID group fail. It is possible that flash drive data can be recovered through a service diagnostics procedure. The first four drives, 0 through 3 in a DPE or in the DAE-OS of SPE-based VNX models are the system drives. The system drives may be referred to as the Vault drives. On SPE-based storage systems the DAE housing the system drives may be referenced as either DAE0 or DAE-OS. Only SAS drives may be provisioned as system drives on the VNX series. These drives contain files and files space needed for the: Saved write cache in the event of a failure Storage system’s operating system files Persistent Storage Manager (PSM) Operating Environment (OE) -- Configuration database The remaining capacity of system drives not used for system files can be used for user data. This is done by creating RAID Groups (Pools cannot use system drives) on this unused capacity. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 20 Hot spares are important, though not mandatory, for both RAID Groups and Pools. Note that the hot spare algorithm will first look for the smallest hot spare that can accommodate the used capacity on the failed or failing drive – this means that, under the right circumstances, a 300 GB drive could spare for a 600 GB drive, for example. RAID Groups or Pool tiers that use NL-SAS drives should be configured with RAID 6, especially in the case of larger RAID Groups. Note that the NL-SAS tier in Pools now has 2 different recommended configurations – 6+2 and 14+2. The latter configuration allows for a Private RAID Group capacity of over 25 TiB when using 2 TB disks, and rebuild times will be lengthy. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 21 RAID-level data protection All the LUNs bound within a Pool will suffer loss of availability, and may suffer data loss, from a complete failure of a Pool RAID group. The larger the number of private RAID groups within the pool, the bigger the effect of a failure. It is important to choose levels of protection for the Pool in line with the value of the Pool contents. Three levels of data protection are available for pools. RAID 5 has good data availability. If one drive of a private RAID group fails, no data is lost. RAID 5 is appropriate for small to moderate sized homogeneous Pools. It may also be used in small to large Pool tiers provisioned with SAS and Flash drives which have high availability. RAID 6 provides the highest data availability. With RAID 6, up to two drives may fail in a private RAID group and result in no data loss. Note that this is true double-disk failure protection. RAID 6 is appropriate for any size Pool or Pool tier, including the largest possible, and highly recommended for NL-SAS tiers. RAID 1/0 has high data availability. A single disk failure in a private RAID group results in no data loss. Multiple disk failures within a RAID group may be survived. However, a primary and its mirror cannot fail together, or data will be lost. Note, this is not double-disk failure protection. RAID 1/0 is appropriate for small to moderate sized Pools or Pool tiers. A user needs to determine whether the priority is: availability, performance, or capacity utilization. If the priority is availability, RAID 6 is the recommendation. Number of RAID groups A fault domain refers to data availability. A Pool is made-up of one or more private RAID groups. A Pool fault domain is a single Pool private RAID group. That is, the availability of a pool is the availability of any single private RAID group. Unless RAID 6 is the level of protection for the entire Pool, avoid creating Pools with a very large number of RAID groups. Rebuild Time and other MTTR functions A failure in a Pool-based architecture may affect a greater number of LUNs than in a Traditional LUN architecture. Quickly restoring RAID Groups from degraded mode to normal operation becomes important for the overall operation of the storage system. Always have hot spares of the appropriate type available. The action of proactive hot sparing will reduce the adverse performance effect a Rebuild would have on backend performance. In addition, always replace failed drives as quickly as possible to maintain the number of available hot spares. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 22 Avoiding iSCSI network congestion is the primary consideration for achieving iSCSI LAN performance. It is important to take into account network latency and the potential for port oversubscription when configuring your network. Network congestion is usually the result of an ill-suited network configuration or improper network settings. Ill-suited may be a legacy CAT5 cable in-use on a GigE link. Network settings include IP overhead and protocol configuration of the network’s elements. For example a common problem is a switch in the data path into the storage system that is fragmenting frames. As a minimum, the following recommendations should be reviewed to ensure the best performance. Simple network topology Both bandwidth and throughput rates are subject to network conditions and latency. It is common for network contentions, routing inefficiency, and errors in LAN and VLAN configuration to adversely affect iSCSI performance. It is important to profile and periodically monitor the network carrying iSCSI traffic to ensure the consistently high Ethernet network performance. In general, the simplest network topologies offer the best performance. Minimize the length of cable runs, and the number of cables, while still maintaining physically separated redundant connections between hosts and the storage system(s). Avoid routing iSCSI traffic as this will introduce latency. Ideally the host and the iSCSI front-end port are on the same subnet and there are no gateways defined on the iSCSI ports. If they are not on the same subnet, users should define static routes. This can be done per target or subnet using naviseccli connection –route. Latency can contribute substantially to iSCSI-based storage system’s performance. As the distance from the host to the storage system increases; a latency of about 1 millisecond per 200 kilometers (125 miles) is introduced. This latency has a noticeable effect on WANs supporting sequential I/O workloads. For example, a 40 MB/s 64 KB single stream would average 25 MB/s over a 200 km distance. EMC recommends increasing the number of streams to maintain the highest bandwidth with these long-distance, sequential I/O workloads. Bandwidth-balanced configuration A balanced bandwidth iSCSI configuration is when the host iSCSI initiator’s bandwidth is greater than or equal to the bandwidth of its connected storage system’s ports. Generally, configure each host NIC or HBA port to only two storage system ports (one per SP). One storage system port should be configured as active, and the other to standby. This avoids oversubscribing a host’s ports. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 23 Network settings Manually override auto-negotiation on the host NIC or HBA and network switches for the following settings. These settings improve flow control on the iSCSI network: • • • Jumbo frames Pause frames TCP Delayed ACK Jumbo frames Using jumbo frames can improve iSCSI network bandwidth by up to 50 percent. When supported by the network, we recommend using jumbo frames to increase bandwidth. Jumbo frames can contain more iSCSI commands and a larger iSCSI payload than normal frames without fragmenting or with less fragmenting depending on the payload size. On a standard Ethernet network the frame size is 1500 bytes. Jumbo frames allow packets configurable up to 9,000 bytes in length. The VNX series supports 4,000, 4,080, or 4,470 MTUs for its front-end iSCSI ports. It is not recommended to set your storage network for Jumbo frames to be any larger then these. If using jumbo frames, all switches and routers in the paths to the storage system must support and be capable of handling and configured for jumbo frames. For example, if the host and the storage system’s iSCSI ports can handle 4,470-byte frames, but an intervening switch can only handle 4,000 bytes, then the host and the storage system’s ports should be set to 4,000 bytes. Note that the File Data Mover has a different Jumbo frame MTU than the VNX front-end ports. The larger Data Mover frame setting should be used. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 24 Pause frames Pause frames are an optional flow-control feature that permits the host to temporarily stop all traffic from the storage system. Pause frames are intended to enable the host’s NIC or HBA, and the switch, to control the transmit rate. Due to the characteristic flow of iSCSI traffic, pause frames should be disabled on the iSCSI network used for storage. They may cause delay of traffic unrelated to specific host port to storage system links. TCP Delayed ACK On MS Windows, and ESX-based hosts, TCP Delayed ACK delays an acknowledgement for a received packet for the host. TCP Delayed ACK should be disabled on the iSCSI network used for storage. When enabled, an acknowledgment is delayed up to 0.5 seconds or until two packets are received. Storage applications may time out during this delay. A host sending an acknowledgment to a storage system after the maximum of 0.5 seconds is possible on a congested network. Because there was no communication between the host computer and the storage system during that 0.5 seconds, the host computer issues Inquiry commands to the storage system for all LUNs based on the delayed ACK. During periods of congestion and recovery of dropped packets, delayed ACK can slow down the recovery considerably, resulting in further performance degradation. Note that delayed ACK cannot be disabled on Linux hosts. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 25 Performance estimate procedure The steps required to perform a ROM performance estimate are as follows: Determine the workload. Determine the I/O drive load. Determine the number of drives required for Performance. Determine the number of drives required for Capacity. Analysis The steps need to be executed in sequence; the output of the previous step is the input to the next step. Determining the workload This is often one of the most difficult parts of the estimation. Many people do not know what the existing loads are, let alone load for proposed systems. Yet it is crucial for you to make a forecast as accurately as possible. An estimate must be made. The estimate must include not only the total IOPS or bandwidth, but also what percentage of the load is reads and what percentage is writes. Additionally, the predominant I/O size must be determined. Determine the I/O drive load This step requires the use of drive IOPS. To determine the number of drive IOPS implied by a host I/O load, adjust as follows for parity or mirroring operations: Parity RAID 5: Drive IOPS = Read IOPS + 4*Write IOPS Parity RAID 6: Drive IOPS = Read IOPS + 6*Write IOPS <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 26 Mirrored RAID 1/0: Drive IOPS = Read IOPS + 2*Write IOPS An example the default private RAID group of a RAID 1/0 pool is a (4+4). Assume a homogenous pool with six private RAID groups. For simplicity, a single LUN is bound to the pool. Further assume the I/O mix is 50 percent random reads and 50 percent random writes with a total host IOPS of 10,000: IOPS = (0.5 * 10,000 + 2 * (0.5 * 10,000)) IOPS = 15,000 For bandwidth calculations, when large or sequential I/O is expected to fill LUN stripes, use the following approaches, where the write load is increased by a RAID multiplier: Parity RAID 5: Drive MB/s = Read MB/s + Write MB/s * (1 + (1/ (number of user data drives in group))) Parity RAID 6: Drive MB/s = Read MB/s + Write MB/s * (1 + (2/ (number of user data drives in group))) Mirrored RAID 1/0: Drive MB/s = Read MB/s + Write MB/s * 2 For example, the default private RAID group of a RAID 5 pool is 5-drive 4+1 (four user data drives in group). Assume the read load is 100 MB/s, and write load is 50 MB/s: Drive MB/s = 100 MB/s + 40 MB/s * (1 + (1/4)) Drive MB/s = 150 MB/s Determine the number of drives required for Performance Make both a performance calculation to determine the number of drives in the storage system. Divide the total IOPS (or bandwidth) by the per-drive IOPS value provided in Table 9 for small-block random I/O and Table 29 for large-block random I/O. The result is the approximate number of drives needed to service the proposed I/O load. If performing random I/O with a predominant I/O size larger than 16 KB (up to 32 KB), but less than 64 KB, increase the drive count by 20 percent. Random I/O with a block size greater than 64 KB must address bandwidth limits as well. This is best done with the assistance of an EMC USPEED professional. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 27 Determine the number of drives required for Capacity Calculate the number of drives required to meet the storage capacity requirement. Typically, the number of drives needed to meet the required capacity is fewer than the number needed for performance. Remember, the formatted capacity of a drive is smaller than its raw capacity. Add the capacity required for a Virtual provisioning pool to maintain the pool’s file system. This is the pool’s metadata overhead. Furthermore, the system drives require four drives, and it is prudent to add one hot spare drive per 30 drives (rounded to the nearest integer) to the drive count. Do not include the system drives and hot spare drives into the performance calculation when calculating the operational performance. Analysis Ideally, the number of drives needed for the proposed I/O load is the same as the number of drives needed to satisfy the storage capacity requirement. Use the larger number of drives from the performance and storage capacity estimates for the storage environment. Total performance drives Total Approximate Drives = RAID Group IOPS / (Hard Drive Type IOPS) + Large Random I/O adjustment + Hot Spares + System Drives For example, if an application was previously calculated to execute 4,000 IOPS, the I/O is 16 KB random requests, and the hard drives specified for the group are 15K RPM SAS drives (see Table 9 Small block random I/O performance by drive type): Total Approximate Drives = 4,000 / 180 + 0 + ((4,000 / 180) / 30) + 5 Total Approximate Drives = 28 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 28 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 29 VNX systems allow the creation of two types of (block) storage pool – RAID Groups and Pools. RAID Groups (RGs) are the traditional way to group disks into sets. Rules regarding the number of disks allowed in a RG, and the minimum/maximum number for a given RAID type are enforced by the system. Supported RAID types are RAID 1, RAID 1/0, RAID 3, RAID 5 and RAID 6. RAID Groups can be created with a single disk or as unprotected RAID 0 groups, though this is uncommon. Hot Spares consist of a single-disk RAID Group with a single LUN automatically created on it. Note that only Traditional LUNs can be created on a RG. Pools are required for FAST VP (auto-tiering), and may have mixed disk types (Flash, SAS and NLSAS). The number of disks in a Pool depends on the VNX model, and is the maximum number of disks in the storage system less 4. As an example, the VNX5700 has a maximum capacity of 500 disks, and a maximum Pool size of 496 disks. The remaining 4 disks are system drives, which cannot be part of a Pool. At present, only RAID 5, RAID 6 and RAID 1/0 are supported in Pools, and each tier will be one RAID type. Pools have metadata associated with them, and that Pool metadata decreases the amount of available space in the Pool. In the uppermost screenshot, a 4+1 RAID 5 Pool has been created with 600 GB SAS drives. Note that 5 GiB is allocated (and therefore unusable by LUNs) even though the Pool currently has no LUNs created on it. In the lower screenshot, 8 NL-SAS drives of 2 TB have been added as a 6+2 RAID 6 tier. Note that the 5 GiB of space is still consumed – Pool overhead. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 30 Traditional LUNs, sometimes referred to in documentation as FLUs (FLARE Logical Unit, the legacy term), are created on RGs. They exhibit the highest level of performance of any LUN type, and are recommended where predictable performance is required. All LUNs in a RG will be of the same RAID type. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 31 Two different types of LUNs may be created on Pools – Thick LUNs and Thin LUNs. There are significant differences between them in terms of both operation and performance. When a Thick LUN is created, the entire space that will be used for the LUN is allocated; if there is insufficient space in the Pool, the Thick LUN will not be created. The slices that make up the Thick LUN each contain 1 GiB of contiguous Logical Block Addresses (LBAs). Because tracking happens at a granularity of 1 GiB, the amount of metadata is relatively low, and the lookups that are required to find the location of the slice in the Pool are fast. Because lookups are required, Thick LUN accesses will be slower than accesses to Traditional LUNs. Thin LUNs allocate 1 GiB slices when space is needed, but the granularity inside those slices is at the 8 KiB block level. Any 1 GiB slice will be allocated to only 1 Thin LUN, but the 8 KiB blocks will not necessarily be from contiguous LBAs. Oversubscription is allowed, so the total size of the Thin LUNs in a Pool can exceed the size of the available physical data space. Monitoring is required to ensure that out of space conditions do not occur. There is appreciably more overhead associated with Thin LUNs than with Thick LUNs and Traditional LUNs, and performance is substantially reduced as a result. The Pool LUNs in the screenshots were created, with size set to MAX, on the Pools shown 2 slides back. Note that the user space of the Pool LUN is not equal to the free space in the Pool – the mapping of slices and blocks consumes disk space. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 32 As mentioned, metadata is associated with the use of both Thick LUNs and Thin LUNs. The metadata is used to track the location of the data on the private LUNs used in the Pool structure. The amount of metadata depends on the size of the LUN, and may be slightly higher (proportionally) for smaller LUNs – those under around 250 GiB. Thin LUNs will consume around 1 GiB more space than a Thick LUN of the same size. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 33 Thin LUNs should be positioned in Block environments where space saving and storage efficiency outweigh performance as the main goals. Areas where storage space is traditionally over allocated, and where the Thin LUN “allocate space on demand” functionality would be an advantage, include user home directories and shared data space. If FAST VP is a requirement, and Pool LUNs are being proposed for that reason, it is important to remember that Thick LUNs achieve better performance than Thin LUNs. Be aware that Thin LUNs are not recommended in certain environments. Among these are Exchange 2010, and VNX file systems. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 34 Space is assigned to Thin LUNs at a granularity of 8 KiB (inside a 1 GiB slice). The implication here is that tracking is required for each 8 KiB piece of data saved on a Thin LUN, and that tracking involves capacity overhead in the form of metadata. In addition, since the location of any 8 KiB piece of data cannot be predicted, each data access to a Thin LUN requires a lookup to determine the data location. If the metadata is not currently memory-resident, a disk access will be required, and an extended response time will result. This makes Thin LUNs appreciably slower than Traditional LUNs, and slower than Thick LUNs. If a Pool with Thin LUNs has a Flash tier, metadata will be relocated to Flash, and LUN performance will improve. Because Thin LUNs make use of this additional metadata, recovery of Thin LUNs after certain types of failure (e.g. cache dirty faults) will take appreciably longer than recovery for Thick LUNs or Traditional LUNs. A strong recommendation, therefore, is to place mission critical applications on Thick LUNs or Traditional LUNs. In some environments – those with a high locality of data reference – FAST Cache may help to reduce the performance impact of the metadata lookup. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 35 In summary, note that: The use of Thin LUNs is contra-indicated in some environments. One of those environments is VNX File, where Thin LUNs should not be used. Thin LUNs should never be used where high performance is an important goal. Pool space should be monitored carefully (Thin LUNs allow Pool oversubscription whereas Thick LUNs do not). The system issues an alert when the consumption of any pool reaches a userselectable limit. By default, this limit is 70%, and allows ample time for the user to take any corrective action required. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 36 In the next slides we’ll take a look at the metaLUN detail. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 37 A metaLUN is seen by the host as a single SCSI LUN – the individual LUNs that make up a Volume Group will each be seen as a SCSI LUN. Using only 1 SCSI LUN may be an advantage. If the host renumbers LUNs when new LUNs are added, and especially if a system restart is required (possibly after a kernel rebuild), then a metaLUN may be a better choice. If the host does not support a Volume Manager, or does not support a Volume Manager used in conjunction with PowerPath, metaLUNs may fit the bill. VNX Replication Software see a metaLUN as a single LUN, so any replication will be simpler on a metaLUN than on a Volume Group. As noted before, VMs will automatically multithread large I/O requests to a Volume Group. metaLUNs will not, so use a VM if this feature is a requirement. Dedicated LUNs may be a better choice when LUN I/O patterns differ widely; mixing random and sequential I/O on the same physical disks is never optimal. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 38 Three metaLUN examples are shown here. metaLUN 1 is a striped metaLUN, made up of 4 identical LUNs. Data is striped across all 4. Note the order of the data on the diagram. The striping will take a while to complete if LUN 0 is already populated. metaLUN 2 is a concatenated metaLUN. Though all 4 LUNs are shown as the same size, they need not be. There is no requirement that they should be the same size, either, though in this case they are. Data fills LUN 0, then LUN 1, LUN2 and finally LUN 3. Expansion by concatenation is immediate, but may produce suboptimal results. metaLUN 3 is a hybrid metaLUN, made up of two striped components concatenated together. LUNs 0 and 1 are striped, so are therefore the same size and RAID type. Similarly, LUNs 2 and 3 are striped, and will be the same size and RAID type, though there is no requirement that they be the same size or RAID type as LUNs 0 and 1. The 2 pairs of LUNs are then concatenated together, an instantaneous process. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 39 This slide shows an example of 4 5-disk RAID 5 LUNs striped into a single metaLUN. The Base LUN, LUN 0, is shown on top, with each 64 kB data element shown as Data 00, Data 01, etc. The parity element for each stripe is also shown, though it doesn’t contribute to the calculation of stripe size. There are 4 data elements per stripe, of 64 kB each, the default, for a data stripe size of 256 kB. Only 4 stripes are shown for the sake of clarity. Because the Base LUN is a 5-disk RAID 5 LUN, the metaLUN that uses it should have the Element Size Multiplier set to 4. This will mean that 1 MB (256 kB times the multiplier of 4) of data will be written to each LUN before data is written to the next LUN in turn. As shown at the bottom of the slide, 16 elements, Data 00 through Data 15, are written to LUN 4095 (remember that component LUNs are renumbered when creating a metaLUN), then the next 16 elements are written to LUN 4094, and so on. 16 elements are 1 MB of data. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 40 Component LUNs should be selected carefully for use with metaLUNs; some recommendations are shown above. metaLUNs may be of 3 different types. The next slide shows examples. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 41 The metaLUN stripe segment size is the largest I/O that will be sent to a Component LUN. Setting the stripe element size multiplier is a compromise between the need for large stripes (for bandwidth) and small stripes (to distribute bursty I/O across all Component LUNs). The multiplier needs to be large enough that if the RAID Groups on which Component LUNs are built are expanded (by adding physical disks), I/O can still fill a stripe to allow MR3 writes. Using metaLUNs with slower drives is not generally recommended. If it must be done, remember to keep the RAID Groups small, bear rebuild times in mind, and avoid random I/O. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 42 Striping of metaLUNs will typically produce better performance than concatenation. Where possible, the component LUNs should have equal-sized RAID groups of disks of the same speed. If concatenation is required, the best practice will be to concatenate striped components, to form a hybrid metaLUN. If metaLUNs are made up of LUNs from the same RAID Groups, the base LUN may be selected in a rotating manner, to further spread the load evenly across the component LUNs. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 43 A typical Flash drive has components similar to these. The core of the drive is the Flash controller and the array of Flash RAM components. Because there are multiple paths to the Flash RAM, multiple operations can be performed simultaneously. This is especially true of reads; writes require additional processing because of the wear leveling or write leveling (2 terms with the same meaning) feature implemented on these drives. As a result of the multiple channels, multithreaded accesses are especially efficient on Flash drives. The internal data organization of Flash drives is different to that of electromechanical disks; that organization, and the internal operation of the drives, makes them very different from traditional disks. It is especially important to note that while mechanical disks are regarded as random-access devices, some accesses are more expensive than others. Reads or writes involving long seeks take longer than the same operations with short seeks. This is not true of Flash drives, which are true random-access devices; performance is uniform over the entire data space. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 44 This example of Flash drive data organization shows that data is divided into 4 KiB pages (some drives will have pages of a different size, e.g. 16 KiB), where a page is the smallest amount of data that a Flash drive can read or write, compared to the 512 B for an electromechanical drive. Writing to a page can only occur when the page is clean (contains no data); modifying an existing page requires that the data be written elsewhere (implemented by write leveling), or that the entire 512 KiB block be erased. A block is the smallest amount of space that can be erased. As the Flash drive fills, it will need to make space for new writes by erasing unused blocks. This block erasure is slow (typically milliseconds), and contributes to the slowdown in performance as Flash drives become full. A garbage collection routine runs on the drive to collect and clean previously used pages; this may also involve consolidation of partly full blocks. Excessive writes directly to the Flash RAM are reduced by consolidating data into DRAM on the drive. This DRAM is made persistent by backing it up with a battery and super-capacitors. Note that blocks are further combined into planes, and that there may be multiple planes on a single die, and potentially multiple dies on a single IC package; this level of organization does not concern us here, though. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 45 Flash drives may be provisioned together in enclosures with any other type drive. In addition, there are no restrictions on the number of Flash drives allowed in any VNX storage system. The drives are high-performance devices, though, and can easily saturate a backend port because of their superior bandwidth and throughput. It is recommended that no more than 12 Flash drives be allocated to a backend port when high bandwidth is the goal, and no more than 5 Flash drives be allocated to a backend port when high throughput is the goal. Note also that RAID 5 gives the best overall ratio of user data to data protection, and is recommended for use where appropriate. Because Flash drive reads are so much faster than writes, the best performance improvement, compared to HDDs, will be seen in environments where the ratio of reads to writes is high. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 46 Flash drives have vastly better random read performance than SAS drives due to the multiple paths to the Flash RAM. Throughput is particularly high with I/O sizes of 32 KB or less; it decreases as the block size becomes larger than the Flash drive page size. With four or more threads of large-block reads, Flash drives have up to twice the bandwidth of SAS HDDs, due to the absence of seek time. Flash drives have superior random write performance to SAS HDDs; throughput decreases with increasing block size. When writes are uncached by SP RAM cache, Flash drives are somewhat slower than SAS HDDs with single-threaded writes; bandwidth improves with thread count. Note that if Flash drives have SP write cache enabled, small-block sequential performance will improve dues to cache coalescing. This will potentially allow full-stripe writes to the Flash LUNs. Avoid deploying Flash drives on small-block workloads such as data logging. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 47 Flash drives are indicated for use where HDDs do not have sufficient performance to meet the demands of the environment, particularly where low response times are a requirement. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 48 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 49 This slide covers the use of Flash drives in RAID Groups or dedicated, homogeneous Pools. This mode of employment ensures peak performance from the drives; note, though, that the benefit is limited to the individual RG or Pool. If only a small number of Flash drives is available, they are often better employed in FAST Cache configurations. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 50 No discussion of FAST Cache and FAST VP is complete without a discussion of data skew, usually simply referred to as skew. Skew is the percentage of total load at the percentage of total capacity where the sum of those values is 100%. The dashed line shows a perfectly linear distribution of workload over the available data area. It is clear that the total load would be 50% at the 50% capacity mark, so the skew would be 50%. In the case of the bold, solid line, 90% of the workload is distributed over 10% of the capacity, for a skew of 90%. Skew values lower than 50% are meaningless (and are the same as 100% minus that value). Data with skew values around the 50% mark is regarded as data with no skew. The calculation of skew, at the LUN or slice level, uses tools available to EMC employees only. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 51 Fast Cache manages a map of pages located on flash drives. Data is cached by promoting data from hard disk drives to flash drives which improves response time and avoids cache misses – especially for reads. Dirty pages are asynchronously written back to hard disk drives when cleaning (demotion) takes place - this optimizes the writes. FAST Cache provides a much larger, scalable second-level cache. The available capacity for FAST Cache is divided equally between SPA and SPB. For example, if 400 GB of FAST Cache is configured (4x 200 GB drives, or 8x 100 GB drives), the available capacity will be around 366 GiB, and SPA and SPB will each be allocated 183 GiB. Unlike FAST VP, FAST Cache works equally well with Pool LUNs and RAID Group LUNs. Bear in mind, though, that enabling and disabling FAST Cache occurs at the LUN level for RAID Group LUNs, but at the Pool level for other LUNs. Disable FAST Cache for private LUNs, except metaLUN components. The CPL and WIL LUNs already have optimizations that keep them cache-resident, and RLP LUNs are unlikely to benefit from FAST Cache. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 52 The benefits of FAST Cache will not be evident in all environments. This slide lists some of the factors to be aware of when proposing FAST Cache. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 53 Fast Cache is a system wide resource, is easy to set up and can allocate up to 2 TB to cache (read or read/write). Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 54 FAST Cache and/or FAST VP should not be proposed if: • Customer data has no skew. Skew means that I/Os to a LUN are not evenly distributed over the entire data area. Some areas are very busy, while others may be accessed very infrequently. An example of skew is when 5% of the customer data generates 95% of all I/Os. • The customer cannot tolerate a false positive. A false positive means that critical data was placed on a slower tier when it was needed on the fastest tier. • The customer has unrealistic expectations. For example, customer data has traditionally been spread across 100s of 15K drives, and the goal is to replace this with 2 Flash drives in FAST Cache. In cases such as these, homogeneous pools, or the “Highest Available Tier” policy for heterogeneous pools, should be proposed. Finally, in cases where there is skew, do NOT under size the FLASH tier. If 5% of all data is responsible for 95% of all IOs, have 1% at least in FAST Cache and then 5-7% as a Flash tier. Tools such as Pool Sizer and Tier Advisor can help. Where the tools are limited to use by USPEED members, contact your local USPEED member for help. FAST VP should be given time to learn the environment. This is an important expectation to set with customers, especially important when data is migrated from a high disk count environment. There is a huge initial difference between a 100% 15K system and a 5%-20%-75% VNX with FAST VP. It can take FAST VP several days to learn which data is hot and get these slices moved to the higher tiers. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 55 Production workloads often show spatial and/or temporal locality, and thus lend themselves to the use of FAST Cache and/or FAST VP. This is not generally true of benchmark data, and FAST Cache and FAST VP may show little benefit in benchmark environments. This is true of both file and block environments, and makes prediction of performance improvement, as well as demonstration of the benefit of the feature, a difficult task. In addition, there are insufficient tools available to the general field community to aid in the design, implementation and troubleshooting of advanced VNX features such as FAST VP, FAST Cache and compression. As is the case with FAST VP, the “learning” process takes time. The warm-up phase for FAST Cache can take from several minutes to several hours, and this should be taken into account. Workloads that perform sequential activity, especially where I/O sizes are small, are poor candidates for FAST Cache. Be aware that some “housekeeping” activities performed by applications may match this profile, and may cause pollution of FAST Cache. Some of this cache pollution is avoided by the sequential data detection added to FAST Cache in the 05.32 VNX OE for Block release. For improved data availability, allocate drives that make up a FAST Cache RAID 1 pair to different back-end ports. This may also have a positive effect on FAST Cache performance. Note that, as a general rule, Flash drives should be spread across multiple buses when the number of Flash drives in use demands it. More than 5 Flash drives can saturate a back-end port, so consider drive placement carefully when using FAST Cache, FAST VP or even Flash-only storage pools. Using Flash drives on Bus 0 is acceptable. DO NOT, however, use Enclosure 0,0 for only one Flash drive in a RAID 1 pair. FAST Cache does not proactively flush the dirty chunks back to the disks but will only flush when it needs capacity for future promotions. This can cause problems where a workload changes rapidly and FAST Cache can not react quickly enough. If a customer needs to disable or resize the FAST Cache, it may take a considerable length of time to de-stage the dirty chunks before FAST Cache can be disabled. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 56 The amount of memory (RAM) per SP, and the maximum SP write cache size, differs for the different VNX models. Advanced software features, such as FAST Cache, FAST VP, Thin LUNs, and Data Compression, consume SP RAM and therefore reduce the amount of available SP write cache. The amount of RAM consumed depends on the VNX model, the features implemented, and the size of FAST Cache, and will be up to 29% for FAST Cache, and between 23% and 37% for any combination (one or more) of the other advanced features. It should be noted that the reduction in available SP write cache is not an indicator that performance will decrease; typically, it is more than offset by the increased performance and efficiency offered by the features that are implemented. For example, although not included in the system cache size metric, significant amounts of memory used by these features are used in caching data used by those features. The rate of cache flushing is a more important metric than write cache size. If a system is already ‘on the edge’ with the cache the right thing to do in any case is to adjust the design to reduce that cache pressure – identify the offending LUNs, and migrate them to better RAID types or faster drives before the advanced features are added to the system. This is in accordance with our best practice recommendation that you have under 60% saturation before attempting to add FAST Cache, as one example. The VNX7500, configured with 48 GiB memory per SP, does not see a major impact in cache size when FAST Cache or FAST VP are used – the larger memory was added to improve performance with those features. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 57 The user can adjust write cache to the previous free page limit by adjusting the watermarks. Both HWM and LWM will need to be adjusted. As an example, if the previous write cache size was 10,000 MiB and the HWM/LWM were configured to 80% and 60%, the write cache headroom was 2,000 MiB. If adding a software feature reduces write cache size by 30%, the new size is 7,000 MiB. To maintain 2,000 MiB headroom, the HWM will have to change to 5,000/7,000 MiB, which is 71%. The LWM can then be configured as around 20% lower, in the 51% region. In the lower end of the VNX model range, where the available write cache is a smaller quantity, the difference between the HWM and the LWM may need to be kept smaller (around 10%) to force cache to be flushed more frequently. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 58 FAST Cache and/or FAST VP are supported for all applications. Though their use will typically lead to improved performance for the application, this may not always be the case. Some applications do not exhibit data skew, and some will have background activity which interferes with the FAST Cache and FAST VP statistics gathering. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 59 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 60 This lesson covers designing for File-only environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 61 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 62 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 63 When selecting dVols (LUNs) from Traditional RAID groups for use in a pool entry, AVM selects disk volumes such that all LUNs must be from the same storage system and from different RAID groups. Having LUNs come from different RAID groups increases the number of spindles in use and avoids head contention on the disks. All LUNs must also have the same RAID configuration (for example, 4+1 R5) and the same size, thus helping to ensure that all parts of the file system have the same performance potential as well as availability characteristics. If more LUNs are available than needed, the RAID groups with the least utilized LUNs are used first. Utilization is defined as the number of LUNs used by the VNX OE for File in the RAID group divided by the number of LUNs visible to VNX OE for File in the RAID group. After selection based on utilization, if more LUNs are available than are needed, LUNs are chosen in a way that dVols will come from different back-end buses, then SP balancing will be used. If more LUNs are available than are needed, LUNs are chosen such that dVols with higher IDs are selected first in order to avoid the dVols with the lowest IDs. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 64 When selecting dVols (LUNs) from Traditional RAID groups for use in a pool entry, AVM selects disk volumes such that all LUNs must be from the same storage system and from different RAID groups. Having LUNs come from different RAID groups increases the number of spindles in use and avoids head contention on the disks. All LUNs must also have the same RAID configuration (for example, 4+1 R5) and the same size, thus helping to ensure that all parts of the file system have the same performance potential as well as availability characteristics. If more LUNs are available than needed, the RAID groups with the least utilized LUNs are used first. Utilization is defined as the number of LUNs used by the VNX OE for File in the RAID group divided by the number of LUNs visible to VNX OE for File in the RAID group. After selection based on utilization, if more LUNs are available than are needed, LUNs are chosen in a way that dVols will come from different back-end buses, then SP balancing will be used. If more LUNs are available than are needed, LUNs are chosen such that dVols with higher IDs are selected first in order to avoid the dVols with the lowest IDs. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 65 Continuing with the methodology described in the previous slide, AVM attempts to create pool entries using 4 dVols. If AVM cannot create a four dVol pool entry, it will attempt to create a three dVol pool entry, then a two, and then finally use a single dVol. AVM will fully consume pool volumes created from four dVols before considering a pool volume containing three disk volumes. This is another method that AVM uses to distribute the load among equally sized pool entries instead of “stacking” file systems on the first available pool entry. EFDs have multiple internal channels that can simultaneously service up to 16 concurrent I/Os. To reach peak EFD performance, all of these channels must be kept busy. This requires that both VNX SPs have access to the EFDs, and that there are multiple I/O queues in front of the EFDs. To accomplish this, you should bind multiple LUNs within an EFD RAID Group, and balance ownership of these LUNs across the SPs. It is acceptable to stripe across all dVols from the same EFD RAID Group (RG), because of the physical structure of Flash drives. AVM will stripe together all dVols of the same size from up to two RGs to create a pool entry. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 66 File systems impose unique performance requirements on the LUNs on which they are built. In order to optimize the performance of file systems, it is recommended that they be built on Traditional (RAID Group) LUNs or Thick (Pool) LUNs only. The use of Thin LUNs is strongly discouraged. Where Pool LUNs are used, it is recommended that the entire pool be used for file system storage, and not shared between File and Block access. To keep performance consistent, and to allow support for slice volumes, the tiering policy should be identical on all Thick LUNs in a pool used for File storage. Additional recommendations are that there should be 1 Thick LUN for each 4 physical disks in the pool, and that the LUN count should be divisible by 10 in order to make striping easier, and balance LUNs across SPs. AVM can make use of Pools and RAID Groups. The storage provisioning activity needs to be performed manually – creating Pool LUNs, assigning them to the Storage Group, and running a Rescan. After these provisioning steps, AVM can be used. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 67 File systems impose unique performance requirements on the LUNs on which they are built. In order to optimize the performance of file systems, it is recommended that they be built on Traditional (RAID Group) LUNs or Thick (Pool) LUNs only. The use of Thin LUNs is strongly discouraged. Where Pool LUNs are used, it is recommended that the entire pool be used for file system storage, and not shared between File and Block access. To keep performance consistent, and to allow support for slice volumes, the tiering policy should be identical on all Thick LUNs in a pool used for File storage. Additional recommendations are that there should be 1 Thick LUN for each 4 physical disks in the pool, and that the LUN count should be divisible by 10 in order to make striping easier, and balance LUNs across SPs. AVM can make use of Pools and RAID Groups. The storage provisioning activity needs to be performed manually – creating Pool LUNs, assigning them to the Storage Group, and running a Rescan. After these provisioning steps, AVM can be used. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 68 When volumes are created from Pool LUNs, there are a number of recommendations: • • • • Stripe LUNs rather than using concatenation Use 5 LUNs per stripe (LUN count in pool divisible by 10 as noted on previous slide) Use a stripe size of 256 KiB Choose stripe LUNs in such a way that SP ownership of the first LUN alternates in stripes Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 69 At present there are performance issues when file systems are created on Pool LUNs. The general recommendation is to use Traditional LUNs for file data unless FAST VP is a requirement. A Unisphere wizard makes the creation of a file storage pool simpler if Traditional LUNs are used. If Pool LUNs are used, use Thick LUNs rather than Thin LUNs, and configure those Thick LUNs to the same tiering policy. Thin LUNs are not recommended for VNX File data. If thin provisioning is a requirement for File storage, use the auto extension feature on file systems built on top of Traditional or Thick LUNs. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 70 The guideline, when using advanced features such as compression, is to implement them at the layer that has the most knowledge of the data structure. The user or administrator will have to choose whether to use file-based or block-based compression based on knowledge of the data, and whether it is repetitive or likely compressible in nature. If the data is shared with end users via an NFS export or CIFS share, then file-based compression should be used. Note that the name “File Compression and Deduplication” is the VNX name for the product referred to on the previous platform as “Celerra Deduplication”. If the data is assigned to hosts through Fibre Channel, iSCSI, or FCoE connections, then block level compression should be used. Remember that block level compression converts Traditional LUNs and Thick LUNs to Thin LUNs via an internal migration, as part of the compression process. Since Thin LUNs are not recommended for file system use, block level compression should be avoided when using file systems. The same will apply to environments using applications where Thin LUNs are not recommended. Certain types of block level data lend themselves to compression: • Sharepoint BLOB (Binary Large OBject) externalization to block storage • Atmos/VE, where Atmos is the archival storage target, and its storage is provisioned on compressed LUNs • VMware VM template repositories, which are read-mostly structures • Archive files In block environments with compression, small random reads achieve the best performance, while small, random writes are most expensive. The latter causes data to be decompressed and then overwritten, and the process will add to the response time of the write. The data access pattern expected from archive-type environments involves small, random reads. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 71 As noted before, LUNs that have the compression feature turned on are migrated to Thin LUNs, with the associated performance impact noted previously. Decompression of a compressed LUN is possible; note, though, that a Traditional LUN which is compressed and then decompressed (by disabling compression) does not return to being a Traditional LUN; instead, it is a fully-provisioned Thin LUN. Returning to a Traditional LUN will require additional steps, e.g. a LUN migration. Areas of a compressed LUN that are accessed must be decompressed for reads and/or writes to take place. This adds additional performance overhead. Note that this is a decompression, in memory, of only the accessed portion of the data, not a full decompression of the entire LUN. Improvements in the way compression is performed on VNX systems has led to performance that exceeds that found on the previous generation CX systems. This performance improvement is achieved in both throughput-oriented (typically small-block random) and bandwidth-oriented (typically large-block sequential) environments. Thick LUNs will still perform better than compressed LUNs. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 72 In summary: Implement compression at the level that the data is used – file or block. Be aware of the data type and the data access pattern before implementing compression. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 73 The restrictions applicable to AVM are: Create a file system by using only one storage pool. If you need to extend a file system, extend it by using either the same storage pool or by using another compatible storage pool. Do not extend a file system across storage systems unless it is absolutely necessary. File systems might reside on multiple disk volumes. Ensure that all disk volumes used by a file system reside on the same storage system for file system creation and extension. This is to protect against storage system and data unavailability. LUNs that have been added to the file-based storage group are discovered during the normal storage discovery (diskmark) and mapped to their corresponding storage pools on the VNX for file. If a pool is encountered with the same name as an existing user-defined pool or system-defined pool from the same VNX for block system, diskmark will fail. It is possible to have duplicate pool names on different VNX for block systems, but not on the same VNX for block system. Names of pools mapped from a VNX for block system to a VNX for file cannot be modified. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 74 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 75 Automatic file system extension does not work on MGFS, which is the EMC file system type used while performing data migration from either CIFS or NFS to the VNX system by using VNX File System Migration (also known as CDMS). Automatic extension is not supported on file systems created with manual volume management. You can enable automatic file system extension on the file system only if it is created or extended by using an AVM storage pool. Automatic extension is not supported on file systems used with TimeFinder/FS NearCopy or FarCopy. While automatic file system extension is running, the Control Station blocks all other commands that apply to this file system. When the extension is complete, the Control Station allows the commands to run. The Control Station must be running and operating properly for automatic file system extension, or any other VNX feature, to work correctly. Automatic extension cannot be used for any file system that is part of a remote data facility (RDF) configuration. Do not use the nas_fs command with the -auto_extend option for file systems associated with RDF configurations. Doing so generates the error message: Error 4121: operation not supported for file systems of type EMC SRDF®. The options associated with automatic extension can be modified only on file systems mounted with read/write permission. If the file system is mounted read-only, you must remount the file system as read/write before modifying the automatic file system extension, HWM, or maximum size options. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 76 Enabling automatic file system extension and thin provisioning does not automatically reserve the space from the storage pool for that file system. Administrators must ensure that adequate storage space exists, so that the automatic extension operation can succeed. When there is not enough storage space available to extend the file system to the requested size, the file system extends to use all the available storage. For example, if automatic extension requires 6 GB but only 3 GB are available, the file system automatically extends to 3 GB. Although the file system was partially extended, an error message appears to indicate that there was not enough storage space available to perform automatic extension. When there is no available storage, automatic extension fails. You must manually extend the file system to recover from this issue. Automatic file system extension is supported with EMC VNX Replicator. Enable automatic extension only on the source file system in a replication scenario. The destination file system synchronizes with the source file system and extends automatically. Do not enable automatic extension on the destination file system. When using automatic extension and thin provisioning, you can create replicated copies of extendible file systems, but to do so, use slice volumes (slice=y). iSCSI virtually provisioned LUNs are supported on file systems with automatic extension enabled. Automatic extension is not supported on the root file system of a Data Mover or on the root file system of a Virtual Data Mover (VDM). Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 77 With thin provisioning enabled, the NFS, CIFS, and FTP clients see the actual size of the VNX Replicator destination file system while they see the virtually provisioned maximum size of the source file system. Thin provisioning is supported on the primary file system, but not supported with primary file system checkpoints. NFS, CIFS, and FTP clients cannot see the virtually provisioned maximum size of any EMC SnapSure™ checkpoint file system. If a file system is created by using a virtual storage pool, the -thin option of the nas_fs command cannot be enabled. VNX for file thin provisioning and VNX for block thin provisioning cannot coexist on a file system. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 78 Use RAID group-based LUNs instead of pool-based LUNs to create system control LUNs. Pool-based LUNs can be created as thin LUNs or converted to thin LUNs at any time. A thin control LUN could run out of space and lead to a Data Mover panic. VNX for block mapped pools support only RAID 5, RAID 6, and RAID 1/0: • RAID 5 is the default RAID type, with a minimum of three drives (2+1). Use multiples of five drives or 9 drives. • RAID 6 has a minimum of four drives (2+2). Use multiples of eight drives or 16 drives. • RAID 1/0 has a minimum of two drives (1+1). Eight drives per group are recommended. EMC Unisphere™ is required to provision virtual devices (thin and thick LUNs) on the VNX for block system. Any platforms that do not provide Unisphere access cannot use this feature. You cannot mix mirrored and non-mirrored LUNs in the same VNX for block system pool. You must separate mirrored and non-mirrored LUNs into different storage pools on VNX for block systems. If diskmark discovers both mirrored and non-mirrored LUNs, diskmark will fail. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 79 Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 80 This lesson covers designing for mixed Block and File environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 81 Designing for a mixed environment is little different from designing for a file-only environment. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 82 This lesson covers the design of environments for specific applications. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 83 Exchange 2010 was designed to use large, slow drives, and minimizes access to the physical disks. As a result, FAST Cache is only useful if very high levels of performance are required. Jetstress, used in testing during Exchange deployment, has poor data locality, so FAST Cache is not likely to provide any deterministic performance improvement with Exchange 2010. BDM, Background Database Maintenance, a regular part of Exchange implementations, causes pollution of FAST VP statistics collection and ranking. Homogeneous Pools, or Traditional LUNs, will not exhibit this effect, and are recommended. The use of “Highest Tier Available” data placement for Exchange data may reduce the effect of BDM on FAST VP LUNs. The use of Thin Pool LUNs should be avoided with Exchange 2010. If Thick Pool LUNs are used, users should be aware that Jetstress causes data to be allocated to LUNs unevenly, causing initial LUN performance to be poorer than that for other LUNs. This could cause Jetstress to report a failure. To work around this, engineering has developed a utility called “SOAPTool” which forces even distribution of the data. If using Thick Pool LUNs, use the SOAPTool utility to ensure optimal performance. Alternatively, Traditional (RAID Group) LUNs may be used instead. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 84 20 SAS spindles (16 for DB, 4 for Log; in RAID 10 configuration) 4x100GB Flash drives in FAST Cache (50% of working set) Max Supported TPS/Ravg figures. Next sample breached gating metric. ( 2 second Average Response Time (Ravg)) SAS Only – 1448 TPS, 0.8s Ravg SAS + FAST Cache – 5778 TPS, 1.95s Ravg A VNX5700 was used in this testing, and was configured as shown in the slide. The chart shows SQL Server performance when running on SAS drives only, and compares this to an environment where FAST Cache has been added. Performance levels off as the system saturation point is reached (vertical blue lines in the chart), but at a point which is considerably higher for the FAST Cacheenabled tests. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 85 FAST Cache requires a warm-up period before it will show optimal performance improvement. Once the warm-up period has elapsed, the benefit is persistent across server or SP reboots. The SP write cache will need some time to reach optimal efficiency (in terms of rehits, etc) after a reboot. In OLTP environments, Pool LUNs show a reduction in performance of around 14% when compared to Traditional LUNs. FAST Cache will improve performance significantly in this environment. Note that if best performance is required from FAST VP, initial data placement should be set to “Highest Available Tier”. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 86 In virtual desktop environments, FAST Cache has demonstrated a significant performance increase over the use of magnetic disk drives alone. This means that slower, less costly disks can be used in this environment. FAST Cache will service much of the I/O load in the boot phase, and will absorb much of the post-boot load. This type of environment, with high data locality, lends itself to the use of FAST Cache. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 87 This slide shows the throughput of the LUNs (on Flash drives) used to hold the operating system image. LUN names are EFD_Replica LUN 1 and EFD_Replica LUN 2. Note the high level of activity during the boot phase. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 88 With FAST Cache implemented, I/O activity caused by booting and steady-state user load is serviced largely from FAST Cache. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 89 Here is a look at a different virtual desktop environment. Note that the I/O pattern is very similar to that seen in previous slides, with a very high level of disk activity at boot time, and much less at steady state. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 90 In environments such as this one, there is likely to be very high locality of data reference due to common user boot configurations. That is apparent here – the first boot storm loads the boot replica into FAST Cache very quickly, thereby minimizing the FAST Cache warm-up time. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 91 SharePoint data is typically stored in a SQL database. For structured data, this is efficient, and makes good use of the SQL query mechanism. For other, unstructured data, such as files, this is not an optimal use of a SQL Server database. SharePoint therefore allows BLOB (Binary Large OBject) data to be stored external to the SQL Server database. An External BLOB Store (EBS) Provider must be installed on each application Web server in the farm; this Provider allows the use of external file-based storage for up to 80% of the data in typical environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 92 For the most frequent operation – browsing – externalized BLOB storage is faster than regular database storage. Note also that there is only a slight difference between the performance of SAS and NL-SAS drives in this environment. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 93 This slide compares a traditional (SQL Server) and an externalized implementation of BLOB storage. Note the saving is disk count, as well as the opportunity to use fewer disks for the same workload. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 94 Historically, DSS workloads have not been a sweet spot for the CX4, except possibly for the CX4960. The VNX changes this position. This solution does not leverage Flash drives or the FAST Suite as the large sequential workloads do not lend themselves to this technology. The huge improvements in total throughput (particularly in the lower end platforms) can drive up to 4.5x the bandwidth on an apples to apples basis. The CX4-120 can achieve around 750MB/s and the VNX5300 can achieve around 3,500MB/s! The cost of the comparable configuration (Block-only VNX5300) is 84% higher than the CX4, however to achieve the throughput provided by that platform with CX4 would require a CX4-960 platform, which would be considerably more expensive than the VNX5300. The slide shows the VNX5300 being able to scale performance 4.5x compared to the upper limit of its predecessor, the CX4-120. VNX systems are ideal for Oracle DSS environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 95 This solution leverages Flash drives and the FAST Suite – the workload lends itself to this technology. The slide shows the typical improvement that can be achieved – 5x the performance from a platform that is marginally more expensive (as configured) than the previous generation system it is compared with. As was the case with Oracle DSS, VNX systems are ideal for Oracle OLTP environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 96 This slide summarizes cases where Pools, FAST Cache and FAST VP may be beneficial. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 97 This lesson covers storage design for virtualized environments. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 98 Virtual environments behave in much the same way as physical environments, especially at the VM level. Guidelines for virtual environments are mentioned. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 99 Questions 1 to 3. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 100 Questions 1 to 3. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 101 Questions 4 and 5. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 102 Questions 4 and 5. Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 103 This module covered best practices for VNX system designs, and best practices for specific host and application environments. See the following documents for more information: H10938 EMC VNX Unified Best Practices for Performance – Applied Best Practices Guide (Aug 2012) Copyright © 2012 EMC Corporation. All rights reserved Module 4: Storage Design Best Practices 104 This module focuses on BC design best practices. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 1 This lesson covers local replication. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 2 VNX SnapView snapshots make efficient use of space – the technology is pointer-based, and only chunks that have changed occupy space in the Reserved LUN Pool. Those chunks are copied by the Copy on First Write (COFW) process, which can have a significant performance impact on the source LUN. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 3 The existence of a Snapshot has a direct effect on the read performance of the Source LUN. COFW activity reads Source LUN data in 64 KiB chunks; this COFW may be caused by a host write to the Source LUN, or by a secondary host write to the Snapshot. Reads directed at the Snapshot are satisfied by the Source LUN if they have not yet been copied to the RLP as a result of COFW activity. No Snapshot activity causes writes to the Source LUN. Source LUN writes are indirectly affected by Snapshot activity, though COFW adds significant latency to a host write, especially when write cache is saturated. The next slide discusses this added latency. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 4 Snapshot activity, as expected, has a direct effect on the RLP LUN(s). Snapshot reads hit the RLP if data has previously been copied there as a result of a COFW. A write by a secondary host to a virgin Snapshot chunk causes a 64 KiB write to the RLP (as part of the COFW). Subsequent secondary host writes to that chunk are the size of the host write. Normal COFW activity causes 4 writes to the RLP – three of them are related to the map, and may be 8 KiB or 64 KiB in size, depending on the operation being performed, and the remaining write is the 64 KiB data chunk. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 5 Key factors that influence Snap performance are listed on the slide. • Application I/O reports can be used to profile the application I/O activity. Very active Source LUNs cause more COFW activity, especially if the application I/O profile uses small-block, write-intensive, random I/O. In this case, bandwidth to the RLP LUNs far exceeds bandwidth to the Source LUN. The larger the change rate, the less efficient Snapshot operation becomes. Remember the rule of thumb is that Snapshots work better on Source LUNs that experience less than 30 percent change rates over the life of the Session. • The number of concurrent Sessions may have a dramatic effect on the COFW activity and the amount of RLP space used. • The duration of the Sessions determines how long COFW operations continue, and how much data is stored in the RLP. If Sessions run indefinitely, eventually the RLP holds all of the original data from the Source LUN, and uses slightly more space than the Source LUN. • The Snapshot I/O profile is an important factor to consider. Writes to the Snapshot can affect the Source LUN, due to COFW reads, as well as add extra I/O load to the RLP. • The RLP behaves like any ordinary FLARE LUN, and must be configured carefully – size and performance are both important. RLP LUNs should be spread across multiple RAID Groups for best performance. The RAID type used should be chosen carefully, as well as the disk type. ATA drives do not perform well when used for RLP LUNs. • Each VNX model has different performance characteristics and limits. • If host data was aligned with the native LUN Offset, then SnapView operations cause disk crossings, and performance is degraded even further. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 6 Very small I/Os that cause a COFW appear to be affected more than larger I/Os. If a 512 B host write causes a COFW, the ratio of host data : RLP data is between 1 : 160 (1 x 64 kB write, and 2 x 8 kB writes) and 1 : 384 (3 x 64 kB writes). If a 64 kB host I/O causes a COFW, then the ratio is 1 : 3 at worst, and it appears as though the performance impact is less. Random data is more troublesome with Snapshots than is sequential data. Sequential writes, no matter how small, eventually fill a chunk, and the host data : RLP data ratio is close to optimal. In addition, random data is more likely to trigger a COFW, and the performance impact is more severe. The total number of I/Os, especially writes, is significant. If a LUN is lightly loaded, the extra I/Os caused by COFW activity may not be noticed. As the I/O load increases, whether caused by host reads or writes, the VNX becomes busier, and COFW activity makes noticeable changes to response time. If the host application is very write-intensive, then the COFW load is particularly severe. The number of writes, if used alone, only gives us part of the picture. We need to know, or calculate as best we can, the number of writes that are made to the same chunk. The write cache rehit ratio gives a rough idea; VNX has no native tools or commands to measure write activity onto specific disk areas. The ktrace utility can help here; it is, however, an EMC proprietary tool. Note that customer estimates are likely to be too low, often by an order of magnitude or more. A customer may assume that if a LUN contains 1,000,000 blocks, and 1,000 of them are changed, that this represents a change of 0.1%. This is correct in the absolute sense; for design purposes, however, we need to determine where those writes took place, and how many chunks were touched. If each changed block was on an unique chunk, then 12.8% of the data would have been changed as seen by a Snapshot – 128 times more than the customer estimate. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 7 If we have no reliable data, then our planning procedure must assume that each host write touches a new chunk. A factor which is often overlooked when sizing the RLP is the I/O profile of the Snapshots. Snapshots are often used for backups or similar activities; the number of writes made to the Snapshot in those cases is low. If the secondary host performs write-intensive activity on the Snapshot, then we have to be aware that primary host writes and secondary host writes cause COFW activity. This is particularly severe if the secondary host is allowed to start writing soon after the Session has been started – the impact of the primary host’s COFW is still near its peak. Other factors which are relevant to the I/O profile of the Source LUN are also relevant here, and mentioned in the slide. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 8 If the Sessions on a Source LUN are started at the same time, or close to the same time, then (most of) the chunks that are copied to the RLP is shared among the Snapshots, and only one copy of any chunk is in the RLP. If Sessions are started with long intervals between them, it is likely that some, or much of the disk data changed between Session starts. Fewer chunks are shared; not only does the RLP take up more disk space, but the number and impact of COFWs are greater. A single Session writes data to the RLP (to the chunk storage area, to be more accurate) in a sequential manner, even though the data may come from very different areas of the Source LUN. As the number of concurrent Sessions increases, writes to the RLP still remain sequential. Once we start to terminate and restart Sessions, space freed up (in random locations) by previous Sessions is used, and the access pattern becomes more random. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 9 As noted previously, the load on the RLP LUNs can be very high – the number of IOPs can be larger than the writes/s on the Source LUN, and the bandwidth can be very much higher than that of the Source LUN. The performance of the RLs is important because of the impact on host applications. As a result, the disks should be SAS disks, and we should limit the number of RLs per RAID Group. Treating RLs like regular LUNs, and applying performance best practices to them, improves the chance of designing a successful Snapshot implementation. Sizing the RLP LUNs for capacity is a compromise between efficient use of disk space, and efficient use of LUN numbers. In many cases, making the RLs 10% of the average size of the Source LUNs, and allocating 2 RLs per Source LUN, works well. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 10 Performance is a consideration when you position SnapView. It’s important that you and the customer discuss the implications, in order to ensure a successful configuration. SnapView Snapshots typically underperform SnapView Clones in like situations, although at low I/O loads, the impact may not be significant. SnapView Snapshots affect the read performance of the source device; Clones do not, unless they are synchronizing. Write performance to the Source LUN is affected by an active Session, and by a non-fractured Clone. Reads from a Snapshot are slower than reads from Source LUNs or fractured Clones. Avoid environments with small-block random writes whenever possible; that scenario has the most dramatic impact on performance. Snapshots can increase the Source LUN response time significantly. Additional snapshots, however, have a minor impact on response time. The impact continues as long as the SnapView session is active. However, the impact decreases over time, as more chunks are copied to the RLP. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 11 Problems found during modeling must be addressed before the solution can be implemented. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 12 The correct configuration and placement of RLP LUNs is the most important factor when planning for Snapshots. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 13 Here, we briefly take a look at 3 examples of sizing the RLP for performance. Note that, in a purely random environment, each host write (at least initially) causes at least one COFW, no matter the size of the host I/O. If the host I/O size is larger than a chunk, then each host write causes multiple COFWs. In purely sequential environments, the ratio of host I/Os to COFWs is the same as the ratio of chunk size to host I/O size. • For 4 KiB host I/Os, we see a COFW for every 16 host writes. • For 256 KiB host I/Os, we see 4 COFWs for each host write. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 14 As noted earlier, an understanding of the COFW process, and the role played by the RLP, is vital to understanding the performance of, and planning for, Snapshots. A LUN which has been added to the Reserved LUN Pool, and then allocated to a Source LUN, has data stored in 3 distinct areas: • A bitmap, found at the beginning of the LUN. This bitmap tracks chunk usage on the Reserved LUN. • A chunk index and status area, which points to the location of data on the LUN, and keeps the status of chunks on the Source LUN. It is indexed by chunk number on the Source LUN, and therefore has a size which is related to Source LUN size. • The chunk storage area. COFW data is saved here. This area occupies the rest of the Reserved LUN. These areas are contiguous on disk. The gaps between them on the slide are there to simplify the illustration. The required size of Area 2 is only known once the Reserved LUN is assigned to a Source LUN. The first COFW to the Reserved LUN triggers creation of the map area, resulting in a much higher level of write activity than is the case for subsequent COFWs. <Continued> Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 15 Each of these areas is subject to SnapView paging, meaning that its data can be paged into SP memory or out to disk as required. This paging operates in 64 kB ‘pages’, so writes to these SnapView areas will always be 64 kB in size when paging, but will be 8 KiB in size in most other circumstances. Similarly, the COFW process writes data in 64 KiB chunks only. When a COFW is performed, each of these areas is updated. A single host write of 64 KiB or less can therefore cause 4 writes of 64 KiB (at worst – usually one 64 KiB and three 8 KiB writes) to be made to the RLP, in addition to the 64 KiB read made from the Source LUN. This has performance implications, especially in environments where access to the Source LUN involves random, small-block writes. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 16 These screenshots show activity on an RLP LUN just after a Session was started. Writes, shown in red, take place to metadata areas on the RL; moves, shown in green, are the actual 64 kB data chunks on the RL. The Source LUN used here is 50 GB in size; had it been larger, the sequential writes seen at around block 200,000 would have appeared elsewhere. As an example, for a 200 GB Source LUN, the line would appear just above the 800,000 block address. The areas highlighted by oval outlines will be expanded in subsequent screenshots. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 17 The index area is accessed largely randomly. This random activity affects the performance of the RLP LUN, and should act as a guideline when selecting disk type and RAID type for RLP LUNs. Note that random reads as well as random writes are occurring in the index area. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 18 Write activity (shown in green – these are actually moves performed by the Data Mover Layer) to the chunk storage area is sequential in nature. Note that this screenshot consists of writes only – the Snapshot was not being read (or, in fact, accessed) by the secondary host. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 19 We’ll take a look at 3 brief examples of sizing the RLP for capacity. These are the same as the examples used when sizing for performance. Note that, in a purely random environment, each host write (at least initially) causes at least one COFW, no matter the size of the host I/O. If the host I/O size is larger than a chunk, then each host write causes multiple COFWs. In purely sequential environments, the ratio of host I/Os to COFWs is the same as the ratio of chunk size to host I/O size. For 4 KiB host I/Os, we see a COFW for every 16 host writes, and for 256 KiB host I/Os, we see 4 COFWs for each host write. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 20 For optimal performance, Reserved LUNs should be created on the number of disks, and the RAID type, that match the expected I/O profile. Bear in mind, particularly in environments where the Source LUN has a small-block, write-intensive load, that the RLP may need to handle much more bandwidth than the Source LUN. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 21 This slide is a brief overview of SnapView Clone operations. The 2 major issues associated with Clones are the performance effect of synchronization (related to the size of the Source LUN), and the correct placement of the Clone. The RAID type and disk type chosen for the Clone is also very important. The Clone Private LUN (CPL) – strictly 2 LUNs – contains all the bitmaps used by SnapView Clones. Each LUN must be a minimum of 1 GiB in size. Because a Clone is a full copy, there is no COFW mechanism. While a Clone is synchronized, each write to the Source LUN causes a write to be made to the Clone, which places an additional load on the write cache. Synchronization can be costly because of the tracking granularity associated with Clones. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 22 Clone synchronization reads data from the Source LUN and writes it to one or more Clones, which are independent LUNs. Because of their independence, separate writes go to each, thereby adding additional load to the write cache and disks. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 23 Clone synchronization reads data from the Source LUN and writes it to one or more Clones, which are independent LUNs. Because of their independence, separate writes go to each, thereby adding additional load to the write cache and disks. Remember that when synchronizing or reverse synchronizing, the entire extent is copied if any changes occurred to the extent. The easy way to calculate the extent size for a Clone is to use LUN size in GB as the extent size in blocks, e.g., a 256 GiB LUN has an extent size of 256 blocks = 128 KiB, a 400 GiB LUN has an extent size of 200 KiB, etc. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 24 No more than 8 clones per Source LUN. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 25 The major items to be considered are the number of LUNs to be replicated, how they will be replicated, and the total data size. The use of LVMs and clusters may add an additional level of complexity. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 26 Decisions about expiration of backup volumes are important because they affect the total amount of space required for the local replication solution. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 27 Within mission-critical data, some subset of that data may be more time sensitive than others. Subcategories of critical data may be needed, with higher-priority data being protected more rigorously. History shows that data errors tend to recur as a result of consistently recurring events. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 28 To summarize, local replication requirements include, but are not limited to: • Volumes belonging to volume groups • Frequency of replications required • Expiration schedule of replicated copies (stopping Sessions, or removing Clones from Clone Group) Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 29 Note that these are LUN IOPs (as seen from the host). Disk IOPs must take account of the RAID type used for the LUNs. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 30 Note that these are LUN IOPs (as seen from the host). Disk IOPs must take account of the RAID type used for the LUNs. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 31 This lesson covers local replication. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 32 VNX Snapshots address limitations of copy on first write (COFW) SnapView Snapshots. The VNX Snapshot technology is redirect on write (ROW or FOFW). VNX Snapshots are limited to Poolbased LUNs (i.e. not RAID Group LUNs). Up to 256 writeable VNX Snapshots can be associated with any Primary LUN, though only 255 are user visible. Because the VNX Snapshot uses pointers rather than a full copy of the LUN, it is space-efficient, and can be created almost instantaneously. The ROW mechanism does not use a read from the Primary LUN as part of its operation, and thus eliminates the most costly (in performance terms) part of the process. A Reserved LUN Pool is not required for VNX Snapshots - VNX Snapshots use space from the same Pool as their Primary LUN. Management options allow limits to be placed on the amount of space used for VNX Snapshots in a Pool. VNX Snapshots allow replicas of replicas; this includes Snapshots of VNX Snapshots, Snapshots of attached VNX Snapshot Mount Points, and Snapshots of VNX Snapshot Consistency Groups. VNX Snapshots can coexist with SnapView snapshots and clones, and are supported by RecoverPoint. If all VNX Snapshots are removed from a Thick LUN, the driver will detect this and begin the defragmentation process. This converts Thick LUN slices back to contiguous 1 GiB addresses. The process runs in the background and can take a significant amount of time. The user can not disable this conversion process directly, however, it can be prevented by keeping at least one VNX Snapshot of the Thick LUN. Note: while a delete process is running, the Snapshot name remains used. So, if one needs to create a new Snapshot with the same name, it is advisable to rename the Snapshot prior to deleting it. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 33 A VNX Snapshot Mount Point (SMP) is a container that holds SCSI attributes • WWN • Name • Storage Group LUN ID, etc. An SMP is similar to a Snapshot LUN in the SnapView Snapshot environment. It is independent of the VNX Snapshot (though it is tied to the Primary LUN), and can therefore exist without a VNX Snapshot attached to it. Because it behaves like a LUN, it can be migrated to another host and retain its WWN. In order for the host to see the point in time data, the SMP must have a VNX Snapshot attached to it. Once the Snapshot is attached, the host will see the LUN as online and accessible. If the Snapshot is detached, and then another Snapshot is attached, the host will see the new point in time data without the need for a rescan of the bus. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 34 The VNX Snapshot Consistency Group allows Snapshots to be taken at the same point in time on multiple Primary LUNs. If individual Snapshots were made of the Primary LUNs, it is possible that updates to one or more Primary LUNs could take place between the time of the Snapshot on the first Primary LUN and the time of the Snapshot on the last Primary LUN. This causes inconsistency in the Snapshot data for the set of LUNs. The user can ensure consistency by quiescing the application but this is unacceptable in many environments. A Consistency Group can have a Snapshot taken of it, and can have members added or removed. Restore operations can only be performed on Groups that have the same members as the Snapshot. This may require modifying Group membership prior to a restore. When a Snapshot is made of a Group, updates to all members are held until the operation completes. This has the same effect as a quiesce of the I/O to the members, but is performed on the storage system rather than on the host. VNX Snapshot Set – a group of all Snapshots from all LUNs in a Consistency Group. For simplifications, is referred to as CG Snap throughout the material. VNX Snapshot Family – a group of Snaps from the same Primary LUN Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 35 This slide, and the following side, compares the two VNX snapshot technologies. In this slide, the processes involved in a new host write to the source LUN (primary LUN) are compared. In the familiar SnapView Snapshot environment, the COFW process reads the original 64 KiB data chunk from the source LUN, writes that chunk to the Reserved LUN, and updates the pointers in the Reserved LUN map area. Once these steps complete, the host write to the Source LUN is allowed to proceed, and the host will receive an acknowledgement that the write is complete. If a SnapView Snapshot is deleted, data in the RLP is simply removed, and no processing takes place on the Source LUN. In the case of a VNX Snapshot, a new host write is simply written to a new location (redirected) inside the Pool. The original data remains where it is, and is untouched by the ROW process. The granularity of Thin LUNs is 8 KiB, and this is the granularity used for VNX Snapshots. If a VNX Snapshot is removed from a Thin LUN, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxx. If the last VNX Snapshot is removed from a Thick LUN, the defragmentation process moves the new data to the original locations on disk, and freed space is returned to the Pool. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 36 In this slide, the processes involved in a secondary host read of a Snapshot are compared. In the familiar SnapView Snapshot environment, data which has not yet been modified is read from the source LUN, while data that has been modified since the start of the SnapView Session is read from the Reserved LUN. SnapView always needs to perform a lookup to determine whether data is on the Source LUN or Reserved LUN, which causes Snapshot reads to be slower than Source LUN reads. In the case of a VNX Snapshot, the original data remains where it is, and is therefore read from the original location on the Primary LUN. That location will be discovered by a lookup which is no different to that performed on a Thin LUN which does not have a VNX Snapshot, so the performance is largely unchanged. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 37 VNX Snapshot management is performed from the Data Protection tab in the top navigation bar. An option under Wizards is the Snapshot Mount Point Configuration Wizard, while the Consistency Group area has the Create Snapshot Consistency Group link. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 38 When a Pool is configured, the Advanced tab allows the selection of parameters related to the use of VNX Snapshots. The upper checkbox, selected by default, modifies VNX Snapshot behavior based on total Pool utilization; the default behavior will start to delete the oldest VNX Snapshots when the Pool becomes 95% full, and will continue with the deletion until the Pool is 85% full. The lower checkbox, deselected by default, modifies VNX Snapshot behavior based on the amount of Pool space occupied by VNX Snapshots; if it is selected, the default behavior is to start to delete the oldest VNX Snapshots when 25% of the total Pool space is being used by VNX Snapshots, and to continue with the deletion until the total Pool space used by VNX Snapshots reaches 20%. Note that these options are not mutually exclusive. Auto-deletion may be paused by the user at any time that it is running, and may be resumed at any later time. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 39 VNX Snapshots may be taken of individual Primary LUNs, VNX Snapshots, or Consistency Groups. A VNX Snapshot of a Consistency Group implies that VNX Snapshots have been taken of the member Primary LUNs at the same point in time. This can be seen in the Creation Time column for the VNX Snapshots. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 40 This lesson covers local replication. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 41 VNX SnapSure saves disk space and time by allowing multiple snapshot versions of a VNX file system. These logical views are called checkpoints. SnapSure checkpoints can be read-only or read/write. SnapSure is not a discrete copy product and does not maintain a mirror relationship between source and target volumes. It maintains pointers to track changes to the primary file system and reads data from either the primary file system or from a specified copy area. The copy area is referred to as a savVol, and is defined as a VNX File Metavolume. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 42 PFS The PFS is any typical VNX file system. Applications that require access to the PFS are referred to as “PFS Applications”. Checkpoint A point-in-time view of the PFS. SnapSure uses a combination of live PFS data and saved data to display what the file system looked like at a particular point-in-time. A checkpoint is thus dependent on the PFS and is not a disaster recovery solution. It is NOT a copy of a file system. SavVol Each PFS with a checkpoint has an associated save volume, or SavVol. The first change made to each PFS data block triggers SnapSure to copy that data block to the SavVol. It also holds the changes made to a writeable checkpoint. Bitmap SnapSure maintains a bitmap of every data block in the PFS where it identifies if the data block has changed. Each PFS with a checkpoint has one bitmap that always refer to the most recent checkpoint. The only exception is when a PFS has a writeable checkpoint, where an individual bitmap will be created for each writeable checkpoint to track the changes made to it. Blockmap A blockmap of the SavVol is maintained to record the address in the SavVol of each saved data block. Each checkpoint, read-only or writeable, has its own blockmap. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 43 In the next several slides, we will see how SnapSure works in capturing data from file system modifications and providing data to users and applications. Displayed on this slide is a PFS with data blocks containing the letters A through F. When the first file system checkpoint is created, a SavVol is also created on disk to hold the bitmap, the original data from the PFS, and that particular checkpoint’s blockmap. Each bit of the bitmap will reference a block on the PFS. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 44 Next, we are going to have a user or application make some modifications to the PFS. In this case, we are writing an “H” in the place of the “B”, and a “K” in the place of the “E”. Before these writes can take place, SnapSure will place a hold on the I/Os and copy the “B” and “E” to the SavVol. Then the blockmap will be updated with the location of the data in the SavVol. In this example, the first column of the blockmap refers to the block address in the PFS, and the left column refers to the block address in the SavVol. Next, the bitmap is updated with “1”s wherever a block has changed in the PFS. A “0” means that there were no changes for that block. After all this process takes place, SnapSure will then release the hold and the writes can be established. If these same two blocks are modified once again, the writes will go through and nothing will be saved in the SavVol. This is true because we already saved the original data from that point in time and anything after that is not Ckpt1’s responsibility. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 45 When a second checkpoint is created, another blockmap will be created in the same SavVol as the old checkpoint. The bitmap that used to refer to the Ckpt1, now refers to Ckpt2, which is the newest checkpoint. The bitmap is then reset to all “0”s waiting for the next PFS modification. Any writes from now on will be monitored by Ckpt2 since it’s the newest checkpoint. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 46 For our next example, we have an application modifying the first, second, and sixth blocks in the PFS with the letters “J”, “L”, and “S”. SnapSure will hold these writes, copy the original data in the SavVol, and update the bitmap and Ckpt2’s blockmap. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 47 If a user or application decides to access Ckpt2, SnapSure will check the bitmap for any PFS blocks that were modified. In this case, the first, second, and sixth blocks were modified. The data for these blocks will come from the SavVol. Everything else will be read from the PFS. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 48 When an older checkpoint is accessed, like Ckpt1 for example, SnapSure cannot use the bitmap because it refers to the newest checkpoint. SnapSure will have to access the desired checkpoint’s blockmap to check for any data that has been copied to the SavVol. In this case, we’ll take the first and second blocks from the SavVol and use the data to fill the second and fifth blocks of Ckpt1. SnapSure will continue to read all of the old blockmaps as it makes its way to the newest checkpoint. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 49 Once SnapSure arrives at the newest checkpoint, the bitmap can then be utilized to determine any other blocks that have changed in the PFS. In the example above, the bitmap says that the first, second, and sixth blocks have changed in the PFS. Notice that we already have the second block in the PFS accounted for with a “B” from the previous slide due to blockmap1. If SnapSure finds blocks that already have been accounted for, it will simply skip it and go to the next block that is represented in the bitmap. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 50 SnapSure requires a SavVol to hold data when you create the first checkpoint of a PFS. AVM algorithms determine the selection of disks used for a SavVol. AVM tries to match the storage pool for the SavVol with that of the PFS whenever possible. If the storage pool is a system-defined pool, and it is too small for the SavVol, AVM will auto-extend it. User-defined storage pools cannot be auto-extended. The system allows SavVols to be created and extended until the sum of the space consumed by all SavVols on the system exceeds 20% (default) of the total space available. This is tunable in the /nas/sys/nas_param file. When the SavVol High Water Mark (HWM) is reached, SnapSure will extend the SavVol based on the size of the file system: • If PFS < 64 MiB, then extension = 6i4 MB. • If PFS < 20 GiB, then extension = PFS size. • If PFS > 200 GiB, then extension = 10 percent of the PFS size. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 51 The VNX File allows up to 1 GiB of physical RAM per Data Mover to page the bitmap and blockmaps for all the PFS that have checkpoints. The 1 GiB also ensures that sufficient Data Mover memory is available for VNX Replicator. For systems with less than 4 GiB of memory, a total of 512 MiB of physical RAM per Data Mover is allocated for the blockmap storage. Each time a checkpoint is read, the system queries it to find the required data block’s location. For any checkpoint, blockmap entries needed by the system, but not resident in main memory are paged in from the SavVol. The entries stay in main memory until system memory consumption requires them to be purged. A bitmap will consume 1 bit for every block (8 KiB) in the PFS. The blockmap will consume 8 bytes for every block (8 KiB) in the checkpoint. Once a second checkpoint is created, still one bitmap exists. There is one bitmap for the most recent checkpoint only. Blockmaps exist for every checkpoint created. The server_sysstat command with the -blockmap switch will provide the Data Mover memory space currently being consumed by all of the blockmaps. The following slide shows an example of calculating the SnapSure memory requirements for a 10 GiB PFS that has two checkpoints created. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 52 The VNX checks the amount of Data Mover memory available before it allows a checkpoint file system to be created, extended, or mounted. If that amount exceeds the predefined limit of the Data Mover’s total memory allotted for checkpoint (or VNX Replicator) SavVols, an error message is sent to the /nas/log/sys_log file. At this point, you can delete any unused SavVols, upgrade to a Data Mover with more memory, or use another Data Mover with more memory. To avoid this situation, plan memory consumption carefully. Use the calculations displayed here to determine your specific memory requirements. These calculations support the “rule of thumb” that says for every 1 GiB of savVol space consumed, one MiB of memory will be required. The scenario says that a 10 GiB PFS has 2 checkpoints created. One checkpoint has 10% of the PFS in the SavVol, and the other checkpoint has 1% of the PFS in the SavVol. Checkpoint 2 is the only one that will have a bitmap associated with it since it is the most recent checkpoint. Since each bitmap requires 1 bit for every block in the PFS, you must calculate the number of blocks in the PFS. The calculation on the slide shows that the PFS has 1,310,720 8 KiB blocks in the 10 GiB PFS. Each block requires 1 bit, which calculates to 160 KiB. Checkpoint 2 also has a blockmap which will consume 8 bytes for every block in the checkpoint. The checkpoint is 1% of the PFS. The calculation on the slide shows that there are 13,107.2 blocks. Multiply this by 8 bytes per block to get about 102 KiB. Checkpoint 1 does not require a bitmap and 1024 KiB is required for the blockmap. As a result, the total memory utilization is 1024KiB (checkpoint 1 blockmap) + 102KiB (checkpoint 2 blockmap) + 160KiB (checkpoint2 bitmap) = 1,286 KiB Using the Rule of Thumb; the checkpoints have 11% of the PFS in the SavVol, or 1.1GB. Therefore, that would require ~1 MiB of memory when rounding up. Note: Remember that VNX Replicator internal checkpoints needs to be taken into account. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 53 SnapSure operations typically cause a decrease in performance. Creating a checkpoint requires the PFS to be paused. Therefore, PFS write activity is suspended, but read activity continues while the system creates the checkpoint. The pause time depends on the amount of data in the cache, but it is typically one second or less. SnapSure needs time to create the SavVol for the file system if the checkpoint is the first one. The PFS will see performance degradation every time a block is modified for the first time only. This is known as the Copy On First Write (COFW) penalty. Once that particular block is modified, any other modifications to the same block will not impact performance. Deleting a checkpoint requires the PFS to be paused. Therefore, PFS write activity is suspended momentarily, but read activity continues while the system deletes the checkpoint. Restoring a PFS from a checkpoint requires the PFS to be frozen. Therefore, all PFS activities are suspended during the restore initialization process. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 54 Refreshing a checkpoint requires the checkpoint to be frozen. Checkpoint read activity is suspended while the system refreshes the checkpoint. During a refresh, the checkpoint is deleted and another one is created with the same checkpoint name. Clients attempting to access the checkpoint during a refresh process experience the following: • NFS clients — The system continuously tries to connect indefinitely. When the system thaws, the file system automatically remounts. • CIFS clients — Depending on the application running on Windows, or if the system freezes for more than 45 seconds, the Windows application might drop the link. The share might need to be remounted and remapped. If a checkpoint becomes inactive for any reason, read/write activity on the PFS continues uninterrupted. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 55 NAS Engineering has done some SnapSure performance testing. The results will be shown on the following slides. The test environment was CIFS only. NFS has been shown to perform better under all the conditions that will be described here. CIFS results vary by workload. The testing was done with “pure” workloads including sequential read, random read, sequential write, and random write. Most workloads typically include a combination of all (or some) of these workloads. The throughput was tested on an uncheckpointed PFS, as well as a PFS with a full checkpoint and an empty checkpoint. A full checkpoint means that 100% of the PFS blocks have already been saved in the SavVol. In this case, all writes to existing PFS data were guaranteed to be re-writes and does not affect the checkpoint (COFW was already performed). All reads from the full checkpoint are satisfied by the SavVol. An empty checkpoint means that there is no data in the SavVol; therefore, any write to the PFS requires a write to the SavVol (COFW). All read requests to the empty checkpoint are satisfied by the PFS. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 56 The results clearly show that when SnapSure is providing the expected additional functionality, for COFW, a decrease in throughput will occur. When writing to the PFS with an empty checkpoint, we see almost a 50% decrease in throughput. Since every write to a block would be the first write, the block would first have to be written to the SavVol. Random Write shows a similar degradation. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 57 When the checkpoint is empty, reading from the checkpoint is much the same as reading from a file system without any checkpoints. When the checkpoint is fully populated, reading from the checkpoint shows a 90% degradation for a sequential read. This is caused by SnapSure processing required to check the bitmap/blockmap to determine if the block has been changed and whether it should be read from the PFS or the SavVol. When performing a random read on the full checkpoint, the performance is a bit better than sequential. This is true due to the fact that SavVol reads are random in nature. When reading from a checkpoint, the size of the checkpoint will determine how long it takes to determine if the block should be read from the SavVol or PFS, and the time to read. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 58 Writes to a single SavVol are purely sequential. NL-SAS drives have very good sequential I/O performance. On the other hand, reads from a SavVol are nearly always random where SAS drives perform better. Workload analysis is important in determining if NL-SAS drives are appropriate for SavVols. Many SnapSure checkpoints are never read from at all; or, if they are, the reads are infrequent and nonperformance-sensitive. In these cases, NL-SAS drives could be used for SavVols. If checkpoints are used for testing, data mining and data sharing, and experience periods of heavy read access, then SAS drives are a better choice. Be careful when using multiple SavVols on a single set of NL-SAS drives since the I/O at the disk level will appear more random where SAS drives perform better. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 59 Migration activity produces significant changes in the file system. Therefore, it is best to complete migration tasks before using SnapSure. This avoids consuming SnapSure resources to capture information that is unnecessary to the checkpoint. If you have a choice, read from the most current (active) checkpoint. When the latest checkpoint is accessed by clients, SnapSure queries its bitmap for the existence of the needed block. Access through the bitmap is faster than access through the blockmap. Therefore, read performance will be slightly better from the most recent checkpoint than from older checkpoints where blockmaps will need to be read. When multiple checkpoints are active, additional Data Mover resources are required; memory and CPU. Therefore, less Data Mover memory would be available for read cache and other operations. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 60 This lesson covers remote replication on VNX systems. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 61 Full copy mode copies all data from source to target, and does not track changes to the source while the Session is running. The number of links used and their capacity is decided by the amount of data needed to move inside a given time. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 62 Incremental mode copies all data from source to target initially, and performs incremental updates thereafter. The number of links used and their capacity is decided by the amount of data needed to move inside a given time. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 63 Incremental SAN Copy (ISC) allows the transfer of changed chunks only, from source to destination. ISC copies all changes made until a user-defined point in time, and uses SnapView Snapshot technology as required to keep track of where those changes are. The changed chunks are then copied from source to destination and a checkpoint mechanism tracks the progress of the transfer. The Source LUN is available to the host at all times. The Target LUN is only of use to an attached host once the transfer is completed. At that point, the Target LUN will be a consistent, restartable, but previous point-in-time copy of the Source LUN. The following show the enumeration of the steps in the graphic shown here: 1. 2. 3. 4. 5. 6. Primary host writes to Source LUN COFW invoked if needed Acknowledgement from local storage system Trigger event Chunks copied from local to remote storage system Acknowledgement from remote storage system Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 64 ISC is not designed to be a DR product, though it is often used in that manner. Important considerations are mentioned here. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 65 Locality of Reference: Write I/O rate does NOT always equal number of COFW 1,000 I/O per second 20% write = 200 writes per second does NOT always cause 200 COFW per second…usually some ‘re-hit’, often significant Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 66 In some cases, the necessary data is not available or the customer is not prepared to provide it. In these cases, it is almost impossible to predict whether the SAN Copy cycle time matches the desired RPO. As such, customer assumptions are always listed as the lowest preference and quite obviously present the most risk to successful delivery of the SAN Copy solution. The change rate and write activity are the two most important factors here. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 67 The low, medium, and high change rates above dictate the respective change rates that are likely to be seen. If you map your starting reference point to these change profiles, then the table can help define how the logarithmic curve applies to your customer environment. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 68 Time taken to move the data depends on a number of factors, discussed on this slide. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 69 The key point here is that more resources allow a more even spread of I/O across LUNs. Remember to factor in any Clone operations. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 70 Here we are doing a simple sizing. As we will see, the simple math is an insufficient way to determine cycle times. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 71 We are looking at synchronizing 12.5 GB/hour. Our simple calculations determine we need less than a T3 to propagate the changes over the link for a one hour cycle. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 72 Now we walk through a sample sizing effort. The estimated changed data is the total amount of data multiplied by the 12% logarithmic change rate. We are using a high change rate for a conservative estimate. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 73 Simple method will only get into the ‘ball park’ for a guess at changed data. We need to know the peak number of changes for a given cycle. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 74 There are a number of ways that you can improve cycle times with a SAN Copy configuration. Here is a list of a few of them. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 75 A key factor is the front-end port utilization. If you can’t get in the front door, the work certainly is not going to get done. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 76 Note that ISC uses 64 KiB transfers for the initial synchronization, where SC can use up to 1 MiB. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 77 Application data may need processing before it becomes usable; this processing, while strictly part of the RTO, is ignored here. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 78 This lesson covers local replication. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 79 The VNX series offers all the replication methodologies needed to keep data available and secure. Integrated with Unisphere, RecoverPoint/SE offers several data protection strategies. RecoverPoint/SE Continuous data protection (CDP) enables local protection for block environments. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 80 There are two types of splitters that can be employed in a RecoverPoint/SE solution: 1. K drivers are low level operating system kernel drivers that split the I/O – for Windows hosts only (with RecoverPoint/SE) 2. VNX splitter drivers where the driver runs on the SPs Choosing the type of splitter is one of the main design concerns in architecting a RecoverPoint/SE solution. Consider the following when choosing a splitter type: • When LUN size > 2 TB needs to be supported, VNX splitter is currently the only choice. • VNX splitters are easy to deploy and manage when scalability is not a big concern and should be chosen whenever there is an option to choose a VNX splitter. • Host-based splitters are ideal for small installations where a small hit on the host CPU performance is acceptable. Large numbers of hosts mean that manageability will be cumbersome. • When performance is critical, array-based splitters are a good choice. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 81 To deploy RecoverPoint/SE release 3.4 and later with VNX splitters: • The latest VNX OE bundle. • The latest RecoverPoint splitter engine (driver). Verify whether this is required based on the VNX OE bundle version. • The RecoverPoint splitter enabler. • All RPA ports must be zoned with all available VNX SP ports RecoverPoint/SE version 3.4 supports the VNX array-based splitter. This splitter runs in each storage processor of a VNX array and will split all writes to a VNX LUN, sending one copy to the original target and the other copy to the RecoverPoint appliance. Refer to the EMC RecoverPoint Deployment Manager Release Notes for the latest version information. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 82 For sizing journal volumes, use the following equation: Journal size = (changed data in Mibps) x (required rollback time in seconds) / (percentage reserved from target-side log) x system factor (usually 5% for internal system needs) For example: Journal size = 5 Mibps x 86400 seconds (24 hours) / 0.8 (80%) x 1.05 = 567000Mib (~71 GiB) The minimum size for a journal volume is 5 GB. Note: RecoverPoint field implementers recommend 20% of the data you are replicating for sizing the journal volume. The Snapshot consolidation feature allows the snapshots in the copy journal to be consolidated to allow for the storage of a longer history of data. For most customers, the granularity of snapshots becomes less important over time. Snapshot consolidation allows us to retain the crucial per write or per second data of write transactions for a specified period of time (for example, the last 24 hours) and only then to start gradually decreasing the granularity of older snapshots, at preset intervals (for example, to create daily, then weekly, and then monthly snapshots). Journal volume sizing when utilizing ‘snapshot consolidation’ must take note of incremental change of data over the period of consolidation. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 83 A single RecoverPoint appliance can sustain an average of 75 MiB/s write I/Os and up to peaks of 110 MiB/s. This throughput figure should be used to calculate the number of appliances required for the desired replication. A minimum of two RPAs are required for redundancy in any RecoverPoint solution. The maximum sustainable incoming throughput for a single cluster is 600 MiB/s. Always refer to the release notes for the most up to date information. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 84 Note that in this discussion I/O refers to write I/O only. There are three potential bottle necks in a RecoverPoint/SE environment that may limit the amount of I/O the system can sustain. The bottle necks are the WAN pipe between sites, the performance of the remote storage, and the performance of the appliances themselves. The system will be able to sustain the load dictated by the weakest link. In case the system cannot sustain the load, it goes into a high load condition. High load occurs when internal buffers on the appliance fill up. This may happen when the appliances themselves cannot sustain the load or the available WAN bandwidth is insufficient. When the remote storage is the bottle neck, behavior is different. See the Storage Performance section for details. In high load the system keeps tracking the I/O. Once the high load condition is over, the system resynchronizes the missed I/O. When sizing, it should be kept in mind that occasional high loads are not problematic for the end user. Hence, the system should be sized to be able to sustain the average load, and not with the goal of being able to sustain the maximal peaks. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 85 During normal operations, RecoverPoint/SE has no impact on the production storage. During initialization of a consistency group, the RP appliance reads from the production storage. This may entail a performance impact on the source storage and theoretically impact production applications. It is possible to configure RP to throttle the read rate from the production storage although it is difficult to find the optimal rate without impacting the production performance. The RecoverPoint recommended best practice for first time initialization in large environments is to do it one consistency group at a time. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 86 Review the system and locate errors or warnings using the RecoverPoint Management Application. Log files, the system panel, system traffic, and the consistency group status are all useful in finding problems. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 87 This lesson covers remote replication on VNX systems. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 88 The sequence of events when using MirrorView is as follows: 1. The host sends a write to the SP that owns the Primary Image LUN. 2. The Reserved LUN is updated via the COFW process (if required). 3. The Primary Image is updated. 4. A ‘write complete’ acknowledgement is passed to the host. 5. A Snapshot is taken of the Secondary Image LUN before the update cycle starts. 6. The changed data, tracked by a Snapshot, is sent to the remote SP that owns the Secondary Image when the update is due. 7. Data being updated causes a COFW, which puts original data in the Reserved LUN. 8. The Secondary Image is updated. 9. Acknowledgements are passed to the Primary Image’s owning SP. 10. If required, the Secondary Image may be rolled back to a previous known consistent state. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 89 MirrorView/A uses SnapView Snapshots and SAN Copy as underlying technology. The traditional SnapView 64 KiB chunk is still copied from Source LUN to the RLP. SnapView flags the 2 KiB ‘sub-chunks’ that actually changed, and transfers only those 2 KiB pieces across the network. This helps improve network performance, but does not affect the performance impact on the Source LUN as a result of COFWs. Note that latency of a MirrorView/A transfer does not affect performance of the primary image, except that the SnapView Session runs for a longer time. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 90 With MV/A we need to understand the Workload to determine the appropriate amount of bandwidth and RLP space. The desired Recovery Point Objective dictates the update interval chosen. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 91 The effect of MirrorView/A on VNX performance cannot be ignored. The chart above shows a typical trace of MirrorView/A performance. In region A, the source LUN is operating normally and is not yet a MirrorView/A image. Response time depends on a number of factors, but is typically around 1 ms for cached writes. Note that we are most concerned about the effect of MirrorView/A on host writes to the production LUN (primary image); the impact on reads is indirect, and typically much less severe. In region B, the mirror is being updated at the specified intervals. Response times peak each time an update cycle is started (because of COFW activity), then decrease over time, until the next update cycle starts. Response times here also depend on various factors; the response time for writes is likely to be at least 3 times what it was previously, plus the response time for an uncached read; we expect to see a peak of at least 15 ms and, under severe operating conditions, peak response times of 50 ms or more. The line drawn midway through the sawtooth portion shows the average response time when update cycles are running. It is important to note that COFW activity only occurs if data chunks are modified two or more times once the underlying ISC Session has been marked – if there is no rewriting of data chunks (no locality of reference for writes), then only tracking activity takes place. Region C shows the response time curve that is produced by normal COFW activity. Note that this is shown for illustration only – MirrorView/A Sessions are typically not kept active for long enough to see the full decay curve. The time taken to return to the original response time (before the Session started) is known as the recovery time; its duration depends on the size of the Source LUN, the granularity of tracking, the number of writes per second, and the randomness of the data access pattern. <Continued > Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 92 Be aware, then, that sizing a MirrorView/A solution does not consist simply of making sure that there is enough space in the RLP – RLP performance dramatically affects the performance of the primary image. Factors to consider are therefore: • Source LUN: size, RAID type, number of disks, data access pattern, R/W ratio, IOPs • RLP LUNs: size, RAID type, number of disks, expected data access pattern, expected R/W ratio, projected IOPs Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 93 This chart shows 6 update cycles on a MV/A mirror (LUN 50) with update interval set to 15 minutes from start of last update. Immediately after the 5th update cycle ended, the I/O rate to the LUN was doubled. Things to note: • Effect of MV/A activity on LUN 51, which is not a mirror image, but is on the same RG as LUN 50 • Effect on response time • Shape of response time curve during update • Length of time taken for update cycle 6 to complete – what does this show? Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 94 Note that there is a performance impact even when MV/A is not actively transferring data. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 95 The sequence of events when using MirrorView is as follows: 1. The host sends a write to the SP that owns the Primary Image LUN, and the primary image is updated. 2. (Optional) The WIL is updated. 3. The Primary Image is updated. 4. The data is sent to the Secondary VNX. 5. The Secondary Image is updated. 6. The Secondary VNX sends an acknowledgement to the Primary VNX. 7. A ‘write complete’ acknowledgement is sent to the host. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 96 The size of the I/O matters as it is sent across the link. Smaller pipes have a tougher time with larger block sizes. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 97 Here we can see what the effect on the transfer rate is with various block sizes. For an example let’s use a T3 line. T3: Speed = 45 Mb/s approx 4.5 MiB/s 4 KiB block: 4 KiB / 4.5 = 0.9 ms 32 KiB block: 32 KiB / 4.5 = 7.3 ms 256 KiB block: 256 KiB / 4.5 = 58.3 ms Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 98 If data is not required for remote restart, it should not be mirrored. Find out from the customer which data is not required. Examples might include paging space, test filesystems, filesystems used for database reorgs, temporary files, etc. Once you have your base figure, multiply the number of writes per second by the average blocksize. This gives you the bandwidth requirement before compression. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 99 When we size we must account for the round trip. In synchronous transfers an I/O is not complete until the acknowledgement is sent. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 100 This slide helps you understand write distribution. Write I/Os are usually concentrated on a few volumes. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 101 IP networks require qualification to ensure they have sufficient quality to carry data replication workloads without significant packet loss and inconsistent packet arrival. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 102 Here are some things to watch out for. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 103 Before there were modeling tools there were individual spreadsheets that all used iterations of the same formulas that applied basic math to find the results of an MirrorView/S solution. The problem with theses models were they were often prone to errors and miscalculations. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 104 MV Link Service Time = Protocol converter delay + Signal Propagation Delay + Data Transfer Time To estimate the impact of implementing MV on the expected Write IO response times first we need to calculate the amount of time it takes to send the IO across the MV link to the Secondary and get confirmation back that the data has been received. This is important because in Synchronous mode as the Write IO is not confirmed as completed to the host until it has been sent to the Secondary and an acknowledgement received back from the Target VNX. Therefore host response time for a local Write IO is extended. With MV over Fibre Channel and DWDM implementations the Protocol converter delay for switches or multiplexors is considered to be insignificant. However careful consideration should be made to ensure sufficient “buffer to buffer credits” for MV over Fibre Channel implementations. Signal Propagation Delay The speed of light through a fiber optic cable is constant, however a small delay is incurred. This can be calculated as 1 millisecond per 125 miles. Remember to take into account the number of “round trips” needed when calculating this figure. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 105 MV Link Service Time = Protocol converter delay + Signal Propagation Delay + Data Transfer Time “Data Transfer Time” is the amount of time it takes for a single write IO to be transmitted across the MV link in one direction. This is dependent on the blocksize and the bandwidth of a single MV link available. Even though multiple MV links should be configured for resiliency, a single IO can only be transmitted down one link, therefore data transfer time can not be improved by increasing the number of links. To improve data transfer time, (and therefore the host Write IO Response Time), it is necessary to use a faster link such as Fibre Channel. The MV Link Service Time can now be calculated. The protocol converter delay + signal propagation delay + data transfer time = MV Link Service Time. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 106 MV Response Time = Current Response Time x 2 + Link Service Time Now we can calculate the average expected host Response Time after implementing MV. The MV Write IO Response Time is calculated by adding the Link Service Time to the current Response Time (or new expected non-MV response time). The READ IO response is unaffected by implementing MV. Average Expected Response Time for Read & Write IOs can be calculated by using the formula: (Read IO Response Time * Read Ratio) + (Write IO Response Time * Write Ratio) Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 107 With the introduction of corporately sponsored sizing tools, the risk associated with synchronous design is reduced to the data that is input into the tool. Since the tool models with host or VNX performance data, the risk is reduced to the data collection sample. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 108 Note that business processes performed on data after it is recovered is ignored when addressing RTO here. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 109 Secondary MirrorView/S images should have write cache enabled; if they don’t, they’ll slow down the synchronization, and even regular updates caused by host I/O to the primary image. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 110 In environments that use synchronous replication, such as MirrorView/S environments, planning the network link is very important. The link must be sized to carry not only the regular writes, but must also have enough headroom to handle traffic generated after an event such as a mirror fracture. If the link is sized to carry regular writes only, it may be impossible to resynchronize the mirror after loss of the link for any appreciable length of time. The size of MV/S (and SnapView Clone) extents can cause the amount of data to be replicated to increase dramatically as the fracture duration increases. Bear in mind that one an extent is marked as dirty – by changing even only one block of data – the entire extent will be copied to the replica. This example asks 2 questions: how much data is marked dirty in a fracture, and how much bandwidth is required in order to copy that dirty data to the replica in a reasonable – usually customer-specified – time? Bandwidth required by regular writes can be calculated by multiplying the number of writes in a second by the size of the writes. Here we have 1,000 IOPs with a R/W ratio of 3:1, giving 250 writes/s. Each write is 4 KiB in size, giving a required bandwidth of 250 x 4 KiB = 1,000 KiB/s. To convert KiB/s to Kb/s, multiply by 10 (there are 8 bits in a byte, but using 10 compensates for protocol overhead and the binary-decimal conversion). We therefore have 10,000 Kb/s, or (dividing by 1,000 since this is a serial link) 10 Mb/s. The amount of data marked dirty during a fracture will be the size of the extent multiplied by the number of writes that occurred during the fracture, assuming random I/O. In this case, extent size is 256 blocks (LUN size is 256 GiB) = 128 KiB, and the number of writes is 250 writes/s x 5 minutes x 60 seconds/minute = 9,600,000 KiB. If this data must be copied to the replica in 75 minutes (the resynchronization time), then the bandwidth requirement for the synchronization traffic only is 9,600,000 KiB / (75 minutes x 60 seconds/minute) = 2,133 KiB/s, which can be converted to 21.3 Mb/s. Adding the 10 Mb/s required by regular writes gives a total requirement of 31.3 Mb/s. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 111 This lesson covers remote replication on VNX systems. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 112 VNX Replicator is an IP-based replication solution that produces a read-only, point-in-time copy of a file system, iSCSI LUN, or VDM. The VNX Replication service periodically updates this copy, making it consistent with the production object. Replicator uses internal checkpoints to ensure availability of the most recent point-in-time copy. These internal checkpoints are based on SnapSure technology. This read-only replica can be used by a Data Mover in the same VNX cabinet (local and loopback replication), or a Data Mover at a remote site (remote replication) for content distribution, backup, and application testing. Replication is an asynchronous process. The target side may be a certain amount of minutes out of sync with the source side. The default is 10 minutes. When a replication session is first started, a full backup is performed. After initial synchronization, Replicator only sends changed data over IP. In the event that the primary site becomes unavailable for processing, VNX Replicator enables you to failover to the remote site for production. When the primary site becomes available, you can use VNX Replicator to synchronize the primary site with the remote site, and then failback to the primary site for production. You can also use the switchover/reverse features to perform maintenance at the primary site or testing at the remote site. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 113 Replicator can set the data amount to be sent across the IP network before an acknowledgement is required from the receiving side. Replicator performs TCP Window auto-sizing on a per-session pertransfer basis, so it is not necessary to tune these values. The use of jumbo frames has not been shown to improve source-to-destination transfer rates for Replicator, so it is not critical to enable jumbo frames on the entire replication data path. When transferring data over a WAN, there is a high risk of dropped packets due to the latency and instability of the medium. Using regular frames will allow Replicator to resend packets faster and more efficiently. Depending on the nature of the data being transferred, external network compression devices should be able to decrease the packet sizes and improve replication transfer rates for environments with limited WAN bandwidth. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 114 The bandwidth schedule controls throttle bandwidth by specifying bandwidth limits on specific periods of time. A bandwidth schedule allocates the interconnect bandwidth used on the source and destination sites for specific days and times, instead of using all available bandwidth at all times for the replication. For example, during work hours 40% of the bandwidth can be allocated to Replicator and then changed to 100% during off hours. Each side of a Data Mover interconnect should have the same bandwidth schedule for all replication sessions using that interconnect. By default, an interconnect provides all available bandwidth at all times for the interconnect. Displayed here is a bandwidth schedule of 10 MB/s from 7 in the morning to 6 at night for all weekdays. On the weekends, Replicator will use all available bandwidth. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 115 When possible, dedicate separate spindles for source, destination, and SavVol. The pattern of I/O for each of these is distinct, and mixing them on the same physical spindles can lead to contention. Also, a given set of disks should perform most efficiently under a consistent workload. In addition, Replicator (V2) has several patterns of linked or concurrent I/O. For instance, because there is a checkpoint of the PFS, new writes to the PFS will result in additional reads from the PFS and subsequent writes to the SavVol. If the PFS and SavVol share the same spindles, there will be additional latency for these operations as the disk head positions for the PFS read, and then seeks to position for the SavVol write. Likewise, during the transfer of the delta set, data is read from the source object (primarily) and written immediately to the destination object. If source and destination share spindles, the seek latency will again be introduced. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 116 NAS Engineering has done some Replicator (V2) performance testing. The results will be shown on the following slides. The test environment consists of CIFS clients accessing (2) NS80s with CX3-80 back-ends and (2) NS960s with CX4-960 back-ends. All file systems being replicated are 20 GiB in size and created on 450 GB 15K RPM FC drives. The network used in the testing is a GigE LAN. We will be looking at a remote replication. The first result will compare the NS80 and NS-960 full copy transfer rates. The data reported in this test represents a maximum; the systems tested in this characterization were optimized to provide greatest possible transfer rates. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 117 The session counts in these results are the number of sessions that are actively transferring data, not necessarily the number of configured sessions. On a per-session basis, the NS-960 provides 2 times the Replicator performance vs. the NS80. Overall the NS-960 offers all-around improved performance over the NS80, and this improvement as you can see also applies to Replicator (V2) as well. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 118 NL-SAS drives are useful because they maximize available storage while minimizing cost. However, their performance characteristics must be considered before determining that they are appropriate for a specific purpose. Generally, because Replicator SavVol activity is highly sequential, you can use NL-SAS disks for the SavVol without performance impact, especially if the SavVol is built on spindles that are not shared with data file systems. It may also be tempting to use NL-SAS drives for the replication destination for cost efficiency. NLSAS drives should only be considered appropriate for use by the destination object under one of the following scenarios: • If it is determined that NL-SAS drives will be able to meet the demands of the anticipated workload on the source object, and that they will further be able to meet the additional I/O requirements of replication, then using NL-SAS for the destination should also provide adequate performance. • NL-SAS drives provide poor random write performance when compared to SAS drives, but their sequential write performance is nearly equal to SAS. If it is determined that the write workload to the source system will be highly sequential, then it is likely that NL-SAS drives would provide the appropriate performance to receive the Replicator updates. • NL-SAS drives should typically not be used for the destination object under any of the following scenarios: • If other objects receiving client traffic will be sharing physical spindles with the destination object, the additional workloads could interact with the replication workload to create a more random and intensive workload than either separately. • If the destination object will serve as the source for a cascaded replication, then the additional workload should be factored in . Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 119 Replicator has policies to control how often the destination object is refreshed by using the max_time_out_of_sync (SLA) setting, which is configured on a per-session basis. This value determines the longest amount of time that will elapse between Replicator transfers. The default SLA value is 10 minutes. The SLA can be set between 1 and 1440 minutes. This value should align with the customer’s requested recovery point objective. Use of a larger SLA can be valuable if any of the following are true: • If the source object is prone to bursts of write activity combined with longer periods of relative inactivity, use of a small SLA will cause Replicator to add additional I/O to the spindles when they are already at their busiest (during the burst). However, if the SLA is set higher, then it is more likely that the source disks will have a chance to absorb the burst of activity and return to a quieter level of activity before the transfer of the changes. • Write activity to the source object tends to overwrite the same block of data multiple times -- With a one-minute SLA, data that is written to the source object is marked for transfer almost immediately. For most workloads in this scenario, there is very little chance that a subsequent write intended for the same block will occur before the next transfer is initiated. As the value of the SLA increases, there is a greater chance for these re-hits that decrease the amount of data which needs to be transferred Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 120 This lesson covers remote replication on VNX systems. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 121 The VNX series offers all the replication methodologies needed to keep data available and secure. Integrated with Unisphere, RecoverPoint/SE offers several data protection strategies. RecoverPoint/SE Continuous data protection (CDP) enables local protection for block environments. RecoverPoint/SE Concurrent local and remote (CLR) replication enables concurrent local and remote replication for block environments. RecoverPoint/SE Continuous remote replication (CRR) enables block protection as well as a file Cabinet DR solution. This enables failover and failback of both block and file data from one VNX to another. Data can be synchronously or asynchronously replicated. During file failover, one or more stand-by Data Movers at the DR site come online and take over for the primary location. After the primary site has been brought online, failback would allow the primary site to resume operations as normal. Configuration can be active/passive as shown here or active/active where each array is the DR site for the other. Deduplication and Compression allow for efficient network utilization, reducing WAN bandwidth by up to 90%, enabling very large amounts of data to be protected without requiring large WAN bandwidth. RecoverPoint/SE is also integrated with both vCenter and Site Recovery Manager (SRM). SRM integration is block only. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 122 There are four major components to a RecoverPoint installation. • RecoverPoint Appliances (RPA) – These appliances are Linux based boxes that accept the “split” data and route the data to the appropriate destination volume, either via IP or Fibre Channel. The RPA also acts as the sole management interface to the RecoverPoint installation. • RecoverPoint Journal Volumes – Journal volumes are dedicated LUNs on both Production and Target sides that are used to stage small aperture, incremental snapshots of the host data. As the personality of production and target can change during failover and failback scenarios, Journal volumes are required on all sides of Replication (production, CDP and CRR). • Splitter – RecoverPoint splitter driver is a use-specific, small footprint software that enables continuous data protection (CDP) and continuous remote replication (CRR). The splitter driver can be loaded on a host, or on a VNX/CLARiiON array. • Remote Replication – Two RecoverPoint Appliance (RPA) clusters can be connected via TCP/IP or FC in order to perform replication to a remote location. RPA clusters connected via TCP/IP for remote communication will transfer "split" data via IP to the remote cluster. The target cluster's distance from the source is only limited by the physical limitations of TCP/IP. RPA clusters can also be connected remotely via Fibre Channel. They can reside on the same fabric or on different fabrics, as long as the two clusters can be zoned together. The target cluster's distance from the source is only limited by the physical limitations of FC. RPA clusters can support distance extension hardware (i.e., DWDM) to extend the distance between clusters. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 123 A fully redundant and high fidelity network with minimal packet loss would greatly improve the replication performance. Gateway Load Balancing Protocol (GLBP) or Virtual Router Redundancy Protocol (VRRP) can be used to configure redundant paths in a WAN environment. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 124 RecoverPoint WAN network, in case of remote replication, must be well-engineered with no packet loss or duplication as that would lead to undesirable retransmission. While planning the network, care must be taken to ensure that the average utilized throughput does not exceed the available bandwidth. Oversubscribing available bandwidth leads to network congestion, which causes dropped packets and leads to TCP slow start. Network congestion must be considered between switches as well as between the switch and the end device. To determine the bandwidth required to meet end-user requirements and user RPO requirements, the I/O fluctuations should be understood and taken into consideration. In order to size the WAN pipe, the relevant data is: • Average incoming I/O for a representative window in MB/s (24 hrs/7 days/30 days) • Compression level achievable on the data (This is often hard to obtain and depends on the compressibility of the data. The rule is 2x to 6x.) Best practice is to dedicate a segment or pipe for the replication traffic or implement an external QOS system to ensure bandwidth allocated to replication is available to meet the required recover point objectives (RPO). From these numbers, compute the minimal BW requirements of the environment by multiplying the estimated compression level by the average incoming data. It should be noted that allocating this BW for replication does not provide any guarantee on RPO or the frequency of high loads because the I/O rate can fluctuate throughout the representative window. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 125 Deduplication is a consistency group policy that eliminates the transfer of repetitive data to a remote site, saving bandwidth. When deduplication is enabled for a consistency group, every new block of data is stored in both the local and the remote RPAs. Each block that arrives at the local RPA is analyzed, and whenever duplicate information is detected, a request is sent to the remote RPA to deliver the data directly to the remote storage, instead of sending the data to the remote site. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 126 Compression is CPU intensive; therefore, for best performance of RecoverPoint, it is important to configure compression levels correctly. Setting a strong compression causes CPU congestion and high loads because of the inability of the RPA to compress all data. Similarly, when the WAN setting is limited to low compression, it causes high loads because it will try to transmit more data. The general rules for compression are as follows: (all units in Megabytes/sec): • When used with low bandwidth (2 MiB/s) or with low upper bound on the write-rate, (in scale of 5 MiB/s) use the maximal compression level. Note that this is only a general rule, and when data is very compressible, we can handle 10 MiB of data with highest compression. • When used with higher bandwidth (5 MiB/s) or with upper bound of about 10-15 MiB/s on the write rate, use one of the mid-range compression levels. • In any other case, use the minimal compression level. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 127 Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 128 Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 129 Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 130 Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 131 Test the knowledge acquired through this training by answering the questions in this slide. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 132 Displayed here are the answers from the previous slide. Please take a moment to review them. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 133 Test the knowledge acquired through this training by answering the questions in this slide. Continue to the next page for the answer key. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 134 Displayed here are the answers from the previous slide. Please take a moment to review them. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 135 This module covered BC Design on VNX systems. Copyright © 2012 EMC Corporation. All rights reserved Module 5: BC Design Best Practices 136 This module focuses on a case study and design exercises. Copyright © 2012 EMC Corporation. All rights reserved Module 6: Case Studies 1 Copyright © 2012 EMC Corporation. All rights reserved Module 6: Case Studies 2 This module covered a case study and design exercises. Copyright © 2012 EMC Corporation. All rights reserved Module 6: Case Studies 3 This course covered gathering relevant information, analyzing it, and using the result of that analysis to design a VNX solution. Copyright © 2012 EMC Corporation. All rights reserved Course Introduction 4 This concludes the training. Thank you for your participation. Please remember to complete the course evaluation available from your instructor. Copyright © 2012 EMC Corporation. All rights reserved