Australian Government Data Centre Strategy 2010-2025 Better Practice Guide: Data Centre Structure August 2013 Contents Contents 2 1. Introduction 3 Purpose 3 Scope 3 Policy Framework 4 Related documents 4 2. Discussion 6 Overview 6 About the data centre structure 6 Assessing the Structure 6 Limits and Trends 11 Operational Considerations 12 Conclusion 13 3. Better practices 15 Planning 15 Operations 15 4. Conclusion Summary of Better Practices 17 17 1. Introduction The purpose of this guide is to advise Australian Government agencies on ways to improve operations relating to the data centre structure. Many government functions are critically dependent upon information and communication technology (ICT) systems based in data centres. The principal purposes of the data centre’s physical structure are to house information and communication technology (ICT) equipment, control the movement of people and goods through the building, and the distribution of air, water and cables. Applying better practices in planning and using the physical structure can reduce operating costs, increase agility in responding to change, and improve security. A data centre is a substantial, long-lived investment, and is subject to many changes through its operating life. Implementing changes to the data centre structure requires long lead times, particularly to keep the data centre ICT running while changes happen. Good planning and operations are central to minimising costs. Agencies use everything from converted office space to purpose built buildings for their data centres. This guide is intended to be applicable to all circumstances. Each agency remains responsible for determining that the structure meets its business needs. This guide on structure forms part of a set of better practice guides for data centres. Purpose The intent of this guide is to assist managers to assess how well the structure meets their agency’s needs, and to reduce the capital and operating costs relating to the structure. Scope This guide considers the physical data centre structure. For the purpose of this guide, the term structure includes the data centre housing the ICT, the building housing the data centre, the building surrounds and the geographic location. All of these elements influence the whole of life costs and the data centre operations. This advice is intended be relevant to any data centre sourcing arrangements used by APS agencies. This includes services contracts, as the advice can inform agencies when assessing financial and technical risks in managed services offers. The considerations for the data centre and building include the fit-out, such as the raised floors (if used), ducts for air and water, trays and / or conduits for cables for power, telecommunication and data, and pathways for the movement of goods and people. A detailed discussion of the equipment racks is excluded from this scope. The structure assists other functions in the data centre, such as security, power and cooling. This guide describes how the structure assists these functions. Better Practice Guide: Data Centre Structure | 3 This guide is intended for data centres in existing buildings. Agencies will have existing data centres, or acquire data centres using the data centre facilities panel. While part of this guide might assist in planning a new data centre construction project, this is not within the scope of this guide. The guide also does not consider geographic factors for business continuity, such as data centres in several locations. Policy Framework The guide has been developed within the context of the Australian public sector’s data centre policy framework. This framework applies to agencies subject to the Financial Management and Accountability Act 1997 (FMA Act). The data centre policy framework seeks financial, technical and environmental outcomes. The Australian Government Data Centre Strategy 2010 – 2025 (data centre strategy) describes actions that will avoid $1 billion in future data centre costs. The data centre facilities panel, established under the coordinated procurement policy, provides agencies with leased data centre facilities. The Australian Government ICT Sustainability Plan 2010 – 2015 describes actions that agencies are to take to improve environmental outcomes. The ICT sustainability plan refers to the National Strategy on Waste. Structures will take in a large volume of equipment and the packaging, and need to remove the same. The data centre strategy and the ICT sustainability plan have the same targets and objectives for data centres. The National Construction Code was created in 2011 by combining the Building Code of Australia and the Plumbing Code of Australia. The National Construction Code controls building design in Australia, and may be further modified by State Government and council regulations. The data centres available from the facilities panel have been confirmed as complying with the Code. The Australian Government’s Protective Security Policy Framework (PSPF) provides agencies with mandatory directions and advisory guidance on data centre security issues. The Commonwealth Property Management Framework provides overarching policy guidance on all property leased or owned by the Commonwealth. Data centres leased through the Data Centre Facilities Panel are compliant with this policy framework. Related documents Information about the data centre strategy, and DCOT targets and guidance can be obtained from the Data Centre section (datacentres@finance.gov.au). The data centre better practice guides also cover: Power: the data centre infrastructure supplying power safely, reliably and efficiently to the ICT equipment and the supporting systems. Cooling: the mechanical and electrical systems that provide conditioned air at the optimum temperature, humidity and pressure. Data Centre Infrastructure Management: the system that monitors and reports the state of the data centre. Also known as the building management system. Better Practice Guide: Data Centre Structure | 4 Fire protection: the detection and suppression systems that minimise the effect of fire on people and the equipment in the data centre. Security: the physical security arrangements for the data centre. This includes access controls, surveillance and logging throughout the building, as well as perimeter protection. Equipment racks: this guide brings together aspects of power, cooling, cabling, monitoring, fire protection, security and structural design to achieve optimum performance for the ICT equipment. Environment: this guide examines data centre sustainability, including packaging, electric waste, water use and green house gas generation. Better Practice Guide: Data Centre Structure | 5 2. Discussion Overview This section outlines the potential benefits and risks to agencies from the data centre structure. Agencies which apply good planning and operations to their data centre structures should have few problems. This is largely because the structure is relatively unchanging while power, cooling and ICT equipment are changing on an hourly and daily basis. However, as most changes to a structure have long lead times, agencies should keep their data centre planning current. The main features of a data centre structure are to: House the ICT equipment and the supporting systems, by addressing the physical requirements. Control the movement of people and equipment, in and around the data centre. Provide the fit out that distributes air, water and electricity through a data centre. The geographic location and building perimeter influence the operating costs and physical security of a data centre. These are also addressed in this section About the data centre structure The typical purpose-built data centre has an operational life of 15 to 25 years, and has permanent and semi-permanent elements. Permanent elements are designed for the full operational life, and include the walls, floors, ceilings, corridors and so on. Semi-permanent elements include the ICT floor space, cable trays, ducting for air and cables, and pipes for liquids. Also referred to in this guide as the fit out, these elements are designed to be upgraded, removed and extended, due to changes in operational needs through the life of the data centre. Data centres that have been set up in converted office space tend to have a shorter operational life than purpose built data centres. In part, this is due to the mismatch of the requirements of a building suitable for people, and a building suitable for ICT. Typically, ICT weighs much more, and needs far more power and cooling than people. Another common factor contributing to a shorter operational life is that the data centre in converted office space usually gets minimal investment, and so is a cause of more failures, issues and absorbs more management attention. Assessing the Structure If a data centre is to hold agency ICT assets, or APS staff will regularly work at the building, an agency should satisfy itself that the structure complies with Australian building standards. The overarching document is the National Construction Code Better Practice Guide: Data Centre Structure | 6 (formerly the Building Code of Australia). This document refers to many other current standards. Agencies are generally advised to rely on compliance certificates rather than making the assessment themselves. This approach should be sufficient for most circumstances. However, if the structure poses significant risks to the agency then thorough inspections may be warranted. An ongoing challenge for data centre managers is that the data centre structure lasts far longer than the ICT equipment that it holds. The constant churn of equipment means constantly changing requirements and expectations. Agencies should plan to carry out routine assessments of the structure. These assessments must involve the ICT, facilities and property people working in concert. The assessments follow a cycle. The initial assessment will select the data centre. Subsequent reviews will arise due to planned major changes, typically caused by ICT equipment refresh cycles (every 3 to 5 years), or by machinery of government organisation changes. The final assessment of a structure will determine that the data centre no longer meets the agency’s needs cost effectively, and that a new data centre should be found, thus beginning a new assessment cycle. The assessments of the structure should consider: Does the structure meet the requirement of the ICT equipment, and how well is the requirement met? o For each requirement that is satisfied, how cost effectively is this done? For example, two data centres may each offer a floor with a carrying capacity of 2,000 kg/m2, but in one the floor is also a thermal mass, which reduces the cooling costs by 10% per annum. o For each requirement that is not satisfied, what is the cost to alter the structure to suit the agency’s purposes? This is highly pertinent to considerations of security and fit out. Does the structure meet the agency’s requirements for ICT availability? What redundancy levels are available for power and telecommunications services to the site? Given historic ICT growth trends, how long will either space or environmental support be adequate? How efficiently can the agency add or remove capacity? What is the cost of operating the as-yet unused capacity? This last point requires significant analysis. Deferring capacity upgrades over the life of a data centre typically reduces capital and operating costs. However, agencies cannot be certain about the future, and changing data centre capacity is a substantial project, commonly lasting more than 12 months and with a budget over $1 million. Key Planning Principle: Enough capacity to minimise the costs Whether a particular decision represents value for money depends most often on whether the structure has the capability and capacity to meet the agency’s requirements over the foreseeable future. Agencies should regularly predict their requirements, know the limits of the structure’s capacity, and seek to make fewer, larger changes. This approach is likely to minimise the whole of life costs. The following figure describes the common pattern for most value for money assessments of data centres structures. When the requirements are compared to the Better Practice Guide: Data Centre Structure | 7 capacity offered by the data centre, the optimum range will typically be above just meeting the requirements. This result is primarily due to whole of life costs being minimised when the structure has some capacity to respond to minor changes. Value for Money Just Optimum Good Okay Comparison of Requirements to Capacity The best value for money result is commonly achieved when the requirements are met and there is excess capacity and capability to manage changes for the foreseeable future. The foreseeable future is typically about three to five years, consistent with the ICT equipment refresh cycle. Solutions to the left of the optimum (green) range require more funds to respond to changes than the optimum. Solutions to the right are over-engineered, and the excess capacity is wasted. This model applies to many data centre decisions, including weight, cabling, floor space, cooling and power. In a simple example, assume the labour costs to install cable are almost the same to install 100 cables as 200 cables. If the initial requirement is for 80 cables, but this is expected to rise to 180 cables over the next five years, then the value for money assessment could be: Just: install 100 cables, and install the additional cables as and when required, which raises labour costs. Optimum: install 200 cables. Good: install 300 cables, and incur higher capital and labour costs. Okay: install 500 cables, and incur even higher labour and capital costs. Avoiding under-provisioning or over-provisioning is challenging given that many data centre components interact with one another. The demands on the structure from the refresh cycle alone are complex. Consider that a 30 per cent increase in server and storage capacity does not necessarily mean 30 per cent more space or power and cooling. Newer technologies typically require less space, less power and therefore less cooling. These should be factored into the space and capacity planning process, and requires both ICT and facilities team involvement. Fit out The fit out of the building supports the distribution of cables, air and water through the building. Generally, the use of semi-permanent installations is proven value for Better Practice Guide: Data Centre Structure | 8 money in large, purpose built data centres. The merit of using conduits and ducts becomes more difficult to show when the data centre is very small or temporary. Semi-permanent installations such as conduits, ducts and cable trays provide order to cables and efficiency to the moves and changes. This reduces ongoing operations costs. Poor cabling practices can mean that doors do not close, that cooling air is blocked, that unnecessary extra cables are called for, when cables were available for use. Other consequences include extended troubleshooting time when resolving connectivity problems or slower moves for equipment when dealing with a common activity, the relocation. Conduits can have higher security features that provide assurance that there is no tampering with the cables. These features include enclosing and sealing the conduits, and / or installing motion sensors. Ducting for air can help or hinder the efficiency of the cooling system. Without ducting, the cooling air is released into a general space and left to drift. In earlier data centre designs with raised floors, this method was effective. Commonly used in designs with slab floors, ducting is used to direct cooling air flow to the ICT equipment. This is often required with more recent server technology, which generates significant heat in a much smaller space. While ducting typically improves cooling effectiveness, the design must be considered, for example, each bend in a duct causes inefficiencies. Housing ICT and Other Equipment The ICT equipment’s physical characteristics are typically: Weight: how much does all of the equipment, racks, cables etc weigh? Space: what is the volume of space needed for the racks, free standing equipment and the clearances around the equipment? Cabling: how many cables, of what type, are to be connected to the ICT equipment? Is the cabling design be optimised to reduce the amount of cabling required while providing flexibility for expected changes? Does the cabling design consider the trade-offs between copper and optic fibre? Cooling: what amount of cooling is required, and how is this cooling being achieved (air or liquid, underfloor or overhead)? Power: how many power cables are to be connected to the ICT equipment? How is the power connected from the ICT equipment to the PDU? For redundant power supply, are the power cables active / active, or active / passive? Future flexibility: what is the forecast for each of the above? Will there be an increase or reduction in any of these? Security: who can access what equipment and under what circumstances? How is unauthorised access recorded and notified? This is the minimum set of information that is needed to describe what the structure is to provide. The greater the granularity of the information, at the physical rack level, the more likely that selected data centre will meet an agency’s needs. Note that this is a list of physical requirements only, and other essential requirements such as power and cooling are omitted from this list. Better Practice Guide: Data Centre Structure | 9 Security requirements will be specified in the first instance by the agency’s security management plan. The PSPF also has specific requirements for data centres. The other data centre equipment, notably the mechanical and electrical systems that provide the power and cooling, have a similar list of requirements, including weight, space, cooling, cabling and security. Controlling Movement The structure should be assessed for the ability to control the movement of people and goods through the building. The key points are capability, security and safety. For people, the assessment should consider how to identify and grant access to people, how to know that they go only to the approved places in the building, and that they can be evacuated safely during an emergency. There are many suitable technologies, listed in the PSPF, which should be used in combination to secure the structure cost-effectively. Examples include biometric access controls, multi level access based on roles, anti-passback, tag-along prevention, RFID pass tracking, closed circuit video, and motion sensors linked to alarms and /or cameras. The areas to be secured include parking areas, hallways, entryways, loading docks, ICT area, and racks holding sensitive equipment. Goods have a similar list, with the addition of considering the weight and volume of the goods. Computer equipment when in the protective packaging, can be very heavy and bulky. Access paths, lifts and door clearances should all be assessed. Geography The data centre location can influence cooling efficiency, security, reliability and telecommunication costs. These influences can substantially change the operating costs. Free air cooling uses the external air instead of air conditioning to remove heat from the data centre. This can reduce a data centre’s operating costs by over 40%. Free air cooling is effective when the climate has a mean annual temperature below 23°C±1°C and moderate humidity levels (mean annual relative humidity of 50%±5%). Adequate air quality is also important to efficiency. Particles from pollution, smoke and dust can interfere with the ICT equipment. While commercial grade data centre equipment is usually designed to maintain air quality, domestic equipment is not. Domestic equipment often fails to maintain air quality standards needed for ICT equipment. The data centre’s proximity to power generation stations and distribution paths in the national electricity grid affects the price and reliability of the electricity supply. Making sure that the power supply to a major zone is connected to two or more distribution paths significantly reduces the risk of power failure. The ACT received its second connection in 2012. Data centres in locations with a single connection will place a greater importance on backup power supplies. The data centre’s proximity to major telecommunications networks should work to reduce network costs, and connecting to several network should improve reliability. Labour costs can be more easily controlled in locations with proximity to larger populations. Service costs and times are also likely to be improved by being closer to major population centres. Better Practice Guide: Data Centre Structure | 10 There are other risk factors that can be identified and assessed, including earthquake, flood and transport accidents. When considering these risks, it is necessary to consider the impact on the surrounding neighbourhood, not only the building. In Australia, large scale flooding and fires have occurred with unfortunate frequency. This has disrupted power supply, telecommunication services and movement of people and goods in the affected areas for many days. Perimeter The building’s surroundings have features that influence the operating costs. The assessment should consider the ease of movement of people and goods, and how the perimeter contributes to the overall site security. Limits and Trends An ongoing challenge for data centre managers is that the data centre structure lasts far longer than the ICT equipment that it holds. The constant churn of equipment means constantly changing requirements and expectations. It is better practice to have developed considered responses to each of the limits, and reporting to advise of when the limits are about to be reached. The planned responses could involve altering the ICT equipment, upgrading the structure, or to moving to another data centre. The better practice is that ICT and facilities staff develop the responses jointly. The current trend in ICT equipment is to heavier, hotter ICT equipment in a significantly smaller footprint. The result is more devices, greater weight and greater heat to be handled in this smaller footprint. This affects power, cooling, cabling and floor loadings. The ability of the data centre’s floors to carry weight is often a fixed limit that is exceeded only with difficulty and temporarily. The floor carrying capacity can be exceeded relatively easily in office buildings and older data centres. These typically have carrying capacity around 750 to 1000 kg / m2. Popular models of blade servers, data warehouses and storage area networks can all exceed 1500 kg / m2. While it is possible to install weight distribution solutions to spread weight more evenly, these should be considered as temporary measures. Replacing the raised floor to carry greater weight in a data centre is possible, and may give a good long term result. Raised floor carrying capacity has risen from 300 kg / m2 in 1965 to around 3000 kg / m2 in 2013. This type of project requires thorough planning, as they can be very expensive and risky to the agency’s business operations. The capability to remove more heat from a smaller location can be another point requiring significant investment. The volume of cooling air that can be delivered to a specific rack is a complex function of the power of the air conditioning unit, the size of the ducts, the volume to be cooled and the rate at which the heated air can be drawn from the data centre. Once the limit of this configuration has been reached, then further investment is required. Point solutions, affecting only one or two racks, can be successful, and relatively inexpensive. Switching to liquid cooling will provide significantly greater cooling capacity, and a dramatic drop in power consumption due to CRACs not being required to move cooling air. However, this will require substantial changes to the structure. Typically, the switch to liquid cooling is cost Better Practice Guide: Data Centre Structure | 11 effective in zones of high electricity consumption, which is generally over 20 to 30 kW per rack. Operational Considerations Achieving the lowest cost of ownership for the data centre structure is largely due to good planning and maintaining order in key resources. The whole of life planning is principally about knowing the limits of the structure and having plans to manage when a proposed change will breach those limits. For example, if the floor carrying capacity is 1500 kg/m2, and new equipment has been ordered that will weigh 2000 kg/m2, then the planned action may be to upgrade part of the computer room floor. In a well run data centre structure the purpose and capacity of every cable, pipe and duct is documented. One test for the quality of the documentation is that changes can be planned using the documentation and executed without failures. This disciplined approach minimises execution time and disruptions to production systems. The trade-off is that each move requires updating the documentation to be completed. However, the industry consensus is that this is time well spent. There are many commercial software tools that will assist the process of managing every duct pipe and cable. The key is to create a process whereby all changes use the tool, so that an accurate audit trail is always available. Agencies may consider creating a role of “resource manager” – an individual either in ICT or facilities who is responsible for linking the two groups together from a process point of view. At its simplest this could be a very junior ICT person who is responsible for all equipment placement on the floor (the long term capacity plan). Their role is to insure facilities staff understand what will happen, and work with them to assess the impact of changes on the infrastructure before they occur. As time goes on this individual will develop a strong understanding of both the ICT issues and the facilities issues, and can help bring these teams closer together. Generally, a structure will continue to be used until the economic benefits of moving to a new data centre exceed the project costs of the relocation. The structure’s integrity must be maintained, by ensuring that all works are planned by a qualified engineer. Maintaining a high level of order in the data centre offers ongoing benefits. There will be higher upfront costs, but the reliability will be greater due to fewer errors and quicker changes, and this will lead to lower whole of life costs. For example, consider the effort involved in replacing half the ICT equipment in the two racks shown in Figure 2. The organised cable layout shown in the right hand picture means that the equipment is accessible. The task will be completed with less effort and time, and the risk of failure due to moving the wrong cable is greatly reduced. Better Practice Guide: Data Centre Structure | 12 Figure 1 Equipment racks before and after organising cables1 Another operational task is cleaning the data centre. Dust and other particles can interfere with the fans and, rarely, the electronics. While the risk is very low, any faults that arise can be very difficult to diagnose, and lead to a series of equipment failures. Cleaning the data centre is more important in an office environment, as many carpets shed fibres. Older data centres need cleaning to remove ‘zinc whiskers’. These tiny fragments of zinc can be carried into the ICT equipment, causing electrical faults and equipment failures. The standard ISO 14644 describes air quality in clean rooms. Most government data centres should not seek to meet this standard, as it is very unlikely to provide any advantages. The exceptions are those agencies that have ICT equipment identified by the manufacturer to be susceptible to air particles, and data centres that have experienced faults due to air quality. Any significant changes to the ICT equipment or configuration should consider the limits of the structure. The lead time for altering the structure is long, typically weeks to months, and potentially costly. Therefore, it is best that this consideration begins as the business case is being developed. The capabilities of the structure must be reflected in any tender material. A systematic approach to communicating procedures to new and visiting staff is needed to maintain standards. In larger data centres this may become a formal, tested training process. In smaller data centres, a high standard of documentation that is easily and continually referenced may be sufficient. Conclusion The limits imposed by the data centre structure can be carefully considered when selecting a data centre and during the operating life. It is possible to over invest in 1 Before and after pictures taken by Cloned Milkmen, www.flickr.com . Better Practice Guide: Data Centre Structure | 13 the structure, purchasing capacity in excess of requirements and never reaching the limits. Equally, under-investing may mean that the limits are exceeded early in the life of the data centre, forcing a move to another data centre. Typically, obtaining data centre space that has adequate capacity but which has been designed to be upgraded easily is the optimum solution. The advantage is that the capital investment is distributed over multiple years. And so far, the data centre technology price/performance has been improving steadily. It is essential that impact of ongoing changes do not exceed the structure’s capacities in an unplanned manner. Agencies should ensure regular communication between ICT, operations and facilities staff, across change, capacity and asset management processes. Agencies must decide how much to invest in their data centres to obtain value for money and to support achieving the agency outcomes. The better practices will assist agencies to reach these objectives. If an agency decides that the data centre performance is inadequate, the first point to review is the data centre operations. If the operations are satisfactory (that is, delivering the full capability of the design), then the data centre design must change. Generally, changing the data centre design means moving to commercial data centre facilities obtained from the data centre facilities panel. Better Practice Guide: Data Centre Structure | 14 3. Better practices Planning The better practice is a plan that identifies the limits and proposed responses for key features of the structure. These key features include the floor loading, the space and fit out. This plan, and the funding impacts, should be reviewed with the senior responsible officer. All cables, pipes and ducts are documented. These items are labelled (or equivalent) accurately. The documentation is always current. All planning and assessment work involves the ICT, facilities and property teams. Operations There is routine cleaning, sufficient to maintain air quality consistent with the highest standard of all the equipment in the data centre. The effects of changes in environmental air quality, such as dust storms or fires, must be considered. All movement of people and goods through the building is consistent with safety and the security policy. All cables, pipes and ducts are identifiable and documented. There is a method for ensuring that the defined procedures and documentation are followed. This method may be consistent with ISO 9000. There may be formal, evaluated training for new staff. Better Practice Guide: Data Centre Structure | 15 Fundamental Agency has statement of requirements for current ICT equipment that ⊠ describes weight, space, cooling, cabling and security. Agency has forecast over next five years of requirements for ICT equipment ⊠ that describes weight, space, cooling, cabling and security. Agency can identify the weight and volume of largest item of ICT equipment ⊠ when packed. The data centre has a path from the loading dock to the ⊠ equipment rack / data hall that can allow the movement of this equipment. Agency has identified the security protections required as per the PSPF. Agency has identified the risks and controls posed by the building integrity, ⊠ location and surrounds. The agency has identified the capacity limits of the current data centre ⊠ structure with regard to weight, space, cooling, cabling and security. Agency has developed plans to respond to changes that exceed one or more of these limits. ⊠ ⊠ All cables in the structure are labelled and recorded. The building is cleaned regularly. The ICT, operations and facilities staff have good communications. This ⊠ includes: Shared processes for change, capacity planning and asset management. Regular (bi-annual for larger agencies) planning meetings Better Practice Guide: Data Centre Structure | 16 4. Conclusion Agencies that use better practices in their data centres can expect lower costs, better reliability, and improved safety than otherwise. Implementing the better practices will give managers more information about data centre power, enabling better decisions. Overall, the data centre will become more efficient, and better aligned to the agency’s strategic objectives. Agencies will also find it simpler and easier to report against the mandatory objectives of the data centre strategy. The key metric is avoided costs, that is, the costs that agencies did not incur as a result of improvements in their data centres. Capturing avoided costs is most effective when done by an agency in the context of a completed project that has validated the original business case. Summary of Better Practices The data centre structure is verified against business expectations. The capacity plan outlines the demand and various limits, and how these limits can be extended. The security remediation is identified. The work health safety plans with respect to risks from the data centre structure have a goal of zero injuries. The agency is routinely: Reviewing whether the data centre is fit for purpose and making planned changes. Cleaning the data centre. Maintaining documentation about the data centre fit out. Maintaining forecasts for future data centre needs, and identifying trends in agency plans that may exceed the data centre capacity. Better Practice Guide: Data Centre Structure | 17