This video is part of the Microsoft® Virtual Academy. 1 In this module, we are going to learn best practices for configuring private cloud infrastructures. The first module of this course covered private cloud planning and deployment considerations. This module focuses on configuration, requirements, and design guides for deploying networking, storage, and a highly available fabric. The third module in this course will look at management practices for the private cloud. 2 Let us discuss the key configuration requirements of servers, software, storage, networking, and high availability. This builds the foundation and the fabric for a virtualization platform. These components are designed to ensure high availability and virtualization in the data center. The best practices we discussed apply to an integrated and distributed private cloud model, and typically would not work in a purely isolated, standalone host type of environment. 3 First, let’s take a look at some of the key considerations when deploying your server. 4 The first tool that we recommend using is Windows® Deployment Service. This is built into the Windows Server® SKUs, to provide the ability for remote operating system installation, can be fully automated, and customized using unattended installation scripts and files. This gives us the ability to deploy large-scale enterprises. Essentially the process is to get an ideal or golden image that we want to use in our environment. Being able to clone that image, system prepare it as necessary, and make sure that it has unique components such as names and domains. Following this, we can install it. Once the installation process has started, the installer can look at the unattend file. This is an XML file containing a list of mappings for certain key components of that particular server, such as the name of that machine, which domain it should be a part of, user names, passwords, administrators, features, programs that should be added and server roles such as Hyper-V® Server. 5 The Microsoft Deployment Toolkit (MDT) helps execute fully automated zero touch installation deployments using Microsoft System Center Configuration Manager 2007 Service Pack 2, and Windows deployment tools. For those without a System Center Configuration Manager 2007 infrastructure, MDT uses Windows deployment tools for light touch installation deployments. Windows Deployment Service can be extended by using one of these solution accelerators, which are free downloads from Microsoft. With the Microsoft Deployment Toolkit 2010, this provides even greater flexibility and management options and can be added on top of the Windows Deployment Service. We can now deploy operating systems, drivers, servers, and applications in addition to Windows updates. This gives us a broad deployment baseline rather than just deploying the operating system; we get to deploy additional components within the operating system as well. Using this technology will help reduce the overall deployment time needed. The Microsoft Deployment Toolkit supports Windows Server 2008 R2 and Windows Server 2003. On the client side, it also supports Windows 7, Windows Vista®, and Windows XP. Furthermore, the deployment toolkit will deploy other Microsoft software such as Microsoft Office. Thus if you need to deploy Microsoft Word and Microsoft Excel® spreadsheet software to client machines, the deployment toolkit can do this as well. 6 Another consideration when configuring your private cloud is the management packs that you will need when using System Center Operations Manager. System Center Operations Manager gives us the ability to monitor the entire environment, not just the physical hardware but also virtual machines and applications running on the servers or running in the virtual machines. However for System Center Operations Manager to work it needs management packs, which provide alerts for Operations Manager to detect whether there is a problem. The management pack is installed on the central Operations Manager server, and receives alerts from the servers in the environments that we are monitoring. To do this, we deploy agents on all of the servers and the VMs. Thus when building your private cloud, consider deploying management pack agents at the same time. This will simplify management of the environment, and get the System Center Operations Manager infrastructure up and running much quicker. 7 8 Let us look at the different mechanisms to use the server manager features and roles. The first and most common is the Server Manager graphical user interface. This utility has “addroles” and “add-feature” wizards that provide the ability to add entire roles and service sub-roles. For example, you could add, or decide not to add, the file server resource manager, or DFS replication. If you are thinking of a broad private cloud deployment, consider using automation so all the server manager roles and features can be deployed from a script. You can use either command line with DISM, OCSetup, or PowerShell with the “add Windows feature” cmdlet. 9 Hyper-V Server is a free download that supports host-based virtualization. This is actually not branded Windows Server because it does not include all of those additional Windows Server features and roles such as Microsoft Active Directory® directory services or file serving. It is purely a virtualization platform; however, it still supports high availability. Configuration on Hyper-V Server is a different operation then traditional server configuration. As it is built on Windows Server core, it has no built-in graphical user interface. If a GUI is required to manage it, it is still possible to manage it remotely using tools such as Remote Server Administration Tools, Hyper-V Manager, Failover Clustering Manager, and System Center Virtual Machine Manager. If you are working with the Hyper-V Server directly on the server, you are will see a command-line based menu. Using a command-line only interface, it is still relatively easy to configure the Hyper-V server. Using the numerical system you can easily join domains and workgroups, change the computer name, add or remove administrators, manage the remote connections, apply updates, change update settings, and enable remote desktop. Further, you can change more granular settings such as network configuration and the time on the device. Finally you have the ability to enable or disable the failover clustering feature. Because the Hyper-V server is designed purely as a virtualization platform, the Hyper-V role is automatically enabled. One of the biggest benefits of Hyper-V Server Is that it does not include many of the additional features and roles of a standard Windows Server, such as the Windows Internet Explorer® Internet browser. That not only gives Hyper-V Server a smaller installation footprint, but it has a smaller attack surface as well. Because there are fewer components exposed to the public, there are fewer components that are vulnerable, and fewer components which need patching. With fewer patches that means there will be fewer updates, helping to maintain higher availability than most traditional operating systems, including Windows Server core. 10 Configuration through group policy requires Active Directory, as it allows us to centrally manage and deploy settings across the entire environment. When a server comes into the domain, a group policy update can be forced on it. This group policy update will communicate to the local Active Directory server, and pull down the appropriate settings based on what that computer’s role is. For example, the server could be placed in a test organization or a production environment, and based on that setting it will have different security configuration network policies or other types of policies applied to it. It is a best practice to use group policy whenever you are deploying a broad private cloud infrastructure because it could provide consistency across all of the server components, instead of having to alter the settings on a server-by-server basis. Using group policy can ensure that you get consistent ideal settings, and avoid something to be missed. 11 There are some configuration considerations for storage as well. When using a private cloud, we are abstracting the physical resources from the services and applications. This means we will be using a storage area network to host the storage for this private cloud. 12 There are four different types of storage area networks that are supported; these include Fiber Channel, serial attached SCSI, Fiber Channel over Ethernet, and iSCSI. In this module we are going to focus on iSCSI as the primary solution. Originally, iSCSI Software Target was only available on Windows Server Storage Server. However, it has been recently made publicly available as a free download from the Microsoft website. The advantage to iSCSI is that it communicates over Ethernet, and allows you to manage everything over your existing network. This avoids configuration of a storage area network, getting customized HBAs, and building a brand new fabric. However, storage traffic over your traditional networking infrastructure could affect regular network communication. Thus it is a best practice to isolate these networks when you are deploying your fabric, giving you dedicated networks for storage and dedicated networks for networking, even if they are running over the same Ethernet protocols. 13 Internet Small Computer System Interface (iSCSI) has become a viable alternative to Fibre Channel SANs as organizations have deployed networks that support multi-gigabyte per second connections. In many cases, iSCSI SANs are cheaper than Fibre Channel based SANs, and they typically do not require the same level of specialized skills to manage the SAN. The iSCSI target is the component which receives the information on the server that contains the storage. To connect to the iSCSI target, you need to use an iSCSI initiator. The iSCSI initiator ships on all versions of Windows Server and can be used with any type of iSCSI target, not just the Microsoft iSCSI target. The iSCSI initiator tells the server that it must connect to an iSCSI target. It is recommended to use a dedicated NIC, and when you configure the target, you have the option to enable automatic reconnections. It is a best practice to select this check box to ensure that when there is interference with your storage or networking path, and you lose contact with the iSCSI target, you can immediately reconnect to avoid any long-term interruption of service. 14 When configuring iSCSI there is an operation that has to go on between the iSCSI target and the iSCSI initiator. First you start by creating the virtual disk on the iSCSI target. The iSCSI target can sit on a server, which has direct attached storage, and you can use this as your simulated storage area network. However, that can be a single point of failure if that server becomes unavailable. Thus it is recommended to use an iSCSI target and remote storage, which is accessible through multiple paths. Once the virtual disks are created on the iSCSI target, regardless of where they are hosted, go to the initiator and request access to the disk by specifying the targets’ IP address or friendly name. Then, go back to the target and accept the requests from the initiators. It is best to do this all at once for multiple initiators. You can write a script, get all of them to request access to the disk at once, then go to the target and accept all of the connections at once as well. Once the target has accepted the connections, you go back to the initiator, and if you click Refresh Configuration, you can see that it is connected and that it can see the target. For each initiator, you have to explicitly log onto the target to enable the automatic reconnections. At this point you have the iSCSI target and the iSCSI initiators speaking to each other, they can communicate and see all of the other disks. However, if this is a fresh deployment, do not forget that the disks will be in a raw state and they won’t be formatted. Thus, before you do anything with the disk, go back into disk management on one of the initiated targets, change those disks to use a simple disk or span disk for example, and format the disks. Even though multiple iSCSI initiators can see this disk, you only need to initiate the storage from one of those initiators. As this is shared storage, any operation you do from the first node, such as formatting the first disk, will be seen by all of the other nodes, so you only have to do this once. After all of your disks have been formatted, named, brought online, and initialized, you can then use them for your VMs or for part of your cluster. 15 Let’s discuss high availability and planning considerations. At this point, we assume that we have our servers deployed, our storage is configured, and we are going to build our cluster. 16 One of the first considerations when planning a cluster is cluster membership. The way cluster membership is designed is a concept called a quorum. The idea of a cluster membership is to avoid a “split brain” or partition. For example, in a three-node cluster, one of the nodes gets partitioned and cannot speak to the other two nodes in the cluster. Thus we do not know if the first node should run a VM, or whether the other nodes should run a VM. the nodes. This means that if the first node can see this cluster disk as part of its partition, it has two votes. It has the vote from the disk, and it has the vote from the node. The isolated node in the second partition will stay offline because it knows it does not have majority. Now, if that disk or partition changes and the second subset can now see the disk, and the first subset cannot, then that partition now has The challenge is we want to make sure that both subsets of nodes do majority that will stay online and the first one will shut down. not run a VM, think that they are the owner, start connecting clients, and write data to a backend VM VHD as this can cause corruption. If Extending this concept further into a private cloud withe distributed we have uncontrolled and simultaneous writing to a disk for multiple servers around the world, is node and file share majority. In this writers, this could mean that we’re writing data over other pieces of scenario it’s similar to the node and disk majority except rather than data. Thus we want to make sure that we access the disk in a giving a cluster disk a vote, we actually place a vote in a remote file coordinated fashion. Therefore, the way we get around this concept share. This remote file share just has to be accessible by every node of having a split brain with multiple active partitions in a cluster is in the cluster so it doesn’t need to be in a data center. that we always enforce one subset of nodes to stay online, and one subset of nodes must always shut down. This is going to give you the ability to have distribution worldwide. You might have a few nodes in your first data center; let’s say four A simple way to do this is by using a voting algorithm where we say nodes in data center one. You have four nodes in data center two, that only the partition that has a majority of nodes stays online. In and then you can have this final vote, this file share witness, in a this scenario where we had a partition of one node, and a partition third remote location. Under this scenario you can lose your first with two of the three other nodes, that partition with two of the data center and so long as the file share witness has a vote and your three other nodes stays online. It serves the VMs and keeps the second data center is running you still have majority, and you stay cluster up and running. Meanwhile that single isolated node is going online. Likewise you can lose the second data center but so long as to go in standby mode where it will try to join the cluster, but it will that first data center is online and has access to the file share not start any VMs until it can communicate with the rest of the witness you have majority. cluster and understand what it should be hosting. Therefore, the way we know whether one partition or the second With a quorum you always need more than a 50 percent majority, so partition should own this vote is because one of the nodes in the to make sure that we never get a partition which has 50 percent on cluster will actually have an open file handle to the file, which is both sides with any type of single networking interruption, we say inside the file share witness. Remember only at one time can one we always want an odd number of voters that contribute towards item have this open file handle so one of those nodes in the cluster the quorum. Thus it is recommended to have three, five, or seven has access to this file share, it has an open file handle on the disk, so nodes in our cluster, so that any time we have a single partition one essentially that node and the partition which contains that one node of those subsets will always have more than 50 percent of the is going to have that additional vote. voters. The final quorum model that we have is what we call disk only, and However, always having an odd number of servers sometimes it is in this case we give a cluster disk the one and only vote. However not practical. In fact we see about 70 to 80 percent of all cluster this is not recommended because this is a single point of failure; if deployments still only using two nodes. Over time we are going to you lose that one disk you’ve lost all of your votes and then you’ve see larger clusters as people move towards the private cloud, but lost your entire cluster. Really the only scenario that we see this even with the current maximum in Windows Server 2008 R2, we being used is what we call a last man standing model with the have a 16-node cluster. This again leads to the question of how we cluster, and what this means is, so long as any one of the nodes in can have an odd number of nodes if we are bound by hardware the cluster can access the disk, the cluster will stay online. So if I constraints. have eight nodes in my cluster I can actually lose seven of those nodes and still keep the cluster and its services up and running, so To get around this, we provide the ability to have other components long as that final node has access to that final disk. Nevertheless this within your infrastructure provide one of these votes as well. Up to might not be practical because if you’re now trying to run eight now we said that every node in the cluster has one vote, and you’re nodes of capacity on a single node you’re probably going to run into going to see this as the most traditional quorum type, which we call some major performance issues. So best practice here is to use node “node majority”. Here every node has a vote, and a partition must majority wherever possible but consider using node and disk or node have more than half of the votes to stay online. Thus as long as we and file share majority if you have an even number of servers to get have two of three votes the cluster can stay up and running. that additional extra vote. However, let’s say we only have a two-node cluster. One way that we can get around this scenario is that we can actually provide a vote to a disk, or any cluster disk, which can be accessible by all of Now once we understand the quorum model, where we are going to place our nodes in the cluster, the next step is to start creating it. As part of the cluster creation process there is a built-in best practice analyzer called ‘validate a cluster configuration’. This tool will test every part of the cluster environment. It’s going to not only test the nodes and make sure that they have the correct software, the correct patches, the correct updates, but it’s going to test the network to make sure that you have redundant networking path, access between all of the nodes. It’s then going to test the storage, make sure it’s visible by all your nodes, and meet some of the requirements for storage such as deporting persistent reservations. It’s going to inventory all of the different components and if you’ve actually deployed the cluster already, it will give you additional tests. Therefore, it’s important to note that this is a requirement when you actually deploy the cluster. If you don’t have a validated cluster Microsoft support will not help; they will tell you go get a validated configuration and then you have a supported cluster. Now when you actually want to know whether you’re going to have a supported cluster, sometimes it would be best to test the configuration ahead of time before you try to put anything on it in production. So keep in mind you can run this validation tool before you deploy the cluster so long as the failover clustering feature has been installed. You can run it when you deploy the cluster and you can run it while the cluster is in production. Again, as this great troubleshooting tool this really helps the product support team by understanding what is running in the cluster at the exact time. 18 In this demo we’ll show cluster validation as well as a cluster deployment. To launch a failover clustering and deploy your first cluster, we’ll go to the administrative tools and launch Failover Cluster Manager. At this point it’s assumed that failover clustering has been installed as a feature on every node or every server that you want to make as part of the cluster. As we launch it you see a traditional MMC 3.0 snap in, with the navigation pane on the left and management pane in the center, and an actions pane on the right. First we’ll validate our configuration to make sure that everything is suitable for clustering. Here we get to specify all of the servers that will be part of the cluster, in this case I’m going to specify four servers and create a four-node cluster. I’ve already installed the failover clustering feature on all of these nodes, which will be part of the cluster. Now that I’ve selected my four nodes it’s going to ask me whether I want to run all of the validation tests or just a subset of tests. To have a fully supported cluster it is required to run the full suite of tests. However, this can take between five minutes and a few hours, depending on the number of nodes in the cluster and the number of storage disks. The reason for this is that every single disk is brought online against every single cluster node to ensure that it functions correctly. Therefore, as you increase the number of nodes and the number of disks, the overall duration will scale up exponentially. Therefore, for the sake of time we’re just going to select a few tests. Here we get to see that there are tests around inventory, the network, the storage, and the system configuration. Let’s compress these and only run the system configuration tests. Here we get a confirmation about the tests which will be run and we can see the tests are being executed. We’ll check whether all of the drivers are assigned and check the memory dump settings. Now what you’re going to see here is that several of the drivers are unsigned and this is expected, and so this is going to throw what’s called a warning. There are three different results that you can get from each of these tests; the first result is a check saying everything works perfectly, the second result is a warning saying the cluster will work but perhaps there’s a best practice that you’re not following and you should investigate further. Alternatively, you could get a red X which says that there is actually a problem with this individual component and you need to fix it to have the cluster supported. While this is running in the background, I’m actually going to open one of the validation tests that I ran on this cluster yesterday and we can get a quick overview of all of the information that’s provided here. We can see we have a few warnings under networking and we have a few warnings under our system configuration, which is actually validating all the signed drivers. So as we look here we see a few drivers aren’t signed, it’s a best practice to have them signed for security reasons. Let’s take a quick look at some of the network warnings that we have in the cluster. It’s validating the network communication, as we scroll down we could quickly see that the warning says we can only access each of these nodes by one network interface. By having only a single network interface you’re introducing a single point of failure and it is a best practice to have multiple network paths to each of your server nodes. So as you can see the cluster will function correctly but there are a few best practices I’m not following which have been flagged here and I would probably want to fix before putting this in production. As we see the report has been completed but you have a few warnings, and we already know what those warnings are, so let’s click finish and now let’s move on to actually deploying the cluster. regardless of which nodes are up and running. So this network name is actually made highly available and it can failover between cluster nodes. Therefore, as the cluster administrator when I want to click next to the running cluster nodes I can simply use this friendly name and I will get connected to one of the active nodes regardless of which nodes in the cluster are up or down at that time. Therefore, in this case we’ll call this our disaster recovery cluster and we’ll call this MVA. Now we’ll quickly validate that this object does not already exist in Active Directory, we get a confirmation screen, which contains all the information we want. Because I’m using DHCP in my environment we detect this automatically and give it a DHCP address so there’s no worrying about this additional IP address management. Because we’re in a private cloud, we want everything to be dynamically provisioned automatically to reduce this management overhead. So we’ve extended this model into the cluster, and any time you create a cluster or a clustered group, a clustered workload, if DHCP is available we will automatically get one of those IP addresses for you. We’ll confirm and we’ll see the cluster created. In addition, we can see in just a minute we have now created a highly available cluster. We now get an additional report that we can see which just keeps track of all of this information about what happened in the cluster. Now, one of the nice things about failover clustering is that anytime you do something on a cluster a report is created and it’s automatically stored on every node in the cluster. This adds a lot of flexibility with the administration and compliance, because rather than having the administrator always do this inventory and reports, all of the information is automatically saved. Now we’ve created our cluster and we can see from the left navigation pane that we now have a cluster MVA available. As we navigate here we see a services and applications node, this is where all of our VMs will be hosted. As we expand the nodes we see the four nodes in our cluster. As we look at the storage we can see all of the clustered storage, and this was automatically added to the cluster when the cluster was created because we saw that all the storage was available to every node in the cluster, hence making it a logical candidate for our clustered shared storage. As we look at the networks we see two different cluster networks and cluster events will provide us with some information about what has actually been running in the cluster. As you can see, creating a highly available cluster for your private cloud fabric is relatively simple and a little later we’ll jump to a Hyper-V cluster and take a look at what we need to do to deploy highly available VMs. But before that we’ll jump back to the presentation and discuss more of the planning and deployment concepts for your private cloud. Now that we have our cluster up and running let’s start to think about some of the workloads that we will deploy on the cluster and understand how they will be viewed and accessed in the cluster. I’ve run my full suite of validation tests and I’m sure that my cluster will work correctly. I’m going to launch the create a cluster wizard, it’s going to ask me for my server names again so I’m going to specify the same four names of the nodes that I’ve just tested. And then we’ll see just how quick and simple it is to create a failover cluster. If you’ve done clustering in the past back in the Windows NT days or even Windows Server 2003, you will know that it was a somewhat complicated process that had a white paper. Here you’re going to see that just a few short configuration steps get the cluster up and running. Now, you’ll notice a warning here and this is because I did not just run the full suite of validation tests so it reminds me that I should run this full suite of tests before I deploy the cluster. However, I’ve done this already; I know that everything’s looking good, so I’m going to ignore this warning for now. After it’s asked me what servers I want, it’s going to ask me what is the actual name of the cluster that I’m going to use so this will be a friendly name used by the cluster. Now the next question that we get asked is what is the friendly name for the cluster, which will be a name that we can always use to access the cluster 19 Any time you deploy a highly available workload, it gets placed in a cluster group. A cluster group is a single logical unit of failover, so when a group needs to move from one node to another, for planned maintenance or because of a crash, everything in that group will move over. As we look at virtual machines, the components inside them are the virtual machine itself, the virtual machine configuration file, and access to the cluster shared volume disk. This model assumes we’re using cluster-shared volumes. Our distributed file system for failover clustering only supports Hyper-V, so if you’re deploying this private cloud fabric for other workloads such as file server or DHCP, and you’re using traditional cluster storage, you will have a cluster disk that’s bound within that group. Thus when the group moves from one node to another node, that disk will actually be dismounted from the first node and remounted onto the second node. The other major difference between non-virtual machine groups is that there’s also a network name associated with the non-VM groups. This is an easy way to access those workloads. Under each network name are multiple IP addresses, and these could be IPv4 or IPv6 addresses. Additionally in this group, we have a highly available workload such as a print spooler or a file server. The concept of these groups is the same; things can move from one node to another node and everything that’s contained within that group moves between the nodes. However, the virtual machines’ friendly name is included in the actual name, so we don’t have this additional network name requirement, and the IP addresses can be configured inside the virtual machine configuration file. 20 As we deploy cluster shared volumes (CSV), a critical component of Hyper-V helps enable live migration. Once the cluster is up and running, you’ll see an option which says to enable cluster-shared volumes on your cluster. This is required to enable CSV because there is an end user license agreement which must be accepted before using cluster shared volumes. This agreement simply says cluster shared volumes can only be used for Hyper-V, and not for anything else. The reason for this is cluster shared volumes were designed and optimized for the Hyper-V workload. Using this logic, we can figure out how traffic should be routed through a node which handles changes to the file system, versus what type of traffic can be sent directly to a disk. Because we’re assuming this is a Hyper-V workload, all of the algorithms are based around distributing the traffic into these two categories. Thus, the cluster shared volumes will not function correctly for non-Hyper-V workloads. After you have accepted the end user license agreement, assuming that you’ve deployed your SAN fabric and you’ve added the disk to the cluster, you can enable CSV and add disks to the CSV. Adding a disk to a cluster shared volume is simple: you select the disk, add it to the cluster shared volume, and it will be available and accessible by every node in your cluster. CSV and live migration are complementary technologies. You can have a live migration without cluster-shared volumes, and you can use cluster shared volumes and never use live migration. However, the benefits to CSV are that it removes the amount of time it takes to do a failover and to move components from one node to another. If we use a traditional clustering disk and we have a failover, we have to dismount the disk from the first node and then remount it to the second node. This operation can take an extended amount of time and so any clients trying to connect during that time most likely will be lost. However, using cluster shared volumes every node can simultaneously access the CSV disk. This means that when reconnecting a client from one node to another, such as during a live migration, we don’t need to dismount the disk. We can simply update the routing table on the closest router. Instead of sending that client to our first node, it is sent to the second node and there’s no additional operation of dismounting and remounting a disk. This means from the clients’ perspective, they can move from one server to another with no downtime. They simply get connected to a different back end. However, if we’re trying to dismount and remount a disk as part of this process, there will be some longer downtime and this client will be disconnected. 21 Let’s talk further about the networking and connections. 22 When deploying a cluster in our private cloud environment, we need to consider the firewall rules for connection security purposes. For most of the components we’ve discussed, the firewall rules will be configured automatically. When the Hyper-V role is installed, all of the ports that are needed get enabled. When failover clustering is installed, the appropriate ports get enabled as well, so it’s relatively simple to manage it. However, because the actual ports which are being used by each service are well known, there is a security best practice to change them in your organization. This makes it harder for attackers to know which port they should be trying to hit. For example, instead of always using port 18 or port 443 as your default, consider using port 90 or port 453 to mix things up. The one exception with firewall rules is anytime you have a clustered file server, and you need to manage file shares remotely from other servers, you should enable the remote volume management firewall setting on every server and on every node. This would be a requirement if you’re trying to use a clustered file server, or to make Virtual Machine Manager highly available. Remember the Virtual Machine Manager library is simply a clustered file share. Thus if you’re making it highly available, you’re making a clustered file share, and hence you need to enable the remote volume management firewall setting. 23 We want to design cluster networks so that they are separated by function. If you have networks that are separated by function, within that function you can have multiple redundant networks. For example, one type of function that we like to isolate is for cluster traffic. We call these cluster networks. This is going to include communication such as health checks between the nodes, updates to the cluster database or quorum model, other types of communication. With cluster shared volume traffic within this category of cluster networks, it’s a best practice to have at least two networks for redundancy. Thus, if one of these cluster networks is unavailable, all of the communication can fall back to a second cluster network. With public networks we allow communication with our clients or with applications. The cluster will be hosting a service or a VM, yet we still need end users to connect to that cluster, and then connect to that VM or to that service. Thus, we want to have dedicated networks for the public for efficiency but also for security. An example would be to protect against any type of denial of service attacks, where a client is flooding this public network, and we don’t want to have any other impact on our back end network or other infrastructure. Additionally, a best practice is to separate the storage networks. Anytime you’re using a storage protocol over an Ethernet network such as with iSCSI, or Fiber Channel over Ethernet, having separate functionality for this network is a best practice to help isolate the traffic. A reason to separate and isolate these networks is that we don’t want the functionality, or the use of one network to affect how the other networks behave. For example, if we have a lot of storage traffic being sent over a network, we don’t want that to flood the network and prevent health checks from going through. If a health check is unable to go through between nodes, it could potentially trigger a false failover because one cluster node thinks another cluster node is down. By isolating these networks by function, we have the ability to have higher availability by ensuring that none of them affect the other. 24 As we extend the model to think about virtualization, it is a best practice to have one or more dedicated networks for Hyper-V. The first network is for Hyper-V management. This is essentially providing an isolated network for the host administrator to do virtual machine management tasks. An example of this is deploying a new virtual machine that requires copying an ISO file from a VMM library. The file copy will be large, but by providing this dedicated network to do the file copy, we ensure faster performance, and we ensure that that new VM can be provisioned as quickly as possible. Additionally, a dedicated network for live migration traffic is recommended as this involves pushing a lot of memory from one server to another as quickly as possible. Every time a live migration happens, we flood the network and we don’t want this affecting other cluster functionality through causing us to miss heartbeats and triggering a false failover. Having these networks isolated will ensure they’re used correctly for each function. When we configure these networks on the cluster, there are settings that you can change. You have settings that can be changed from the cluster property, and you have a network for live migration settings which can designate settings to the VM. Beyond that you can have more detailed granularity by using a feature called network prioritization. While most of these roles can be configured using the GUI, if you really want to go to a granular level you can give each cluster network a value from one to 65,000, and based on the numerical value of that network, it can be assigned a different function. The lowest network that you assign will be used for internal and cluster communication, the second lowest for live migration traffic, and the very highest network that you can assign will be used for the public traffic. By default we are going to assign the public traffic network a high number if we find any network that has a default gateway. Meaning if we find a cluster network that has access to the outside world, we assume that that’s going to be used for your public communication. Likewise, when we see networks that do not have a default gateway, we assume that these will be used for internal cluster communication and we give them a lower value. The actual order that they initially get assigned is based on the order the cluster sees these networks when the cluster is first started up. Additionally, you have the ability to change the settings using the network prioritization feature. 25 26 You can create either a private network or an internal network to isolate the virtual machines from the physical network. Virtual networks are different than the networks we assign for the host. Up to now we’ve deployed the private cloud infrastructure, got the cluster up and running, and the network set up between the hosts. Now we want to extend these networks from networks that go host-to-host, to networks within a host. When we are deploying lots of VMs on a particular host, we need to manage how all of those VMs interact with the networking components on that host. With virtual machines there are three primary types of virtual networks which can be configured using the virtual network manager. The most common is external networks, and this is where a VM can actually communicate with the rest of the enterprise, with end users, and with customers. Basically what happens with the external network is the VM can speak to other components through the physical NIC. In addition, each physical NIC on a host can access one virtual network, but you can have many VMs on a single virtual network. The second type is called an internal network. This is where we have the ability for the VMs to communicate with other VMs on that same host, and with the host itself. This is useful when you want to isolate a domain, or isolate an environment onto one particular host. The third type is a private network. In this case, the VMs can only talk to each other, and not to the host. This is most often used when testing in a secure environment, when there is pre-released code, or to isolate what these VMs are doing from the rest of the world for security or compliance reasons. 27 As we look at some of the additional virtual network management utilities, we also need to think about configuring the MAC addresses. A MAC address is a unique identifier for a machine, but the whole concept to MAC addressing has been switched with virtualization. This is because physical machines were bound by a single MAC address, yet virtual machines can be created at any time and can be assigned any type of MAC address. While these used to be globally unique, with the world of virtualization quickly provisioning and de-provisioning VMs, different MAC address management techniques had to be adopted. MAC addresses are managed with virtual machines through a pool of addresses assigned on a particular host, and every time a VM is brought up, it is automatically assigned a dynamic MAC address. MAC addresses give us the ability to change what every VM on that particular host will use. However, if you start to use multiple hosts, you need to consider that if you move a VM from your first host over to your second host and have a MAC address conflict within a particular host, there is no multi-server MAC address management. However, System Center Virtual Machine Manager has global MAC address management. This will keep the MAC addresses across the entire environment separated and isolated. 28 Next, we need to address using virtual local area network (VLAN) tags. A VLAN gives us the ability to expand and virtualize any type of logical network to spread it out across a group of machines, or even spread it out across multiple data centers, by essentially abstracting the physical networking requirements to the virtual layer. We have a lot of flexibility with how we allow machines or different servers to communicate within a virtualized environment. Using VLAN tags we have the ability to assign a property to a particular VLAN and give it a unique number within the environment. Using this we can say that certain virtual machines are only able to function on virtual VLANs, so if I have a VM that uses the VLAN tag of eight it is only able to function on my VLAN number eight. Beyond the ability to isolate the host, this gives us some additional functionality and additional flexibility to designate unique networks not just for particular VMs, but to have unique networks for host management or for VM management. As well, for VMs which are allowed to connect to external networks, or just VMs that are allowed to connect to internal networks. So configuring this not only has to be done on the actual network adaptor, but also on the virtual machines. 29 A primary usage for VLANs is for security. We’ve already discussed isolating the host and the VM network, but also consider using a dedicated network adaptor for the host management. We want to have an external network type that VMs can communicate with not just each other, but also with the outside world. We also need to think about whether the management operating system should be able to use that networking adaptor. Generally, we want to separate usage. We want to say the Hyper-V host administrator has a dedicated network adaptor and a dedicated network. However, sometimes this is not possible. Sometimes a certain blade chassis might only have two or four NICs, limiting the number of network connections that you have. Using the settings allows management operating systems to share this network adaptor and you can toggle between whether or not you want host management using the same network as your virtual machines. 30 Now that we’ve got our network and fabric deployed, we will talk about clustered VMs. At this point we’ve deployed our operating system, we’ve set up the networking, we configured the storage, we created our cluster, but now we want to put virtual machines on the cluster. When you deploy a clustered machine you can do it straight from Failover Cluster Manager. There is an integrated cluster manager experience where most of the wizards and management functionality from Hyper-V Manager has been pulled from. Deploying a new, highly available VM is as simple as completing the traditional new VM wizard where you specify a VM name, location, memory, networking, and virtual hard disk. 31 Now I’ve switched over to another cluster, and as you can see this cluster has several virtual machines deployed on it as well as several other workloads. In fact, every other inbox workload that’s available on the cluster. Now this is just for demo purposes as I am not following a best practice here. The best practice is to have dedicated Hyper-V clusters and then clusters for everything else. But as you can see I’ve mixed file server, DHCP, print, and a whole lot more with my virtual machines simply because this is a demo environment. The reason why it’s a best practice to separate your Hyper-V hosts from other types of clusters is because of the memory requirements VM place on the host. Remember you’re now running lots of servers on top of a single server when you’re using virtualization so you want to limit what else that host cluster is doing. In addition, by removing other types of services, roles, or activities that that host is doing such as having a file server cluster workload with it, you’re reducing the workload on that host so it’s going to be more responsive with how it handles the virtual machines that it’s running. Nevertheless, let’s actually take a look at a Hyper-V virtual machine deployment from a cluster. Now as we actually navigate the storage here you’ll see that I actually have several disks which are listed as cluster shared volume disks. This is because I’ve enabled cluster shared volumes and I’ve added disks to CSV. If I look at my CSV disk node, I see these same four disks. Now one of the things that you’ll notice with how CSV is deployed is that my file path is different. You can see that it’s listed as c:\clusterstorage and volume number and this differs from the more traditional cluster storage which uses a drive letter. But remember cluster shared volumes allow data to be accessed from every node in the cluster, not just a single node, and so the way that this technology works is actually a repoint pass placed in this file location at c:\clusterstorage. What this means is that node one can always access the disk by going to c:\clusterstorage, so can node two, node three, all the way to node 16. By having this consistent file path to the cluster shared volume disk, every VM always knows how to access it. So we’ve reduced the complexity of managing all of these different drive letters which could be different as a disk is between different nodes. Because we’re no longer dependent on the drive letters, we have the ability to have a lot more CSV disks than just the 24 traditional cluster disks. Now as I actually launch my computer, c, clusterstorage I can see these same four disks volume 2, 3, 4, and 5. I’ve already done this ahead of time and I’ve actually gone and deployed a VHD in one of these. You can see my Windows XP probe VHD file and when I created a new VM, I’m going to point to this VHD file which is already sitting on my cluster disk. Now let’s take a look at creating one of these highly available virtual machines. I select a new service application and from here, you see the traditional VHD or new virtual machine option that you would from Hyper-V Manager. In addition, as I launch this I actually get the new virtual machine wizard that you would traditionally see from Hyper-V Manager. Therefore, it’s going to ask me to specify a name so I’m going to call this VM demo for MVA and then it’s going to ask me where do I actually want to store this virtual machine. Remember, I’m on a cluster, I need to store all the application data in shared storage so I’m going to select one of my CSV disks. It asks me the amount of memory I want to assign, and if I want to connect it to any virtual networks. Then it asks me if I want to create a VHD, use an existing one, or attach one later. In this case I’m going to browse to the disk on volume 5 that I already have, select that, confirm, click next and then I get my confirmation page. Therefore, what you’ve seen up to this point is still the traditional new virtual machine wizard but once I click finish you’re going to see another wizard launch You’re not only going to see the virtual machine created but now you’re going to see the new highly available wizard and we are now creating a new highly available virtual machine. Therefore, you can see through this integration that we not just created a VM, but we immediately added it to the cluster and made it highly available. Again, we can see a report of everything we did if we wish to later, or for now we’ll just click finish. And we can see that this new VM is now available in my cluster. By default all of the VMs get placed in a stopped state. The reason why we don’t bring them immediately online is because there is usually configuration that needs to be done, but through Failover Cluster Manager I can still connect my VM if I wanted and I can change all of the settings just like through Hyper-V Manager. Therefore, I can change the storage, the memory, the networking, the integration services, the snapshot location, everything I want is through Failover Cluster Manager. Once that’s configured I can start the VM, I can connect to it if I need, and I now have my highly available virtual machine. Now, one caveat here is any time you actually manage a virtual machine that’s on a cluster, make sure you do it through Failover Cluster Manager. The reason for this is that Hyper-V Manager is not cluster aware. Therefore, if you manage a clustered VM through Hyper-V Manager it will not detect any change and it will not replicate those changes to the rest of the cluster. But if you do it through Failover Cluster Manager or through System Center Virtual Machine Manager it is cluster aware and so those changes will get pushed to every node in the cluster. With that, we’ve deployed our highly available virtual machine. 32 As we wrap up the configuration portion of the private cloud infrastructure, we believe that you should have a better understanding of best practices to deploy and configure your private cloud environment. There are many tools that you can use, from Windows deployment services and the deployment toolkit, to iSCSI targets, and then the failover clustering validation wizard. Always keep the best practices in mind for deploying things and getting them right when you first set up the fabric, rather than having to adjust it later once the resources are already in production. As the final tip, don’t forget any time you’re managing your highly available virtual machines make sure that you do that through Failover Cluster Manager or System Center Virtual Machine Manager to ensure that the changes are understood and reflected across all nodes in the cluster. We hope that you will join the third part of this series where we are going to talk about the management of the private cloud infrastructure, and look at some of the more advanced techniques and best practices to keep the infrastructure that you’ve just deployed up and running. 33 This video is part of the Microsoft Virtual Academy. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, BitLocker, BizTalk, Excel, Forefront, Hyper-V, Internet Explorer, Lync, Microsoft Dynamics, PerformancePoint, SQL Server, Visual Studio, Windows, Windows Server, and Windows Vista are registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 34