Private Cloud Infrastructure Module 2 - Configuration

This video is part of the Microsoft® Virtual Academy.
1
In this module, we are going to learn best practices for configuring private cloud infrastructures. The
first module of this course covered private cloud planning and deployment considerations. This
module focuses on configuration, requirements, and design guides for deploying networking,
storage, and a highly available fabric. The third module in this course will look at management
practices for the private cloud.
2
Let us discuss the key configuration requirements of servers, software, storage, networking, and high
availability. This builds the foundation and the fabric for a virtualization platform.
These components are designed to ensure high availability and virtualization in the data center. The
best practices we discussed apply to an integrated and distributed private cloud model, and typically
would not work in a purely isolated, standalone host type of environment.
3
First, let’s take a look at some of the key considerations when deploying your server.
4
The first tool that we recommend using is Windows® Deployment Service. This is built into the
Windows Server® SKUs, to provide the ability for remote operating system installation, can be fully
automated, and customized using unattended installation scripts and files.
This gives us the ability to deploy large-scale enterprises. Essentially the process is to get an ideal or
golden image that we want to use in our environment. Being able to clone that image, system
prepare it as necessary, and make sure that it has unique components such as names and domains.
Following this, we can install it.
Once the installation process has started, the installer can look at the unattend file. This is an XML
file containing a list of mappings for certain key components of that particular server, such as the
name of that machine, which domain it should be a part of, user names, passwords, administrators,
features, programs that should be added and server roles such as Hyper-V® Server.
5
The Microsoft Deployment Toolkit (MDT) helps execute fully automated zero touch installation
deployments using Microsoft System Center Configuration Manager 2007 Service Pack 2, and
Windows deployment tools. For those without a System Center Configuration Manager 2007
infrastructure, MDT uses Windows deployment tools for light touch installation deployments.
Windows Deployment Service can be extended by using one of these solution accelerators, which
are free downloads from Microsoft. With the Microsoft Deployment Toolkit 2010, this provides even
greater flexibility and management options and can be added on top of the Windows Deployment
Service. We can now deploy operating systems, drivers, servers, and applications in addition to
Windows updates.
This gives us a broad deployment baseline rather than just deploying the operating system; we get to
deploy additional components within the operating system as well. Using this technology will help
reduce the overall deployment time needed. The Microsoft Deployment Toolkit supports Windows
Server 2008 R2 and Windows Server 2003. On the client side, it also supports Windows 7, Windows
Vista®, and Windows XP. Furthermore, the deployment toolkit will deploy other Microsoft software
such as Microsoft Office. Thus if you need to deploy Microsoft Word and Microsoft Excel®
spreadsheet software to client machines, the deployment toolkit can do this as well.
6
Another consideration when configuring your private cloud is the management packs that you will
need when using System Center Operations Manager. System Center Operations Manager gives us
the ability to monitor the entire environment, not just the physical hardware but also virtual
machines and applications running on the servers or running in the virtual machines.
However for System Center Operations Manager to work it needs management packs, which provide
alerts for Operations Manager to detect whether there is a problem.
The management pack is installed on the central Operations Manager server, and receives alerts
from the servers in the environments that we are monitoring. To do this, we deploy agents on all of
the servers and the VMs.
Thus when building your private cloud, consider deploying management pack agents at the same
time. This will simplify management of the environment, and get the System Center Operations
Manager infrastructure up and running much quicker.
7
8
Let us look at the different mechanisms to use the server manager features and roles.
The first and most common is the Server Manager graphical user interface. This utility has “addroles” and “add-feature” wizards that provide the ability to add entire roles and service sub-roles. For
example, you could add, or decide not to add, the file server resource manager, or DFS replication.
If you are thinking of a broad private cloud deployment, consider using automation so all the server
manager roles and features can be deployed from a script. You can use either command line with
DISM, OCSetup, or PowerShell with the “add Windows feature” cmdlet.
9
Hyper-V Server is a free download that supports host-based virtualization. This is actually not
branded Windows Server because it does not include all of those additional Windows Server
features and roles such as Microsoft Active Directory® directory services or file serving. It is purely a
virtualization platform; however, it still supports high availability.
Configuration on Hyper-V Server is a different operation then traditional server configuration. As it is
built on Windows Server core, it has no built-in graphical user interface. If a GUI is required to
manage it, it is still possible to manage it remotely using tools such as Remote Server Administration
Tools, Hyper-V Manager, Failover Clustering Manager, and System Center Virtual Machine Manager. If
you are working with the Hyper-V Server directly on the server, you are will see a command-line
based menu.
Using a command-line only interface, it is still relatively easy to configure the Hyper-V server. Using
the numerical system you can easily join domains and workgroups, change the computer name, add
or remove administrators, manage the remote connections, apply updates, change update settings,
and enable remote desktop. Further, you can change more granular settings such as network
configuration and the time on the device. Finally you have the ability to enable or disable the failover
clustering feature.
Because the Hyper-V server is designed purely as a virtualization platform, the Hyper-V role is
automatically enabled. One of the biggest benefits of Hyper-V Server Is that it does not include many
of the additional features and roles of a standard Windows Server, such as the Windows Internet
Explorer® Internet browser. That not only gives Hyper-V Server a smaller installation footprint, but it
has a smaller attack surface as well. Because there are fewer components exposed to the public,
there are fewer components that are vulnerable, and fewer components which need patching. With
fewer patches that means there will be fewer updates, helping to maintain higher availability than
most traditional operating systems, including Windows Server core.
10
Configuration through group policy requires Active Directory, as it allows us to centrally manage and
deploy settings across the entire environment. When a server comes into the domain, a group policy
update can be forced on it. This group policy update will communicate to the local Active Directory
server, and pull down the appropriate settings based on what that computer’s role is. For example,
the server could be placed in a test organization or a production environment, and based on that
setting it will have different security configuration network policies or other types of policies applied
to it.
It is a best practice to use group policy whenever you are deploying a broad private cloud
infrastructure because it could provide consistency across all of the server components, instead of
having to alter the settings on a server-by-server basis. Using group policy can ensure that you get
consistent ideal settings, and avoid something to be missed.
11
There are some configuration considerations for storage as well. When using a private cloud, we are
abstracting the physical resources from the services and applications. This means we will be using a
storage area network to host the storage for this private cloud.
12
There are four different types of storage area networks that are supported; these include Fiber
Channel, serial attached SCSI, Fiber Channel over Ethernet, and iSCSI. In this module we are going to
focus on iSCSI as the primary solution. Originally, iSCSI Software Target was only available on
Windows Server Storage Server. However, it has been recently made publicly available as a free
download from the Microsoft website.
The advantage to iSCSI is that it communicates over Ethernet, and allows you to manage everything
over your existing network. This avoids configuration of a storage area network, getting customized
HBAs, and building a brand new fabric.
However, storage traffic over your traditional networking infrastructure could affect regular network
communication. Thus it is a best practice to isolate these networks when you are deploying your
fabric, giving you dedicated networks for storage and dedicated networks for networking, even if
they are running over the same Ethernet protocols.
13
Internet Small Computer System Interface (iSCSI) has become a viable alternative to Fibre Channel
SANs as organizations have deployed networks that support multi-gigabyte per second connections.
In many cases, iSCSI SANs are cheaper than Fibre Channel based SANs, and they typically do not
require the same level of specialized skills to manage the SAN.
The iSCSI target is the component which receives the information on the server that contains the
storage. To connect to the iSCSI target, you need to use an iSCSI initiator. The iSCSI initiator ships on
all versions of Windows Server and can be used with any type of iSCSI target, not just the Microsoft
iSCSI target.
The iSCSI initiator tells the server that it must connect to an iSCSI target. It is recommended to use a
dedicated NIC, and when you configure the target, you have the option to enable automatic
reconnections. It is a best practice to select this check box to ensure that when there is interference
with your storage or networking path, and you lose contact with the iSCSI target, you can
immediately reconnect to avoid any long-term interruption of service.
14
When configuring iSCSI there is an operation that has to go on between the iSCSI target and the iSCSI
initiator. First you start by creating the virtual disk on the iSCSI target. The iSCSI target can sit on a
server, which has direct attached storage, and you can use this as your simulated storage area
network. However, that can be a single point of failure if that server becomes unavailable. Thus it is
recommended to use an iSCSI target and remote storage, which is accessible through multiple paths.
Once the virtual disks are created on the iSCSI target, regardless of where they are hosted, go to the
initiator and request access to the disk by specifying the targets’ IP address or friendly name. Then,
go back to the target and accept the requests from the initiators. It is best to do this all at once for
multiple initiators. You can write a script, get all of them to request access to the disk at once, then
go to the target and accept all of the connections at once as well. Once the target has accepted the
connections, you go back to the initiator, and if you click Refresh Configuration, you can see that it is
connected and that it can see the target.
For each initiator, you have to explicitly log onto the target to enable the automatic reconnections.
At this point you have the iSCSI target and the iSCSI initiators speaking to each other, they can
communicate and see all of the other disks.
However, if this is a fresh deployment, do not forget that the disks will be in a raw state and they
won’t be formatted. Thus, before you do anything with the disk, go back into disk management on
one of the initiated targets, change those disks to use a simple disk or span disk for example, and
format the disks.
Even though multiple iSCSI initiators can see this disk, you only need to initiate the storage from one
of those initiators. As this is shared storage, any operation you do from the first node, such as
formatting the first disk, will be seen by all of the other nodes, so you only have to do this once.
After all of your disks have been formatted, named, brought online, and initialized, you can then use
them for your VMs or for part of your cluster.
15
Let’s discuss high availability and planning considerations. At this point, we assume that we have our
servers deployed, our storage is configured, and we are going to build our cluster.
16
One of the first considerations when planning a cluster is cluster
membership. The way cluster membership is designed is a concept
called a quorum. The idea of a cluster membership is to avoid a
“split brain” or partition. For example, in a three-node cluster, one
of the nodes gets partitioned and cannot speak to the other two
nodes in the cluster. Thus we do not know if the first node should
run a VM, or whether the other nodes should run a VM.
the nodes.
This means that if the first node can see this cluster disk as part of its
partition, it has two votes. It has the vote from the disk, and it has
the vote from the node. The isolated node in the second partition
will stay offline because it knows it does not have majority. Now, if
that disk or partition changes and the second subset can now see
the disk, and the first subset cannot, then that partition now has
The challenge is we want to make sure that both subsets of nodes do majority that will stay online and the first one will shut down.
not run a VM, think that they are the owner, start connecting clients,
and write data to a backend VM VHD as this can cause corruption. If Extending this concept further into a private cloud withe distributed
we have uncontrolled and simultaneous writing to a disk for multiple servers around the world, is node and file share majority. In this
writers, this could mean that we’re writing data over other pieces of scenario it’s similar to the node and disk majority except rather than
data. Thus we want to make sure that we access the disk in a
giving a cluster disk a vote, we actually place a vote in a remote file
coordinated fashion. Therefore, the way we get around this concept share. This remote file share just has to be accessible by every node
of having a split brain with multiple active partitions in a cluster is
in the cluster so it doesn’t need to be in a data center.
that we always enforce one subset of nodes to stay online, and one
subset of nodes must always shut down.
This is going to give you the ability to have distribution worldwide.
You might have a few nodes in your first data center; let’s say four
A simple way to do this is by using a voting algorithm where we say
nodes in data center one. You have four nodes in data center two,
that only the partition that has a majority of nodes stays online. In
and then you can have this final vote, this file share witness, in a
this scenario where we had a partition of one node, and a partition
third remote location. Under this scenario you can lose your first
with two of the three other nodes, that partition with two of the
data center and so long as the file share witness has a vote and your
three other nodes stays online. It serves the VMs and keeps the
second data center is running you still have majority, and you stay
cluster up and running. Meanwhile that single isolated node is going online. Likewise you can lose the second data center but so long as
to go in standby mode where it will try to join the cluster, but it will
that first data center is online and has access to the file share
not start any VMs until it can communicate with the rest of the
witness you have majority.
cluster and understand what it should be hosting.
Therefore, the way we know whether one partition or the second
With a quorum you always need more than a 50 percent majority, so partition should own this vote is because one of the nodes in the
to make sure that we never get a partition which has 50 percent on
cluster will actually have an open file handle to the file, which is
both sides with any type of single networking interruption, we say
inside the file share witness. Remember only at one time can one
we always want an odd number of voters that contribute towards
item have this open file handle so one of those nodes in the cluster
the quorum. Thus it is recommended to have three, five, or seven
has access to this file share, it has an open file handle on the disk, so
nodes in our cluster, so that any time we have a single partition one essentially that node and the partition which contains that one node
of those subsets will always have more than 50 percent of the
is going to have that additional vote.
voters.
The final quorum model that we have is what we call disk only, and
However, always having an odd number of servers sometimes it is
in this case we give a cluster disk the one and only vote. However
not practical. In fact we see about 70 to 80 percent of all cluster
this is not recommended because this is a single point of failure; if
deployments still only using two nodes. Over time we are going to
you lose that one disk you’ve lost all of your votes and then you’ve
see larger clusters as people move towards the private cloud, but
lost your entire cluster. Really the only scenario that we see this
even with the current maximum in Windows Server 2008 R2, we
being used is what we call a last man standing model with the
have a 16-node cluster. This again leads to the question of how we
cluster, and what this means is, so long as any one of the nodes in
can have an odd number of nodes if we are bound by hardware
the cluster can access the disk, the cluster will stay online. So if I
constraints.
have eight nodes in my cluster I can actually lose seven of those
nodes and still keep the cluster and its services up and running, so
To get around this, we provide the ability to have other components long as that final node has access to that final disk. Nevertheless this
within your infrastructure provide one of these votes as well. Up to
might not be practical because if you’re now trying to run eight
now we said that every node in the cluster has one vote, and you’re nodes of capacity on a single node you’re probably going to run into
going to see this as the most traditional quorum type, which we call
some major performance issues. So best practice here is to use node
“node majority”. Here every node has a vote, and a partition must
majority wherever possible but consider using node and disk or node
have more than half of the votes to stay online. Thus as long as we
and file share majority if you have an even number of servers to get
have two of three votes the cluster can stay up and running.
that additional extra vote.
However, let’s say we only have a two-node cluster. One way that
we can get around this scenario is that we can actually provide a
vote to a disk, or any cluster disk, which can be accessible by all of
Now once we understand the quorum model, where we are going to place our nodes in the cluster,
the next step is to start creating it. As part of the cluster creation process there is a built-in best
practice analyzer called ‘validate a cluster configuration’. This tool will test every part of the cluster
environment. It’s going to not only test the nodes and make sure that they have the correct
software, the correct patches, the correct updates, but it’s going to test the network to make sure
that you have redundant networking path, access between all of the nodes.
It’s then going to test the storage, make sure it’s visible by all your nodes, and meet some of the
requirements for storage such as deporting persistent reservations. It’s going to inventory all of the
different components and if you’ve actually deployed the cluster already, it will give you additional
tests. Therefore, it’s important to note that this is a requirement when you actually deploy the
cluster. If you don’t have a validated cluster Microsoft support will not help; they will tell you go get a
validated configuration and then you have a supported cluster.
Now when you actually want to know whether you’re going to have a supported cluster, sometimes
it would be best to test the configuration ahead of time before you try to put anything on it in
production. So keep in mind you can run this validation tool before you deploy the cluster so long as
the failover clustering feature has been installed. You can run it when you deploy the cluster and you
can run it while the cluster is in production. Again, as this great troubleshooting tool this really helps
the product support team by understanding what is running in the cluster at the exact time.
18
In this demo we’ll show cluster validation as well as a cluster deployment. To launch
a failover clustering and deploy your first cluster, we’ll go to the administrative tools
and launch Failover Cluster Manager. At this point it’s assumed that failover
clustering has been installed as a feature on every node or every server that you
want to make as part of the cluster. As we launch it you see a traditional MMC 3.0
snap in, with the navigation pane on the left and management pane in the center,
and an actions pane on the right. First we’ll validate our configuration to make sure
that everything is suitable for clustering. Here we get to specify all of the servers
that will be part of the cluster, in this case I’m going to specify four servers and
create a four-node cluster. I’ve already installed the failover clustering feature on all
of these nodes, which will be part of the cluster. Now that I’ve selected my four
nodes it’s going to ask me whether I want to run all of the validation tests or just a
subset of tests. To have a fully supported cluster it is required to run the full suite of
tests. However, this can take between five minutes and a few hours, depending on
the number of nodes in the cluster and the number of storage disks. The reason for
this is that every single disk is brought online against every single cluster node to
ensure that it functions correctly. Therefore, as you increase the number of nodes
and the number of disks, the overall duration will scale up exponentially. Therefore,
for the sake of time we’re just going to select a few tests.
Here we get to see that there are tests around inventory, the network, the storage,
and the system configuration. Let’s compress these and only run the system
configuration tests. Here we get a confirmation about the tests which will be run
and we can see the tests are being executed. We’ll check whether all of the drivers
are assigned and check the memory dump settings. Now what you’re going to see
here is that several of the drivers are unsigned and this is expected, and so this is
going to throw what’s called a warning. There are three different results that you
can get from each of these tests; the first result is a check saying everything works
perfectly, the second result is a warning saying the cluster will work but perhaps
there’s a best practice that you’re not following and you should investigate further.
Alternatively, you could get a red X which says that there is actually a problem with
this individual component and you need to fix it to have the cluster supported.
While this is running in the background, I’m actually going to open one of the
validation tests that I ran on this cluster yesterday and we can get a quick overview
of all of the information that’s provided here. We can see we have a few warnings
under networking and we have a few warnings under our system configuration,
which is actually validating all the signed drivers. So as we look here we see a few
drivers aren’t signed, it’s a best practice to have them signed for security reasons.
Let’s take a quick look at some of the network warnings that we have in the cluster.
It’s validating the network communication, as we scroll down we could quickly see
that the warning says we can only access each of these nodes by one network
interface. By having only a single network interface you’re introducing a single point
of failure and it is a best practice to have multiple network paths to each of your
server nodes. So as you can see the cluster will function correctly but there are a
few best practices I’m not following which have been flagged here and I would
probably want to fix before putting this in production.
As we see the report has been completed but you have a few warnings, and we
already know what those warnings are, so let’s click finish and now let’s move on to
actually deploying the cluster.
regardless of which nodes are up and running. So this network name is actually
made highly available and it can failover between cluster nodes. Therefore, as the
cluster administrator when I want to click next to the running cluster nodes I can
simply use this friendly name and I will get connected to one of the active nodes
regardless of which nodes in the cluster are up or down at that time. Therefore, in
this case we’ll call this our disaster recovery cluster and we’ll call this MVA.
Now we’ll quickly validate that this object does not already exist in Active Directory,
we get a confirmation screen, which contains all the information we want. Because
I’m using DHCP in my environment we detect this automatically and give it a DHCP
address so there’s no worrying about this additional IP address management.
Because we’re in a private cloud, we want everything to be dynamically provisioned
automatically to reduce this management overhead. So we’ve extended this model
into the cluster, and any time you create a cluster or a clustered group, a clustered
workload, if DHCP is available we will automatically get one of those IP addresses for
you. We’ll confirm and we’ll see the cluster created. In addition, we can see in just a
minute we have now created a highly available cluster.
We now get an additional report that we can see which just keeps track of all of this
information about what happened in the cluster. Now, one of the nice things about
failover clustering is that anytime you do something on a cluster a report is created
and it’s automatically stored on every node in the cluster. This adds a lot of flexibility
with the administration and compliance, because rather than having the
administrator always do this inventory and reports, all of the information is
automatically saved.
Now we’ve created our cluster and we can see from the left navigation pane that we
now have a cluster MVA available. As we navigate here we see a services and
applications node, this is where all of our VMs will be hosted. As we expand the
nodes we see the four nodes in our cluster. As we look at the storage we can see all
of the clustered storage, and this was automatically added to the cluster when the
cluster was created because we saw that all the storage was available to every node
in the cluster, hence making it a logical candidate for our clustered shared storage.
As we look at the networks we see two different cluster networks and cluster events
will provide us with some information about what has actually been running in the
cluster.
As you can see, creating a highly available cluster for your private cloud fabric is
relatively simple and a little later we’ll jump to a Hyper-V cluster and take a look at
what we need to do to deploy highly available VMs. But before that we’ll jump back
to the presentation and discuss more of the planning and deployment concepts for
your private cloud.
Now that we have our cluster up and running let’s start to think about some of the
workloads that we will deploy on the cluster and understand how they will be
viewed and accessed in the cluster.
I’ve run my full suite of validation tests and I’m sure that my cluster will work
correctly. I’m going to launch the create a cluster wizard, it’s going to ask me for my
server names again so I’m going to specify the same four names of the nodes that
I’ve just tested. And then we’ll see just how quick and simple it is to create a failover
cluster. If you’ve done clustering in the past back in the Windows NT days or even
Windows Server 2003, you will know that it was a somewhat complicated process
that had a white paper. Here you’re going to see that just a few short configuration
steps get the cluster up and running. Now, you’ll notice a warning here and this is
because I did not just run the full suite of validation tests so it reminds me that I
should run this full suite of tests before I deploy the cluster. However, I’ve done this
already; I know that everything’s looking good, so I’m going to ignore this warning
for now. After it’s asked me what servers I want, it’s going to ask me what is the
actual name of the cluster that I’m going to use so this will be a friendly name used
by the cluster. Now the next question that we get asked is what is the friendly name
for the cluster, which will be a name that we can always use to access the cluster
19
Any time you deploy a highly available workload, it gets placed in a cluster group. A cluster group is a
single logical unit of failover, so when a group needs to move from one node to another, for planned
maintenance or because of a crash, everything in that group will move over. As we look at virtual
machines, the components inside them are the virtual machine itself, the virtual machine
configuration file, and access to the cluster shared volume disk.
This model assumes we’re using cluster-shared volumes. Our distributed file system for failover
clustering only supports Hyper-V, so if you’re deploying this private cloud fabric for other workloads
such as file server or DHCP, and you’re using traditional cluster storage, you will have a cluster disk
that’s bound within that group. Thus when the group moves from one node to another node, that
disk will actually be dismounted from the first node and remounted onto the second node.
The other major difference between non-virtual machine groups is that there’s also a network name
associated with the non-VM groups. This is an easy way to access those workloads. Under each
network name are multiple IP addresses, and these could be IPv4 or IPv6 addresses.
Additionally in this group, we have a highly available workload such as a print spooler or a file server.
The concept of these groups is the same; things can move from one node to another node and
everything that’s contained within that group moves between the nodes. However, the virtual
machines’ friendly name is included in the actual name, so we don’t have this additional network
name requirement, and the IP addresses can be configured inside the virtual machine configuration
file.
20
As we deploy cluster shared volumes (CSV), a critical component of Hyper-V helps enable live migration. Once
the cluster is up and running, you’ll see an option which says to enable cluster-shared volumes on your
cluster. This is required to enable CSV because there is an end user license agreement which must be
accepted before using cluster shared volumes. This agreement simply says cluster shared volumes can only be
used for Hyper-V, and not for anything else. The reason for this is cluster shared volumes were designed and
optimized for the Hyper-V workload.
Using this logic, we can figure out how traffic should be routed through a node which handles changes to the
file system, versus what type of traffic can be sent directly to a disk. Because we’re assuming this is a Hyper-V
workload, all of the algorithms are based around distributing the traffic into these two categories. Thus, the
cluster shared volumes will not function correctly for non-Hyper-V workloads.
After you have accepted the end user license agreement, assuming that you’ve deployed your SAN fabric and
you’ve added the disk to the cluster, you can enable CSV and add disks to the CSV. Adding a disk to a cluster
shared volume is simple: you select the disk, add it to the cluster shared volume, and it will be available and
accessible by every node in your cluster.
CSV and live migration are complementary technologies. You can have a live migration without cluster-shared
volumes, and you can use cluster shared volumes and never use live migration. However, the benefits to CSV
are that it removes the amount of time it takes to do a failover and to move components from one node to
another. If we use a traditional clustering disk and we have a failover, we have to dismount the disk from the
first node and then remount it to the second node. This operation can take an extended amount of time and
so any clients trying to connect during that time most likely will be lost. However, using cluster shared
volumes every node can simultaneously access the CSV disk. This means that when reconnecting a client from
one node to another, such as during a live migration, we don’t need to dismount the disk. We can simply
update the routing table on the closest router. Instead of sending that client to our first node, it is sent to the
second node and there’s no additional operation of dismounting and remounting a disk.
This means from the clients’ perspective, they can move from one server to another with no downtime. They
simply get connected to a different back end. However, if we’re trying to dismount and remount a disk as part
of this process, there will be some longer downtime and this client will be disconnected.
21
Let’s talk further about the networking and connections.
22
When deploying a cluster in our private cloud environment, we need to consider the firewall rules
for connection security purposes. For most of the components we’ve discussed, the firewall rules
will be configured automatically. When the Hyper-V role is installed, all of the ports that are needed
get enabled. When failover clustering is installed, the appropriate ports get enabled as well, so it’s
relatively simple to manage it. However, because the actual ports which are being used by each
service are well known, there is a security best practice to change them in your organization. This
makes it harder for attackers to know which port they should be trying to hit.
For example, instead of always using port 18 or port 443 as your default, consider using port 90 or
port 453 to mix things up. The one exception with firewall rules is anytime you have a clustered file
server, and you need to manage file shares remotely from other servers, you should enable the
remote volume management firewall setting on every server and on every node. This would be a
requirement if you’re trying to use a clustered file server, or to make Virtual Machine Manager highly
available. Remember the Virtual Machine Manager library is simply a clustered file share. Thus if
you’re making it highly available, you’re making a clustered file share, and hence you need to enable
the remote volume management firewall setting.
23
We want to design cluster networks so that they are separated by function. If you have networks
that are separated by function, within that function you can have multiple redundant networks. For
example, one type of function that we like to isolate is for cluster traffic. We call these cluster
networks. This is going to include communication such as health checks between the nodes, updates
to the cluster database or quorum model, other types of communication.
With cluster shared volume traffic within this category of cluster networks, it’s a best practice to
have at least two networks for redundancy. Thus, if one of these cluster networks is unavailable, all
of the communication can fall back to a second cluster network.
With public networks we allow communication with our clients or with applications. The cluster will
be hosting a service or a VM, yet we still need end users to connect to that cluster, and then connect
to that VM or to that service. Thus, we want to have dedicated networks for the public for efficiency
but also for security. An example would be to protect against any type of denial of service attacks,
where a client is flooding this public network, and we don’t want to have any other impact on our
back end network or other infrastructure.
Additionally, a best practice is to separate the storage networks. Anytime you’re using a storage
protocol over an Ethernet network such as with iSCSI, or Fiber Channel over Ethernet, having
separate functionality for this network is a best practice to help isolate the traffic. A reason to
separate and isolate these networks is that we don’t want the functionality, or the use of one
network to affect how the other networks behave.
For example, if we have a lot of storage traffic being sent over a network, we don’t want that to flood
the network and prevent health checks from going through. If a health check is unable to go through
between nodes, it could potentially trigger a false failover because one cluster node thinks another
cluster node is down. By isolating these networks by function, we have the ability to have higher
availability by ensuring that none of them affect the other.
24
As we extend the model to think about virtualization, it is a best practice to have one or more
dedicated networks for Hyper-V. The first network is for Hyper-V management. This is essentially
providing an isolated network for the host administrator to do virtual machine management tasks.
An example of this is deploying a new virtual machine that requires copying an ISO file from a VMM
library. The file copy will be large, but by providing this dedicated network to do the file copy, we
ensure faster performance, and we ensure that that new VM can be provisioned as quickly as
possible.
Additionally, a dedicated network for live migration traffic is recommended as this involves pushing a
lot of memory from one server to another as quickly as possible. Every time a live migration
happens, we flood the network and we don’t want this affecting other cluster functionality through
causing us to miss heartbeats and triggering a false failover. Having these networks isolated will
ensure they’re used correctly for each function.
When we configure these networks on the cluster, there are settings that you can change. You have
settings that can be changed from the cluster property, and you have a network for live migration
settings which can designate settings to the VM. Beyond that you can have more detailed granularity
by using a feature called network prioritization. While most of these roles can be configured using
the GUI, if you really want to go to a granular level you can give each cluster network a value from
one to 65,000, and based on the numerical value of that network, it can be assigned a different
function. The lowest network that you assign will be used for internal and cluster communication,
the second lowest for live migration traffic, and the very highest network that you can assign will be
used for the public traffic.
By default we are going to assign the public traffic network a high number if we find any network
that has a default gateway. Meaning if we find a cluster network that has access to the outside
world, we assume that that’s going to be used for your public communication. Likewise, when we
see networks that do not have a default gateway, we assume that these will be used for internal
cluster communication and we give them a lower value. The actual order that they initially get
assigned is based on the order the cluster sees these networks when the cluster is first started up.
Additionally, you have the ability to change the settings using the network prioritization feature.
25
26
You can create either a private network or an internal network to isolate the virtual machines from
the physical network. Virtual networks are different than the networks we assign for the host. Up to
now we’ve deployed the private cloud infrastructure, got the cluster up and running, and the
network set up between the hosts. Now we want to extend these networks from networks that go
host-to-host, to networks within a host. When we are deploying lots of VMs on a particular host, we
need to manage how all of those VMs interact with the networking components on that host.
With virtual machines there are three primary types of virtual networks which can be configured
using the virtual network manager. The most common is external networks, and this is where a VM
can actually communicate with the rest of the enterprise, with end users, and with customers.
Basically what happens with the external network is the VM can speak to other components through
the physical NIC. In addition, each physical NIC on a host can access one virtual network, but you can
have many VMs on a single virtual network.
The second type is called an internal network. This is where we have the ability for the VMs to
communicate with other VMs on that same host, and with the host itself. This is useful when you
want to isolate a domain, or isolate an environment onto one particular host.
The third type is a private network. In this case, the VMs can only talk to each other, and not to the
host. This is most often used when testing in a secure environment, when there is pre-released code,
or to isolate what these VMs are doing from the rest of the world for security or compliance reasons.
27
As we look at some of the additional virtual network management utilities, we also need to think
about configuring the MAC addresses. A MAC address is a unique identifier for a machine, but the
whole concept to MAC addressing has been switched with virtualization. This is because physical
machines were bound by a single MAC address, yet virtual machines can be created at any time and
can be assigned any type of MAC address. While these used to be globally unique, with the world of
virtualization quickly provisioning and de-provisioning VMs, different MAC address management
techniques had to be adopted.
MAC addresses are managed with virtual machines through a pool of addresses assigned on a
particular host, and every time a VM is brought up, it is automatically assigned a dynamic MAC
address. MAC addresses give us the ability to change what every VM on that particular host will use.
However, if you start to use multiple hosts, you need to consider that if you move a VM from your
first host over to your second host and have a MAC address conflict within a particular host, there is
no multi-server MAC address management. However, System Center Virtual Machine Manager has
global MAC address management. This will keep the MAC addresses across the entire environment
separated and isolated.
28
Next, we need to address using virtual local area network (VLAN) tags. A VLAN gives us the
ability to expand and virtualize any type of logical network to spread it out across a group of
machines, or even spread it out across multiple data centers, by essentially abstracting the
physical networking requirements to the virtual layer. We have a lot of flexibility with how we
allow machines or different servers to communicate within a virtualized environment.
Using VLAN tags we have the ability to assign a property to a particular VLAN and give it a
unique number within the environment. Using this we can say that certain virtual machines are
only able to function on virtual VLANs, so if I have a VM that uses the VLAN tag of eight it is only
able to function on my VLAN number eight. Beyond the ability to isolate the host, this gives us
some additional functionality and additional flexibility to designate unique networks not just for
particular VMs, but to have unique networks for host management or for VM management. As
well, for VMs which are allowed to connect to external networks, or just VMs that are allowed
to connect to internal networks. So configuring this not only has to be done on the actual
network adaptor, but also on the virtual machines.
29
A primary usage for VLANs is for security. We’ve already discussed isolating the host and the VM
network, but also consider using a dedicated network adaptor for the host management. We want to
have an external network type that VMs can communicate with not just each other, but also with the
outside world. We also need to think about whether the management operating system should be
able to use that networking adaptor. Generally, we want to separate usage. We want to say the
Hyper-V host administrator has a dedicated network adaptor and a dedicated network. However,
sometimes this is not possible. Sometimes a certain blade chassis might only have two or four NICs,
limiting the number of network connections that you have. Using the settings allows management
operating systems to share this network adaptor and you can toggle between whether or not you
want host management using the same network as your virtual machines.
30
Now that we’ve got our network and fabric deployed, we will talk about clustered VMs. At this point
we’ve deployed our operating system, we’ve set up the networking, we configured the storage, we
created our cluster, but now we want to put virtual machines on the cluster.
When you deploy a clustered machine you can do it straight from Failover Cluster Manager. There is
an integrated cluster manager experience where most of the wizards and management functionality
from Hyper-V Manager has been pulled from. Deploying a new, highly available VM is as simple as
completing the traditional new VM wizard where you specify a VM name, location, memory,
networking, and virtual hard disk.
31
Now I’ve switched over to another cluster, and as you can see this
cluster has several virtual machines deployed on it as well as
several other workloads. In fact, every other inbox workload
that’s available on the cluster. Now this is just for demo purposes
as I am not following a best practice here. The best practice is to
have dedicated Hyper-V clusters and then clusters for everything
else. But as you can see I’ve mixed file server, DHCP, print, and a
whole lot more with my virtual machines simply because this is a
demo environment. The reason why it’s a best practice to
separate your Hyper-V hosts from other types of clusters is
because of the memory requirements VM place on the host.
Remember you’re now running lots of servers on top of a single
server when you’re using virtualization so you want to limit what
else that host cluster is doing. In addition, by removing other
types of services, roles, or activities that that host is doing such as
having a file server cluster workload with it, you’re reducing the
workload on that host so it’s going to be more responsive with
how it handles the virtual machines that it’s running.
Nevertheless, let’s actually take a look at a Hyper-V virtual
machine deployment from a cluster. Now as we actually navigate
the storage here you’ll see that I actually have several disks which
are listed as cluster shared volume disks. This is because I’ve
enabled cluster shared volumes and I’ve added disks to CSV. If I
look at my CSV disk node, I see these same four disks. Now one of
the things that you’ll notice with how CSV is deployed is that my
file path is different. You can see that it’s listed as
c:\clusterstorage and volume number and this differs from the
more traditional cluster storage which uses a drive letter. But
remember cluster shared volumes allow data to be accessed from
every node in the cluster, not just a single node, and so the way
that this technology works is actually a repoint pass placed in this
file location at c:\clusterstorage. What this means is that node
one can always access the disk by going to c:\clusterstorage, so
can node two, node three, all the way to node 16. By having this
consistent file path to the cluster shared volume disk, every VM
always knows how to access it. So we’ve reduced the complexity
of managing all of these different drive letters which could be
different as a disk is between different nodes. Because we’re no
longer dependent on the drive letters, we have the ability to have
a lot more CSV disks than just the 24 traditional cluster disks.
Now as I actually launch my computer, c, clusterstorage I can see
these same four disks volume 2, 3, 4, and 5. I’ve already done this
ahead of time and I’ve actually gone and deployed a VHD in one
of these. You can see my Windows XP probe VHD file and when I
created a new VM, I’m going to point to this VHD file which is
already sitting on my cluster disk.
Now let’s take a look at creating one of these highly available
virtual machines. I select a new service application and from here,
you see the traditional VHD or new virtual machine option that
you would from Hyper-V Manager. In addition, as I launch this I
actually get the new virtual machine wizard that you would
traditionally see from Hyper-V Manager. Therefore, it’s going to
ask me to specify a name so I’m going to call this VM demo for
MVA and then it’s going to ask me where do I actually want to
store this virtual machine. Remember, I’m on a cluster, I need to
store all the application data in shared storage so I’m going to
select one of my CSV disks.
It asks me the amount of memory I want to assign, and if I want to
connect it to any virtual networks. Then it asks me if I want to
create a VHD, use an existing one, or attach one later. In this case
I’m going to browse to the disk on volume 5 that I already have,
select that, confirm, click next and then I get my confirmation
page. Therefore, what you’ve seen up to this point is still the
traditional new virtual machine wizard but once I click finish
you’re going to see another wizard launch
You’re not only going to see the virtual machine created but now
you’re going to see the new highly available wizard and we are
now creating a new highly available virtual machine. Therefore,
you can see through this integration that we not just created a
VM, but we immediately added it to the cluster and made it
highly available. Again, we can see a report of everything we did if
we wish to later, or for now we’ll just click finish. And we can see
that this new VM is now available in my cluster. By default all of
the VMs get placed in a stopped state. The reason why we don’t
bring them immediately online is because there is usually
configuration that needs to be done, but through Failover Cluster
Manager I can still connect my VM if I wanted and I can change all
of the settings just like through Hyper-V Manager.
Therefore, I can change the storage, the memory, the networking,
the integration services, the snapshot location, everything I want
is through Failover Cluster Manager. Once that’s configured I can
start the VM, I can connect to it if I need, and I now have my
highly available virtual machine. Now, one caveat here is any time
you actually manage a virtual machine that’s on a cluster, make
sure you do it through Failover Cluster Manager. The reason for
this is that Hyper-V Manager is not cluster aware. Therefore, if
you manage a clustered VM through Hyper-V Manager it will not
detect any change and it will not replicate those changes to the
rest of the cluster. But if you do it through Failover Cluster
Manager or through System Center Virtual Machine Manager it is
cluster aware and so those changes will get pushed to every node
in the cluster.
With that, we’ve deployed our highly available virtual machine.
32
As we wrap up the configuration portion of the private cloud infrastructure, we believe that you
should have a better understanding of best practices to deploy and configure your private cloud
environment. There are many tools that you can use, from Windows deployment services and the
deployment toolkit, to iSCSI targets, and then the failover clustering validation wizard.
Always keep the best practices in mind for deploying things and getting them right when you first set
up the fabric, rather than having to adjust it later once the resources are already in production.
As the final tip, don’t forget any time you’re managing your highly available virtual machines make
sure that you do that through Failover Cluster Manager or System Center Virtual Machine Manager
to ensure that the changes are understood and reflected across all nodes in the cluster.
We hope that you will join the third part of this series where we are going to talk about the
management of the private cloud infrastructure, and look at some of the more advanced techniques
and best practices to keep the infrastructure that you’ve just deployed up and running.
33
This video is part of the Microsoft Virtual Academy.
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, BitLocker,
BizTalk, Excel, Forefront, Hyper-V, Internet Explorer, Lync, Microsoft Dynamics,
PerformancePoint, SQL Server, Visual Studio, Windows, Windows Server, and Windows Vista
are registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of
Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.
34