Private Cloud - Architecture - addon slides

advertisement
Additional Info (some are still draft)
Tech notes that you may find useful as input to the design.
A lot more material can be found at the Design Workshop
1
Confidential
Internal Cloud: Gartner model and VMware model
Gartner take:
• Virtual infrastructure
• On-demand, elastic, automated/dynamic
• Improves agility and business continuity
Life cycle management
Configuration and change management
Performance management
Virtual infrastructure management
Virtual infrastructure
Physical infrastructure
2
Confidential
Orchestrator
Capacity management
Ext. cloud connector
Chargeback system
Enterprise service management
Identity and access management
Service catalog
Service governor/infrastructure authority
Self-service provisioning portal
Master / Slave concept
3 3
Confidential
Cluster: Settings
For the 3 sample sizes, here is my personal recommendation
• DRS fully automated. Sensitivity: Moderate
• Use anti-affinity or affinity rules only when needed.
• More things for you to remember.
• Gives DRS less room to maneuver
• DPM enabled. Choose hosts that support DPM
• Do not use WOL. Use DPM or IPMI
• VM Monitoring enabled.
• VM monitoring sensitivity: Medium
• HA will restart the VM if the heartbeat between the host and the VM has not been received within a 60 second interval
• EVC enabled. Enable you to upgrade in future.
• Prevent VMs from being powered on if they violate availability constraints  better availability
• Host isolation response: Shut down VM
• See http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
• Compared with “Leave VM Powered on”, this prevent data/transaction integrity risk. The risk is rather low as the VM itself has lock
• Compared with “Power off VM”, this allows graceful shutdown. Some application needs to run consistency check after a sudden power
off.
4
Confidential
DRS, DPM, EVC
In our 3 sizes, here are the settings:
• DRS: Fully Automated
• DRS sensitivity: Leave it at default (middle. 3 Star migration)
• EVC: turn on.
• It does not reduce performance.
• It is a simple mask.
• DPM: turn on. Unless HW vendor shows otherwise
• VM affinity: use sparingly. It adds complexity as we are using group affinity.
• Group affinity: use (as per diagram in design)
Why turn on DPM
• Power cost is real concern
Singapore example: S$0.24 per kWh x (600 W + 600 W) x 24 hours 365 days x 3 years / 1000 W = $5100
This is quite close of buying 1 server
For every 1W of power consumed, we need minimum 1W of power for aircond + UPS + lighting
5
Confidential
VMware VMmark
Use VMmark as the basis for CPU selection only, not entire box selection.
• It is the official benchmark for VMware, and it uses multiple workload
• Other benchmark are not run on vSphere, and typically test 1 workload
• VMmark does not include TCO. Consider entire cost when choosing HW platform
Use it as a guide only
• Your environment is not the same.
• You need head room and HA.
How it’s done
• VMmark 2.0 uses 1 - 4 vCPU
• MS Exchange, MySQL, Apache, J2EE,
File Server, Idle VM
Result page:
• VMmark 2.0 is not compatible with 1.x results
• www.vmware.com/products/vmmark/results.html
6
Confidential
This slide needs update
VMware VMmark
7
Confidential
VMmark: sample benchmark result (HP only)
I’m only showing result from 1 vendor as vendor comparison is more than just VMmark result.
IBM, Dell, HP, Fujitsu, Cisco, Oracle, NEC have VMmark results
Look at this number. 20 tiles = 100 Active VM
This number is when comparing with same #Tiles
± 10% is ok for real-life sizing. This is benchmark
Opteron 8439, 24 cores
Xeon 5570, 8 cores
Opteron 2435, 12 cores
Xeon 5470, 8 cores
This tells us that Xeon 5500 can run 17 Tiles, at 100% utilisation.
Each Tile has 6 VM, but 1 is idle. 17 x 5 VM = 85 active VM in 1 box.
At 80% Peak utilisation, that’s ~65 VM.
8
Confidential
Fault Tolerance
Workload Type
Databases
Application Specifics
The most popular workloads on FT.
Small to medium instances. Mostly SQL Server.
MS Exchange
and Messaging
BES, Exchange.
Gaming company has 750 mailboxes on 1 FT VM.
See FT load test at blogs.vmware.com
Web and File
servers
File server might be stateless but application using it may be
sensitive to denial of service and may be very costly to lose. A
simulation relying on a file server might have to be restarted if the
file server fails.
Manufacturing
and Custom
Applications
These workloads keep production lines moving. Breaks result in
loss of productivity and material.
SAP
SAP ECC 6.0 System based on SAP NetWeaver 7.0 platform. ASCS, a
Message and Transaction locking service, is a SPOF.
BlackBerry
BlackBerry Enterprise Server 4.1.6 (BES)
Examples: Propeller factory, meat factory, pharma line.
1 vCPU BES can support 200 users, 100-200 emails/day
9
Confidential
MS Clustering
ESX Port Group properties
• Notify Switches = NO
• Forged Transmits = Accept.
Win08 does not support NFS
Storage Design
• Virtual SCSI adapter
• LSI Logic Parallel for Windows Server 2003
• LSI Logic SAS for Windows Server 2008
ESXi changes
• ESXi 5.0 uses a different technique to
determine if RDM LUNs are used for MSCS
cluster devices, by introducing a configuration
flag to mark each device as "perennially
reserved" that is participating in a MSCS
cluster.
Unicast mode reassigns the station (MAC) address of
the network adapter for which it is enabled and all
cluster hosts are assigned the same MAC address,
you cannot have ESX send ARP or RARP to update
the physical switch port with the actual MAC address
of the NICs as this break the the unicast NLB
communication
10
Confidential
Symantec ApplicationHA
Can install agent to multiple VM simultaneously
Additional Roles for security
It does not cover Oracle yet
Presales contact for ASEAN: Vic
11
Confidential
VMware HA and DRS
Read Duncan’s yellowbrick first.
• Done? Read it again. This time, try to internalise it. See speaker notes below for an example.
vSphere 4.1
• Primary Nodes
• Primary nodes hold cluster settings and all “node states” which are synchronized between primaries. Node states hold for instance resource usage
information. In case that vCenter is not available the primary nodes will have a rough estimate of the resource occupation and can take this into
account when a fail-over needs to occur.
• Primary nodes send heartbeats to primary nodes and secondary nodes.
• HA needs at least 1 primary because the “fail-over coordinator” role will be assigned to this primary, this role is also described as “active primary”.
• If all primary hosts fail simultaneously no HA initiated restart of the VMs will take place. HA needs at least one primary host to restart VMs. This is why
you can only take four host failures in account when configuring the “host failures” HA admission control policy. (Remember 5 primaries…)
• The first 5 hosts that join the VMware HA cluster are automatically selected as primary nodes. All the others are automatically selected as secondary
nodes. A cluster of 5 will be all Primary.
• When you do a reconfigure for HA the primary nodes and secondary nodes are selected again, this is at random. The vCenter client does not show
which host is a primary and which is not.
• Secondary Nodes
• Secondary nodes send their state info & heartbeats to the primary nodes only.
• HA does not knows if the host is isolated or completely unavailable (down).
• The VM lock file is the safety net. In VMFS, the file is not visible. In NFS, it is the .lck file.
Nodes send a heartbeat every 1 second. The mechanism to detect possible outages.
12
Confidential
vSphere 4.1: HA and DRS
Best Practices
• Avoid using advance settings to decrease slot size as it might
lead to longer down time. Admission control does not take
fragmentation of slots into account when slot sizes are
manually defined with advanced settings.
What can go wrong in HA
• VM Network lost
• HA network lost
• Storage Network lost
Failed
13
Not Failed
What happen as a result
VM Network
HA Network
Storage Network
Users can’t access VM. If there are active users, they will complain.
HA does nothing as it’s not within the scope of HA in vSphere 4.1
HA Network
VM Network
Storage Network
It depends: Split Brain or Partitioned?
If the host is isolated, it will execute Isolation Response (shut down VM)
Lock is released.
Other host will gain lock. Other host will then start the VM
Storage Network
Does not matter
VM probably crash as it can’t access disk.
Lock expires.
Host will lose connection to array.
Other host (first one to get the lock?) will boot the VM.
Confidential
VMware HA and DRS
Split Brain >< Partitioned Cluster
• A large cluster that spans across racks might experience partitioning. Each partition will think they are full cluster. So long there is no loss is
storage network, each partition will happily run their own VM.
• Split Brain is when 2 hosts want to run a VM.
• Partitioned can happen when the cluster is separated by multiple switches. Diagram below shows a cluster of 4 ESX.
14
Confidential
HA: Admission Control Policy (% of Cluster)
Specify a percentage of capacity that needs to be reserved for failover
• You need to manually set it so it is at least equal to 1 host failure.
• E.g. you have a 8 node cluster and wants to handle 2 node failure. Set the % to be 25%
Complexity arises when nodes are not equal
• Different RAM or CPU
• But this also impact the other Admission Control option. So always keep node size equal, especially in Tier 1.
Total amount of reserved resource < (Available Resources – Reserved Resources)
If no reservation is set a default of 256 MHz is used for CPU and 0MB + overhead for MEM
Monitor the thresholds with vCenter on the Cluster’s “summary” tab
15
Confidential
Snapshot
Only keep for maximum 1-3 days.
• Delete or commit as soon as you are done.
• A large snapshot may cause issue when committing/deleting.
For high transaction VM, delete/commit as soon as you are done verifying
• E.g. databases, emails.
3rd party tool
• Snapshots taken by third party software (called via API) may not show up in the vCenter Snapshot Manager. Routinely check for snapshots
via the command-line.
Increasing the size of a disk with snapshots present can lead to corruption of the snapshots and potential data loss.
• Check for snapshot via CLI before you increase
16
Confidential
vMotion
Can be encrypted. At a cost certainly. If vMotion network is isolated, then there is no need.
May lose 1 ping.
Inter-cluster vMotion is not the same with intra-cluster
• Involves additional calls into vCenter, so hard limit
• Lose VM cluster properties (HA restart priority,
DRS settings, etc.)
17
Confidential
ESXi: Network configuration with UCS
If you are using Cisco UCS blade
• 2x 10G or 4x 10G depending on blade model and mezzanine card
All mezzanine card models support FCoE
• Unified I/O
• Low Latency
The Cisco Virtualized Adapter (VIC) supports
• Multiple virtual adapters per physical adapter
• Ethernet & FC on the same adapter
• Up to 128 virtual adapters (vNICs)
• High Performance 500K IOPS
• Ideal for FC, iSCSI and NFS
Once you decide it’s Cisco,
discuss the detail with Cisco.
18
Confidential
What Is Auto Deploy
Without Auto Deploy…
With Auto Deploy…
Host image tied to physical server
Host image decoupled from server
•
•
•
•
•
•
Each host needs full install and config
Not easy to recover host
Redundant boot disks/dedicated LUN
Run on any server w/ matching hardware
Config stored in Host Profile
No boot disk
A lot of time/effort building hosts
Agile deployment model
•
•
•
•
•
•
Deploying hosts is repetitive and tedious
Heavy reliance on scripting
Need to update for each new release
Deploy many hosts quickly and efficiently
No pre/post install scripts
No need to update with each release
Configuration drift between hosts
Host State Guaranteed
•
•
•
•
19
Config drift always a concern
Compromises HA/DR
Manging drift consumes admin resources
•
Confidential
Single boot image shared across hosts
Every reboot provides consistent image
Eliminate need to detect/correct drift
Auto Deploy Components
20
Component
Sub-Components
Notes
PXE Boot
Infrastructure
• DHCP Server
• TFTP Server
•
•
•
Setup independently
gPXE file from vCenter
Can use Auto Deploy
Appliance
Auto Deploy Server
• Rules Engine
• PowerCLI Snap-in
• Web Server
•
•
Build/Manage Rules
Match server to Image
and Host Profile
Deploy server
Image Builder
• Image Profiles,
• PowerCLI Snap-in
•
Combine ESXi image
with 3rd party VIBs to
create custom Image
Profiles
vCenter Server
• Stores Rules
• Host Profiles
• Answer Files
•
•
Provides store for rules
Host configs saved in
Host Profiles
Custom Host settings
saved in Answer Files
Confidential
•
•
Storage DRS and DRS
Interactions:
• Storage DRS placement may impact VM-host compatibility for DRS
• DRS placement may impact VM-datastore compatibility for Storage DRS
Solution: datastore and host co-placement
• Done at provisioning time by Storage DRS
• Based on an integrated metric for space, I/O, CPU
and memory resources
• Overcommitted resources get more weights
in the integrated metric
• DRS placement proceeds as usual
Datastore
Space
I/O
Connected
CPU
Connected
Memory
Integrated
Metric
1
High
High
Low
Low
Low
2
Low
Medium
Medium
Medium
Medium
3
High
Medium
High
High
High
But easier to architect it properly. Map ESX Cluster to Datastore Cluster manually.
Datastore 1
21
Datastore 2
Confidential
Datastore 3
Unified Fabric with Fabric Extender
End of Row Deployment
Fabric Extender
Multiple points of management
Unified fabric with Fabric extender
Single point of management
Reduced cables
FC
Ethernet
Blade switches
Fiber between racks
Copper in racks
High cable count
22
Confidential
Storage IO Control
Suggested Congestion Threshold values
Storage Media
Congestion Threshold
Solid State Disks
10 - 15 milliseconds
Fiber Channel
20 - 30 milliseconds
SAS
20 - 30 milliseconds
SATA
30 - 50 milliseconds
Auto-tiered Storage
Full LUN auto - tiering
Vendor recommended value. If none provided,
recommended threshold from above for the slowest
storage
Auto-tiered Storage
Block level / sub-LUN
auto - tiering
Vendor recommended value. If none provided, combination
of thresholds from above for the fastest and the slowest
media types
One: Avoid different settings for datastores sharing underlying resources
• Use same congestion threshold on A, B
• Use comparable share values
SIOC
SIOC
Datastore A
Datastore B
(e.g. use Low/Normal/High everywhere)
Physical drives
23
Confidential
NAS & NFS
Two key NAS protocols:
• NFS (the “Network File System”). This is what we support.
• SMB (Windows networking, also known as “CIFS”)
Things to know about NFS
• “Simpler” for person who are not familiar with SAN complexity
• To remove a VM lock is simpler as it’s visible.
• When ESX Server accesses a VM disk file on an NFS-based datastore, a special .lck-XXX lock file is generated in the same directory where
the disk file resides to prevent other ESX Server hosts from accessing this virtual disk file.
• Don’t remove the .lck-XXX lock file, otherwise the running VM will not be able to access its virtual disk file.
• No SCSI reservation. This is a minor issue
• 1 Datastore will only use 1 path
•
•
•
•
•
24
• Does Load Based Teaming work with it?
• For 1 GE, throughput will peak at 100 MB/s. At 16 K block size, that’s 7500 IOPS.
The Vmkernel in vSphere 5 only supports NFS v3, not v4. Over TCP only, no support for UDP.
MSCS (Microsoft Clustering) is not supported with NAS.
NFS traffic by default is sent in clear text since ESX does not encrypt it.
• Use only NAS storage over trusted networks. Layer 2 VLANs are another good choice here.
10 Gb NFS is supported. So is Jumbo Frames, and configure it end to end.
Deduplication can save sizeable amount. See speaker notes
Confidential
iSCSI
Use Virtual port storage system instead of plain Active/Active
• I’m not sure if they cost much more.
Has 1 additional Array Type over traditional FC: Virtual port storage system
• Allows access to all available LUNs through a single virtual port.
• These are active-active Array, but hide their multiple connections though a single port. ESXi multipathing cannot detect the
multiple connections to the storage. ESXi does not see multiple ports on the storage and cannot choose the storage port it
connects to. These array handle port failover and connection balancing transparently. This is often referred to as transparent
failover
• The storage system uses this technique to spread the load across available ports.
25
Confidential
iSCSI
Limitations
• ESX/ESXi does not support iSCSI-connected tape devices.
• You cannot use virtual-machine multipathing software to perform I/O load balancing to a single physical LUN.
• A host cannot access the same LUN when it uses dependent and independent hardware iSCSI adapters simultaneously.
• Broadcom iSCSI adapters do not support IPv6 and Jumbo Frames. [e1: still true in vSphere 5??]
• Some storage systems do not support multiple sessions from the same initiator name or endpoint. Multiple sessions to such
targets can result in unpredictable behavior.
Dependant and Independent
• A dependent hardware iSCSI adapter is a third-party adapter that depends on VMware networking, and iSCSI configuration and
management interfaces provided by VMware. This type of adapter can be a card, such as a Broadcom 5709 NIC, that presents a
standard network adapter and iSCSI offload functionality for the same port. The iSCSI offload functionality appears on the list of
storage adapters as an iSCSI adapter
Error correction
• To protect the integrity of iSCSI headers and data, the iSCSI protocol defines error correction methods known as header digests
and data digests. These digests pertain to the header and SCSI data being transferred between iSCSI initiators and targets, in
both directions.
• Both parameters are disabled by default, but you can enable them. Impact CPU. Nehalem processors offload the iSCSI digest
calculations, thus reducing the impact on performance
Hardware iSCSI
• When you use a dependent hardware iSCSI adapter, performance reporting for a NIC associated with the adapter might show
little or no activity, even when iSCSI traffic is heavy. This behavior occurs because the iSCSI traffic bypasses the regular
networking stack
Best practice
• Configure jumbo frames end to end.
• Use NIC with TCP segmentation offload (TSO)
26
Confidential
iSCSI & NFS: caveat when used together
Avoid using them together
iSCSI and NFS have different HA models.
• iSCSI uses vmknics with no Ethernet failover – using MPIO instead
• NFS client relies on vmknics using link aggregation/Ethernet failover
• NFS relies on host routing table.
• NFS traffic will use iSCSI vmknic and results in links without
redundancy
• Use of multiple session iSCSI with NFS is not supported by NetApp
• EMC supports, but best practice is to have separate subnets, virtual
interfaces
27
Confidential
NPIV
What it is
• Allow a single Fibre Channel HBA port to register with the Fibre Channel fabric using several worldwide port names (WWPNs). This ability
makes the HBA port appear as multiple virtual ports, each having its own ID and virtual port name. Virtual machines can then claim each of
these virtual ports and use them for all RDM traffic.
• Note that is WWPN, not WWNN
• WWPN – World Wide Port Name
• WWNN – World Wide Node Name
• Single port HBA typically has a single WWNN and a single WWPN (which may be the same).
• Dual port HBAs may have a single WWNN to identify the HBA, but each port will typically have its own WWPN.
• However they could also have an independent WWNN per port too.
First one is WW Node Name
Second one is WW Port Name
Design consideration
• Only applicable to RDM
• VM does not get its own HBA nor FC driver required. It just gets an N-port, so it’s visible from the fabric.
• HBA and SAN switch must support NPIV
• Cannot perform Storage vMotion or VMotion between datastores when NPIV is enabled. All RDM files must be in the same datastore.
• Still in place in v5
28
Confidential
2 TB VMDK barrier
You need to have > 2 TB disk within a VM.
• There are some solutions, each with pro and cons.
• Say you need a 5 TB disk in 1 Windows VM.
• RDM (even with physical compatibility) and DirectPath I/O do not increase virtual disk limit.
Solution 1: VMFS or NFS
• Create a datastore of 5 TB.
• Create 3 VMDK. Present to Windows
• Windows then combine the 3 disk into 1 disk.
• Limitation
• Certain low level storage-softwares may not work as they need 1 disk (not combined by OS)
Solution 3: iSCSI within the Guest
• Configure the iSCSI initiator in Windows
• Configure a 5 TB LUN. Present the LUN directly to Windows, bypassing the ESX layer. You can’t monitor it.
• By default, it will only have 1 GE. NIC teaming requires driver from Intel. Not sure if this supported.
29
Confidential
Storage: Queue Depth
When should you adjust the queue depth?
• If a VM generates more commands to a LUN than the LUN queue depth; Adjust the device/LUN queue.
• Generally with fewer, very high IO VMs on a host, larger queues at the device driver will improve performance.
• If the VM’s queue depth is lower than the HBA’s; Adjust the vmkernel.
Be cautious when setting queue depths
• With too large of device queues, the storage array can easily be overwhelmed and its performance may suffer with high latencies.
• Device driver queue depths is global and set per LUN setting.
• Change the device queue depth for all ESX hosts in the cluster
Calculating the queue depth:
• To verify that you are not exceed the queue depth for an HBA use the following formula:
• Max. queue depth of the HBA = Device queue setting * # of LUNs on HBA
Queue are at multiple levels
• LUN queue for each LUN at ESXi host.
• If the above queue is full, then kernel queue will be filled up
• LUN queue at array level for each LUN
• If this queue does not exist, then the array writes straight into disk.
• Disk queue
• The queue at the disk level, if there is no LUN queue
30
3
Confidential
Sizing the Storage Array
• For RAID 1 (it has IO Penalty of 2)
• 60 Drives= ((7000 x 2 x 30%) + (7000 x 70%)) / 150 IOPS
• Why RAID 5 has 4 IO Penalty?
31
RAID Level
IO Penalty
1
2
5
4
6
6
Confidential
Storage: Performance Monitoring
Get a baseline of your environment during a “normal” IO time frame.
• Capture as many data points as possible for analysis.
• Capture data from the SAN Fabric, the storage array, and the hosts.
Which statistics should be captured
• Max and average read/write IOps
• Max and average read/write latency (ms)
• Max and average Throughput (MB/sec)
• Read and write percentages
• Random vs. sequential
• Capacity – total and used
32
Confidential
SCSI Architecture Model (SAM)
33
Confidential
Fibre Channel Multi-Switch Fabric
TR
RC
RC
TR
N_Port 0
F_Port
F_Port
TR
RC
RC
TR
N_Port 1
Fabric Switch 1
Node A
Node B
E_Port
Node D
TR
N_Port 3
F_Port
RC
F_Port
Node C
TR
RC
RC
N_Port 2
TR
RC
TR
F_Port
RC
TR
RC
E_Port
TR
N_Port 0
TR
TR
RC
RC
F_Port
TR
RC
RC
TR
N_Port 1
Node G
Node E
Fabric Switch 2
Node F
F_Port
TR
Node H
RC
TR
N_Port 3
RC
F_Port
RC
N_Port 2
TR
TR
RC
3434
Confidential
Backup: VADP vs Agent-based
ESX has 23 VM. Each VM is around 40 GB.
• All VMs are idle, so this CPU/Disk are purely on back up.
• CPU Peak is >10 GHz (just above 4 cores)
• But Disk Peak is >1.4 Gbps of IO, almost 50% of a 4 Gb HBA.
After VAPD, both CPU and Disk drops to negligible
35
Confidential
VADP: Adoption Status
This is as at June 2010.
Always check with vendor for the most accurate data
36
Partner Name
Product Name
Version
Integration Status
CA
ArcServe
12.5 w/patch
Released
Commvault
Simpana
8.0 SP5
Released
EMC
Avamar
5.0
Released
EMC
Networker
7.6.x
Not yet
HP
Data Protector
6.1.1 with patch
Not yet
IBM
Tivoli Storage Manager
6.2.0
Released
Symantec
Backup Exec
2010
Released
Symantec
Backup Exec System Recovery
2010
Released
Symantec
NetBackup
7.0
Released
Vizioncore
vRanger Pro
4.2
Released
Veeam
Backup & Replication
4.0
Released
Confidential
Partition alignment
Affects every protocol, and every storage array
• VMFS on iSCSI, FC, & FCoE LUNs
• NFS
• VMDKs & RDMs with NTFS, EXT3, etc
VMware VMFS partitions that align to 64KB track boundaries give reduced latency and increased throughput
• Check with storage vendor if there are any recommendations to follow.
• If no recommendations are made, use a starting block that is a multiple of 8 KB.
Responsibility of Storage Team.
• Not vSphere Team
On NetApp :
• VMFS Partitions automatically aligned. Starting block in multiples of 4k
• MBRscan and MBRalign tools available to detect and correct misalignment
FS 4KB-1MB
Cluster
Cluster
VMFS 1MB-8MB
Array 4KB-64KB
37
Cluster
Block
Chunk
Chunk
Confidential
Chunk
Tools: Array-specific integration
The example below is from NetApp. Other Storage partners have integration capability too.
Always check with respective product vendor for latest information.
38
Confidential
Tools: Array-specific integration
Management of the Array can be done from vSphere client. Below is from NetApp
Ensure storage access is not accidently given to vSphere admin by using RBAC
39
Confidential
Data Recovery
No integration with tape
• Can do manual
If a third-party solution is being used to backup the deduplication store, those backups must not run while the Data Recovery service is
running. Do not back up the deduplication store without first powering off the Data Recovery Backup Appliance or stopping the
datarecovery service using the command service datarecovery stop.
Some limits
• 8 concurrent jobs on the appliance at any time (backup & restore).
• An appliance can have at the most 2 dedupe store destinations due to the overhead involved in deduping.
• VMDK or RDM based deduplication stores of up to 1TB or CIFS based deduplication stores of up to 500GB.
• No IPv6 addresses
• No multiple backup appliances on a single host.
VDR cannot back up VMs
• that are protected by VMware Fault Tolerance.
• with 3rd party multi-pathing enabled where shared SCSI buses are in use.
• with raw device mapped (RDM) disks in physical compatibility mode.
• Data Recovery can back up VMware View linked clones, but they are restored as unlinked clones.
Using Data Recovery to backup Data Recovery backup appliances is not supported.
• This should not be an issue. The backup appliance is a stateless device, so there is not the same need to back it up like other types of VMs.
40
Confidential
VMware Data Recovery
We assume the following requirements
• Back up to external array, not the same array.
• External Array can be used for other purpose too. So the 2 arrays are backing up each other.
• How to ensure Write performance as the array is shared?
• 1x a day back up. No need multiple back up per day on the same VM.
Consideration
• Bandwidth: Need dedicated NIC to the Data Recovery VM
• Performance: Need to reserve CPU/RAM for the VM?
• Group like VM together. It maximises dedupe
• Destination: RDM LUN presented via iSCSI to the Appliance. See picture below (hard disk 2)
• Not using VMDK format to enable LUN level operation
• Not using CIFS/SMB as Dedupliation Store is 0.5 TB vs 1 TB on RDM/VMDK
• Space calculation: need to find a tool to help estimate the disk requirements.
41
Confidential
Mapping: Datastore – VM
Criteria to use when placing a VM into a Tier:
• How critical is the VM? Importance to business.
• What are its performance and availability requirements?
• What are its Point-in-Time restoration requirements?
• What are its backup requirements?
• What are its replication requirements?
Have a document that lists which VM resides on which datastore group
• Content can be generated using PowerCLI or Orchestrator, which shows datastores and their VMs.
• Example tool: Quest PowerGUI
• While rarely happen, you can’t rule out if datastore metadata get corrupted.
• When that happens, you want to know what VMs are affected.
A VM normally change tiers throughout its life cycle
• Criticality is relative and might change for a variety of reasons, including changes in the organization, operational processes, regulatory
requirements, disaster planning, and so on.
• Be prepared to do Storage vMotion.
• Always test it first so you know how long it takes in your specific environment
• VAAI is critical, else the traffic will impact your other VMs.
42
Datastore Group
VM Name
Size (GB)
IOPS
Total
12 VM
1 TB
1400 IOPS
Confidential
RDM
Use sparingly.
• VMDK is more portable, easier to manage, and easier to resize.
• VMDK and RDM have similar performance.
Physical RDM
• Can’t take snapshot.
• No Storage vMotion. But can do vMotion.
• Physical mode specifies minimal SCSI virtualization of the mapped device, allowing the greatest flexibility for SAN management
software.
• VMkernel passes all SCSI commands to the device, with one exception: the REPORT LUNs command is virtualized so that the
VMkernel can isolate the LUN to the owning virtual machine.
Virtual RDM
• Specifies full virtualization of the mapped device. Features like snapshot, etc works
• VMkernel sends only READ and WRITE to the mapped device. The mapped device appears to the guest operating system exactly
the same as a virtual disk file in a VMFS volume. The real hardware characteristics are hidden.
43
Confidential
Human Experts vs Storage DRS
2 VMware performance engineers vs Storage DRS competing to balance the following:
• 13 VMs: 3 DVD store, 2 Swingbench, 4 mail servers, 2 OLTP, 2 web servers
• 2 ESX hosts and 3 storage devices (different FC LUNs in shades of blue)
Storage DRS provides lowest average latency, while maintaining similar throughput. Why human expert lost?
• Too many numbers to crunch, too many dimensions to the analysis. Human took a couple of hours to think this through.
Why bother anyway 
2500
50
Latency (ms)
45
IOPS
2000
1500
1000
500
40
35
30
25
20
15
10
5
0
0
Space
balanced
BASIL
Storage
DRS
Expert 1
Space
balanced
Expert 2
BASIL
Storage
DRS
Expert 1
Expert 2
Green: Average Latency (ms)
44
Confidential
Alternative Backup Method
VMware ecosystem may provide new way of doing back up.
• Example below is from NetApp
NetApp SnapManager for Virtual Infrastructure (SMVI)
• In Large Cloud, SMVI server should sit on a separate VM from with vCenter.
• While it has no performance requirement, it is best from Segregation of Duty point of view.
• Best practice is to keep vCenter clean & simple. vCenter is playing much more critical role in larger environment where plug-ins are relying on vCenter
up time.
• Allows for consistent array snapshots & replication.
• Combine with other SnapManager products (SM for Exchange, SM for Oracle, etc) for application consistency
•
•
•
•
45
• Exchange and SQL work with VMDK
• Oracle, SharePoint, SAP require RDM
Can be combined with SnapVault for vaulting to disk.
3 levels of data protection :
• On disk array snapshots for fast backup (seconds) & recovery (up to 255 snapshot copies of any datastore can be kept with no performance impact)
• Vaulting to separate array for better protection, slightly slower recovery
• SnapMirror to offsite for DR purposes
Serves to minimize backup window (and frozen vmdk when changes are applied)
Option to not create a vm snapshot to create crash consistent array snapshots
Confidential
One VMKernel port
& IP subnet
Use multiple links with
IP hash load balancing on
the NFS client (ESX)
Use multiple links with
IP hash load balancing on
The NFS server (array)
Storage needs multiple
sequential IP addresses
46
Confidential
Yes
Support
multi-switch
Link
aggr?
Use multiple VMKernel
Ports & IP subnets
Use ESX routing table
Storage needs multiple
sequential IP addresses
47
Confidential
vMotion Performance on 1 GbE Vs 10 GbE
Scenario
CPU %USED
Idle VM
Moderately Loaded VM
Web Traffic
0
0 Gbps
140%
2.5 Gbps
Idle/Moderately loaded VM scenarios
•
Reductions in duration when using 10 GbE vs 1 GbE
on both vSphere 4.1 and vSphere 5
Heavily Loaded VM
325%
6 Gbps
Consider switch from 1 GbE to 10 GbE vMotion
network
Heavily loaded VM scenario
•
Reductions in duration when using 10 GbE vs 1 GbE
•
1 GbE on vSphere 4.1: Memory copy convergence
issues lead to network connection drops
•
1 GbE on vSphere 5 : SDPS kicked-in resulting in
zero connection drops
vMotion in vSphere 5 never fails due to memory
copy convergence issues
Duration of vMotion
(lower the better)
48
Confidential
Impact on Database Server Performance During vMotion
vSphere 5
vSphere 4.1
450
450
vMotion duration : 23 sec
400
350
350
300
300
250
250
orders-per-second
orders-per-second
200
Impact
during
guest trace
period
200
150
150
100
50
vMotion duration : 15 sec
400
Impact during
guest trace
period
Impact during
switch-over
period
Impact during
switch-over
period
100
50
0
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
0
Time (in seconds)
Time (in seconds)
Performance impact minimal during the memory trace phase in vSphere 5
Throughput was never zero in vSphere 5 (due to switch-over time < half a second)
Time to resume to normal level of performance about 2 seconds better in vSphere 5
49
Confidential
vMotion Network Bandwidth Usage During Evacuation
50
Confidential
Network Settings
Load-Based Teaming
• We will not use as we are using 1 GE in this design.
• If you use 10 GE, the default settings is a good starting point. It gives VM 2x the share versus hypervisor.
NIC Teaming
• If the physical switch can support, then use IP-Hash
• Need a Stacked-Switch. Basically, they can be managed as if they are 1 bigger switch. Multi-chassis EtherChannel Switch is another name.
• IP-Hash does not help if the source and address are constant. For example, vMotion always use 1 path only as source-destination pair is constant.
Connection from VMkernel to NFS server is contant,
• If the physical switch can’t support, then use Source Port
• You need to manually balance this, so not all VM go via the same port.
VLAN
• We are using VST. Physical switch must support VLAN trunking.
PVLAN
• Not using in this design. Most physical switches are PVLAN aware already.
• Packets will be dropped or security can be compromised if physical switch is not PVLAN aware.
Beacon Probing
• Not enabled, as my design only has 2 NIC per vSwitch. ESXi will flood both NIC if it has 2 NIC only.
Review default settings
• Change Forged Transmit to Reject.
• Change MAC address changes to Reject
51
Confidential
VLAN
Native VLAN
• Native VLAN means the switch can receive and transmit untagged packets.
• VLAN hopping occurs when an attacker with authorized access to one VLAN creates packets that trick physical switches into transmitting the
packets to another VLAN that the attacker is not authorized to access. Attacker send forms an ISL or 1Q trunk port to switch by spoofing DTP
messages, getting access to all VLANs. Or attacker can send double tagged 1Q packets to hop from one VLAN to another, sending traffic to a
station it would otherwise not be able to reach.
• This vulnerability usually results from a switch being misconfigured for native VLAN, as it can receive untagged packets.
Local vSwitches do not support native VLAN. Distributed vSwitch does.
• All data passed on these switches is appropriately tagged. However, because physical switches in the network might be configured for native
VLAN, VLANs configured with standard switches can still be vulnerable to VLAN hopping.
• If you plan to use VLANs to enforce network security, disable the native VLAN feature for all switches unless you have a compelling reason to
operate some of your VLANs in native mode. If you must use native VLAN, see your switch vendor’s configuration guidelines for this feature.
VLAN 0: the port group can see only untagged (non-VLAN) traffic.
VLAN 4095: the port group can see traffic on any VLAN while leaving the VLAN tags intact.
52
Confidential
Distributed Switch
Design consideration
• Version upgrade
• ?? Upgrade procedure
53
Confidential
Feature Comparison Among Switches (partial)
54
Feature
vSS
vDS
Cisco N1K
VLAN
yes
yes
yes
Port Security
yes
yes
yes
Multicast Support
yes
yes
yes
Link Aggregation
static
static
LACP
Traffic Management
limited
yes
yes
Private VLAN
no
yes
yes
SNMP, etc.
no
no
yes
Management Interface
vSphere Client
vSphere client
Cisco CLI
Netflow
No
yes
yes
Confidential
vNetwork Standard Switch: A Closer Look
vSS defined on a per host basis from Home  Inventory  Hosts and Clusters.
Uplinks (physical NICs)
attached to vSwitch.
Port Groups are policy
definitions for a set
or group of ports.
e.g. VLAN membership,
port security policy,
teaming policy, etc
vNetwork Standard
Switch (vSwitch)
55
Confidential
vNetwork Distributed Switch: A Closer Look
vDS operates off the local cache – No operational dependency on vCenter server
• Host local cache under /etc/vmware/dvsdata.db and /vmfs/volumes/<datastore>/.dvsdata
• Local cache is a binary file. Do not hand edit
DV Uplink Port Group
defines uplink policies
DV Uplinks abstract
actual physical nics
(vmnics) on hosts
DV Port Groups
span all hosts
covered by vDS
and are groups
of ports
defined with the
same policy
e.g. VLAN, etc
vmnics on each host
mapped to dvUplinks
56
Confidential
Nexus 1000V: VSM
VM properties
• Each requires a 1 vCPU, 2 GB RAM. Must be reserved, so it will impact the cluster Slot Size.
• Use “Other Linux 64-bit" as the Guest OS.
• Each needs 3 vNIC.
• Requires the Intel e1000 network driver. Because No VMware Tools installed?
Availability
• 2 VSMs are deployed in an active-standby configuration, with the first VSM functioning in the primary role and the other VSM functioning in
a secondary role.
• If the primary VSM fails, the secondary VSM will take over.
• They do not use VMware HA mechanism.
Unlike cross-bar based modular switching platforms, the VSM is not in the data path.
• General data packets are not forwarded to the VSM to be processed, but rather switched by the VEM directly.
57
Confidential
Nexus 1000V: VSM has 3 Interface for “mgmt”
Control Interface
• VSM – VEMs communication, and VSM – VSM communication
• Handles low-level control packets such as heartbeats as well as any configuration data that needs to be exchanged between the VSM and
VEM. Because of the nature of the traffic carried over the control interface, it is the most important interface in Nexus 1000V
• Requires very little bandwidth (<10 KBps) but demands absolute priority.
• Always the first interface on the VSM. Usually labeled "Network Adapter 1" in the VM network properties.
Management Interface
• VSM – vCenter communication.
• Appears as the mgmt0 port on a Cisco switch. As with the management interfaces of other Cisco switches, an IP address is assigned to
mgmt0.
• Does not necessarily require its own VLAN. In fact, you could use the same VLAN with vCenter
Packet Interface
• carry network packets that need to be coordinated across the entire Nexus 1000V. Only two type of control traffic: Cisco Discovery Protocol
and Internet Group Management Protocol (IGMP) control packets.
• Always the third interface on the VSM and is usually labeled "Network Adapter 3" in the VM network properties.
• Bandwidth required for packet interface is extremely low, and its use is very intermittent. If Cisco Discovery Protocol and IGMP features are
turned off, there is no packet traffic at all. The importance of this interface is directly related to the use of IGMP. If IGMP is not deployed,
then this interface is used only for Cisco Discovery Protocol, which is not considered a critical switch function
58
Confidential
vNetwork Distributed Portgroup Binding
Port Binding: Association of a virtual adapter with a dvPort
VMware ESX
Static Binding: Default configuration
• Port bound when vnic connects to portgroup
Dynamic binding
• Use when #VM adapters > #dvPorts in a portgroup and all VMs are not active
DVPort created on proxySwitch and
bound to vnic
Ephemeral binding
• Use when #VMs > #dvPorts and port history is not relevant
• Max Ports is not enforced
ProxySwitch
Use static binding for best performance and scale
59
Confidential
Network Stack Comparison
Good attributes of FCoE
• Has less overhead than FCIP or iSCSI. See diagram below.
• FCoE is managed like FC at initiator, target, and switch level
• Mapping FC frames over Ethernet Transport
• Enables Fibre Channel to run over a lossless Ethernet medium
• Single Adapter, less device proliferation, lower power consumption
• No gateways required
• NAS certification: FCoE CNAs can be used to certify NAS storage. Existing NAS devices listed on VMware SAN Compatibility Guide do not
require recertification with FCoE CNAs.
Mixing of technologies always increase complexity
FCP
FCIP
iSCSI
TCP
SCSI
IP
FCoE
FC
Ethernet
Physical Wire
SCSI
60
iSCSI
FCIP
Confidential
FCoE
FC
Physical Switch Setup
Spanning Tree Protocol
• vSwitch won’t create loops
• vSwitch can’t be linked.
• vSwitch does not take incoming packet from pNIC and forward as outgoing
packet to another pNIC
VM0
VM1
Recommendations
1. Leave STP on in physical network
2. Use “portfast” on ESX facing ports
3. Use “bpduguard” to enforce STP boundary
MAC a
MAC b
vSwitch
vSwitch
Physical
Switches
61
Confidential
MAC c
1 GE switch
Sample from Dell.com (US site, not Singapore)
Around US$5 K. Need a pair.
48 ports
• Each ESXi needs around 7 – 13 ports (inclusive of iLO port)
62
Confidential
10 GE switch
Sample from Dell.com (US site, not Singapore)
Around US$10 – 11 K. Need a pair.
24 ports
• Each ESXi only need 2 port
• iLO port can connect to existing GE/FE switch
63
Compared with 1 GE switch,
Price is very close. Might be
even cheaper in TCO
Confidential
Multi security zones (w/ vShield Edge to protect vApp Network)
vCD “logical” View
vSphere “operational” View
Reminder: this is self-service (UI / API)
vApp
vApp Network
Org Network
Organization
PG
External Network
PG
PG
vSphere vNetwork
vCD will deploy this
64
Confidential
Two-tier application (w/ vShield App to protect backend)
vCD “logical” View
vSphere “operational” View
Reminder: this is NOT self-service (today)
vApp
Front-end
enclave
Back-end
enclave
Org Network
Organization
PG
External Network
PG
vSphere vNetwork
vShield Admin config (today)
65
Confidential
vShield Edge in short
vCD “logical” View
Security Zone 1
NAT
DHCP
LB
vSphere “operational” View
Security Zone 2
Virtual
Appliance
Firewall
Routing
VPN
vNIC
vNIC
PortGroup
L2Network-B
L2Network-A
vSphere vNetwork
vShield Edge
66
PortGroup
Confidential
vShield App in short
vCD “logical” View
vSphere “operational” View
Security Zone 2
Security Zone 1
Firewall
vNIC
vNIC
PortGroup
L2Network
vSphere vNetwork
vShield App
Kernel
Module
67
Confidential
Security Compliance: PCI DSS
PCI applies to all systems “in scope”
• Segmentation defines scope
• What is within scope? All systems that Store, Process, or Transmit
cardholder data, and all system components that are in or connected to
the cardholder data environment (CDE).
The DSS is vendor agnostic
• Does not seem to cover virtualisation.
Relevant statements from PCI DSS
• “If network segmentation is in place and will be used to reduce the scope
of the PCI DSS assessment, the assessor must verify that the segmentation
is adequate to reduce the scope of the assessment.” - (PCI DSS p.6)
• “Network segmentation can be achieved through internal network
firewalls, routers with strong access control lists or other technology that
restricts access to a particular segment of a network.” – PCI DSS p. 6
• “At a high level, adequate network segmentation isolates systems that
store, process, or transmit cardholder data from those that do not.
However, the adequacy of a specific implementation of network
segmentation is highly variable and dependent upon such things as a given
network's configuration, the technologies deployed, and other controls
that may be implemented. “
– PCI DSS p. 6
• “Documenting cardholder data flows via a dataflow diagram helps fully
understand all cardholder data flows and ensures that any network
segmentation is effective at isolating the cardholder data environment.” –
p.6
68
Confidential
Security Compliance: PCI DSS
Added complexity from Virtualisation
• System boundaries are not as clear as their non-virtual counterparts
• Even the simplest network is rather complicated
• More components, more complexity, more areas for risk
• Digital forensic risks are more complicated
• More systems are required for logging and monitoring
• More access control systems
• Memory can be written to disk
• VM Escape?
• Mixed Mode environments
69
Confidential
Sample Virtualized CDE
PCI: Virtualization Risks by Requirement
Requirement
Unique Risks to Virtual Environments
How you can address them
3.
Protect stored
cardholder data.
There is a chance that memory that was previously only stored as
volatile memory may now be written to disk as stored (i.e., taking
snapshots of systems).
 Apply data retention and disposal policy to CDE-VMs, snapshots, and any other components which have the possibility of
storing CHD, encryption keys, passwords, etc.
How are memory resources and other shared resources protected
from access? (How do you know that there are no remnants of stored
data?)
 Document storage configuration and SAN implementation.
 Document any encryption process, encryption keys, &
encryption key management used to protects stored CHD?
 Fully isolate the VMotion network to ensure that as hosts are
moved from one physic server to another, memory and other
sensitive running data cannot be sniffed or logged.
Access controls are more complicated. In addition to hosts, there are
now additional applications, virtual components, and storage of these
components (i.e., what protects their access while they are waiting to
be provisioned).
 Document all the types of different Role Based Access Controls
(RBAC) used for access to physical hosts, virtual hosts, physical
infrastructure, virtual infrastructure, logging systems, IDS/IPS,
multi-factor authentication, and console access.
Organizations should carefully document all the access controls in
place, and ensure that there are separate access controls for different
“security zones.”
 Ensure that physical hosts do not rely on virtual RBAC systems
that they host.
9.
Restrict physical
access to
cardholder data.
Risks are greater since physical access to the hypervisor could lead to
logical access to every component.
 Ensure that you are considering physical protection in your D/R
site.
10.
Track and monitor
all access to
network resources
and cardholder
data.
Some virtual components do not have the robust logging capabilities of
their physical counterparts. Many systems are designed for
troubleshooting and are not designed to create detailed event and
system logs which provide sufficient detail to meet PCI logging
requirements and assist with a digital forensic investigation.
7.
Restrict access to
cardholder data by
business need-toknow.
 Address the risk that physical access to a single server or SAN
can result in logical access to hundreds of servers.
PCI requires logs to be stored in a central location that is independent
of the systems being logged.
70
Confidential
 Establish unified and centralized log management solutions
which cannot be altered or disabled by access to the
hypervisor.
 ESX logs should not be stored on a virtual host on the same ESX
server, as compromising the ESX server could compromise the
logs. Be prepared to demonstrate that the logs are forensically
sound.
vNetwork Appliances
Advantages
• Flexible deployment
• Scales naturally as more ESX hosts are deployed
Architecture
• Fastpath agent filter packets in datapath, transparent to vSwitch
• Optionally forward packets to VM (slowpath agent)
Solutions
• VMware vShield, Reflex, Altor, Checkpoint, etc.
Heavyweight filtering
in “Slow Path” agent
Lightweight filtering
in “Fast Path” agent
71
Confidential
vShield
DMZ
DB
APP
Org vDC
Shared Services
vShield Edge
vShield App
vShield App
vShield App
Virtual Distributed Switch
vSphere
vSphere
vSphere
vSphere
INTERNET
72
Confidential
Setup Perimeter services
Install vShield Edge
• External – Internal
Provision Services
• Firewall
• NAT, DHCP
• VPN
• Load Balancer
Setup Internal Trust Zones
Install vShield App
• vDS / dvfilter setup
• Secure access to shared services
Create interior zones
• Segment internal net
• Wire up VMs
vShield and Fail-Safe
http://www.virtualizationpractice.com/blog/?p=9436
73
Confidential
Security
Steps to delete “Administrator” from vCenter
• Move it to the “No Access” role. Protect it with alarm if this is modified.
• All other plug-in or mgmt products that use Administrator will break
Steps to delete “root” from ESX
• Replaced with another ID. Can’t be tied to AD?
• Manual warns of removing this user.
Create another ID with root group membership
• vSphere 4.1 now support MS AD integration
74
Confidential
VCM’s Free vSphere Compliance Checker (Download)
ESX related
hardening rules
5 ESX Hosts
VM shell related
hardening rules
http://www.vmware.com/products/datacenter-virtualization/vsphere-compliance-checker/overview.html
75
Confidential
security P2V issue – loss of physical control
physical security
cloud security in virtual data center
static
dynamic
perimeter security was achieved using
physical firewall, IPS and VPN
due to mobility of the VM’s it is not sufficient
to achieve the same
interior security was achieved using VLAN
or subnet based policies
leads into VLAN sprawl and complex
policies
endpoints are protected with AV agents
results in more AV agents in each VM
impacting the host or other VM’s
physical organizational boundaries or
security zones can be achieved easily with
physical appliances
can be achieved only with the help of
different subnets resulting in VLAN sprawl
sharing of same physical hosts by multiple
VM’s results in complex multi tenancy
policies – to enable logical boundaries
greater transparency & visibility –
given tools are virtualization aware
opaque with poor visibility
76
Confidential
Windows VM monitoring
Use the new Perfmon counters provided.
The built-in from Windows is misleading in virtual environment
77
Confidential
Time Keeping and Time Drift
Critical to have the same time for all ESX and VM.
All VM & ESX to get time from the same 1 internal NTP server
• Synchronize the NTP Server with an external stratum 1 time source
The Internal NTP server to get time from a reliable external server or real atomic clock
• Should be 2 sources
Do not virtualise the NTP server
• As a VM, it may experience time drift if ESXi host is under resource constraint
Physical candidates for NTP Server:
• Back up server (with vStorage API for Data Protection)
• Cisco switch
See MS AD slide for specific MS AD specific impact.
78
Confidential
Linux
New features in ext4 filesystem:
• Extents reduce fragmentation
• Persistent preallocation
• Delayed allocation
• Journal checksumming
• fsck is much faster
RHEL 6 & ext4 properly align filesystems
Tips: use the latest OS
• Constant Improvements
• Built-in paravirtual drivers
• Better timekeeping
• Tickless kernel. On-demand timer interrupts. Systems stay totally idle
• Hot-add capabilities
• Reduces need to oversize “just in case”
• Might need to tweak udev. See VMware KB 1015501
• Watch for jobs that happen at the same time (across VM)
• Monitoring (every 5 minutes)
• Log rotation (4 AM)
• Don’t need sysstat & sar running. Use vCenter metrics instead
79
Confidential
Guest Optimization Swap File Location
Swap file for Windows guests should be on separate dedicated drives
• Cons:
• This requires another vmdk file. Management overhead as it has to be resized when RAM changes too.
• Pro:
• No need to back up
• Keep the application traffic and OS disk traffic separate from the page file traffic thereby increasing performance.
• Swap partition equal to 1.5x RAM
• 1.5x is the default recommendation for best performance (knowing nothing about the application).
• Monitor the page file usage to see how much of it is actually being used, in the old days whatever memory was installed was what they were
committed to and making a change was an act of congress, look to leverage the virtual flexibility and modify for best usage.
• http://support.microsoft.com/kb/889654 Microsoft limits on page file’s .
Microsoft’s memory recommendations and definition of physical address extension explained
http://support.microsoft.com/?kbid=555223
80
Confidential
80
Infrastructure VM
Purpose
CPU
RAM
Remarks
Admin Client.
Win 7 32 bit
1
2 GB
Dedicated for vSphere management/administration purpose.
vSphere Client has plug-ins. So it’s more convenient to have a ready made client.
Higher security than typical administrator personal notebook/desktop, which serve many other purpose
(email, internet browsing, MS office, iTunes, etc)
For higher security.
Can be placed in the Management LAN. From your laptop, do an RDP jump to this VM. Suitable for SSLF
Useful when covering during leave, etc. But do not use shared ID.
Softwares installed: Microsoft PowerShell (no need to install CLI as it’s in vMA), VMware Orchestrator
vCenter
Win08 R2 64 bit Ent Edition
2
4 GB
1 CPU is not sufficient. 2 vCPU 4 GB RAM 5 GB data drive is enough for 50 ESX and 500 VM.
No need to over allocate, especially on vCPU and RAM.
Ensure MS IIS is removed prior to vCenter installation
~ 1.5 MB of RAM per VM and ~3 MB of RAM per managed host
Avoid installing vCenter on a Domain Controller. But deploy it on a system that is part of the AD domain;
facilitates security and flexibility in setting up VC roles, permissions and DB authentication
IT Database Server
Win08 R2 64 bit Ent Edition
2
4 GB
SQL Server 2005 64 bit. See next slide. Need to plan carefully.
IT Database Server
Win08 R2 64 bit Ent Edition
2
4 GB
SQL Server 2008 64 bit. See next slide. Need to plan carefully.
Update Manager
Win08 R2 64 bit?
1
4 GB
50 GB of D:\ drive for Patch Store is sufficient. Use Thin Provisioning.
vShield
1
See: “VMware Update Manager Performance Best Practices” VMworld session.
1 GB
Tier 1 as traffic goes here.
1 per ESXi host vSwitch (serving VM, not VMkernel)
vShield Manager
1
1 GB
Management console only
Patch Management Server
1
4 GB
I’m assuming client has the tool in place and wants to continue
81
Confidential
Infrastructure VM
Purpose
CPU
RAM
RP Tier
Resource Pool
vMA
1
1 GB
3
Management console only
SRM 5
2
2 GB
2
Recommend to separate from VC.
Converter
1
2 GB
1
If possible, do not run in Production Cluster, so it does not impact the ESX
utilisation
Not set to 3 as you want the conversion process to be completed as soon as
possible.
vShield Security VM from
partner
2
Cisco Nexus.
If you use Nexus
1
Cisco Nexus VSM (HA)
1
2 GB
Tier 1 as it’s in the data path
1
1 per ESXi host
2 GB
3
Management console only. Not data path
Requires 100% reservation, so this impacts the cluster Slot Size
2 GB
Database
3
The HA is managed by Cisco Nexus itself, not managed by VMware.
Bit
Upd Mgr
SRM
vCenter
Orchestrator
View 5
SQL Server 2008 Std Ed (not SP1)
64 bit
No
Yes
Yes
Yes
Need SP1
SQL Server 2008 Ent Ed (not SP1)
64 bit
Yes
Yes
Yes
Yes
Need SP1
Oracle 10g Enterprise Edition, R2
64 bit
Yes
Yes
Yes
Yes
Yes
Oracle 11g Standard Edition, R1 (not R2)
32 bit
Yes
Yes
Yes
Yes
Yes
82
Confidential
Capacity Planner
Version 2.8 does not yet have the full feature for Desktop Cap Plan. Wait for next upgrade.
• But you can use it on case by case basis, to collect those demanding desktop.
Default setting of paging threashold does not take into account server RAM.
• Best practice for the Paging threshold is 200 Pg/sec/GB. So, you have 48GB RAM x 200= 9600 Pgs/sec.
• Reason is that this paging value provides for the lowest latency access to memory pages.
• You might get high paging if back up job run.
Create project if you need to separate result (e.g. per data center)
Win08 has firewall on. Need to turn off using command line.
To be verified in 2.8: You can't change prime time. It's based on the local time zone.
83
Confidential
P2V
Avoid if possible. Best practice is to install from template (which was optimised for virtual machine)
• Remove unneeded devices after P2V
MS does not support P2V of AD Domain Controller.
Static servers are good candidate for P2V:
• Web servers, print servers
Servers with retail licence/key will require Windows reactivation. Too many hardware changes.
Resize
• Relative CPU comparison
• MS Domain Controller: 1 vCPU, 2 GB is enough.
84
Confidential
Many Solutions Depend on vCenter Server
Site
Recovery
Manager
vCloud
Director
Operations
vCenter Server
Configuration
Manager
View Server
and
Composer
Chargeback
8585
CapacityIQ
Confidential
Orchestrator Integrated Workflow Environment
Automation: A way to perform frequently repeated process without manual intervention.
• Basic building block: a shell script, a Perl script, a PowerShell script
• Example: given a list of hostnames, add ESX to VC.
Orchestration: A way to manage multiple automated processes across and among heterogeneous systems.
• Example - Add ESX hosts from a list to VC, update CMDB with successfully added ESX hosts, then send email notification.
Example
• If a datastore on a host is more than 95% utilized, open a change control ticket then perform s-vMotion and send email notification
86
Confidential
vCenter Chargeback Manager deployment options – cont.
For vCD and VSM data collector
• Deploy at least 2 data collectors for vCD and VSM each for high availability
CBM instance can be installed/upgraded at the time of vCD install/upgrade or later
vCenter Server
vCenter Chargeback Web Interface
Chargeback Data Collector
vCenter
Database
Chargeback Load
Balancer
vCenter Server
vCenter
Chargeback
Database
vCenter Chargeback Servers
vCenter
Database
vCenter Server
vCenter
Database
87
Confidential
VR Framework
SRM UI
Primary Site
Secondary Site
VC
Site Pairing
SRM
VRMS
VM
VM
VM
VC
VRMS
VR Service
VR Server
VR Filter
88
SRM
NFC Service
ESX
ESX
ESX
ESX
ESX
ESX
VR
vSphere Replication Server
VRMS
vSphere Replication
Management System
Confidential
SRM Architecture with vSphere Replication
[Protected Site]
[Recovery Site]
vSphere Client
vSphere Client
SRM Plug-In
SRM Server
SRM Plug-In
vCenter Server
vCenter Server
SRM Server
vRMS
vRMS
vRS
ESX
ESX
ESX
vRA
vRA
vRA
ESX
ESX
Replication
Storage
Storage
VMFS
89
Storage
VMFS
VMFS
Confidential
VMFS
Service Provider
[Customer A]
Storage
NFS
[DRaaS Provider]
vCenter
[Customer B]
SRM Server
NFS
vRMS
Replication
ESX
ESX
ESX
vRA
vRA
vRA
vRS
ESX
ESX
SRM Server
vRMS
vCenter
Storage
vCenter
SRM Server
VMFS
VMFS
vRMS
ESX
ESX
vRS
ESX
ESX
ESX
vRA
vRA
vRA
Replication
vRMS
SRM Server
90
vCenter
Confidential
Storage
NFS
NFS
Branch Office
[Remote Site A]
[Central Office]
ESX
vRA
SRM Server
SRM Server
vCenter
vCenter
vRMS
vRMS
ESX
Why is talking to this VRMS?
vRS
vRA
ESX
ESX
ESX
Replication
Storage
[Remote Site B]
91
VMFS
Confidential
VMFS
Decision Trees
Develop decision trees that is tailored to the organisation. Below are 2 examples.
92
Confidential
vSphere Replication Performance
1 vSphere Replication “replication server” appliance can process up to 1 Gbps of sustained throughput using approximately 95% of 1
vCPU.
• 1 Gbps is much larger than most WAN bandwidth
For a VM protected by VR the impact on application performance is 2 - 6% throughput loss
93
Confidential
MS SQL Server 2008: Licensing
Always refer to official statement from vendor web site.
• Emails, spoken words, SMS from a staff (e.g. Sales Manager, SE) is not legally binding
Licensing a Portion of the Physical Processors
If you choose not to license all of the physical processors, you will need to
know the number of virtual processors supporting each virtual OSE (data point
A) and the number of cores per physical processor/socket (data point B).
Typically, each virtual processor is the equivalent of one core
vSphere 4.1 introduce multi-core. Will you save more $? Need to check with
MS reseller + official MS documents
94
Confidential
SQL Server 2008 R2
Get the Express from http://www.microsoft.com/express/Database/
In most cases, the Standard edition will be sufficient.
vCenter 4.1 and Update Manager 4.1 does not support the Express edition.
• Hopefully Update 1 will?
95
Confidential
Windows Support
http://www.windowsservercatalog.com/default.aspx
Interesting. It is the
other way around.
vSphere 4.1
passed the
certification for
Win08 R2. So
Microsoft supports
Win03 too.
It is version
specific. Check for
vSphere 5
96
Confidential
SQL Server: General Best Practices
Follow Microsoft Best Practices for SQL Server deployments
Defrag SQL Database(s) – http://support.microsoft.com/kb/943345
Preferably 4-vCPU, 8+GB RAM for medium/larger deployments
Design back-end to support required workload (IOPS)
Monitor Database & Log Disks -Disks Reads/Writes, Disk Queues
Separate Data, Log, TempDB etc., IO
Use Dual Fibre Channel Paths to storage
• Not possible in vmdk
Use RAID 5 for database & RAID 1 for logs in read-intensive deployments
Use RAID 10 for database & RAID 1 for logs for larger deployments
SQL 2005 TempDB (need to update to 2008)
• Move TempDB files to dedicated LUN
• Use RAID 10
• # of TempDB files = # of CPU cores (consolidation)
• All TempDB files should be equal in size
• Pre-allocate TempDB space to accommodate expected workload
• Set file growth increment large enough to minimize TempDB expansions.
• Microsoft recommends setting the TempDB files FILEGROWTH increment to 10%
97
Confidential
What is SQL Database Mirroring?
Database-level replication over IP…, no shared storage requirement
Same advantages as failover clustering (service availability, patching, etc.)
At least two copies of the data…, protection from data corruption (unlike failover clustering)
Automatic failover for supported applications (DNS alias required for legacy)
Works with SRM too. VMs recover according to SRM recovery plan
98
Confidential
VMware HA with Database Mirroring for Faster Recovery
Highlights:
• Can use Standard Windows and SQL Server editions
• Does not require Microsoft clustering
• Protection against HW/SW failures and DB corruption
• Storage flexibility (FC, iSCSI, NFS)
• RTO in few seconds (High Safety)
• vMotion, DRS, and HA are fully supported!
Note:
• Must use High Safety Mode
for Automatic Failover
• Clients applications must be aware of Mirror or use DNS Alias
99
Confidential
MS SharePoint 2010
Go for 1 VM = 1 Role
100
Confidential
Java Application
RAM best practice
• Size the virtual machine’s memory to leave adequate space
• For the Java heap
• For the other memory demands of the Java Virtual Machine code
• For any other concurrently executing process that needs memory from the same guest operating system
• To prevent swapping in the guest OS
• Do not reserve RAM 100% unless HA Cluster is not based on Host Failure.
• This will impact HA Slot Size
• Consider the VMware vFabric as it takes advantage of vSphere.
Others
• Use the Java features for lower resolution timing as supplied by your JVM (Windows/Sun JVM example: -XX:+ForceTimeHighResolution)
• Use as few virtual CPUs as are practical for your application
• Avoid using /pmtimer in boot.ini for Windows with SMP HAL
101
Confidential
SAP
No new benchmark data on Xeon 5600.
• Need to check latest Intel data.
Regarding the vSphere benchmark.
• It’s a standard SAP SD 2-tier benchmark. In real life, we should split DB and CI instance, hence cater for more users
• vSphere 4.0, not 4.1
• SLES 10 with MaxDB
• Xeon 5570, not 5680 or Xeon 7500 series.
VM Size for Benchmark
4 vCPU
• SAP ERP 6.0 (Unicode) with Enhancement Package 4
SD Users
1144
Around 1500 SAPS per core
• Virtual at 93% to 95% of Native Performance. For sizing, we can take 90%
of physical result.
• Older UNIX servers (2006 – 2007) are good candidates for migration to
X64 due to low SAPS per core.
Central Instance can be considered for FT.
• 1 vCPU is enough for most cases
102
Confidential
8 vCPU
2056
Response Time (s)
0.97
0.98
SAPS
6250
11230
VM CPU Utilization
98%
97%
ESX Server CPU Utilization
<30%
<80 %
SAP 3-Tier SD Benchmark
103
Confidential
MS AD
Good candidate.
• 1 vCPU 2 GB RAM are sufficient. Use the UP HAL.
• 100,000 users require up to 2.75GB of memory to cache directory (x86)
• 3 Million users require up to 32GB of memory to cache entire directory (x64)
• Disk is rather small
• Disk2 (D:) for Database. Around ~16GB or greater for larger directories
• Disk3 (L:) for Log files. Around 25% of the database LUN size
Changes in MS AD design once all AD are virtualised
• VM is not a reliable source of time. Time drift may happens inside a VM.
• Instead of synchronising with the Forest PDC emulator or the “parent” AD, synchronise with Internal NTP Server.
Best practices
• Set the VM to auto boot.
• Boot Order
• vShield VM
• AD
• vCenter DB
• vCenter App
• Regularly monitor Active Directory replication
• Perform regular system state backups as these are
still very important to your recovery plan
104
Confidential
MS Exchange
Exchange has become leaner
and more scalable
Exchange 2003
Exchange 2007
Exchange 2010

32-bit Windows

64-bit Windows

64-bit Windows

900 MB database cache

32+ GB database cache

4 Kb block size

8 Kb block size

32 Kb block size

High read/write ratio

1:1 read/write ratio

I/O pattern optimization

70% reduction in disk I/O

Further 50% I/O reduction
Building block CPU and RAM sizing for 150 sent/received
• http://technet.microsoft.com/en-us/library/ee712771.aspx
Building Block
Profile
Megacycle Requirement
vCPU
1000 mail box
150 sent/received daily
3,000
2 (1.3 actual)
Cache Requirement
9 GB
Total Memory Size
16 GB
Database Availability Group (DAG)
• DAG feature in Exchange 2010 necessitates a different approach to sizing the Mailbox Server role, forcing the administrator to account for
both active and passive mailboxes.
• Mailbox Servers that are members of a DAG can host one or more passive databases in addition to any active databases for which they may
be responsible.
• Not supported by MS when combined
105
Confidential
VMware HA + DAGs (no MS support)
Protects from hardware and application failure
• Immediate failover (~ 3 to 5 secs)
• HA decreases the time the database is in an
‘unprotected state’
No passive servers.
Windows Enterprise edition.
Exchange Standard or Enterprise editions
Complex configuration and capacity planning
2x or more storage needed
Not officially supported by Microsoft
106
Confidential
Realtime Applications
Overall: Extremely Latency Sensitive
• All apps are somewhat latency sensitive
• RT apps break with extra latency
“Hard Realtime Systems”
• Financial trading systems
• Pacemakers
“Soft Realtime Systems”
• Telecom: Voice over IP
• Technically Challenging, but possible. Mitel and Cisco both provide official support. Need 100% reservation.
• Not life-or-death risky
Financial Desktop Apps (need hardware PCoIP)
• Market News
• Live Video
• Stock Quotes
• Portfolio Updates
107
Confidential
File Server
Why virtualise?
• Cheaper
• Simpler.
Why not virtualise
• You already have an NFS server
• You don’t want additional layer.
108
Confidential
Upgrade to vSphere 5
109
Confidential
Upgrade Best Practices
Turn Upgrade into Migrate
• Much lower risk. Ability to roll back and much simpler project.
• Fewer stages. 3 stages  1
• Upgrade + New Features + Rearchitecture in 1 clean stage.
• Faster overall project
• Need to do server tech refresh for older ESXi
Think of both data centers
• vCenter 5 can’t linked-mode to vCenter 4.
Involve App Team
• Successful upgrade should result in faster performance
Involve Network and Storage team
• There cooperation is required to take advantage of vSphere 5
Compare Before and After
• …. and document your success! 
110
Confidential
Migrate: Overall Approach
Document the Business Drivers and Technical Goals
• Upgrade is not simple. And you’re not doing it for fun 
• If you are going to support larger VM, you might need to change server
Check compabitility
• Array to ESXi 5.
• Is it supported?
• You need firmware upgrade to take advantage of new vStorage API
• Backup software to vCenter 5
• Products that integrates with vCenter 5
• VMware “integration” products: SRM, View, vCloud Director, vShield, vCenter Heartbeat
• Partner integration products: TrendMicro DS, Cisco Nexus
• VMware management products, partner management products.
• All these products should be upgraded first
Assuming all the above is compatible, proceed to next step
Read the Upgrade Guide
Plan and Design the new architecture
• Based on vSphere 5 + SRM 5 + vShield 5 + others
• Decide which architectural changes you are going to implement. Examples:
• vSwitch to vDS?
• Datastore Cluster?
• Auto-deploy?
• vCenter appliance? Take note the limitation (View, VCM, LinkedMode, etc limitation)
• What improvements are you implementing? Examples:
• Datastore clean up or consolidation.
• SAN: fabric zoning, multi-pathing, 8 Gb, FCoE
• Chargeback? This will impact your design
111
Confidential
Migrate: Overall Approach
Upgrade vCenter
Create the first ESXi cluster
• Start with IT cluster
Migrate first 4.x cluster into vCenter 5
• 1 cluster at a time.
• Follow VM schedule downtime
• Capture Before Performance, for comparison or proof.
• Back up VM, then migrate.
• Once last VM migrated, the hosts are free for reuse or decommissioned.
Repeat until last cluster is migrated
Upgrade VM to latest hardware version and upgrade VMware Tools.
112
Confidential
New features that impact design
New features with major design impact
• Storage Cluster
• Auto Deploy
• You need infrastructure to support it
• vCenter appliance
• VMFS-5
• Larger datastore, so your datastore strategy might change to “less but larger” one.
Other new features can wait after upgrade.
• Example, Network IO Control can be turned on after upgrade.
113
Confidential
Over Time The DMZ Evolved
Increased technological and operational complexity
Security
Zones
UTM
1995
# of systems & complexity increase over time
SEC 1880
114
Confidential
2005
Design Consideration for DMZ Zone
5-dimensional decision model
SEC 1880
115
Confidential
vDMZ Operations is different
 Virtual DMZ Operations
•
•
•
•
•
•
Needs VMware Know-how
Needs Windows Know-how
Needs Hardware Know-how
Needs Application Know-how
Needs security automation
Needs organizational integration
 Virtual DMZ Operations
• Highly dynamic & agile
• Additional systems (vSphere,
Windows)
• Additional Hardware (Blades,
Converged Networking)
• Server sprawl inside the DMZ
116
Physical DMZ Operations
• Network, Network-Security & Unix
only
• Disparate Silos
• Manual operations
• No integration into “internal ops”
DMZ Operations:
• Maintenance: Upgrading, Updating
and Troubleshooting
• ServiceChanges: Changing existing
Services
• Innovation: Introduction of new
Services
• Monitoring: Keeping things “in the
green” & “secure”
Confidential
Chargeback
vCloud
Director
database
JDBC
vShield
Manager
REST
VSM data
collector
JDBC
Chargeback
data collector
vCenter
database
117
vCloud data
collector
REST
JDBC
Chargeback
Server
REST
JDBC
JDBC
Confidential
Chargeback
database
Automation impacts 8 areas of IT Excellence
Service Design
Continual Process Improvement
•
Supplier Management
•
Service Level Management
•
Service Catalog Management
•
Availability Management
Transformation Planning
Organization and Skill Development
Life Cycle Management
Systems Management
Capacity Management
Financial Management
Configuration Management
Security Management
1
118
Confidential
Internal Cloud Maturity Model
Governance
Service
automation
Service
management
Cloud
infrastructure
management
HIaaS
infrastructure
119
Technologically
proficient
Operationally
ready
Applicationcentric
Serviceoriented
Cloudenabled
Include
virtualization
in software
procurement
Update
procurement and
change
management
Update
audit/
accounting
practices
Define
HIaaS
standard
models
Plan IA
policy
requirements
Assess and deploy
lab automation
Automate
VM
provisioning
Automate
application
provisioning
Automate service
provisioning
Automate
cloud
bursting
Define
service
tiers
Implement service
pools
Implement showback,
update data
protection
Implement
or update
service
catalog
Define
IA service
management
requirements
Define
standard templates
Deploy
essential
management
services
Enforce
QoS
Deploy
virtual
infrastructure
appliances
Deploy
virtual datacenters
Consolidate
physical
to virtual
Deploy HA
services,
tier-3 apps
Deploy load
balancing,
tier-2 apps
Optimize for
tier-1 apps, multitenants
Optimize for cloud
portability
Confidential
Download