Brian Guarnieri SAN TOPOLOGY 101 - Fort Worth SQL Server Users

advertisement
Data
Analyzing
Robot
Youth
Lifeform
Can Fly Jets,
Eats Cheetos,
Says Memowee,
Runs on x86
instruction set,
RAM, Not sure?
S.
A.
N.
EVERYTHING THEY
TOLD ME
ABOUT MY SAN IS A LIE!
SCSI
Small Computer System Interface (SCSI, /ˈskʌzi/ SKUZ-ee)[1] is a set of
standards for physically connecting and transferring data between
computers and peripheral devices. The SCSI standards
define commands, protocols, and electrical and optical interfaces. SCSI
is most commonly used for hard disks and tape drives, but it can
connect a wide range of other devices, including scanners
and CD drives, although not all controllers can handle all devices. The
SCSI standard defines command sets for specific peripheral device
types; the presence of "unknown" as one of these types means that in
theory it can be used as an interface to almost any device, but the
standard is highly pragmatic and addressed toward commercial
requirements.
Common Flavors of SCSI
SCSI - 1
ULTRA3 SCSI
SCSI-2
Ultra2 Wide SCSI
Fast SCSI
ULTRA 3 SCSI
Fast Wide SCSI
ULTRA 320 SCI
Ultra SCSCI
ULTRA 640 SCSI
Ultra Wide SCSI
Ultra 2 SCSI
FILE ACCESS PROTOCALS (LAYER 7)
NFS (Network File System) – Sun originally developed standard in 1984 for remote server
file access. IE Mounting a remote drive.
NFSv2 – 1989, Operated on UDP predominately. Only allowed 2GB of a file to be read.
NFSv3 – 1995 Support for 64 bit file offsets, can handle higher than 2GB, support Async
writes to server, Sun Offically added TCP as a transport for Data.
NFSv4- 2000 Protocal support for clustered server deployments, influence by CFS, and
mandates strong security, and transport improvements.
NFSv4.1 – Adds support for parallelism in data distribution across servers.
SMB (Server Message Block) & CIFS (Common Internet File System)- Originally developed
by IBM. Microsoft picked up the protocal with it’s implementation of LAN Manager for
OS2 and 3COM.
1. Was integrated into Windows for Work Groups in 1992.
2. SMB morphed into CIFS with direct connections to TCP Port 139.
3. The windows UNIX connection client is called “SAMBA” to connect to Unix Servers.
4. CIFS – added NTLM and NTLMv2 support for locking, stronger encryption, and later
Active Directory Support.
5. Intially Netbios was used for transmission but later versions become less and less
chatty.
SEE ALSO: WEBDAV for further file sharing protocals.
DATA TRANSMISSION PROTOCALS
PROTOCAL
Advantages
Broad Definition
Fibre Channel
Das Uber Fast. Multiple GB’s
and rising!
Fiber or Ethernet Medium (Fibre Channel
over Ethernet connecting to a storage
medium via the Fibre Channel.
Good for long distances.
Geo located SANS.
Small Computer System Interface, old
school standard. CPU independent. IP
based SCSI.
Cheap
Attachment for external hard drives.
Switched fabric link used in high end data
centers. Low Network latency. Superset of
Virtual Interface Architecture. (What ever
that is?)
Features include high throughput,
low latency, quality of service and failover,
Old School Point to Point Protocol,
replacement for SCSI int the 80’s.
Backwards compatibility to SATA.
Communication protocol used to move
data to and from computer storage devices
such as hard drives and tape drives.
(Arbitrated Loop, Point to Point, and
Switched Fabric.)
iSCSI host Adapter / RAID
Controller
e-SATA
(External Serial ATA Disk Enclosures)
InfiniBand
SAS
Serial Attached SCSI
Remember: Fiber = Fiber Optics, Fibre = Protocol.
Engineers get bored so they have to make up
silliness so they’ll have something to whine about.
and it is designed to be scalable.
THE SAN
THE HBA
HBA (MISC HOST BUS ADAPTERS)
External SCSI BUS
QLOGIC
CISCO SCSI HBA
External SCSI BUS
ISA SCSI HBA
Circa early 90’s
WHAT IS A AN HBA?
A host bus adapter (HBA) component that connects a host
system (the computer) to other network and storage devices via
an external hard port.
FOR THIS CONVERSATION ONLY:
Today, the term host bus adapter (HBA) is most often used
to refer to a Fibre Channel interface card. Fibre Channel
HBAs are available for all major open systems, computer
architectures, and buses, including PCI and SBus (obsolete
today). Each HBA has a unique World Wide Name (WWN),
which is similar to an Ethernet MAC address in that it uses
an OUI assigned by the IEEE. However, WWNs are longer (8
bytes). There are two types of WWNs on a HBA; a node
WWN (WWNN), which is shared by all ports on a host bus
adapter, and a port WWN (WWPN), which is unique to
each port. There are HBA models of different speeds:
1Gbit/s, 2Gbit/s, 4Gbit/s, 8Gbit/s, 10Gbit/s and 20Gbit/s.
The major Fibre Channel HBA manufacturers
are QLogic and Emulex. As of mid-2009, these vendors
shared approximately 90% of the market.[1][2] Other
manufacturers include Agilent, ATTO, Brocade, and LSI.
HBA is also known to be interpreted as High Bandwidth
Adapter in cases of Fibre Channel controllers.
OTHER TYPES OF HBA
eSata
iFiniband
ATA
SAS
SATA
SCSI
HBA = HARDWARE
TRANSLATOR or
DISK CONTROLLER
MPIO DRIVERS & YOUR HBA:
The Microsoft Multipath I/O (MPIO) framework provides support for
multiple data paths to storage to improve fault tolerance of
connection to storage, and may in some cases provide greater
aggregate throughput by using multiple paths at the same time.
•
•
•
•
Dynamic load balancing
Traffic shaping
Automatic path management
Dynamic reconfiguration
Included in the OS in Windows 2008. 2003 & 2000 required a download and
configuration.
• Microsoft’s full discussion is at
http://technet.microsoft.com/en-us/library/ee619734(v=WS.10).aspx
•
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=9787
•
http://download.microsoft.com/download/3/0/4/304083f1-11e7-44d9-92b92f3cdbf01048/mpio.doc
FIBER – It’s a GLASS
Lots of breaky stuff. Lots of connectors and
types of cables out there for everything
from industrial lasers to trans Atlantic
Cable.
http://en.wikipedia.org/wiki/Optical_fiber_cable
http://en.wikipedia.org/wiki/Optical_fiber_connector
FIBER CONFIGURATIONS
Arbitrated Loop:
Has to communicate to all
nodes to pass along data
between nodes. End of loop
can be slow. Middle 2000’s,
less often used. Think TOKEN
RING.
Switched Fabric:
Fibre channel used in a SWITCHED
environment. Similar to a TCP in IP
networks. Data can be shared from
switches from multiple end
sources. Used to truly create a
Storage Area Network. HBA’s can
have redundant cable.
Point to Point: Every thing old is new again.
Direct attached storage devices like drive cabinets
or RAM SAN’s. Nice because it becomes a device
dedicated for servers with high performance needs.
THE D I S K (SUB SYSTEM aka JBOD)
YEAH they’re lots of
disks.
YEAH!
JUST A BUNCH OF DISKS
DISK SUB SYSTEMS BY LEVEL OF COMPLEXITY
1. JBOD: Just a Bunch of disks
2. RAID ARRAYS (SEE NEXT SLIDE)
3. Intelligent Disk Sub Systems
1. Instant Copies
2. Remote Mirroring & Consistency Groups (i-SCSI is a great candidate protocol.) –
Remote copying of data either synchronously or a- synchronously.
3. Lun Masking (Logical Unit Number)
1. Port Based: Poor man’s Masking limiting certain ports to specific servers
2. Full Lun Masking means presenting certain disks to the server so that only
that server has access to a specific stripe or set of disks.
Don’t confuse LUN Masking with ZONEING – SEE SAN VOLUME CONTROLLER:
Many devices and nodes can be attached to a SAN. When data is stored in a single cloud, or
storage entity, it is important to control which hosts have access to specific devices. Zoning
controls access from one node to another. Zoning lets you isolate a single server to a group
of storage devices or a single storage device, or associate a grouping of multiple servers
with one or more storage devices, as might be needed in a server cluster deployment.
Zoning is implemented at the hardware level (by using the capabilities of Fibre Channel
switches) and can usually be done either on a port basis (hard zoning) or on a World-Wide
Name(WWN) basis (soft zoning). WWNs are 64-bit identifiers for devices or ports.
http://technet.microsoft.com/en-us/library/cc758640(v=ws.10).aspx
R.A.I.D.
NEW RAID Redundant Array of Independent Disks,
OLD SCHOOL: Redundant Array of Inexpensive Disks
A spanned volume is a formatted partition which data is stored on more
than one hard disk, yet appears as one volume.
RAID 0+1: Even number of drives creates a second striped set to mirror a
primary striped set. The array continues to operate with one or more drives
failed in the same mirror set, but if drives fail on both sides of the mirror the
data on the RAID system is lost.
RAID 1+0: (a.k.a. RAID 10) mirrored sets in a striped set (minimum four drives;
even number of drives) provides fault tolerance and improved performance
but increases complexity.
RAID 5+3: mirrored striped set with distributed parity (some manufacturers
label this as RAID 53).
My Experience: SANS Combie Striping with Hybrid RAID. Each manufacturer
is different and some older sans use more traditional RAID 1, 5, and 10. An
HP SAN I used back in the North East.
What is a SAN VOLUME CONTROLLER?
Indirection or mapping from virtual LUN to physical LUN : Servers access SVC as if it were a storage controller. The SCSI LUNs they see
represent virtual disks (volumes) allocated in SVC from a pool of storage made up from one or more managed disks (MDisks). A managed disk is simply
a storage LUN provided by one of the storage controllers that SVC is virtualizing.
Data migration : SVC can move volumes from MDisk group to MDisk group, whilst maintaining I/O access to the data. MDisk groups can be shrunk
or expanded by removing or adding hardware LUNs, while maintaining I/O access to the data. Both features can be used for seamless hardware
migration. Migration from an old SVC model to the most recent model is also seamless and implies no copying of data.
Importing existing LUNs via a feature called Image Mode: "Image mode" is a one-to-one representation of an MDisk (managed
LUN) that contains existing client data; such an MDisk can be seamlessly imported into or removed from an SVC cluster.
Fast-write cache: Writes from hosts are acknowledged once they have been committed into the SVC mirrored cache, but prior to being destaged
to the underlying storage controllers. Data is protected by being replicated to the other node in an I/O group (node pair). Cache size is dependant on
the model of SVC used. Fast-write cache is also used to increases performance in midrange storage configurations.
Auto tiering (Easy Tier): SVC automatically selects the best storage hardware for each chunk of data, according to its access patterns. Cache
unfriendly "hot" data is dynamically moved to solid state drives SSD, whereas cache friendly "hot" and any "cold" data is moved to economic spinning
disks. Makes use of HOT ZONES on disks by moving data to the outer (faster IO) or inner (slower IO) sections of spindles.
Solid state drive SSD support: SVC can use any supported external SSD storage device or provide its own internal SSD slots, up to 32 per cluster. Easy
Tiering is automatically active when mixing SSDs with spinning disks in hybrid MDisk groups.
Space-efficient features: LUN capacity is only used when new data is written to a LUN. Also known as Thin Provisioning. Data blocks equal zero
are not physically allocated, unless previous data unequal zero exists.Thin provisioning is typically combined with the FlashCopy features detailed
below to provide space-efficient snapshots
Virtual Disk Mirroring: Provides the ability to make two copies of a LUN, implicitly on different storage controllers
Stretched Cluster, also called Split IOgroup: A geographically distributed cluster layout leveraging the virtual disk mirroring feature across
datacenters within 300 km distance. A stretched cluster presents one logical storage layer over synchronous distances for increased high availability.
Unlike in classical mirroring, logical LUNs are writable on both sides (tandem) at the same time, removing the need for "failover", "role switch", or "site
switch". The feature can be combined with Live Partition Mobility or VMotion to avoid any data transport (storage mobility or storage VMotion). Each
side's SVC nodes also have access to the other side's physical storage hardware, removing the need for data rebuilds in case of simple node failures.
Licensed IBM SVC FEATURES:
Licensed Features
The payment for base license is per TB of MDisks or per number of physical disk drives in the
underlying layer. There are some optional features, separately licensed per TB:[1]
Metro Mirror - synchronous remote replication
This allows a remote disaster recovery site at a distance of up to about 300km[5]
Global Mirror - asynchronous remote replication
This allows a remote disaster recovery site at a distance of thousands of kilometres. Each Global Mirror
relationship can be configured for high latency / low bandwidth or for high latency / high bandwidth connectivity,
the latter allowing a consistent recovery point objective RPO below 1 sec.
FlashCopy (FC)
This is used to create a disk snapshot for backup, or application testing of a single volume. Snapshots require only
the "delta" capacity unless created with full-provisioned target volumes. FlashCopy comes in three flavours:
Snapshot, Clone, Backup volume. All are based on optimized copy-on-write technology, and may or may not
remain linked to their source volume.
One source volume can have up to 256 simultaneous targets. Targets can be made incremental, and cascaded
tree like dependency structures can be constructed. Targets can be re-applied to their source or any other
appropriate volume, also of different size (e.g. resetting any changes from a resize command).
Copy-on-write is based on a bitmap with a configurable grain size, as opposed to a journal.[1]
SAN 2.0 SOLID STATE
Why should you care about NAND FLASH PRODUCTS?
PLUS SIDE
1. NAND IS FUN TO SAY (Stay away from ML-FLASH)
2. It’s fast
3. Can be integrated into a SAN with a low form factor.
4. Did I say It’s fast
5. A single card can take up less space than JBOD. Good for off site Disaster Recovery
Servers.
6. It’s more reliable than DISKS and easier to tell when a catastrophic failure occurs.
7. Lower Power Consumption.
8. Bad parts of a drive can be sequestered and disabled when chips fail.
Why should you care about NAND FLASH PRODUCTS?
DOWN SIDE
1. Drives can degrade because of they write and erase data.
2. Data can leak out of memory sectors at a “bit level”
3. They’re not cheap (But getting cheaper)
4. Cheap write controllers can reuse the same chips causing unnecessary wear.
5. Not all external drives will interface with a specific switch. OEMS can get picky.
ioDrive/ioDrive Duo Reliability
• No moving parts = Fewer points of failure
• Has internal N+1 redundancy with self-healing technology
• 4 Levels of data parity protection and/or verification.
– 10-30 probability of undetected bad data!
• 5 Levels of data redundancy.
– 10-20 probability of uncorrectable data!
• 5 Methods for ensuring NAND longevity and durability
–10 years for SLC (@ 100% write duty cycle
–8 years for MLC (@ 100% write duty cycle
• HDD 5 years component design life span, 350,000 to 600,00
hour MTBF
• ioDrive - 1+ million hours MTBF
FUSION-IO: Will that be One Card or Two?
Texas
Memory
Systems
EMC
FUZION IO
Texas Memory Systems RamSan-70
PCI Express slot differing by
size: Everything old is new
again.
Who remembers AGP, ISA, VESA?
Who Remembers the 80’s?
Some Texas Memory Sytems
NANDFlash
RamSan Products
RamSan-70
RamSan-710/810
RamSan-720/820
RamSan-630
SLC Flash
SLC/eMLC Flash
SLC/eMLC Flash
SLC Flash
900 GB
5/10 TB
12/24 TB
10 TB
1.2M IOPS
400K/320K IOPS
500K/450K IOPS
1M IOPS
2.5 GB/s
5/4 GB/s
5/4 GB/s
10 GB/s
Full-height, halflength PCIe x8 2.0
1U rackmount, 4x IB or FC ports
3U rackmount, 10x
IB or FC ports
RamSan-710/-810
1-2 interface modules
4-20 Flash modules
+ 1 “Active Spare”
management
control processor
redundant
power
supplies
1U chassis
redundant
fans
N+1
batteries
RamSan-720/820
Interface
Management
Module
Management
Module
Interface
Power
Module
Switch/RAID
Controller
Switch/RAID
Controller
Power
Module
Voltage
Gate
Voltage
Source
NAND FLASH: The MICROSOPIC VIEW
“Voltage
Drain”
1. Charges are held in the Floating gate.
2. Cells are lined in parallel
3. When Electrons are presented to the floating gate. No current
flows.
4. Erase blocks 8 to 32 Kbytes in size.
5. A write operation in any type of flash device can only be
performed on an empty or erased unit. So in most cases write
operation must be preceded by an erase operation.
6. Multi-level cell flash that can store/hold more than one bits
rather than a single bit in each memory cell doubling the
capacity of memory.
ERASE OPERATIONS “Fowler-Nordheim (F-N) Tunneling” :
1. Voltage ranging between -9V to -12V is applied to the source and control
gate. 2. A voltage of around 6V is applied to the source.
2. 2. A voltage of around 6V is applied to the source.
3. . The electrons in the floating gate are pulled off and transferred to the
source by quantum tunneling (a tunnel current). Electrons tunnel from the
floating gate to the source and substrate. Floating Gates do not carry a
charge making them neutral and a bit of 1 giving it a state of empty.
Voltage
Gate
1. -9Volts
“Voltage
Drain”
Voltage
Source
2. 6 Volts
e-
3. Fowler-Nordheim (F-N)
Tunneling
WRITE OPERATIONS:
1. 7V is applied to Bit Line (Drain terminal), bit 0 is stored in the cell.
2. The channel is now turned on, so electrons can flow from the source to the
drain. Through the thin oxide layer electrons move to the floating gate.
3. The source-drain current is sufficiently high to cause some high-energy
electrons to jump through the insulating layer onto the floating gate, via a
process called hot-electron injection.
1. 7volts
e2.&3.
READ OPERATIONS
1. Apply Voltage around 5V to the control gate
2. 1V to the drain. The state of the memory cell is distinguished by the
current flowing between the drain and the source.
To read the data, a voltage is applied to the control gate, and the MOSFET
(metal–oxide–semiconductor field-effect transistor channel) IE. The gate.
will be either conducting or remain insulating, based on the threshold
voltage of the cell, which is in turn controlled by charge on the floating
gate. The current flow through the MOSFET channel is sensed and forms a
binary code, reproducing the stored data.
2.
1.
MOSFETS (known as FGMOS). These FG MOSFET
Flash Problems and TMS Solutions
Each company handles this in they’re own way.
Problem
Solution
Limited write-erase cycles
Wear leveling
Bit errors
ECC
Block/plane/device failures
Block remapping, RAID,
Variable Stripe RAID™
Disturb errors
Voltage and timing adjustments
(read, write, erase)
Erases need big blocks and
take a long time
Overprovisioning
Sold State Solutions : DDR on the go verses FLASH
• 600,000 IOPS*
• Latency = 15µ seconds =
0.000015 = 15 x 10-6
• Internal NAND Flash for backup
IBM Chipkill for failed chips.
• Raided Ram Boards
• Redundant Power
• One Internal Controller
• Certified with IBM Fabric/SVC’s
The RamSan-420 uses RAID protected Flash memory modules to rapidly back
up the RAM-based data and ensure non-volatility for the system. In Active
Backup mode, the RamSan-420 continuously backs up data to the internal
redundant Flash modules without impacting system performance. The
RamSan-420 can back up or restore the entire 256 GB of data in just three
minutes. Texas Memory Systems’ patented IO2 technology further improves
system availability by making user or application-requested data instantly
accessible after the system is powered on.
Storage
Capacity (Usable): 512 GB
LUNs: 1,024 LUNs
Storage Medium: RAM
Performance
Maximum Bandwidth: 4,500 MB/s
Read IOPS: 600 K IOPS
Read Latency: 15 microseconds
Write IOPS: 600 K IOPS
Write Latency: 15 microseconds
Interfaces
Expansion Slots: 4 slots
Fibre Channel Ports per Card: 2
ports/card
Fibre Channel Speed per Port: 4 Gb/s
Management
Supports Browser Management
Supports SNMP Management
Supports SSH Management
Supports Telnet Management
Power
Typical Power Consumption: 650 W
Mechanical
Dimensions: 7" (4U) x 24"
Form Factor: 4U
Weight: 90 lbs
DDR-RAM BOARD RAM SAN
ADVANTAGES:
1. LUNS can be spanned against multips Devices
creating multipath processing, load balancing.
2. Load balancing enables a single application to
fully benefit from the RamSan-440’s technical
capabilities. Multipathing also enables path
failover so that an application’s availability is not
dependent on a single link.
3. Dual HBA’s create 8 Fiber Ports.
RamSan firmware has been tested and is
compatible with nearly all host multipathing
drivers. Texas Memory Systems has developed
OEM MPIO module for Microsoft Windows and a
OEM MPIO module for IBM AIX 5.3.
http://www.ramsan.com/files/f000245.pdf
RAMSAN 400 Internals
RULE: NEVER TRUST WHAT ANYONE SAYS AND ALWAYS KEEP YOUR
LASER HANDY! – PARANOIA ROLEPLAYING GAME, WEST END GAMES.
OEM SERVER TOOLS ARE YOUR FRIEND!
When starting a new job make sure you know how to get
into the server and access the OEM server tools. It
makes you look smart for a very low hanging fruit. In a
month they’ll call you Scotty because the other DBA’s
are scared to touch it.
Know how to access your HBA’s CONFIGURATION TOOLS
In this case search for the executable sansurfer.exe on the C:\*.* drive.
Look Ma’ it’s got Diagnostics
Things to look for: 1. Pirates, 2. Physical Disk IO rates, 3. Cache Utilization (Write %, Read
%), 4. Write Pending %, 5. Storage Controller Utilization (Disk side), 6. Bandwidth
Utilization
boooYEAH!
DELL’s FUSION IO MANAGER For FUSION IO DUO
IO FOR A FILE COPY SESSION
HOW TO DO I SMACK MY SAN SO IT KNOWS WHO’S BOSS?
HOW DO I KNOW YOUR LYING?
Tools I like:
What to focus on?
• LATENCY
• BANDWIDTH
• IOPS
SQLIO
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=20163
SQL IO SIMULATOR:
http://support.microsoft.com/kb/231619
IOMETER
http://sourceforge.net/projects/iometer/
http://www.iometer.org/doc/downloads.html
CrystalDiskMark
Haven’t used it but it’s another Alternative.
Performance Monitor
(PERFMON)
C:\Windows\System32\perfmon.exe
RESOURCE MONITOR
C:\Windows\System32\perfmon.exe /res
WHAT DO I MEASURE?
1. LATENCY :
2. IOPS: Input output Operations per second (Sequential or Random)
IOPS * TransfersizeinBytes = Bytes Per Seconds
P e r f m o n c o u n t e r : ( W r i t e s / S e c o n d ) -1
Remember not all LUNS are SAN DISKS! Distinguish local from SAN DISKS!
\<Drive:>\Disk Reads/sec
BRENT OZAR’s FAVORITE PERF MON
FOR SANS.
\<Drive:>\Disk Writes/sec
These are listed OBJECT first, then
COUNTER
\<Drive:>\Disk Read Bytes/sec
Memory – Available MBytes
\<Drive:>\Disk Write Bytes/sec
Paging File – % Usage
Physical Disk – Avg. Disk sec/Read
\<Drive:>\Avg. Disk Bytes/Read
Physical Disk – Avg. Disk sec/Write
Physical Disk – Disk Reads/sec
\<Drive:>\Avg. Disk Bytes/Write
Physical Disk – Disk Writes/sec
Processor – % Processor Time
\<Drive:>\ Split IO/Sec
SQLServer: Buffer Manager – Buffer
cache hit ratio
DISK QUEUE Length should
SQLServer: Buffer Manager – Page life
be avoided on SANS’
expectancy
CPU / Memory Counters can be Important to
measure general bottlenecks. For SAN
Troubleshooting I like using Activity Monitor
To get a qualitative health check on memory and
CPU pressure.
SQLServer: General Statistics – User
Connections
SQLServer: Memory Manager – Memory
Grants Pending
SQLServer: SQL Statistics – Batch
Requests/sec
SQLServer: SQL Statistics –
Compilations/sec
SQLServer: SQL Statistics –
Recompilations/sec
System – Processor Queue Length
DMV’s Useful for Checking for slowness: Remember some stats are based on
ticks: Use Stat * @@TIMETICKS microseconds or culmulative ticks:
(select cpu_ticks_in_ms, CPU_Ticks from sys.dm_os_sys_info)
Tool
Monitors
Granularity
sys.dm_os_wait_stats
PAGEIOLATCH waits
SQL Server Instance
level
sys.dm_io_virtual_file_stats
Latency, Number of I/O’s
Hot Database files
sys.dm_exec_query_stats
Number of …
Reads (logical or physical)
Number of writes (logical)
Query or Batch stats
sys.dm_db_index_usage_stats
Number of I/O’s and type
of access (seek, scan,
lookup, write)
Index or Table Hits
sys.dm_db_index_operational_stats
PAGEIOLATCH waits
Index or Table
sys.dm_os_io_pending_ios
Pending I/O requests at
any given point in time
File (Per I/O request)
REMEMBER STATS ARE CUMULATIVE. IF YOU REBOOT OR FAIL OVER YOU
RESET THEM.
How do I use S Q L I O to measure performance blocks.
1.SQLIO IS COMPLETELY COMMAND LINE BASED.
2. USE >> in a BAT FILE TO SAVE OUT STATS.
3. Read the SQLIO.RTF file that comes with it. Very usefull.
4. Edit the C:\Program Files (x86)\SQLIO\param.txt file to specify where your test files
are located.
5. Configure you .BAT file to run or run all day to gather info.
6. Use SSIS or some APP TO IMPORT the DATA. Doing it by hand can take a while.
[options] may include any of the following:
-k<R|W>
kind of IO (R=reads, W=writes)
-t<threads>
number of threads
-s<secs>
number of seconds to run
-d<drv_A><drv_B>..
use same filename on each drive letter given
-R<drv_A/0>,<drv_B/1>..
raw drive letters/number for I/O
-f<stripe factor>
stripe size in blocks, random, or sequential
-p[I]<cpu affinity>
cpu number for affinity (0 based)(I=ideal)
-a[R[I]]<cpu mask>
cpu mask for (R=roundrobin (I=ideal)) affinity
-o<#outstanding>
depth to use for completion routines
-b<io size(KB)>
IO block size in KB
-i<#IOs/run>
number of IOs per IO run
-m<[C|S]><#sub-blks>
do multi blk IO (C=copy, S=scatter/gather)
-L<[S|P][i|]>
latencies from (S=system, P=processor) timer
-B<[N|Y|H|S]>
set buffering (N=none, Y=all, H=hdwr, S=sfwr)
-S<#blocks>
start I/Os #blocks into file
-v1.1.1
I/Os runs use same blocks, as in version 1.1.1
-F<paramfile>
read parameters from <paramfile>
Configuring SQI IO METER FOR TESTING
Edit C:\Program Files (x86)\SQLIO\PARAM.TXT
Add or delete lines as needed.
Format is PATH | Number of threads/File | MASK | Size of File in MB
c:\sqlio_test.dat 4 0x0 100
d:\sqlio_test.dat 4 0x0 100
CREATING A BAT FILE FOR TESTING
sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt
sqlio -kW -s360 -frandom -o8 -b64 -LS -Fparam.txt
sqlio -kW -s360 -frandom -o8 -b128 -LS -Fparam.txt
sqlio -kW -s360 -frandom -o8 -b256 -LS -Fparam.txt
sqlio -kW -s360 -fsequential -o8 -b8 -LS -Fparam.txt
sqlio -kW -s360 -fsequential -o8 -b64 -LS -Fparam.txt
sqlio -kW -s360 -fsequential -o8 -b128 -LS -Fparam.txt
sqlio -kW -s360 -fsequential -o8 -b256 -LS -Fparam.txt
Write | 360 seconds | random or sequential | number of IO Requests | bytes
used | what param file to use. You can also use –F<drive>:\testfile.dat
READING THE RESULTS:
sqlio -kR -s360 -fsequential -o8 -b8 -LS –Fparam.txt >> C:\SQLIOLOG.TXT
sqlio v1.5.SG
using system counter for latency timings, 2208056 counts per second
parameter file used: param.txt
file U:\testfile.dat with 10 threads (0-9) using mask 0x0 (0)
10 threads reading for 360 secs from file U:\testfile.dat
using 8KB sequential IOs
enabling multiple I/Os per thread with 8 outstanding
using specified size: 10000 MB for file: U:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 32282.80
MBs/sec: 252.20
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 2
Max_Latency(ms): 567
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 58 15 8 5 3 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
PARSING THE RESULTS:
PERL: http://sqlblog.com/blogs/linchi_shea/archive/2007/02/21/parse-the-sqlio-exe-output.aspx
POWERSHELL: http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/05/25/parsing-sqlio-output-to-excel-charts-using-regex-in-powershell.aspx
IOMETER: Don’t be your SAN Administrators “B@#$%^”
BIG BUCKS NO WHAMMIES!!
Standard Files in Directory
LOW HANGING FRUIT
TROUBLESHOOTING 101
And how not to look like an Idiot.
SAN-A-GEDON!
What do you do when your SAN’s PRIMARY and SECONDARY CONTROLLER GOES?
Never trust the guy servicing your expensive Hardware. Grill him properly. Securely
fastening his hands and legs to limit motion. Inject Sodium Pentothal, and have your DR
SOLUTION READY!
1. Create a temporary Cluster in case of a “Mega Whoops” like when the tech copies
the bad config on the bad card to the good card.
2. Validate and never believe it when they say it’s only going to be 30 Minutes.
3. If you have a DR Solution make sure it’s up to date and possibly perform a nonfailover recovery plan to time outage.
4. In E-commerce solutions TIME = MUCHO DENERO! Time is money. Outages are
doubly so. Cost = hourly wage + lost revenue + loss of customer Confidence
5. Remember the guy working on your SAN is being sent out as a Contractor for the
bigger company and that company is most likely lowest bidder.
6. Always have a Mega Whoops plan.
7. Contact your local mafia Don to find out what body disposal options are available if
and when the IBM tech goes “missing” after he brings your enterprise to a
screeching halt.
8. Remember 1:10,000 can happen to you!
I AM A SAN I AM AND DON’T LIKE GREEN EGGS AND HAM
QA: BLAME THE SAN WHEN IT’S THE SAN…….CHECK YOUR SERVER TOOLS FIRST.
The best way I’ve found to
compare a disk on the server
and Virtual Disk is size.
RESOURCE MONITOR IS YOUR FRIEND
SOME TIMES YOUR TOOLS LIE!
•
Remember Trust No one and always keep your laser handy!
DELL LIES….THEY
ALWAYS LIE
THEY LIE! The TEMPDB is on the same HD SPAN as your Page File……
NO ES MUI BUENO! Always check that the guy who built it didn’t make a mistake.
Notice the drive
says DELL! A dead
give away your not
on the SAN. You
can confirm with
the location ID.
Further Reading:
1. http://www.brentozar.com/sql/sql-server-san-best-practices/
http://technet.microsoft.com/library/Cc966412
2. http://www.hds.com/assets/pdf/best-practices-for-microsoft-sql-server-on-hitachiuniversal-storage-platform-vm.pdf (HITACHI SANS)
SSQL CAT (Customer Advisory Team Links):
1. http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-componentspostattachments/00-09-45-2765/Mike_5F00_Ruthruff_5F00_SQLServer_5F00_on_5F00_SAN_5F00_SQLCAT.zip
2. http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-ComponentsPostAttachments/00-09-45-2765/Mike_5F00_Ruthruff_5F00_SQLServer_5F00_on_5F00_SAN_5F00_SQLCAT.zip
Books:
Troppens, Ulf, Rainer Erkens, Wolfgang Mueller-Friedt, Rainer Wolafka, and Nils Haustein.
"Storage Networks Explained—Basics and Application of Fibre Channel SAN, NAS, iSCSI,
InfiniBand and FCoE, Second Edition". Storage Networks Explained: Basics and Application
of Fibre Channel SAN, NAS, iSCSI, InfiniBand and FCoE, Second Edition. John Wiley & Sons. ©
2009.
W. Curtis Preston. “Help for Storage Administrators: Using Sans and NAS”. O’Reilly, © 2002.
This PRESENTATION IS
BROUGHT TO YOU BY:
An
Angry
Squirrel
DOCTOR STEEL
Download