Future Hardware Platform Technologies
Presented by:
Alexander Sack, Senior Developer
1
Agenda
2
State of the Industry
Future Technologies
Multi-Core
SATA
SAS
PCI Express
iSCSI
Virtualization
New SCO Hardware Support & New Hard Drive types
Nearline Hard Drive Considerations
Future Hardware Support from SCO & Current Projects
Q & A
State of the Industry
4
32-bit Single or dual processor servers
I/O bandwidth bound:
Devices are faster than the interconnect or fabric they attach to
Complicated cabling schemes to attach devices
Network and storage separately managed
Platform dependent
Separate servers for separate platforms
Separate platforms for separate applications
Mixed architectural environments
State of the Industry
5
Where are we heading?
64-bit Multi-Core Architectures
In order to keep pace with Moore’s Law, Intel and AMD are concentrating on parallelism not clock speed.
The “Serial” Age
Next generation protocols have moved back to serial protocols: Its easier to move one bit faster than several bits simultaneously!
Wired to Wireless
In an age of mobile devices, access to the network must be unfettered!
Networked Storage
Network vs. Storage: What’s really the difference?
A Virtual World
Virtualization is no longer just an enterprise feature on big iron.
6
State of the Industry
Faster processing, networking and I/O power
Compact design, easy assembly
Designed for power management
Easier access to the network
Heterogeneous environments managed on one platform
Mixed platform and application environments on a single managed piece of hardware!
Future Technologies: Multi-Core
What is it?
A multi-core CPU combines independent processors or cores onto a single silicon chip.
Intel: Distinguishes between logical and physical processors
Logical refers to the Hyperthreading side, physical means core.
An Intel Dual-Core processor has two physical processors in the same chip package.
Dual-Core Pentium 4 Xeon chips, “Paxville”, is due out 1Q06 or sooner!
AMD: Uses the concept of logical processor count to refer to multiple cores existing within the same chip package.
Shipping dual-core Opteron and AMD64 (X2) today
8
Future Technologies: Multi-Core
9
Just like SMP, when a system boots, the kernel is loaded on one processor or one core called the bootstrap processor or BSP.
The kernel queries the BSP via the CPUID instruction to determine how many logical and physical processors exist.
Based on the number of logical, physical, and licensed processors, the kernel will attempt to initialize each processor.
Future Technologies: Multi-Core
The OS scheduler will typically run processes in a round-robin fashion and due to our ordering scheme, usually select a physical processor before a logical processor.
The kernel relies on the ACPI CA layer if no MPS Tables are detected to route interrupts correctly.
ACPI CA based on the Intel stack
Will be updated to support newer ACPI v3.0
Interrupts are load-balanced across multiple Local APICs.
APIC has a limitation of 8 CPUs since it uses 3 bits to address a processor on a separate APIC specific bus.
Latest Pentium architecture uses XAPIC which reuses the system bus to address processors, thus supporting larger than 8 CPU configurations.
10
Future Technologies: Multi-Core
11
Multi-Core processors are licensed the same way as
Hyperthreading technology.
SCO is ready to fully support multi-core architectures today!
OSR6 MP1 fully supports current Intel Dual-Core offerings
OSR6 and UW7 must be installed in single processor mode before enabling multi-core technology.
Working with OEM partners to have certifications ready in time for dual-core toward the end of the year.
AMD?
Future Technologies: SATA
12
What is it?
Serial ATA is the next generation of IDE designed to replace the older parallel interconnect.
Serial ATA is composed of several specifications:
Serial ATA Core Specification
Gives general protocol and physical layer descriptions Protocol pieces are slowly being folded into the T13 AT Attachment standards.
Initially based on ATA/ATAPI-6 standard
Serial ATA II Specification
An addendum to Serial ATA Core spec
Serial ATA II DOES NOT MEAN 300GB/s!
Adds additional features to the existing specs
Port Multiplier Specification
Specifies how port multipliers devices are incorporated into a SATA topology
Port Selector Specification
Adds redundant path capabilities to a SATA device
Future Technologies: SATA
13
Gen I: 150 MB/s Gen II: 300 MB/s
Serial Protocol means point-to-point topology.
The SATA II specs added some key features:
Native Command Queuing (NCQ)
Similar to SCSI tagged command queuing with a max queue depth 32
First Party DMA (FPDMA) READ and WRITE
Allows target to reschedule I/O commands for quicker access times.
Hot-Plug
SATA was designed with native Hot-Plug in mind
Devices can be added and removed to a port without stopping the bus.
Staggered Spin-Up
Allows sequential start up devices to reduce peak power requirements
Future Technologies: SATA
14
How does it work?
Some SATA controllers implement a Shadow Register Block
Remember, SATA devices do not have hardware registers.
Legacy Mode means that the controller is being programmed like a IDE chipset.
Allows legacy drivers to just work
Awkward as some aspects of ATA can not be fully emulated like PIO mode.
Native SATA chipsets require a driver to send down Frame Information
Structures (FIS).
Allows driver to use extended features like NCQ and Hot-Plug
Uses fast and efficient DMA or FPDMA mechanisms for I/O transfers
There are several kinds of FISes:
D2H – Register
H2D – Register
Set Device Bits
DMA Activate
DMA Setup
PIO Setup
DATA
BIST Activate
Future Technologies: SATA
15
Impact?
Not all controllers are Native SATA
Some controllers just use SATA PHY to either increase interconnect speed (133 to 150 MB/s) or to reuse existing fab designes
Most hardware SATA RAID are really SCSI or SCSI-like cards
Advanced Host Controller Interface (AHCI) is an Intel sponsored open specification for a native SATA chipset.
Support all of SATA II feature list
Has already multi-vendor support
OSR6 ships with a AHCI driver
Deploy SATA with confidence!
Hardware SATA RAID is supported by the existing drivers that ship today with
OSR6!
OSR6 will be further enhanced to support other chipsets in the future.
SATA will be the primary interconnect for lower cost storage solutions.
Future Technologies: SAS
16
Serial Attached SCSI (SAS) is the next generation of the venerable SCSI protocol
SAS specification is maintained by the T10 Technical
Committee:
ANSI standard as of 2003
Works on top of the existing SCSI Architecture Model-3 (SAM-3)
Roadmap:
Gen I: 3GB/s (shipping now!)
Gen II: 6GB/s (2006)
Gen III: 12GB/s (2010)
Future Technologies: SAS
17
SAS has incorporated the best of various storage technologies:
Data frame based on FCP
Leverages existing SATA PHY for compatibility
Supports OOB signals SATA COMRESET, COMINIT, and COMWAKE
Adds COMSAS which is used to distinguish between a SATA and SAS PHY
Both sides must assert COMSAS to establish a SAS link
Scalability with wide ports
Dual-ported by design for high availability
World-Wide Names
8B/10B Encoding Scheme
Spread Spectrum Clocking to address EMI requirements
Future Technologies: SAS
18
Architecture Overview (SAS-1.1 Section 4.1.1):
A SAS domain contains one or more SAS devices and a service delivery subsystem.
A SAS device contains one or more SAS ports.
A SAS port contains one or more phys.
A service delivery subsystem in a SAS domain may contain expander devices:
Expander devices contain expander ports and a SMP port.
An expander port contains one or more phys.
An expander device may share its phy with the SAS device contained within the expander device.
Future Technologies: SAS
19
Act as hubs or routers to expand a SAS bus
Support up to 128 ports
Two types of expanders:
Edge Expanders
May not be connected to more than one fanout expander
Only two edge expander device sets in a single SAS domain
Fan-out Expanders
Used to connect multiple edge expanders
No more than one fanout expander per SAS domain
Future Technologies: SAS
20
Three Main Protocols:
Serial SCSI Protocol (SSP)
1KB max frame length
Full duplex
Frame types:
COMMAND
TASK
XFER_RDY
DATA
RESPONSE
Serial Management Protocol (SMP)
Used to manage routing characteristics of expanders
Not implemented by target SAS devices
SATA Tunneling Protocol (STP)
Used by expanders to offer SATA compatibility
Follow SATA rules for connection, e.g. 8KB frame size
Future Technologies: SAS
21
SAS is expected to become mainstream in 2006.
Vendors are starting to ship their SAS solutions now!
Working with partners to prepare for the SAS invasion:
OSR6 is already certified with HP’s P600 SAS controller
LSI working on full support for their MPT and MegaRAID SAS controllers
Working with Adaptec to deliver full HostRAID SAS support by end of year
Maintenance Pack update will occur to update OSR6/UW7 to handle
SAS
Future Technologies: PCI Express
22
PCI Express is the next generation of PCI
Initially developed by Intel as 3GIO
Handed over to PCI SIG to become an official standard
Ratified as a standard in 2002
Designed to look like PCI in order to smooth transition but overcome a lot of the initial limitations of PCI/PCI-X:
Point-to-point interconnections are faster
Wide parallel bus increases cost
Reference CLK signal validation is slow due to settling times
With one bus transaction in one direction at a time, arbitration for bus ownership slows down system performance
Future Technologies: PCI Express
23
Power Management
Quality of Service
Isochronous connections (time based)
Hot-Plug and Hot Swap
Multi-hierarchy and advanced peer-to-peer communications
Data Integrity
Error Handling
Process Technology Independence
Built-in standards for electrical compliance
Future Technologies: PCI Express
24
How does it work?
PCI Express devices functionally overlay on top of PCI/PCI-X devices:
Root Complex = HOST/PCI Express Bridge
Switches = PCI/PCI Bridges
Enpoints = PCI Bus Master targets
Bridge = PCI Express/PCI Bridge
Basic connection between two devices is a link
A link must support one lane where each lane represents a set of differential signal pairs
Links can be aggregated:
x1 250 MB/s
x2 500 MB/s
x4 1000 MB/s
x8 2000 MB/s
x12 3000 MB/s
x16 4000 MB/s
x32 8000 MB/s
Future Technologies: PCI Express
25
PCI Express protcol stack:
Transaction Layer
Basic unit of communication is the Transaction Layer Protocol (TLP)
Pipelined full split-transaction protocol
Used for credit based flow control
Optional support for end-to-end data integrity dectection
Data Link Layer
Transition TLPs across a link to another components Transaction Layer
Converts TLPs to Data Link Layer Packets (DLLP)
Established flow control and sequence numbers
Physical Layer
Broken down into two sub layers
Logical Sub-Block: uses 8B/10B encoding scheme
Electrical Sub-Block: contains transmitter and receiver
Future Technologies: PCI Express
26
Impact?
OSR6 already supports most existing PCI Express topologies:
Vendors have certified some PCI Express cards and chipsets
Working with each partner to understand PCI Express roadmap
There are changes necessary to support all of PCI Express:
PCI Express is backwards compatible because it implements PCI/PCI-X address spaces (I/O, Memory, and Configuration)
Add new Message address space support
Native PCI Express uses Message Signal Interrupts (MSI)
Currently, Root Complexes emulate level-sensitive interrupts to mimic INTn style interrupts for compatibility
Drivers today will be updated to support future PCI Express versions
Future Technologies: iSCSI
27
Internet SCSI (iSCSI) uses existing TCP/IP to transfer
SCSI packets over a network.
Managed by the IETF IP Storage Group:
RFC3720 – iSCSI Standard
RFC3721 – iSCSI Naming and Discovery
RFC3723 – Securing Block Storage Protocols over IP
Internet Storage Name Service (iSNS) draft
iSCSI Extensions for RDMA Specification
iSCSI Implementer’s Guide
iSCSI primarily as a low cost alternative to FC
Future Technologies: iSCSI
28
How does it work?
iSCSI Architecture
iSCSI Node: target or initiator
iSCSI Session: A group of TCP connections that link an initiator with a target (“SCSI I-T
Nexus”)
SCSI Command Descriptor Blocks (CDBs) are encapsulated into iSCSI Protocol Data Units
(PDUs):
Each PDU contains a 48 BYTE Base Header Segment (BHS)
Optional digest header to support encryption and IPSec
Opcodes:
NO-OP
SCSI Command
SCSI Task Management Function request
Login Request
Text Request
SCSI Data-Out
Logout Request
SNACK Request
Vendor Specific Opcodes
Future Technologies: iSCSI
29
Used to establish a new session or establish a new connection within an existing connection.
Optionally allows initiator and target to negotiate parameters, security exchange, and establish what stage the initiator is ready to enter.
Once Login has been established session enters Full
Featured phase.
Initiator can now send SCSI CDBs via iSCSI PDUs to various logical units part of the session.
Future Technologies: iSCSI
30
Hardware Based
TCP/IP and iSCSI ASICs on the board
The entire iSCSI protocol is done in firmware
Looks like a regular SCSI HBA to the OS
Software Based
iSCSI protocol done in software
Requires kernel changes to support iSCSI
Protocol overhead is very CPU intensive:
Use NIC cards that feature TCP Offload Engines (TOE)
Intel has announced I/O Advanced Technology initiative to help network performance.
Future Technologies: iSCSI
31
iSCSI is a viable low-cost alternative to FC deployments
Technology is still maturing:
Older iSCSI targets may not be fully compliant with the latest RFC
Mixed approaches of both software and hardware solutions are prevalent
Working with IHVs to establish some form of iSCSI solution:
Talking to Adaptec to support their hardware based iSCSI cards
Investigating whether Intel I/O AT technology that can be leveraged for a software based iSCSI stack solution
Estimating need based on customer feedback
Future Technologies: Virtualization
32
Virtualization abstracts the hardware from the software.
An Guest OS runs within a virtual machine and is managed by a virtual machine monitor (VMM)
First implemented in hardware on the IBM/360 mainframe in the late 1960s.
Used to be an enterprise feature catered to big iron:
Both Intel and AMD have announced their intention to bring virtualization technology to the x86 architecture
Intel Vanderpool
AMD Pacifica
Future Technologies: Virtualization
33
Full vs. Para
Historically, most implementations use full virtualization:
Hardware is completely abstracted
Guest software has no concept its running within a VM
Needed custom hardware in order to scale well
Requires special hardware drivers to act as middleware
Examples: VMWare and MergePRO
Paravirtualization
Guest software is aware of the fact its running under a VM
Requires core software changes to use Paravirtualization API
Scales better
Finer use of privilege rings on the processor
Leverages existing drivers
Examples: Xen
Future Technologies: Virtualization
34
How does it work?
Intel Vanderpool adds new processor operation called VMX:
VMX root operation
Used to run VMM
VMX non-root operation
Used by guest software (guest OS)
VMX transitions
VMX entries: to instantiate guest software
VMX exit: to tear down a VM transfer control to the VMM
Transitions are controlled by the Virtual Machine Control Structure (VMCS)
AMD Pacifica is a superset of Vanderpool
Offers same functionality
Not binary compatible
Adds AMD64 architectural enhancements
More modes for the onboard memory controller
Device Execution Vector (DEV) manages devices that need DMA
Future Technologies: Virtualization
35
Virtualization will be a viable choice to manage a heterogeneous application platform on one piece of hardware!
To see real world applications, please stop by the
“Windows Interopability and Tools” breakout session given by Sandy Gupta on Tuesday.
Tracking technology as it matures and becomes more prominent in the field.
Selecting The right Hard Drive means much less hassle
38
Reference: http://www.seagate.com/docs/pdf/whitepaper/TP-540-Nearline-Storage-Requirements.pdf
The Nearline Challenge
Meeting the Nearline Challenge
While nearline applications don’t require the high level of data availability and IOPS demanded by online applications, they do share the need for around-the-clock data accessibility. And though nearline data activity is far less frequent than online activity, both are highly random in nature. These random reads/writes force drive heads to rapidly and repeatedly traverse a drive’s discs.
To deliver the enterprise-class reliability standard of 1.0 million hours MTBF, nearline-ready
SATA drives are specifically designed to withstand the rigors of random reads/writes and
24x7, always-on operation. In contrast, the typical 600,000 hours MTBF rating of desktop-class SATA drives is obtained in the mild environment of sequential reads/writes and 8x5 power-on hours, and thus has no relevance when considering the use of such drives in nearline applications.
But nearline reliability goes beyond MTBF ratings. Nearline-ready SATA drives also incorporate a Workload Management to dynamically protect them from excessive peak workloads. To further safeguard reliability, nearline-ready drives perform “offline scans” during drive idle time to periodically test the media surface for defects.
39
Ref: Seagate, Corp.
Seagate Nearline Hard Drive Types
Seagate NL35 Series
Nearline Fibre Channel and Serial ATA disc drives
PRODUCT OVERVIEW
KEY FEATURES AND BENEFITS
• Reduces storage infrastructure costs by providing high-capacity, low-cost storage for data applications that don’t require the performance of mainstream enterprise disc drives
• Lowers capacity costs while still meeting the stringent requirements of the data center, ensuring application performance and availability are not compromised
• Integrates easily with existing storage infrastructures, enabling efficient use of multiple tiers of storage to meet varied application needs
• Supports the full range of nearline applications with options for both Serial ATA and Fibre Channel infrastructures
KEY SPECIFICATIONS
Seagate NL35 Fibre Channel Disc Drive
• 400-Gbyte capacity
• Dual-port, full-duplex 2 Gbit/sec Fibre Channel interface
• 1 million hours MTBF (networked nearline workloads)
• Optimized for Fibre Channel tiered storage
Seagate NL35 SATA Disc Drive
• 250-Gbyte and 400-Gbyte capacities
• SATA 1.5 Gb/s serial interface
• 1 million hours MTBF (direct-attach nearline workloads)
• Optimized for SAS/SATA tiered storage
40
Reference: http://www.seagate.com/docs/pdf/marketing/Seagate_NL35.pdf
Other Nearline Hard Drive Vendors
41 http://www.seagate.com/docs/pdf/whitepaper/IDC_final_04C4285.pdf
Future Hardware Support from SCO & Current
Projects
43
Dialogic Support - Telephony
44
Based on demand, SCO will be releasing new Dialogic
Springware device drivers for OpenServer 6, UnixWare
7.1.4 and OpenServer 5.0.7 in Fall 2005
All Dialogic devices will be listed in the SCO Hardware
Database as “
” due to licensing requirements
Reproducible issues submitted to support will be escalated for a fix to engineering
Dialogic Support - Telephony
45
Dialogic Device Support
VOICE WITH NETWORK INTERFACE BOARDS
D/240PCI-T1, D/240JCT-T1, D/300PCI-E1, D/300JCT-E1
LINE VOICE WITH FAX
VFX/41JCT-LS, D/41JCT-LS
VOICE WITH DUAL NETWORK INTERFACE BOARDS
D/240SC-2T1, D/300SC-2E1, D/480SC-2T1, D/600SC-2E1,
D/600JCT-2T1, D/480JCT-2T1
HIGH DENSITY FAX
D/120JCT-LS
NETWORK INTERFACE BOARDS
DTI/480SC, DTI/481SC, DTI/600SC, DTI/601SC, D/600JCT-
2T1, D/240JCT-T1, D/480JCT-2T1, D/300JCT-E1
CONFERENCING
DCB/320SC, DCB/960SC
MEDIA BOARDS
D/80SC, D/160SC, D/240SC, D/320SC, D/640SC, D/160JCT,
D/320JCT, D/240PCI-T1, D/300PCI-E1, D/480JCT-2T1,
D/600JCT-2E1
Device driver support for the following Intel/Dialogic telephony
Products may work, but will not be tested or officially supported.
LOW DENSITY BOARDS AND PBX INTEGRATION
D/21E, D/41E, D/41ESC, D/21D, D/41D, D/21H, D/41H,
D/42NS
MEDIA WITH LOOP START INTERFACE
D/120JCT-LS, D/160SC-LS, LSI/81SC, LSI/161SC
LINE VOICE WITH FAX
VFX/40ESC plus, VFX/40ESC, VFX/40, VFX/40SC, VFX40E
LOW DENSITY BOARDS AND PBX INTEGRATION
All devices in this category currently in production will be supported by the driver set (examples are listed below)
D/4-PCI, D/4-PCIU, D/41EPCI, D/82JCTU, D/41JCT,
D/41JCT-LS. D/42JCT-U
MEDIA WITH LOOP START INTERFACE
LSI/81SC, LSI/161SC
CONFERENCING
DCB/640SC
USB Update
Some inherent issues were found in the USB subsystem that made printing and serial (modem) class drivers unreliable, or hard to configure
USB infrastructure is undergoing an overhaul for
OpenServer 6 and UnixWare 7.1.4
Printing and serial (modem) functionality will be available in Fall 2005 as a download and made available on the next media release
46
Adaptec SAS/SATA & HostRAID Support
A new device driver (adp94xx) for Adaptec
SAS/SATA and HostRAID (software RAID) support is being developed now.
This new driver will cover the entire line of Adaptec products
This driver will be available in winter 2005
47
Parallel Card Support
Parallel Card Support
Support for standard PCI Parallel cards will be available in OpenServer 6 Maintenance Pack 2 scheduled for late October.
SCO System Certification Tests (SCT)
A revised version of the SCT adding functionality omitted from the original release will be made available in the fall of 2005.
48
Your SCO Hardware Team
Alex Sack Mike Drangula
David Wood
Dean Flamberg
Dean Zimmerman
Doug Souders
Mohammed Ali
Nigel Simpson
Paul Hurford
Evan Hunt Richard Harry
Jim Bonnet
Kerri Wallach
Kurt Hutchinson
Robert Lipe
Roger Vortman
Simon Bordman
Maxine McCarthy Steve Williamson
Q&A
Alexander Sack, alexs@sco.com
– Senior Developer
Paul Hurford, paulhu@sco.com
– Director, Hardware Strategies
Mike Drangula, drangula@sco.com
– Senior Developer