ReportV5 - University of Colorado Colorado Springs

advertisement
Investigating Serial Attached SCSI (SAS) over TCP (tSAS)
Master’s Project Report
by
Deepti Reddy
As part of the requirements for the degree of
Master of Science in Computer Science
University of Colorado, Colorado Springs
Committee Members and Signatures
Approved by
Date
______________________________
Project Advisor:
Dr. Edward Chow
______________________________
Member:
_______________
Dr. Xiaobo Zhou
______________________________
Member:
_______________
_______________
Dr. Chuan Yue
1
Acknowledgements
I would like to express my sincere thanks and appreciation to Dr. Chow. His support and
encouragement as my advisor helped me learn a lot and fuelled my enthusiasm for this project.
He has been an excellent professor, mentor and advisor throughout my Master’s program at
UCCS.
Appreciation and thanks also to my committee members Dr. Zhou and Dr. Yue for their
guidance and support. I would also like to thank Patricia Rea for helping me with all the
required paper work during my Master’s program.
2
Investigating Serial Attached SCSI (SAS) over TCP (tSAS) ............................................................................. 5
1.
Abstract ................................................................................................................................................. 5
2. Background on SCSI, iSCSI & SAS .............................................................................................................. 6
2.1 SCSI (Small Computer Systems Interface) ........................................................................................... 6
2.1.1 SCSI Architecture Model ......................................................................................................... 7
2.1.2 SCSI Command Descriptor Block .......................................................................................... 8
2.1.3 Typical SCSI IO Transfer ...................................................................................................... 10
2.1.4
Limitations of SCSI........................................................................................................... 11
2.2 iSCSI (Internet Small Computer System Interface) ........................................................................... 12
2.2.1 iSCSI Session and Phases ................................................................................................... 12
2.2.2 iSCSI PDU ............................................................................................................................... 13
2.2.3
Data Transfer between Initiator and Target(s) ............................................................. 14
2.2.4
Read/Write command sequence in iSCSI .................................................................... 16
2.3 Serial Attached SCSI (SAS) ..................................................................................................................... 17
Figure 2.3.0 – A typical SAS topology...................................................................................................... 19
2.3.1
Protocols used in SAS ..................................................................................................... 19
2.3.2
Layers of the SAS Standard ........................................................................................... 19
2.3.3
SAS Ports .......................................................................................................................... 21
2.3.4
Primitives ........................................................................................................................... 21
2.3.5
SSP frame format ............................................................................................................. 22
2.3.6
READ/WRITE command sequence............................................................................... 26
3.0.
tSAS (Ethernet SAS) ......................................................................................................................... 28
3.1
Goal, Motivation and Challenges of the Project ......................................................................... 28
3.2
Project Implementation .............................................................................................................. 28
3.2.0
tSAS Topology and Command flow sequence ............................................................ 29
3.2.1 Software and Hardware solutions for tSAS implementations......................................... 33
3.2.2
Primitives ........................................................................................................................... 35
3.2.4 Task Management ................................................................................................................. 41
3.2.5 tSAS mock application to compare with an iSCSI mock application .............................. 41
3.3
Performance evaluation.............................................................................................................. 44
3.3.0
Measuring SAS performance using IOMeter in Windows and VDbench in Linux .. 44
3.3.1
Measuring iSCSI performance using IOMeter in Windows ....................................... 51
3
3.3.2
Measuring tSAS performance using the client and server mock application written
and comparing it to the iSCSI client/server mock application as well as to legacy SAS and
legacy iSCSI ...................................................................................................................................... 56
4.0
Similar Work.................................................................................................................................... 70
5.0
Future Direction .............................................................................................................................. 70
6.0
Conclusion (Lessons learned) .......................................................................................................... 71
7.0
References ...................................................................................................................................... 71
8.0
Appendix ......................................................................................................................................... 75
8.1
How to run the tSAS and iSCSI mock initiator (client) and target (server) application .............. 75
8.2
How to run iSCSI Server and iSCSI Target Software .................................................................... 78
8.3
How to run LeCroy SAS Analyzer Software ................................................................................. 78
8.4
WireShark to view the WireShark traces .................................................................................... 78
8.5
VDBench for Linux ....................................................................................................................... 78
8.5
IOMeter for Windows ................................................................................................................. 79
4
Investigating Serial Attached SCSI (SAS) over TCP (tSAS)
Project directed by Professor Edward Chow
1. Abstract
Serial Attached SCSI [1], the successor of SCSI is gaining popularity by leaps and bounds in
enterprise storage systems. SAS is reliable, cheaper, faster and more scalable than its
predecessor SCSI. One of the limiting features of SAS is its distance limitation. A single point to
point SAS cable connection can cover only around 8 meters. To scale topologies to support a
large number of devices beyond the native port count, expanders are used in SAS topologies
[2]. With zoning [2] capabilities introduced in SAS2 expanders, SAS is gaining popularity in
Storage Area Networks. With the growing demand for SAS in large topologies arises the need to
investigate SAS over TCP (tSAS) to increase the distance and scalability of SAS. The iSCSI
protocol [3] today provides similar functionality where it sends SCSI commands over TCP.
However, SAS drives and SAS expanders can’t be used in an iSCSI topology such that the iSCSI
HBA talks directly in-band to SAS devices making the iSCSI back-end less scalable than tSAS. The
iSCSI specification is leveraged heavily for design consideration for tSAS.
5
The goal of this project is to provide research results for future industry specification for tSAS
and iSCSI. The project involves understanding the iSCSI protocol as well as the SAS protocol and
providing guidance on how tSAS can be designed. The project also involves investigating
sending a set of SAS commands and responses over TCP/IP (tSAS) to address scalability and the
distance limitations of legacy SAS. A client prototype application will be implemented to
send/receive a small set of commands. A server prototype application will be implemented that
receives a set of tSAS commands from the client and sends tSAS responses. The server
application mocks a tSAS Initiator while the client application mocks a tSAS target. The
performance of tSAS will be compared to legacy SAS and iSCSI to determine the speed and
scalability of tSAS. To compare fairly with legacy iSCSI, a client and server prototype that mock
an iSCSI Initiator and iSCSI target will also be implemented.
2. Background on SCSI, iSCSI & SAS
2.1 SCSI (Small Computer Systems Interface)
Since work first began in 1981 on an I/O technology that was later named the Small Computer
System Interface, this set of standard electronic interfaces has evolved to keep pace with a
storage industry that demands more performance, manageability, flexibility, and features for
high-end desktop/server connectivity each year [4]. SCSI allows connectivity with up to seven
devices on a narrow bus and 15 devices on a wide bus, plus the controller [5].
The SCSI protocol is an application layer storage protocol. It's a standard for connecting
peripherals to your computer via a standard hardware interface, which uses standard SCSI
commands. The primary motivation for SCSI was to provide a way to logically address blocks.
Logical addresses eliminate the need to physically address data blocks in terms of cylinder,
head, and sector. The advantage of logical addressing is that it frees the host from having to
know the physical organization of a drive [6][7][8][9]. Currently the SCSI protocol being used is
SCSI-3. The SCSI standard defines the data transfer process over a SCSI bus, arbitration policies,
and even device addressing [10].
Below is a snapshot of SCSI history:
Type/Bus
SCSI-2 (8 bit narrow)
Approx. Speed
10 MB/Sec
UltraSCSI (8-bit narrow)
20 MB/Sec
Ultra Wide SCSI (16-bit wide)
Ultra2 SCSI (16-bit wide)
40 MB/Sec
80 MB/Sec
Mainly used for
Scanners, Zip-drives, CDROMs
CD-Recorders, Tape Drives,
DVD drives
Lower end Hard Disk Drives
Mid range Hard Disk Drives
6
Ultra 160-SCSI (16-bit Wide)
160 MB/Sec
High end Hard Dis Drives and
Tape Drives
Ultra-320 SCSI (16-bit Wide)
320 MB/Sec
State-of-the-art Hard Disk
Drives, RAID backup
applications
Ultra-640 SCSI (16-bit Wide)
640 MB/Sec
High end Hard Disk Drives,
RAID applications, Tape Drives
Figure 2.1.0 – Snapshot of SCSI History [10]
The SCSI protocol emerged as the predominant protocol inside host servers because of its wellstandardized and clean message-based interface [11]. Moreover, in later years, SCSI supported
command queuing at the storage devices and also allowed for overlapping commands [11]. In
particular, since the storage was local to the server, the preferred SCSI transport that was used
was Parallel SCSI where multiple storage devices were connected to the host server using cablebased bus [11].
2.1.1 SCSI Architecture Model
The SCSI architecture model is a client-server model. The initiator (Host Bus Adapter) initiates
commands and acts like the client while the target (hard disk drives, tape drives etc) responds
to commands initiated by the initiator and therefore act as servers. Figure 2.1.1.0 & 2.1.1.1
show the SCSI architecture model[9][12].
7
Figure 2.1.1.0: SCSI Standards Architecture Model [9][12]
Figure 2.1.1.1: Basic SCSI Architecture[9]
2.1.2 SCSI Command Descriptor Block
8
Protocol Data Units (PDUs) are passed between the initiator and target to send commands
between a client and server. A PDU in SCSI is known as a Command Descriptor Block (CDB). It is
used to communicate a command from a SCSI application client to a SCSI device server. In other
words, the CDB defines the operation to be performed by the server. A CDB may have a fixed
length of 16 bytes or a variable length between 12 and 260 bytes. A typical 10 byte CDB format
is shown below in Figure 2.1.2.0 [9] [13] [14].
Figure 2.1.2.0: 10 byte SCSI CDB
SCSI Common CDB Fields:
Operation Code:
The first byte of the CDB consists of the operation code (opcode) and it identifies the operation
being requested by the CDB. The two main Opcodes of interest for this project are Read and
Write opcodes. The opcode for a Read operation is 0x28 and the opcode for a Write operation
is 0x2A [9] [13] [14].
Logical block address:
The logical block addresses on a logical unit or within a volume/partition begins with block zero
and is contiguous up to the last logical block of that logical unit or within that volume/partition
[9] [13][14].
Transfer length:
The transfer length field specifies the amount of data to be transferred for each IO. This is
usually the number of blocks. Some commands use transfer length to specify the requested
number of bytes to be sent as defined in the command description. A transfer length of zero
implies that no data will be transferred for the particular command. A command without any
data and simply a response (non-DATA command) will have the transfer length set to a value of
zero [9][13][14][15].
Logical Unit Numbers:
9
The SCSI protocol defines how to address the various units to which the CDB is to be delivered
to. Each SCSI device (target) can be subdivided into one or more logical units (LUNs). A logical
unit is simply a virtual controller that handles SCSI communications on behalf of storage devices
in the target. Each logical unit has an address associated with it which is referred to as the
logical unit number. Each target must have at least one LUN. If only one LUN is present, it is
assigned as LUN0 [9][13][14][15].
For more details on these fields, please refer to the SCSI spec [12].
2.1.3 Typical SCSI IO Transfer
The three main phases of an IO transfer are the command phase, the data phase and the status
phase. The initiator sends the command to a target. Data is then exchanged between the target
and initiator. Finally, the target sends the status completion for the command to the initiator.
Certain commands known as non-DATA commands do not have a data phase. Figure 2.1.3.0
shows a SCSI IO transfer for a non-data command while Figure 2.1.3.1 shows a SCSI IO transfer
for a data command [7][8][9][10].
Figure 2.1.3.0: Non-Data Command Transfer[9]
10
Figure 2.1.3.1: Data I/O Operation[9]
2.1.4 Limitations of SCSI
Although the SCSI protocol has been successfully used for many years, it has limited capabilities
in terms of the realization of storage networks due the limitations of the SCSI bus [11]. As the
need for storage and servers grew over the years, the limitations of SCSI as a technology
became seemingly obvious [14]. Initially, the use of parallel cables limited the number of
storage devices and the distance capability of the storage devices from the host server. The
length of the bus limits the distance over which SCSI may operate (maximum of around 25
meters)[9][14]. The limits imply that adding additional storage devices means the need to
purchase a host server for attaching the storage [14]. Second, the concept of attaching storage
to every host server in the topology means that the storage has to be managed on a per-host
basis. This is a costly solution for centers with a large number of host servers. Finally, the
technology doesn’t allow for a convenient sharing of storage between several host servers, nor
typically does the SCSI technology allow for easy addition or removal of storage without host
server downtime [16].
Despite these limitations, the SCSI protocol is still of importance since it can be used with other
protocols simply by replacing the SCSI bus with a different interconnection type such as fibre
channel, IP networks etc [9][16]. Availability of high bandwidth, low latency network
interconnects such as Fibre Channel (FC) and Gigabit Ethernet (GbE) along with the complexities
of managing dispersed islands of data storage, led to the development of Storage Area
Networks (SANs)[16]. Lately, Internet Protocol (IP) is advocated as an alternative to transport
SCSI traffic over long distance [11]. Proposals like iSCSI try to standardize the encapsulation of
SCSI data in TCP/IP (Transmission Control Protocol/Internet Protocol) packets [11][17]. Once
11
the data is in IP packets, it can be carried over a range of physical network connections. Today,
GbE is widely used for local area networks (LANs) and campus networks [11].
2.2 iSCSI (Internet Small Computer System Interface)
The advantages of IP networks are seemingly obvious. The presence of well tested and
established protocols like TCP/IP, allow IP networks both wide-area connectivity as well as
proven bandwidth sharing capabilities. The emergence of Gigabit Ethernet indicates that the
bandwidth requirements of serving storage over a network should not be an issue [15].
The limitations of the SCSI bus, identified in the previous section, and the increased desire for IP
storage led to the development of iSCSI. iSCSI was developed as an end-to-end protocol to
enable transportation of storage I/O block data over IP networks thus dispensing with the
physical bus implementation as the transport mechanism[7][20][21]. iSCSI works by mapping
SCSI functionality to the TCP/IP protocol. By utilizing; TCP flow control, congestion control,
segmentation mechanisms, IP addressing, and discovery mechanisms, iSCSI facilitates remote
backup, storage, and data mirroring [7][20][22]. The iSCSI protocol standard defines amongst
other things, the way SCSI commands can be carried over the TCP/IP protocol [7][23].
2.2.1 iSCSI Session and Phases
Data is transferred between an initiator and target via an iSCSI session. An iSCSI session is a
physical or logical link which carries TCP/IP protocols and iSCSI PDUs, between an initiator and
target. The PDUs in turn carry SCSI commands and data in the form of SCSI CDBs [7][23].
There are four phases in a session, where the first phase, login, starts with the establishment of
the first TCP connection [19]. The four phases are:
1) Initial login phase: In this phase, an initiator sends the name of the initiator and target, and
specifies the authentication options. The target then responds with the authentication options
the target selects[19].
2) Security authentication phase: This phase is used to exchange authentication information (ID,
password, certificate, etc.) based on the agreed authentication methods to make sure each
party is actually talking to the intended party. The authentication can occur both ways such that
a target can authenticate an initiator, and an initiator can also request the authentication of the
target. This phase is optional[19]
3) Operational negotiating phase: The Operational negotiating phase is used to exchange
certain operational parameters such as protocol data unit (PDU) length and buffer size. This
phase as well is optional [19].
12
4) Full featured phase: This is the normal phase of an iSCSI session where iSCSI commands, and
data messages are transferred between an initiator and a target(s)[19].
2.2.2 iSCSI PDU
The iSCSI PDU is the equivalent of the SCSI CDB. It is used to encapsulate the SCSI CDB and any
associated data. The general format of a PDU is shown in Figure 2.2.2.0. It is comprised of a
number of segments, one of which is the basic header segment (BHS). The BHS is mandatory
and is the segment that is mostly used. The BHS segment layout is shown in Figure 2.2.2.1. It
has a fixed length of 48 bytes. The Opcode, TotalAHSLength, and DataSegmentLength fields in
the BHS are mandatory fields in all iSCSI PDUs. The Additional Header Segment (AHS) begins
with 4-byte Type-Length-Value (TLV) information. This field specifies the length of the actual
AHS following the TLV. The Header and Data 19 digests are optional values. The purpose of
these fields is to protect the integrity the authenticity of the header and data. The digest types
are negotiated during the login phase [9].
Figure 2.2.2.0 – iSCSI PDU Structure
13
Figure 2.2.2.0 – Basic Header Segment (BHS)
2.2.3 Data Transfer between Initiator and Target(s)
Once the full feature phase of the normal session has been established, data can be exchanged
between the initiator and the target(s). The normal session is used to allow transfer of data
or/from the initiator and target.
Let us assume that an application on the initiator wishes to perform storage I/O to/from the
target. This can be broken down into two stages:
1. Progression of the SCSI command through the initiator, and
2. Progression of the SCSI command through the target.
To help assist in understanding the progression of the commands, the iSCSI protocol layering
model is shown in Figure 2.2.3.0 [9].
14
Figure 2.2.3.0 – iSCSI protocol layering model
Progression of a SCSI Command through the Initiator
1. The user/kernel application on the initiator issues a system call for an I/O operation
which is sent to the SCSI layer.
15
3. On receipt at the SCSI layer, the system call is converted into a SCSI command and a CDB
containing this information is then constructed. The SCSI CDB is then passed to the iSCSI
initiator protocol layer [9].
4. At the iSCSI protocol layer, the SCSI CDB and any SCSI data are encapsulated into a PDU
and the PDU is forwarded to the TCP/IP layer [9].
5. At the TCP layer, a TCP header is added. The IP layer encapsulates the TCP segment by
adding an IP header before the TCP header [9].
6. The IP datagram is passed to the Ethernet Data Link Layer where it is framed with
Ethernet headers and trailers. The resulting datagram is finally placed on the network
[9].
Progression of a SCSI Command through the Target
1. At the target, the Ethernet frame is stripped off at the Data Link Layer. The IP datagram
is passed up to the TCP/IP layer [9].
2. The IP and TCP layers each check and strip off headers and pass iSCSI PDU up to the
iSCSI layer [9].
3. At the iSCSI layer, the SCSI CDB is extracted from the iSCSI PDU and passed along with
the data to the SCSI layer [9].
4. Finally, the SCSI layer sends the SCSI request and data to the upper layer application [9].
2.2.4 Read/Write command sequence in iSCSI
Read Operation Example
+------------------+-----------------------+----------------------+
|Initiator Function|
PDU Type
| Target Function
|
+------------------+-----------------------+----------------------+
| Command request |SCSI Command (READ)>>> |
|
| (read)
|
|
|
+------------------+-----------------------+----------------------+
|
|
|Prepare Data Transfer |
+------------------+-----------------------+----------------------+
|
Receive Data
|
<<< SCSI Data-In
|
Send Data
|
+------------------+-----------------------+----------------------+
|
Receive Data
|
<<< SCSI Data-In
|
Send Data
|
+------------------+-----------------------+----------------------+
16
|
Receive Data
|
<<< SCSI Data-In
|
Send Data
|
+------------------+-----------------------+----------------------+
|
|
<<< SCSI Response
|Send Status and Sense |
+------------------+-----------------------+----------------------+
| Command Complete |
|
|
+------------------+-----------------------+----------------------+
Figure 2.2.4.1 Read Operation Example[3]
Write Operation Example
+------------------+-----------------------+---------------------+
|Initiator Function|
PDU Type
| Target Function
|
+------------------+-----------------------+---------------------+
| Command request |SCSI Command (WRITE)>>>| Receive command
|
| (write)
|
| and queue it
|
+------------------+-----------------------+---------------------+
|
|
| Process old commands|
+------------------+-----------------------+---------------------+
|
|
| Ready to process
|
|
|
<<< R2T
| WRITE command
|
+------------------+-----------------------+---------------------+
|
Send Data
|
SCSI Data-Out >>>
|
Receive Data
|
+------------------+-----------------------+---------------------+
|
|
<<< R2T
| Ready for data
|
+------------------+-----------------------+---------------------+
|
|
<<< R2T
| Ready for data
|
+------------------+-----------------------+---------------------+
|
Send Data
|
SCSI Data-Out >>>
|
Receive Data
|
+------------------+-----------------------+---------------------+
|
Send Data
|
SCSI Data-Out >>>
|
Receive Data
|
+------------------+-----------------------+---------------------+
|
|
<<< SCSI Response
|Send Status and Sense|
+------------------+-----------------------+---------------------+
| Command Complete |
|
|
+------------------+-----------------------+---------------------+
To learn more about the SCSI command PDU, the Ready To Transfer (R2T) PDU, SCSI Data-In
PDU and the SCSI Data-Out PDU, please refer to the iSCSI specification [3].
Figure 2.2.4.1 Read Operation Example[3]
2.3 Serial Attached SCSI (SAS)
SAS is the successor of SCSI technology and is becoming wide-spread as performance
requirements and addressability exceeds well beyond what legacy SCSI supports.
In 2004, SAS interfaces were initially introduced at 3Gb/s. Currently, supporting 6Gb/s and
moving to 12Gbps by 2012, SAS interfaces have significantly increased the available bandwidth
17
offered by legacy SCSI storage systems. Though fibre channel is more scalable, it is a costly
solution for use in a SAN. Table 2.3.0 compares SCSI, SAS and Fibre Channel technologies.
SCSI
Parallel Bus
3.2 Gbps
SAS
Full Duplex
3 Gbps, 6Gbps
Moving to 12 Gbps
Distance
Devices
Number of
Targets
1 to 12 meters
SCSI only
14 devices
Connectivity
Drive Form
Factor
Cost
Single-port
3.5”
8 meters
SAS & SATA
128 expanders
by 1 expander.
>16,000 with
cascaded
expanders
Dual-port
2.5”
Topology
Speed
Fibre Channel
Full Duplex
2 Gbps
4 Gbps
Moving to 8 Gbps
10 km
Fibre Channel only
127 devices in a
loop. Switched
fabric can go to
millions of devices
Dual-port
3.5”
Low
Medium
High
Table 2.3.0 – Comparing SCSI, SAS and Fibre Channel
An initiator, also called a Host Bus Adapter or Controller, is used to send commands to SAS
targets. SAS controller devices have a limited number of ports. A narrow Port in SAS consisting
of a single port is referred to as a PHY [1].
Expander devices in a SAS domain facilitate communication between multiple SAS devices.
Expanders have a typical port count of 12 to 36 ports while SAS controllers have a typical port
count of 4-16 ports. Expanders can be cascaded as well to increase scalability. One of the most
significant SAS feature is the transition from 3.5” drives to 2.5” drives. This helps reduce floor
space and power consumption [2]. Another advantage of using SAS targets is that a SAS hard
drive is dual-ported providing a redundant path to each hard drive in case of an
Initiator/Controller fail-over. Also, unlike SCSI, SAS employs a serial means of data transfer like
fibre channel [25]. Serial interfaces are known to reduce crosstalk and related signal integrity
issues.
Figure 2.3.0 shows an example of a typical SAS topology. SAS commands originate from the HBA
driver and are eventually sent to the HBA. The SAS controller/HBA sends commands to the disk
drives through the expander for expander attached targets/drives. The target replies to the
command through the expander. The expander simply acts like a switch and routes the
commands to the appropriate target and routes the responses from a particular target to the
Controller.
18
Figure 2.3.0 – A typical SAS topology
2.3.1 Protocols used in SAS
The three protocols used in SAS are Serial Management protocol, Serial SCSI Protocol and SATA
Tunnel Protocol. Serial Management Protocol (SMP) [1] is used to discover the SAS topology
and to perform system management. The Serial SCSI Protocol (SSP) [1] is used to send SCSI
commands and receive responses from SAS targets. SATA Tunnel Protocol (STP) [1] is used to
communicate with SATA targets in a SAS topology.
2.3.2 Layers of the SAS Standard
Below is the organization and layers of the SAS standard:
19
Figure 2.3.2.0 – Layers of the SAS Standard
As can be seen from the above Figure 2.3.2.0, the SAS Physical layer consists of:
a) Passive interconnect (e.g., connectors and cable assemblies); and
b) Transmitter and receiver device electrical characteristics.
The phy layer state machines interface between the link layer and the physical layer to keep
track of dword synchronization [2]. The link layer defines primitives, address frames, and
connections. Link layer state machines interface to the port layer and the phy layer and
perform the identification and hard reset sequences, connection management, and SSP, STP,
and SMP specific frame transmission and reception [2]. The port layer state machines interface
with one or more SAS link layer state machines and one or more SSP, SMP, and STP transport
layer state machines to establish port connections and disconnections. The port layer state
machines also interpret or pass transmit data, receive data, commands, and confirmations
between the link and transport layers. The transport layer defines frame formats. Transport
layer state machines interface to the application layer and port layer and construct and parse
frame contents [2]. The application layer defines SCSI, ATA, and management specific features
[2].
20
2.3.3 SAS Ports
A port contains one or more phys. Ports in a device are associated with physical phys based on
the identification sequence. A port is a wide port if there is more than one phy in the port. A
port is a narrow port if there is only one phy in the port. In other words, a port contains groups
of phy with the same SAS address, attached to another group of phys with the same SAS
address [2]. Each device in the topology has a unique SAS address. Therefore, for example if a
HBA is connected using PHYs 0,1,2 and 3 to expander A and PHYs 4,5,6 & 7 to expander B, PHYs
0,1,2 & 3 of the HBA are a single wide-port and PHYs 4,5 6 & 7 are part of another wide-port.
Figure 2.3.3.0 – Wide Ports in SAS
2.3.4 Primitives
Primitives are DWords mainly used to manage flow control. Some of the common primitives
are:
1. ALIGN(s) – Used during speed negotiation of a link, rate matching of connections etc
2. AIP(s) (Arbitration in Progress) - AIP is transmitted by an expander device after a
connection request to specify that the connection request is being processed and
specify the status of the connection request.
3. BREAKS(s) – A phy aborts a connection requests and break a connection by transmitting
the BREAK primitive sequence
4. CLOSE – A close primitive is used to close a connection
21
5. OPEN ACCEPT – Specifies a connection has been accepted
6. OPEN REJECT – These primitives are used to specify that a connection has been rejected
and specifies the reason for the rejection as well.
7. ACK – Specifies the acknowledgement of a SSP frame
8. NAK – Negative acknowledgement of a SSP frame
9. RRDY – Advertise SSP frame credit
10. BROADCAST(s) – Used to notify SAS ports of events such as change in topology etc [1]
To learn more about the other primitives and the primitives mentioned above, please refer
to the SAS Specification.
2.3.5 SSP frame format
In this project, we primarily work with SSP Read/Write commands. A typical SSP frame format is
below:
22
Figure 2.3.5.0 – SSP Frame Format
The Information Unit is a DATA frame, XFER_RDY frame, COMMAND frame, RESPONSE frame or
TASK frame. For SSP requests of interest for this project, the information unit is either a
COMMAND frame, XFER_RDY frame, DATA frame or a RESPONSE frame [2].
Command frame:
23
The COMMAND frame is sent by an SSP initiator port to request that a command be processed.
The command frame consists of the logical unit number the command is intended for as well as
the SCSI CDB that contains the type of command, transfer length etc [2].
Figure 2.3.5.1 – SSP Command Frame
XFER_RDY frame:
The XFER_RDY frame is sent by an SSP target port to request write data from the SSP initiator
port during a write command or a bidirectional command [2].
Figure 2.3.5.2 – SSP XFER_RDY Frame
24
The REQUESTED OFFSET field contains the application client buffer offset of the segment of
write data in the data-out buffer that the SSP initiator port may transmit to the logical unit
using write DATA frames [2].
The WRITE DATA LENGTH field contains the number of bytes of write data the SSP initiator port
may transmit to the logical unit using write DATA frames from the application client data-out
buffer starting at the requested offset [2].
DATA frame:
Figure 2.3.5.3 – SSP Data Frame
A typical DATA frame in SAS is limited to 1024 bytes (1K) [2].
Response Frame:
The response frame is sent by an SSP target port n response to a SSP command by an initiator
[2].
25
Figure 2.3.5.4 – SSP Response Frame
A successful write/read completion will not contain any sense data. In this project, we work
with successful read/write completions and therefore sense data won’t be returned by the
target.
2.3.6 READ/WRITE command sequence
SSP Read Sequence [20]
26
Figure 2.3.6.0 – SSP Read Sequence
SSP Write Sequence[20]
Figure 2.3.6.1 – SSP Write Sequence
27
3.0. tSAS (Ethernet SAS)
3.1
Goal, Motivation and Challenges of the Project
The goal of this project is to investigate sending a set of SAS commands, data and responses
over TCP/IP and to investigate how tSAS can performs against legacy iSCSI and legacy SAS as
best as possible. Since Ethernet contains its own physical layer, SAS over TCP (tSAS) eliminates
the need for the SAS physical layer overcoming the distance limitations of SAS. This overcomes
the distance limitation of the Serial Attached Small Computer System Interface (SAS) physical
layer interface so that SAS storage protocol may be used for communication between host
systems and storage controllers in the Storage Area Network (SAN) [21]. SANs allow sharing of
data storage over long distances and still permit centralized control and management [16].
More particularly, the SAN embodiments can comprise at least one host computer system and
at least one storage controller that are physically separated by greater than around 8 meters
which is the physical limitation of a SAS cable. An Ethernet fabric can connect the host
computer system(s) and storage controller(s)[21]. The SAS storage protocol over TCP can also
be used to communicate between storage controllers/hosts and SAS expanders as explained
later in this section.
Using gigabit Ethernet (10G/40G/100G) [32], tSAS also overcomes the 6G and 12G limitations of
SAS2 (6G) and SAS3 (12G) respectively. As mentioned earlier in this paper, the main challenge
of developing an tSAS client/server application is that there is no standard specification for
tSAS. We will leverage Michael Ko’s patent [21] on SAS over Ethernet[27] to help us through
the process of defining our tSAS protocol required for this project.
Similar to iSCSI, TCP was chosen as the transport for tSAS. TCP has many features that are
utilized by iSCSI. The exact same features and reasoning is behind the choice of using TCP for
tSAS as well.
• TCP provides reliable in-order delivery of data.
• TCP provides automatic retransmission of data that was not acknowledged.
• TCP provides the necessary flow control and congestion control to avoid overloading a
congested network.
• TCP works over a wide variety of physical media and inter-connect topologies. [23]
3.2
Project Implementation
28
3.2.0 tSAS Topology and Command flow sequence
The Figure 3.2.0.0 below shows a typical usage of tSAS to expand scalability, speed and distance
of legacy SAS by using a tSAS HBA. In Figure 3.2.0.0, tSAS is the protocol of communication used
between a remote tSAS HBA and a tSAS controller. The tSAS controller is connected to the
back-end expander and drives using legacy SAS cables.
Figure 3.2.0.0 – Simple tSAS Topology
All SSP frames will be encapsulated in an Ethernet frame. Figure 3.2.0.1 shows how an Ethernet
frame with the SSPframe data encapsulated in it looks. The tSAS Header is the SSP Frame
Header and the tSAS Data is the SSP Information Unit (Refer to Figure 2.3.4.0 for the SSP frame
format).
Figure 3.2.0.1 – tSAS header and data embedded in an Ethernet frame
29
The back-end of tSAS is a tSAS HBA that can receive tSAS commands, strip off the TCP header
and pass on the SAS command to the expander and drives. The back-end of tSAS will talk inband to the SAS expanders and drives. The remote tSAS Initiator communicates with the tSAS
target by sending tSAS commands. Figure 3.2.0.2 shows the typical SSP request/response read
data flow. The tSAS SSP Request is initially sent by the tSAS Initiator to the tSAS Target over
TCP. The tSAS Target strips off the TCP header and sends the SSP request using the SAS Initiator
block on the tSAS Target to the SAS expander. The SAS expander sends the data frames and the
SSP Response to the tSAS Target. Finally, the tSAS Target embeds the SSP data frames and
response frame over TCP and sends the frames to the tSAS Initiator. A write (Figure 3.2.0.3) will
look the same with the tSSP Request sent by the initiator followed by the Xfer_rdy (similar to
the R2T in iSCSI) sent by the target followed by the DATA sent by the initiator and finally the
tSAS response from the target.
Figure 3.2.0.2 – tSAS Read SSP Request & Response Sequence Diagram.
This figure doesn’t show all the SAS primitives exchanged on the SAS wire within a
connection after the Open Accept
30
Figure 3.2.0.3 – tSAS WRITE SSP Request & Response Sequence Diagram.
This figure doesn’t show all the SAS primitives exchanged on the SAS wire within a
connection after the Open Accept
SAS over Ethernet can also be used for a SAS controller to communicate with a SAS expander. In
SAS1, expanders did not have support to receive SAS commands out-of-band. SAS1
controllers/HBAs would need to send commands to an expander in-band even for expander
diagnosis and management. SAS HBAs/controllers have a lot more complex functionally than
expanders. Diagnosing issues by sending commands in-band to expanders made it harder and
time-consuming to root cause where the problem is in the SAS topology. Also, managing
expanders via in-band lacked the advantage of remotely managing expanders via out-of-band
over Ethernet. With the gaining popularity of zoning, expander vendors have implemented
support for limited SMP zoning commands out-of-band via Ethernet in SAS2 [1]. A client
management application is used to send a limited set of SMP commands out-of-band to the
expander. The expander processes the commands and sends the SMP responses out-of-band to
the management application. Figure 3.2.0.4 shows the communication between the client
management application and the expander during a SMP command.
31
Figure 3.2.0.4 – SMP Request & Response Sequence Diagram.
This figure doesn’t show the SAS primitives exchanged on the SAS wire within a connection
after the Open Accept
This already existing functionality on a SAS expander can be leveraged to design the tSAS
functionality on an Expander to communicate via TCP with a SAS controller/HBA. Figure 3.2.0.5
shows a topology where the tSAS protocol is used for communication between the tSAS
Controller and the back-end expander as well. Michael Ko’s patent doesn’t cover using tSAS to
talk with expanders. However, expanders can also be designed to send
commands/data/responses via TCP.
32
Figure 3.2.0.5 -Topology where tSAS is used to communicate with an expander
3.2.1 Software and Hardware solutions for tSAS implementations
Similar to iSCSI, tSAS can be implemented in both hardware and software. This is one of the
benefits of iSCSI and tSAS since each organization can customize their SAN configuration based
on budget and the performance needed [23][24].
Software based tSAS solution:
This solution is cheaper than a hardware based tSAS solution since you do not need extra
hardware to implement this tSAS solution. In this solution, all tSAS processing is done by the
processor and TCP/IP operations are also executed by the CPU. The NIC is merely an interface
to the network, this implementation requires a great deal of CPU cycles hurting the overall
performance of the system [23][24].
TCP/IP Offload engine tSAS solution:
As network infrastructures have reached Gigabit speeds, network resources are becoming more
abundant and the bottleneck is moving from the network to the processor. Since TCP/IP
processing requires a large portion of CPU cycles, a software tSAS implementation may be used
along with specialized network interface cards with TCP offload engines (TOEs) on board. NICs
with integrated TOEs have hardware built into the card that allows the TCP/IP processing to be
done at the interface. This prevents the TCP/IP processing from making it to the CPU freeing the
system processor to spend its resources on other applications [23][24].
33
Figure 3.2.1.0 – TCP/IP Offload Engine [23] [24]
Hardware Based tSAS solution:
In a hardware-based tSAS environment, the initiator and target machines contain a host bus
adapter (HBA) that is responsible for both TCP/IP and tSAS processing. This will free the CPU
from both TCP/IP and tSAS functions. This dramatically increases performance in those settings
where the CPU may be burdened with other tasks [23][24].
34
Software tSAS
Software tSAS
with TCP Offload
Hardware tSAS
with TCP Offload
Figure 3.2.1.2 – tSAS implementations [25]
.
3.2.2 Primitives
In conventional SAS storage protocol, the SAS link layer uses a construct known as primitives.
Primitives are special 8b/10b encoded characters that are used as frame delimiters, for out of
band signaling, control sequencing, etc. Primitives were explained in section 2.3.3. These SAS
primitives are defined to work in conjunction with the SAS physical layer [21].As far as
primitives go, all ALIGN(s), OPEN REJECT(s), OPEN(s), CLOSE, DONE, BREAK, HARD RESET, NAKs,
RRDY etc can simply be ignored on the tSAS protocol side since these are link layer primitive
required only on the SAS side. For example, if an IO on the SAS side is timed out or fails due to
NAKs or BREAKs, OPEN timeouts or OPEN REJECTs, the IO will simply timeout on the tSAS side
to the tSAS Initiator. The primitives of interest include BROADCAST primitives, especially
BROADCAST CHANGE primitive, as this primitive tells an initiator that the topology has changed
and to re-discover the topology using SMP commands. However, since, as discussed above, the
SAS physical layer is unnecessary, an alternate means of conveying the SAS primitives is
35
needed. In one embodiment, this can be accomplished by defining a SAS primitive to be
encapsulated in an Ethernet frame [21].
The SAS Trace below in Figure 3.2.2.0 shows the primitives on a READ command exchanged on
the wire between the initiator and target. The lower panel shows the primitives such as RRDys,
ACKs, DONEs, CLOSEs etc exchanged during a READ command sequence. These primitives are
not required in the tSAS protocol. Please refer to Appendix Section 8.0 for information on SAS
trace capturing.
36
Figure 3.2.2.0 – Primitives on the SAS side [25]
37
3.2.3 Discovery
Discovery in tSAS will be similar to SAS and will be accomplished by sending Serial management
protocol (SMP) commands over TCP to the initiators and expanders downstream to learn the
topology.
The SMP Request frame will be embedded in an Ethernet frame and sent to the
expander/initiator. The expander/initiator will reply to the SMP Request by sending a SMP
Response frame embedded in an Ethernet frame. Figure 3.2.0.4 in section 3.2.0 shows how the
SMP commands are communicated in tSAS. For more information on SMP commands and
Discovery, please refer to the SAS Specification [1].
For example, on a Discover List command that is used to return information on an attached
device/PHY, the SMP Discover List Command is sent by the initiator and the SMP Discover List is
sent via TCP as the response.
38
Figure 3.2.3.0 – SMP Discover List Request Frame
39
Figure 3.2.3.1 – SMP Discover List Response Frame
Since Ethernet frames are assembled such that they include a cyclic redundancy check (CRC) for
providing data protection, a SMP frame that is encapsulated in an Ethernet frame can rely on
the data protection afforded by this same cyclic redundancy check [21]. In other words, the SAS
side CRC on the request and response SMP frame need not be transmitted.
40
Please refer to the SAS Specification [1] for information on these SMP commands.
3.2.4 Task Management
Similar to SAS and iSCSI, a Task Frame will be sent by the initiator to another initiator/expander
in the topology. A Task Management may be sent to manage a target. For instance, when IOs to
a target fail, the host may request a Task Management Target Reset command to reset the
target in the hope that the target is reset and cooperates after being reset. A host may request
a Task Management LUN reset to reset an entire LUN and have all IOs to that LUN be failed.
To learn more about the various Task Commands, please refer to the SAS [1] and iSCSI
specifications [3].
3.2.5 tSAS mock application to compare with an iSCSI mock
application
For the purpose of investigating iSCSI vs tSAS, a client application and a server application that
communicate using iSCSI and tSAS are written. A tSAS client application will send read/write
tSAS commands to the tSAS server application which will process and send responses to the
client. Similarly, an iSCSI client application will send certain read/write iSCSI commands to the
iSCSI server application which will process and send responses to the client. Commands are
sent single threaded such that the queue depth (number of outstanding commands) is one. The
algorithm used for the tSAS application and the iSCSI application is similar helping us investigate
the two protocols.
Initially, the tSAS application is written such that each REQUEST, RESPONSE and DATA frame is
encapsulated into an independent Ethernet frame. Revisiting the SSP Format in Figure 2.3.40,
the entire SSP Frame excluding the CRC is encapsulated into an Ethernet frame. Since Ethernet
frames are assembled such that they include a cyclic redundancy check (CRC) for providing data
protection, a SAS/SSP frame that is encapsulated in an Ethernet frame can rely on the data
protection afforded by this same cyclic redundancy check [21].
In SAS, each data frame is 1K in length. In the initial design, each Ethernet frame that carries the
data frame only carried 1K of DATA..
This causes the time to complete the IO to be significantly high compared to not limiting the
amount of data to be sent in each frame to 1K. The performance was hence slow. The
application was then revised to send more than 1K of data in each frame by maxing out the
Data that can be stuffed into each Ethernet frame. This means that each Ethernet frame can
contain more than just 1K of data. Below are the results from this implementation.
41
Below are the numbers of running the tSAS application when each REQUEST, RESPONSE and
DATA frame is individually encapsulated into an Ethernet frame and sent across. The test bench
used for this experiment is a Windows Server 2008 machine with the client and server
application running such that the client sends requests and the server replies to requests. A
Netgear Prosafe 5 port Gigabit switch model GS 105 is used in between such that the client and
server auto negotiate to 1 Gbps.
1 Gbps:
READ
Transfer Length (KB)
IOPS
I/Os per second
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
Average Time from Read Command
to Completion (milliseconds) in tSAS
where each DATA frame is
encapsulated
in an Ethernet frame
0.249 ms
0.206 ms
0.216 ms
0.368 ms
0.495 ms
0.28 ms
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
1.616 ms
2.711 ms
4.913 ms
5.954 ms
7.681 ms
16.111 ms
618.811
368.867
203.541
167.954
130.191
62.069
4016.064
4854.368
4629.62
2717.391
2020.20
3571.428
Table 3.2.5.0 - Average Time from Read Command to Completion (milliseconds)
in tSAS where each DATA frame is encapsulated in an Ethernet frame
Below are the numbers of running the tSAS application when each REQUEST, RESPONSE and
DATA frame is encapsulated into an Ethernet frame and sent across. However, in this
implementation, the DATA is not limited to 1K in each Ethernet frame. DATA frames are
combined to use each Ethernet frame to maximum capacity.
Transfer Length (KB)
1 KB
2 KB
4 KB
Average Time from Read Command
to Completion (milliseconds) in tSAS
where each Ethernet frame containing
SSP Data is used efficiently
0.199 ms
0.114 ms
0.280 ms
IOPS
I/Os per second
5025.125
8771.929
3571.428
42
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
0.258 ms
0.174 ms
0.455 ms
0.828 ms
1.418 ms
2.714 ms
3.000 ms
3.756 ms
7.854 ms
3875.968
5747.124
2197.802
1207.729
705.218
368.459
333.333
266.240
127.323
Table 3.2.5.1 - Average Time from Read Command to Completion (milliseconds)
in tSAS where each Ethernet frame containing SSP Data is used efficiently
Below are the numbers of running the iSCSI application when each REQUEST, RESPONSE and
DATA frame is encapsulated into an Ethernet frame and sent across. In this implementation,
the DATA is not limited to 1K in each Ethernet frame. The iSCSI implementation itself doesn’t
pack each SCSI DATA frame into a separate Ethernet frame. It allows DATA frames to be
combined such that more than just a single DATA frame is sent. Therefore, in our
implementation as well DATA frames are combined to use each Ethernet frame to maximum
capacity.
Transfer Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Read Command
to Completion (milliseconds) in iSCSI
where each Ethernet frame containing
SCSI Data is Maxed out
0.189 ms
0.261 ms
0.205 ms
0.501 ms
0.327 ms
0.454 ms
0.898 ms
1.421 ms
3.311 ms
3.138 ms
4.955 ms
8.942 ms
IOPS
I/Os per second
5291.005
3831.417
4878.048
2996.007
3058.104
2202.643
1113.585
703.729
302.023
318.674
201.816
111.831
Table 3.2.5.2 - Average Time from Read Command to Completion (milliseconds)
in iSCSI where each Ethernet frame containing SCSI Data is Maxed out
As can be seen from Tables 3.2.5.0, 3.2.5.1 and 3.2.5.2:
43
1. A tSAS implementation where each DATA frame is encapsulated in a separate Ethernet
frame is not an efficient implementation
2. A tSAS implementation where more than just 1 K of DATA (a single DATA frame) is
encapsulated in an Ethernet frame is more efficient. This is comparable to the iSCSI
implementation in the market as well and the iSCSI client/server app written. Therefore,
for the rest of this project we will go with this tSAS implementation.
3.3 Performance evaluation
3.3.0 Measuring SAS performance using IOMeter in Windows and
VDbench in Linux
3.3.0.1
SAS Performance using IOMeter
Iometer is an I/O subsystem measurement and characterization tool that can be used in both
single and clustered systems [32]. Iometer is both a workload generator as it performs I/O
operations in order to stress the system being tested, and a measurement tool as it examines
and records the performance of its I/O operations and their impact on the system under test. It
can either be configured to emulate a disk target or network I/O load of any program. It can
also be used to generate entirely synthetic I/O loads. It can also generate and measure loads on
single or multiple networked systems [32].
Iometer can be used to measure and characterize the:
 Performance of network controllers.
 Performance of disk controllers.
 Bandwidth and latency capabilities of various buses.
 Network throughput to attached drive targets.
 Shared bus performance.
 System-level performance of a hard drive .
 System-level performance of a network [32].
Iometer consists of two programs, namely, Iometer and Dynamo. Iometer is the name of the
controlling program. Using the graphical user interface, a user can configure the workload, set
the operating parameters, and start and stop tests. Iometer tells Dynamo what to do, collects
the resulting data, and summarizes the results into output files. Only one copy of IOMeter
should be running at a time. It is typically run on the server machine. Dynamo is the IO
workload generator. It doesn’t come with a user interface. At the Iometer’s command, Dynamo
performs I/O operations, records the performance information and finally returns the data to
IOMeter [32].
In this project, IOMeter is used to measure performance of a SAS topology/drive.
44
The test bench used to measure SAS performance via the IOMeter is:
1.
2.
3.
4.
5.
6.
7.
The Operating System used is Windows Server 2008.
The server used was a Super Micro server
A SAS 6 Gbps HBA in a PCIe slot
The HBA attached to the 6 Gbps SAS Expander
The 6G SAS expander attached downstream to a 6G SAS drive.
A LeCroy SAS Analyzer placed between the target and expander
IOMeter was set to have a maximum number of outstanding IOs of 1. In other words,
the queue depth is set to 1. This makes IOs single-threaded. This option was used since
the mock server and client iSCSI and tSAS applications also have a queue depth of 1.
8. For the maximum I/O rate (I/O operations per second), the Percent Read/Write
Distribution was set to 100% Read while testing the read performance and was set to
100% write while testing the write performance. The Percent Random/Sequential
Distribution was set to 100% Sequential while testing the read and write performance.
9. For measurements taken without an expander, the SAS drive was directly attached to
the SAS analyzer and the SAS analyzer was attached to the HBA.
A SAS Protocol Analyzer can be used to capture SSP/STP/SATA traffic between various
components in a SAS topology. For example, a SAS Protocol Analyzer can be placed between an
Initiator and an Expander to capture the IO traffic between the Initiator and the Expander.
Similarly, a SAS protocol analyzer may be placed between drives and an Expander helping the
user to capture IO traffic between the drives and an Expander. A capture using the SAS Protocol
Analyzer is commonly known as a SAS trace.
45
Figure 3.3.0.1.0 – SAS Trace using Le Croy SAS Protocol Analyzer
Timings on READ and WRITE commands with transfer sizes of 1K, 2K, 4K, 8K, 16K, 32K, 64K,
128K, 256K, 512K, 1024K and 2048K are captured. The first two tables in Table 3.3.0.1.0 and
Table 3.3.0.1.1 capture performance of READs. Figure 3.3.0.1.0 shows READ performance when
a SAS drive is direct-attached to the HBA. Figure 3.3.0.1.1 shows READ performance when a SAS
drive is connected to a HBA via an expander.
Performance of READ10 command using Direct-Attached drive:
Transfer
Length
(KB)
1 KB
2 KB
4 KB
Average Time from
Read Command
to Completion
(milliseconds) using
IOMeter – DirectAttached
0.0644 ms
0.0768 ms
0.0800 ms
Average time
from READ
command
completion on
the SAS trace
from the drive
0.0365 ms
0.0389 ms
0.0563 ms
Average IOPS
I/Os per Second
15527.950
13020.833
12500
46
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
0.0916 ms
0.112 ms
0.219 ms
0.438 ms
0.861 ms
1.706 ms
3.409 ms
6.896 ms
30.972 ms
0.0508 ms
0.0675 ms
0.180 ms
0.376 ms
0.788 ms
1.579 ms
3.264 ms
6.693 ms
21.653 ms
10917.030
8928.571
4566.21
2283.105
1161.440
586.166
293.341
145.011
46.182
Table 3.3.0.1.0 – Direct-Attached SSP READ performance
In Table 3.3.0.1.0, the average time for READ command to complete using IOMeter is the value
calculated by IOMeter. The average time for READ command to complete using the SAS
analyzer is the time it takes for the drive to respond to the command once the HBA sends the
command. As can be seen, the drive is the bottle neck in this topology. The I/Os per second is
not always a direct multiple of the average time for the IO completion due to delays at the HBA,
hardware etc. However it is close enough and in this project we assume that the IOPS is 1000
ms/ (Average time in millisecond for 1 IO to complete).
Transfer
Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time
from Read
Command to
Completion
(milliseconds)
using
IOMeter Expander
Attached
0.0649 ms
0.0709 ms
0.0810 ms
0.0840 ms
0.113 ms
0.225 ms
0.416 ms
0.872 ms
1.716 ms
3.418 ms
7.022 ms
31.344 ms
Average time from
READ command
completion on the
SAS trace from the drive
Average IOPS
I/Os per second
Average time for
READ Completion
without including
delay from the
drive
0.0365 ms
0.0389 ms
0.0563 ms
0.0508 ms
0.0675 ms
0.180 ms
0.376 ms
0.788 ms
1.579 ms
3.264 ms
6.693 ms
21.653 ms
15408.320
14104.372
12345.679
11904.761
8849.557
4444.444
2403.846
1146.788
582.750
292.568
142.409
31.904
0.0284 ms
0.032 ms
0.0247 ms
0.0332 ms
0.0455 ms
0.045 ms
0.04 ms
0.084 ms
0.137 ms
0.154 ms
0.329 ms
9.691 ms
Table 3.3.0.1.1 – Expander-Attached READ performance
47
As can be seen from the above tables, the performance numbers on READ commands of
various transfer lengths when the SAS target is directly connected to the HBA or is behind an
expander are very similar. In other words, the timing on the wire between the HBA and the
expander is less than 1 millisecond for transfer sizes between 1K and 2048K. The HBA and the
expander are generally designed such that h/w takes most of the heavy weight lighting when it
comes to IO path/transfers.
Performance of WRITE10 command:
Timings on WRITE commands of sizes 1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1024K
and 2048K are captured below. Table 3.3.0.1.2 shows READ performance when a SAS drive is
direct-attached to the HBA. Table 3.3.0.1.3 shows READ performance when a SAS drive is
connected to a HBA via an expander.
Transfer Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write Command to
Completion (milliseconds) using IOMeter – Direct Attached
6.014 ms
6.020 ms
6.030 ms
6.059 ms
6.111 ms
6.216 ms
6.424 ms
6.836 ms
7.672 ms
9.338 ms
12.824 ms
37.346 ms
Table 3.3.0.1.2– Direct-Attached WRITE performance
Transfer
Length
(KB)
1 KB
2 KB
4 KB
8 KB
Average Time
from Write
Command
to Completion
(milliseconds)
using IOMeter –
Expander Attached
6.012 ms
6.020 ms
6.032 ms
6.059 ms
Average time from
WRITE command
completion on the
SAS trace from
the drive
IOPS
I/Os per second
Average time for
WRITE Completion
without including
delay from the drive
5.957 ms
5.964 ms
5.990 ms
6.011 ms
166.334
166.112
165.782
165.047
0.055 ms
0.056 ms
0.042 ms
0.048 ms
48
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
6.110 ms
6.215 ms
6.424 ms
6.839 ms
7.672 ms
9.337 ms
12.665 ms
37.345 ms
6.054 ms
6.157 ms
6.378 ms
6.782 ms
7.573 ms
9.204 ms
12.466 ms
27.751 ms
163.666
160.901
155.666
146.220
130.344
107.101
78.957
26.777
0.056 ms
0.058 ms
0.046 ms
0.057 ms
0.099 ms
0.133 ms
0.199 ms
9.74 ms
Table 3.3.0.1.3 – Expander-Attached WRITE performance
3.3.0.2
SAS Performance using VDBench in Linux
Vdbench is a disk and tape I/O workload generator that is used for testing and benchmarking of
existing and future storage products. Vdbench generates a wide variety of controlled storage
I/O workloads by allowing the user to set workload parameters such as allowing control over
workload parameters such as I/O rate, transfer sizes, read and write percentages, and random
or sequential workloads etc [37].
The test bench used to measure SAS performance via the VDBench is:
1.
2.
3.
4.
5.
6.
7.
The Operating System used is Red Hat Enterprise Linux 5.4.
The server used was a Super Micro server
A SAS 6 Gbps HBA in a PCIe slot
The HBA attached to the 6 Gbps SAS Expander
The 6G SAS expander attached downstream to a 6G SAS drive.
A LeCroy SAS Analyzer placed between the target and expander
VDBench was set to have a maximum number of outstanding IOs of 1. In other words,
the queue depth is set to 1. This makes IOs single-threaded. This option was used since
the mock server and client iSCSI and tSAS applications also have a queue depth of 1.
8. For the maximum I/O rate (I/O operations per second), the Percent Read/Write
distribution was set to 100% Read while testing the read performance and was set to
100% write while testing the write performance. The Percent Random/Sequential
Distribution was set to 100% Sequential while testing the read and write performance.
Please refer to Appendix in Section 8 to learn more about VDBench and the scripts used.
Performance of READ10 command using VDBench
Transfer
Length
(KB)
Average Time from
Read Command
to Completion
Average time from
READ command
completion on the
Average IOPS
I/Os per Second
Average time for READ
Completion without
including delay from
49
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
(milliseconds) using
VDBench
0.0780 ms
0.0850 ms
0.100 ms
0.132 ms
0.264 ms
0.528 ms
1.058 ms
2.117 ms
4.235 ms
8. 572 ms
18.317 ms
34.323 ms
SAS trace from the
drive
0.034 ms
0.039 ms
0.056 ms
0.085 ms
0.215 ms
0.476 ms
0.998 ms
2.051 ms
4.162 ms
8.367 ms
12.576 ms
20.983 ms
the drive
12820.51
11764.705
10000
7575.75
3787.87
1893.94
945.18
472.366
236.127
116.658
54.594
29.135
0.044 ms
0.046 ms
0.044 ms
0.047 ms
0.049 ms
0.052 ms
0.060 ms
0.066 ms
0.073 ms
0.205 ms
5.741 ms
13.34 ms
Table 3.3.0.2.0 – SSP READ performance using VDBench
Performance of WRITE10 command using VDBench
Transfer
Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from
Write Command
to Completion
(milliseconds) using
VDBench
6.006 ms
6.025 ms
6.058 ms
6.121 ms
6.252 ms
6.518 ms
4.086 ms
5.142 ms
7.290 ms
14.511 ms
23.083 ms
40.105 ms
Average time from
Write command
completion on the
SAS trace from the
drive
5.965 ms
5.978 ms
6.012 ms
6.068 ms
6.203 ms
5.638 ms
4.023 ms
5.078 ms
7.199 ms
11.895 ms
19.738 ms
35.078 ms
Average IOPS
I/Os per Second
166.500
165.975
165.070
163.371
159.95
153.421
244.738
194.476
137.174
68.913
43.321
24.935
Average time for
WRITE Completion
without including delay
from the drive
0.041 ms
0.047 ms
0.046 ms
0.053 ms
0.049 ms
1.15 ms
0.063 ms
0.064 ms
0.091 ms
2.616 ms
3.345 ms
5.027 ms
Table 3.3.0.2.1 – SSP WRITE performance using VDBench
The following conclusions can be drawn from the tests above using IOMeter and VDBench:
1. Looking at the performance numbers above, one notices that the performance drops
drastically for a 2048K Read/Write as compared to a 1024K Read/Write. After analyzing
the SAS traces that were collected for the transfer sizes of 1K, 2K, 4K, 8K, 16K, 32K, 64K,
50
128K, 256K, 512K, 1024K and 2048K on Reads and Writes, and analyzing the SAS traces,
one finds that up until 1024K transfer size requests, the HBA sends a command to the
target requesting for all the data in a single IO. However, on the 2048K transfer sizes
and higher, the HBA sends commands of varying transfer sizes to the target. In other
words, a single IO doesn’t fetch the entire 2048K of data on a read and a single IO is not
used to write 2048K of data to a drive. Multiple smaller transfer sizes IOs are used to
read or write 2048K of data from or to the disk respectively causing the performance to
suddenly drop. This most likely an optimization or limitation of the driver.
2. One also notices that performance on READs is better than the performance of WRITEs.
This is obvious as the frame sizes on READs is lesser and the number of frame being
transmitted and the number of handshakes that occur on a READ command is lesser
than what occurs on a WRITE command. Also, it takes more time for a drive to write
DATA than to read from Disk. The IOmeter user guide as well states “For the maximum
I/O rate (I/O operations per second), try changing the Transfer Request Size to 512
bytes, the Percent Read/Write Distribution to 100% Read, and the Percent
Random/Sequential Distribution to 100% Sequential.”
3. At smaller transfer sizes, the performance difference between each transfer size is not
so apparent. However, at larger transfer sizes (above 256K etc), the performance and
time for IOs to complete is more visibly lower and higher respectively.
4. The results obtained via VDBench are slightly poorer than the results obtained via
IOMeter. A different SAS drive was used for both tests and therefore the SAS drive
performance used during VDBench is poorer than the SAS drive performance used using
IOMeter testing. Also, timings can vary as the OS and driver are different for Windows
and Red hat Linux.
Note: The SAS Analyzer Traces, performance results, VDBench scripts etc are located in the
SASAnalyzerTraces folder in the project folder where all the deliverables are located. Refer to
section Appendix 8.
3.3.1 Measuring iSCSI performance using IOMeter in Windows
The following measurements are taken on the following test bench:
1. An iSCSI software Initiator running on a windows system. The Starwind iSCSI Initiator
was used as the iSCSI Initiator. Please refer to Appendix in Section 8 to learn more about
the StarWind iSCSI Initiator.
2. An iSCSI software Target emulated on a windows system. The KernSafe iSCSI Target was
used to create an iSCSI target and talk to it. Please refer to Appendix in Section 8 to
learn more about the KernSafe iSCSI Target.
51
3. The iSCSI target was created using a SCSI USB flash drive
4. The iSCSI Initiator and iSCSI Target system are connected to each other via a NetGear
Pro Safe Gigabit Switch at a connection rate of 1Gbps.
5. READs/WRITEs of transfer lengths/sizes 1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K,
1024K, 2048K are issued by the iSCSI Initiator
6. A WireShark analyzer is also running on the Initiator system to view the data passed
between the iSCSI Initiator and iSCSI Target. Please refer to Appendix in Section 8 to
learn more about the WireShark Network Protocol Analyzer.
7. IOMeter is used to view the performance of these transfer sizes
8. The number of outstanding IOs (queue Depth) is set to 1 in the IOMeter.
9. On each READ, the test is set to 100% sequential READS. On each WRITE, the test is set
to 100% sequential WRITEs.
1Gbps
Read
Table shows the iSCSI Read Completion timings.
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Read Command to Completion
(milliseconds) using IOMeter with iSCSI Device
1.208 ms
1.423 ms
2.377 ms
2.252 ms
3.251 ms
4.550 ms
5.683 ms
14.640 ms
28.505 ms
164.172 ms
415.445 ms
913.563 ms
IOPS
I/Os per second
827.810
702.740
845.308
444.049
307.597
219.780
175.963
68.306
35.081
6.091
2.407
1.094
Table 3.3.1.0 – iSCSI Read Completion Timings at 1 Gbps
Write
Table shows the iSCSI Write Completion timings.
Transfer Length
(KB)
Average Time from Read Command to Completion
(milliseconds) using IOMeter with iSCSI Device
MBs/Sec
52
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
1.077 ms
1.890 ms
2.220 ms
2.593 ms
4.867 ms
7.942 ms
13.083 ms
27.028 ms
50.340 ms
225.698 ms
593.711 ms
1059.284 ms
928.505
529.101
450.450
385.653
205.465
125.912
76.435
36.998
19.685
4.430
1.684
0.944
Table 3.3.1.1 – iSCSI Write Completion Timings at 1 Gbps
The above timings include the delay at the USB flash drive. Since USB flash drives are slow, we
then ran IOMeter on the machine connected to the USB to get the read/write timings when IOs
are issued to the SCSI drive directly.
The following measurements are taken on the following test bench:
1. A virtual SCSI target (USB flash drive) was used as the SCSI target.
2. READs/WRITEs of transfer lengths/sizes 1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K,
1024K, 2048K are issued by the iSCSI Initiator
3. IOMeter is used to benchmark the SCSI device
READ
Table shows the SCSI Read Completion timings.
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
Average Time from Read
Command to Completio
(milliseconds) using IOMeter with SCSI Device
0.919 ms
1.073 ms
1.194 ms
1.453 ms
1.984 ms
3.448 ms
4.455 ms
7.044 ms
13.205 ms
25.885 ms
51.234 ms
IOPS
I/Os per second
1088.139
931.966
837.520
688.231
504.032
290.023
224.466
141.964
75.728
38.632
19.518
53
2048 KB
102.571 ms
9.749
Table 3.3.1.2 – SCSI Read Completion Timings
WRITE
Table shows the SCSI Write Completion timings.
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write
Command to Completion
(milliseconds) using IOMeter with SCSI Device
0.679 ms
1.075 ms
1.207 ms
1.775 ms
3.527 ms
5.914 ms
8.821 ms
17.041 ms
32.053 ms
62.021 ms
120.193 ms
244.561 ms
MBs/Sec
1472.754
930.232
828.500
563.380
283.527
169.090
113.365
58.682
31.198
16.123
8.319
4.088
Table 3.3.1.3 – SCSI Write Completion Timings
To get the performance of iSCSI without including the time it takes for IOs to complete from the
SCSI target itself (the bottleneck), the iSCSI performance timings via IOMeter is subtracted by
the SCSI Performance timings via IOMeter. These results then make it more feasible for us to
compare the iSCSI numbers here to the mock client/server iSCSI application written for this
project.
Read
Table shows the iSCSI Read Completion timings without including the time it takes for IOs to
complete from the SCSI target itself.
Transfer Length
(KB)
1 KB
2 KB
Average Time from Read Command to Completion
(milliseconds) without including the time it takes
For IOS to complete from the SCSI target
1.208 – 0.919 = 0.289 ms
0.35 ms
54
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
0.585 ms
0.799 ms
1.267 ms
1.102 ms
1.228 ms
7.596 ms
15.3 ms
138.287 ms
264.211 ms
810.992 ms
Table 3.3.1.4 – iSCSI Read Completion Timings without including delay at the drive
Write
Table shows the iSCSI Write Completion timings without including the time it takes for IOs to
complete from the SCSI target itself.
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Read Command to Completion
(milliseconds) without including the time it takes
For IOS to complete from the SCSI target
0.398 ms
0.815 ms
1.013 ms
0.818 ms
1.34 ms
2.028 ms
4.262 ms
9.987 ms
18.287 ms
163.677 ms
473.518 ms
814.723 ms
Table 3.3.1.5 – iSCSI Write Completion Timings without including delay at the drive
Note: The IOMeter data collected as well as any WireShark Traces are located in the in the
project where all the deliverables are located. Refer to Section Appendix 8.
55
3.3.2 Measuring tSAS performance using the client and server
mock application written and comparing it to the iSCSI
client/server mock application as well as to legacy SAS and
legacy iSCSI
A. The tSAS performance was measured by running the client/server application written.
The test bench used to test the tSAS applications is two Windows 208 Server systems
connected using a netgear switch with a connection rate of 10 Mbps, 100 Mbps and 1
Gbps. One windows machine runs the client application while the other runs the
server application.
10 Mbps:
READ:
Transfer Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Read Command
to Completion (milliseconds) using
mock application in iSCSI
2.786 ms
5.968 ms
7.541 ms
11.002 ms
18.258 ms
175.630 ms
197.788 ms
255.342 ms
601.288 ms
741.555 ms
2228.483 ms (~2.228 sec)
3863.979 ms (~3.863 sec)
IOPS
I/Os per second
358.937
167.560
132.608
90.892
54.770
5.693
5.055
3.916
1.663
1.348
0.448
0.259
Table 3.3.2.0 – READ Command Timings iSCSi Mock app at 10 Mbps
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
Average Time from Read Command
to Completion (milliseconds) using
mock application in tSAS
2.543ms
5.933 ms
6.896 ms
10.902 ms
IOPS
I/Os per second
393.236
168.548
145.011
91.726
56
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
18.152 ms
153.126 ms
192.224 ms
192.103 ms
576.096 ms
996.854 ms
1614.082 ms (~1.614 sec)
3615. 275 ms (~3.615 sec)
55.090
6.530
5.202
5.205
1.736
1.003
0.619
0.276
Table 3.3.2.1 – READ Command Timings tSAS Mock app at 10 Mbps
iSCSI vs tSAS READ Completion Time at 10
Mbps
2500
2000
Time
(Milliseconds)
1500
1000
tSAS READ Completion
time
500
iSCSI READ Completion
Time
0
0
200
400
600
800 1000 1200
Transfer Size (Kilobytes)
Figure 3.3.2.0– iSCSI vs tSAS Read Completion Time at 10 Mbps.
Looking at the chart above, tSAS performs better than iSCSI. One also observes that at small
READ transfers, iSCSI and tSAS have a more similar performance at 10 Mbps. However, at larger
transfer sizes, tSAS performs more visibly better than iSCSI.
WRITE
Transfer Length (KB)
Average Time from Write Command
IOPS
57
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
to Completion (milliseconds)
using mock application in iSCSI
13.968 ms
14.909 ms
16.867 ms
20.078 ms
27.365 ms
505.044 ms
710.429 ms
1572.559 ms (~1.572 sec)
3380.042 ms (~3.380 sec)
6886.112 ms (~6.886 sec)
1431.612 ms (~14.316 sec)
1977.700 ms (~19.777 sec)
I/Os per second
71.592
67.073
59.287
49.805
36.543
1.980
1.407
0.636
0.256
0.145
0.698
5.056
Table 3.3.2.2 – WRITE Command Timings iSCSi Mock app at 10 Mbps
Transfer
Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write Command
to Completion (milliseconds)
using mock application in tSAS
4.892 ms
5.849 ms
8.054 ms
10.979 ms
17.984 ms
233.573 ms
614.819 ms
1584.924 ms (~1.584 sec)
3540.684 (~3.540 sec)
6684.609 (~6.684 sec)
1245.677 ms (~12.456 sec)
1772.838 ms (~17.728 sec)
IOPS
I/Os per second
204.415
170.969
91.089
91.083
55.605
4.281
1.626
0.631
0.282
0.149
0.803
0.564
Table 3.3.2.3 – WRITE Command Timings tSAS Mock app at 10 Mbps
58
tSAS vs iSCSI Write 10Mbps
8000
7000
6000
5000
Time (milliseconds) 4000
tSAS Write 10MBps
iSCSI Write 10MBps
3000
2000
1000
0
0
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Transfer Size (Kilobytes)
Figure 3.3.2.1– iSCSI vs tSAS Write Completion Time at 10 Mbps.
Looking at the chart above, tSAS performs better than iSCSI. One also observes that at small
READ transfers, iSCSI and tSAS have a more similar performance. However, at larger transfer
sizes, tSAS performs more visibly better than iSCSI.
100 Mbps
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
Average Time from Read Command
to Completion (milliseconds) using
mock application in iSCSI
1.996ms
2.692 ms
2.579 ms
3.093 ms
3.802 ms
15.001 ms
IOPS
I/Os per second
501.002
371.471
387.747
323.310
263.092
66.662
59
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
17.193 ms
35.913 ms
82.172 ms
115.905 ms
311.735 ms
577.684 ms
58.163
27.845
12.169
8.627
3.208
1.731
Table 3.3.2.4 – READ Command Timings iSCSI Mock app at 100 Mbps
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Read Command
to Completion (milliseconds) using
mock application in tSAS
1.984 ms
2.595 ms
2.543 ms
2.979 ms
3.968 ms
14.330 ms
17.302 ms
41.583 ms
74.161 ms
106.293 ms
251.228 ms
569.294 ms
IOPS
I/Os per second
504.032
385.256
393.236
225.683
252.016
69.783
57.796
24.048
13.484
9.408
3.980
1.756
Table 3.3.2.5 – READ Command Timings tSAS Mock app at 100 Mbps
60
tSAS vs iSCSI Read 100 Mbps
700
600
500
Time
(Milliseconds)
400
tSAS Read 100 Mbps
300
iSCSI Read 100 Mbps
200
100
0
0
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Transfer Length (Kilobytes)
Figure 3.3.2.2– iSCSI vs tSAS Read Completion Time at 100 Mbps.
Looking at the chart above in Figure 3.3.2.2, tSAS performs better than iSCSI for all transfer
sizes captured.
Transfer Length
(KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write Command
to Completion (milliseconds) using
mock application in iSCSI
14.559 ms
15.203 ms
14.716 ms
15.030 ms
22..011 ms
25.735 ms
55.918 ms
110.481 ms
193.932 ms
272.651 ms
350.924 ms
772.876 ms
IOPS
I/Os per second
68.686
65.776
67.953
66.533
45.431
38.857
17.883
9.051
5.156
3.667
2.849
1.294
Table 3.3.2.6 – WRITE Command Timings iSCSI Mock app at 100 Mbps
61
Transfer
Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write Command
to Completion (milliseconds) using
mock application in tSAS
2.699 ms
2.864 ms
2.647 ms
3.285 ms
3.832 ms
5.480 ms
40.484 ms
66.802 ms
161.243 ms
272.125 ms
394.083 ms
761.236 ms
IOPS
I/Os per second
0.370
349.162
377.786
304.414
260.960
182.481
24.701
14.969
6.201
3.674
2.537
1.313
Table 3.3.2.7 – WRITE Command Timings tSAS Mock app at 100 Mbps
tSAS vs iSCSi Write 100 Mbps
900
800
700
600
500
Time
(Milliseconds) 400
tSAS Write 100 Mbps
300
iSCSI Write 100 Mbps
200
100
0
0
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Transfer Length (KB)
Figure 3.3.2.3– iSCSI vs tSAS Write Completion Time at 100 Mbps.
Looking at the chart above, tSAS performs better than iSCSI overall.
1 Gbps:
62
Transfer
Length (KB)
1KB
2KB
4KB
8KB
16KB
32KB
64KB
128KB
256KB
512KB
1024KB
2048KB
Average Time from Read Command
to Completion (milliseconds)
using mock application in iSCSI
1.999 ms
1.231 ms
1.227 ms
1.436 ms
1.338 ms
1.795 ms
2.401 ms
4.264 ms
7.072 ms
12.395 ms
24.880 ms
44.383 ms
IOPS
I/Os per second
500.250
812.347
814.995
696.378
747.384
557.103
416.493
234.521
141.402
80.677
40.193
22.531
Table 3.3.2.8 – READ Command Timings iSCSI Mock app at 1000 Mbps (1 Gbps)
Transfer
Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Read Command
to Completion (milliseconds) using
mock application in tSAS
1.976 ms
1.507 ms
1.695 ms
1.251 ms
1.247 ms
1.708 ms
2.627 ms
4.467 ms
7.755 ms
13.054 ms
23.683 ms
40.627 ms
IOPS
I/OS per second
506.073
663.570
589.970
799.360
801.924
585.480
380.662
223.863
128.949
76.605
42.224
24.614
Table 3.3.2.9 – READ Command Timings tSAS Mock app at 1000 Mbps (1 Gbps)
63
tSAS vs iSCSI Read 1 Gbps
50
45
40
35
30
Time
25
(Milliseconds)
20
tSAS Read 1 Gbps
iSCSi Read 1 Gbps
15
10
5
0
0
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Transfer Length (KB)
Figure 3.3.2.4– iSCSI vs tSAS Read Completion Time at 1000 Mbps.
Looking at the chart above, tSAS performs better than iSCSI. At smaller transfer sizes tSAS and
iSCSI perform more similar while at larger transfer sizes tSAS performs visibly faster than iSCSI.
Transfer
Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write Command
to Completion (milliseconds) using
mock application in iSCSI
1.469 ms
1.503 ms
1.462 ms
11.528 ms
12.212 ms
13.088 ms
15.727 ms
16.928 ms
18.630 ms
27.883 ms
48.535 ms
75.057 ms
IOPS
I/Os per second
680.735
665.335
683.994
86.745
81.886
76.406
63.584
63.584
53.676
35.864
20.603
13.323
Table 3.3.2.10 – WRITE Command Timings iSCSI Mock app at 1000 Mbps (1 Gbps)
64
Transfer
Length (KB)
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
Average Time from Write Command
to Completion (milliseconds) using
mock application in tSAS
1.347 ms
1.406 ms
1.497 ms
1.772 ms
1.523 ms
2.418 ms
2.654 ms
3.754 ms
6.882 ms
14.168 ms
27.538 ms
46.258 ms
IOPs
I/Os per second
742.390
711.237
668.002
564.334
656.598
413.564
376.789
266.382
145.306
70.581
36.313
21.617
Table 3.3.2.11 – WRITE Command Timings tSAS Mock app at 1000 Mbps (1 Gbps)
tSAS vs iSCSI Write 1 Gbps
80
70
60
50
Time
40
(Milliseconds)
tSAS Write 1 Gbps
30
iSCSI Write 1 Gbps
20
10
0
0
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Transfer Size (KB)
Figure 3.3.2.5– iSCSI vs tSAS Write Completion Time at 1000 Mbps.
tSAS performs visibly better than iSCSI on both small and larger transfer sizes for Writes as 1
Gbps.
65
From the data collected on the tSAS mock application and the iSCSI mock application, the
following conclusions can be drawn:
1.
tSAS performs better than iSCSI overall at all transfer sizes regardless of the speed of the
connection between the initiator and the target. The reason for this can be easily
attributed to the fact that the REQUEST, TRANSFER READY (Xfer Rdy for SAS) and
RESPONSE frame sizes are smaller in tSAS vs the REQUEST, TRANSGFER READY (R2T) and
RESPONSE frame sizes in iSCSI. In other words, the over head in tSAS is smaller as
compared to the over head in iSCSI.
2.
At smaller speeds, the performance of iSCSI and tSAS is very comparable with tSAS
performing slightly better than iSCSI. However, as transfer sizes get larger, tSAS
performs more visibly better than iSCSI.
3.
Overall, WRITE performance is poorer than READ performance in tSAS and iSCSI. This
can be attributed to the fact that handshaking is more for WRITEs than READs. On
WRITEs, the initiator needs to wait for the transfer ready (Xfer_Rdy or R2T) frame
before sending data.
4.
For better performance, it may be best to use smaller transfer sizes since at larger
transfer sizes the error rate and retransmission of packets on TCP is higher looking at
the wireshark traces collected.
B.
Next we will look at how tSAS performs at different connections speeds for a fixed
transfer size at each connection rate.
The below graph in Figure 3.3.2.6 compares tSAS READ performance at varying
connection speeds for a 2K Transfer size.
66
Time for READ Completion for a transfer size of 2K at 10Mbps, 100Mbps
and 1Gbps
7
6
5
4
Time (MilliSeconds)
3
Time for READ Completion
2
1
0
0
200
400
600
800
1000
1200
Connection Rate (Mbps)
Figure 3.3.2.6–tSAS READ Completion Time for a transfer size of 2K at 10 Mbps, 100
Mbps and 1 Gbps.
The below graph in Figure 3.3.2.7 compares tSAS READ performance at varying connection
speeds for a 16K Transfer size.
67
Time for READ Completion with transfer size of 16K at 10 Mbps, 100
Mbps and 1 Gbps
20
18
16
14
12
Time (Milliseconds) 10
8
Time for READ Compeltion
6
4
2
0
0
200
400
600
800
1000
1200
Comnnection Rate (Mbps)
Figure 3.3.2.7–tSAS READ Completion Time for a transfer size of 16K at 10 Mbps, 100
Mbps and 1 Gbps.
The below graph in Figure 3.3.2.8 compares tSAS READ performance at varying connection
speeds for a 512K Transfer size.
68
Time for READ Completion with transfer size of 512K at 10 Mbps, 100Mbps
and 1Gbps
1200
1000
800
Time (Milliseconds) 600
Time for READ Compeltion
400
200
0
0
200
400
600
800
1000
1200
Connection Rate (Mbps)
Figure 3.3.2.8–tSAS READ Completion Time for a transfer size of 512K at 10 Mbps, 100
Mbps and 1 Gbps.
As can be seen for the graphs above, performance drastically improves from 10 Mbps to 1
Gbps. With 40 Gbps and 100 Gbps soon to be available [36], tSAS performance should
outperform SAS. From a performance analysis done by Netapp on 1 Gbps and 10 Gbps
Ethernet server scalability [35], one can infer that 10 Gbps can perform 4.834 times better than
1 Gbps on the wire. Therefore, for a tSAS solution 40/100 Gigabit Ethernet is recommended to
obtain faster speeds and a better performance.
C. Next, we will compare tSAS to legacy iSCSI and legacy SAS.
Comparing tSAS results at 1 Gbps to legacy SAS by looking at performance numbers between
the HBA and the expander:
As mentioned in section 3.3.0.1 and looking at Tables 3.3.0.1.0 and 3.3.0.1.1, the delay between
the HBA and expander is in the order of microseconds (less than a millisecond for all transfer
sizes between 1K to 2048K). Comparing this to our tSAS mock application performance, we can
easily see that tSAS performance is much slower than legacy SAS between a HBA and an
expander. Since we can use tSAS between a HBA and an expander, this is a valid comparison of
tSAS to legacy SAS. However, without having a solution where tSAS is implemented in hardware
69
by using a tSAS HBA, it is not fair to compare our tSAS results to legacy SAS between the HBA
and the expander. Therefore, it is best to stick with the comparison of tSAS with the iSCSI mock
application itself.
Comparing tSAS results at 100 Gbps to legacy iSCSI without delay at the SCSI drive:
It is not fair to compare the tSAS numbers with the iSCSI numbers we got using the StarWind
iSCSI Initiator and KernSafe iSCSI target (Table 3.3.1.4 and 3.3.1.5). tSAS outperforms the iSCSI
performance numbers we got using legacy iSCSI. However, our tSAS implementation is not a full
implementation of the tSAS Software Initiator or Target. Therefore, it is best to stick with the
comparison of tSAS with the iSCSI mock application itself.
Comparing tSAS to legacy SAS and legacy iSCSI will be left as future work when a tSAS solution is
implemented in a SAS HBA.
4.0 Similar Work
1.
Michael Ko’s patent on Serial Attached SCSI over Ethernet proposes a very
similar solution to the tSAS solution provided in this project.
2.
iSCSI specification (SCSI over TCP) itself is similar to a tSAS solution (SAS over
TCP). The iSCSI solution can be heavily leveraged for a tSAS solution.
3.
The Fibre Channel over TCP/IP specification also can be leveraged to design and
implement a tSAS solution [31].
5.0 Future Direction
1.
2.
3.
4.
The tSAS mock application can be run using a faster switch with connection rate
of 10 Gbps to get more data points
The tSAS mock application can be designed such that it uses piggy backing where
the SSP Read Response frame from the target is piggy backed with the last DATA
frame sent by the target. Also, a DATA frame can be piggy backed with a SSP
Write Request. This may slightly improve READ and WRITE performance.
Jumbo frames can be used to increase the amount of DATA that is passed from
the initiator and target per Ethernet packet improving the performance results.
Using an existing Generation 3 SAS HBA and expanders that have an Ethernet
Port, read/write commands can be implemented on an expander and the HBA
such that they are sent via TCP. This can be used to benchmark and see the
feasibility further of tSAS. An embedded TCP./IP stack such as lwIP can be used
to implement this [33].
70
5.
The Storage Associations can be motivated with the results of this project to
work on a tSAS specification
6.0 Conclusion (Lessons learned)
Overall, tSAS is a viable solution. tSAS will be faster than a similar iSCSI implementation due to
the frame sizes (Request, Response and Transfer Ready) in tSAS being smaller than frame sizes
(Request, Response and Ready to Transfer) in iSCSI. In other words, the overhead in tSAS is
smaller than the overhead in iSCSI. Also, in a tSAS topology the back-end will always be a legacy
SAS drive as opposed to iSCSI where the back-end may be a SCSI drive which is much slower
than a SAS drive.
At smaller transfer sizes, the performance of a tSAS and iSCSI solution may be very similar with
tSAS performing slightly better than iSCSI. However, at larger transfer sizes, tSAS should be a
better solution improving the overall performance of a storage system.
For tSAS to outperform a typical SAS solution today, a HBA solution of tSAS should be used to
increase performance. A software solution of tSAS may not be a good choice if the aim is to
beat the performance of legacy SAS. However, with 40G/100G Ethernet in the horizon [36], a
software solution of tSAS can provide both performance and prove to be a cheaper solution.
tSAS can also make use of jumbo frames to increase performance.
From a pure interest of overcoming the distance limitation of legacy SAS, tSAS is an excellent
solution since it sends SAS packets over TCP.
7.0 References
[1] T10/1760-D Information Technology – Serial Attached SCSI – 2 (SAS-2),
T10, 18 April 2009,
Available from http://www.t10.org/drafts.htm#SCSI3_SAS
[2] Harry Mason,
Serial attached SCSI Establishes its Position in the Enterprise,
LSI Corporation,
available from http://www.scsita.org/aboutscsi/sas/6GbpsSAS.pdf
[3] http://www.scsilibrary.com/
[4] http://www.scsifaq.org/scsifaq.html
[5] Kenneth Y. Yun ; David L. Dill;
71
A High-Performance Asynchronous SCSI Controller,
available from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=528789
[6] http://www.t10.org/scsi-3.htm
[7] Sarah Summers,
Secure asymmetric iScsi system for online storage, 2008,
University of Colorado, Colorado Springs,
available from http://www.cs.uccs.edu/~gsc/pub/master/sasummer/doc/
[8] SCSI Architecture Model - 5 (SAM-5), Revision 21, T10, 2011/05/12,
available from
http://www.t10.org/members/w_sam5.htm
[9] SCSI Primary Commands - 4 (SPC-4), Revision 31, T10, 2011/06/13,
available from http://www.t10.org/members/w_spc4.htm
[10] Marc Farley, Storage Networking Fundamentals: An Introduction to Storage Devices,
Subsystems, Applications,Management, and File Systems, Cisco Press, 2005, ISBN 1-587051621
[11] Huseyin Simitci; Chris Malakapalli; Vamsi Gunturu;
Evaluation of SCSI Over TCP/IP and SCSI Over Fibre Channel Connections,
XIOtech Corporation,
available from
http://www.computer.org/portal/web/csdl/abs/proceedings/hoti/2001/1357/00/13570087abs.htm
[12] Harry Mason,
SCSI, the Industry Workhorse, Is Still Working Hard, Dec 2000,
SCSI Trade Association
available from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=889098&tag=1
[13] Mark S. Kolich, Basics of SCSI: Firmware Applications and Beyond,
Computer Science Department, Loyola Marymount University, Los Angeles,
available from http://mark.koli.ch/2008/10/25/CMSI499_MarkKolich_SCSIPaper.pdf
[14] Prasenjit Sarkar; Kaladhar Voruganti, IP Storage: The Challenge Ahead,
IBM Almaden Research Center, San Jose, CA,
available from http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.8984
[15] Prasenjit Sarkar; Sandeep Uttamchandani; Kaladhar Voruganti, Storage over IP: When Does
Hardware Support help?, 2003
IBM Almaden Research Center, San Jose, California
available from http://dl.acm.org/citation.cfm?id=1090723
72
[16] A. Benner, "Fibre Channel: Gigabit Communications and I/O for Computer
Networks", McGraw-Hill, 1996.
[17] Infiniband Trade Association
available from http://www.infinibandta.org
[18] K.Voruganti; P. Sarkar, An Analysis of Three Gigabit Networking Protocols for
Storage Area Networks’, 20th IEEE International Performance, Computing, and
Communications Conference”, April 2001,
available from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=918661&tag=1
[19] Kalmath Meth; Julian Satran, Features of the iSCSI Protocol, August 2003, IBM Haifa
Research Lab
available from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1222720
[20] Yingping Lu; David H. C. Du, Performance Study of iSCSI-Based Storage Subsystems, IEEE
Communications Magazine, August 2003, pp 76-82.
[21] Integration Scenarios for iSCSI and Fibre Channel,
available from
http://www.snia.org/forums/ipsf/programs/about/isci/iSCSI_FC_Integration_IPS.pdf
[22] Irina Gerasimov; Alexey Zhuravlev; Mikhail Pershin; Dennis V. Gerasimov, Design and
Implementation of a Block Storage Multi-Protocol Converter, Proceedings of the 20th
IEEE/11th NASA Goddard Conference of Mass Storage Systems and Technologies (MSS‟03)
available from http://storageconference.org/2003/papers/26-Gerasimov-Design.pdf
[23] Internet Small Computer Systems Interface (iSCSI), http://www.ietf.org/rfc/rfc3720.txt
[24] Yingping Lu; David H. C. Du, Performance Study of iSCSI-Based
Storage Subsystems, University of Minnesota, Aug 2003,
available from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1222721
[25] Cai, Y.; Fang, L.; Ratemo, R.; Liu, J.; Gross, K.; Kozma, M.; A test case for 3Gbps serial
attached SCSI (SAS) Test Conference, 2005. Proceedings. ITC 2005. IEEE International, February
2006, available from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1584027
[26] Rob Eliot, Serial Attached SCSI, HP Industry Standard Servers, Server Storage Advanced
Technology , 30 September 2003
available from
http://www.scsita.org/sas_library/tutorials/SAS_General_overview_public.pdf
[27] Michael A. Ko, LAYERING SERIAL ATTACHED SMALL COMPUTER SYSTEM INTERFACE (SAS)
73
OVER ETHERNET, United States Patent Application 20080228897, 09/18/2008
available from http://www.faqs.org/patents/app/20080228897
[28] Mathew R. Murphy, iSCSI-based Storage Area Networks for Disaster
Recovery Operations, The Florida State University, College of engineering, 2005,
available from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.8245
[29] “Increase Performance of Network-Intensive Applications with TCP/IP
Offload Engines (TOEs),” Adaptec, Inc. White Paper, May 2003
available from
http://www.probsolvesolutions.co.uk/solutions/white_papers/adaptec/NAC_TechPaper2.pdf
[30] IEEE P802.3ba 40Gb/s and 100Gb/s Ethernet Task Force
available from http://www.ieee802.org/3/ba/
[31] M. Rajagopal; E. Rodriguez; R. Weber; Fibre Channel Over TCP/IP, Network Working Group,
July 2004, available from http://rsync.tools.ietf.org/html/rfc3821
[32] IOMeter Users Guide, Version 2003.12.16 available from
http://www.iometer.org/doc/documents.html
[33] The lwIP TCP/IP stack,
available from http://www.sics.se/~adam/lwip/
[34] 29West Messaging Performance on 10-Gigabit Ethernet, September 2008,
available from
http://www.cisco.com/web/strategy/docs/finance/29wMsgPerformOn10gigtEthernet.pdf
[35] 1Gbps and 10Gbps Ethernet
Server Scalability, NetApp,
available from http://partners.netapp.com/go/techontap/matl/downloads/redhatneterion_10g.pdf
[36] John D. Ambrosia, 40 gigabit Ethernet and 100 Gigabit Ethernet: The development of a
flexible architecture
available from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4804384
[37] Henk Vandenbergh, VDBench Users Guide, Version 5.00, October 2008
available from
http://iweb.dl.sourceforge.net/project/vdbench/vdbench/Vdbench%205.00/vdbench.pdf
74
8.0 Appendix
8.1
How to run the tSAS and iSCSI mock initiator (client) and target
(server) application
Since the applications were written in C using Microsoft Visual Studio 2008 Professional Edition,
you will need to have Visual Studio 2008 downloaded on the system where you would like to
run these applications. A client.exe and server.exe files are provided for tSAS and iSCSi in the
code directory of this project.
Please run the Server.exe and Client.exe program for either iSCSI or tSAS after installing the
tSAS and iSCSI client and server applications using the Windows Installer Packages provided for
both tSAS and iSCSI mock applications:
1. You will see the following screens when you run the Server.exe and Client.exe files
respectively.
2. Enter the IP address of your server/target to see the following output screens
75
3. Select if you would like to test READs/WRITEs along with the transfer size to see the
output of the test results
76
77
8.2
How to run iSCSI Server and iSCSI Target Software
1. The StarWind iSCSI Initiator was used for this project.
a. You may download the StarWind iSCSi Initiator software for free from
http://www.starwindsoftware.com/iscsi-initiator
b. After installing the software please refer to the “Using as iSCSi Initiator” PDF file
included in http://www.starwindsoftware.com/iscsi-initiator.
2. The Kern Safe iSCSI Target was used to create an iSCSI Target
a. You may download the iSCSI target software (KernSafe iStorage Server) from
http://www.kernsafe.com/product/istorage-server.aspx.
b. After installing and running it, please Click on the Create Target to Create a
target and specify the type of target you would like to create as well as security
specifications.
8.3
How to run LeCroy SAS Analyzer Software
The LeCroy SAS Analyzer software can be downloaded from
http://lecroy.ru/protocolanalyzer/protocolstandard.aspx?standardID=7
You can open the SAS Analyzer Traces provided in the SAS Analyzer Traces folder with this
software.
Running the menu Report->Statistical Report will give you the Average Completion time of IOs
and other useful information.
The SAS Analyzer traces are located in the project deliverable folder.
8.4
WireShark to view the WireShark traces
The Wireshark Network analyzer Software can be downloaded from http://www.wireshark.org/
This software will let you capture and view the WireShark traces provided with this project. The
WireShark traces are located in the project deliverable folder.
8.5
VDBench for Linux
VDBench can be downloaded from http://sourceforge.net/projects/vdbench/
After installing VDBench on linux, you may use a script similar to the one below to run IOs and
look at the performance results.
78
sd=s1,lun=/dev/sdb,align=4096,openflags=o_direct
*
wd=wd1,sd=(s1),xfersize=2048KB,seekpct=0,rdpct=0
*
rd=rd1,wd=wd1,iorate=max,forthreads=1,elapsed=300,interval=1
*
Lun=/dev/sdb simply states the target you are testing.
Xfersize is used to change the transfer size [37].
Seekpct=0 states that all IOs are sequential [37].
Forthreads=1 states that the Queue Depth or Number of outstanding IOs is 1 [37].
Interval=1 will simply display/update the performance results onto the screen every second
[37].
For additional information on each field and additional feields please refer to the VDBench user
guide [37].
8.5
IOMeter for Windows
IOMeter can be downloaded from http://www.iometer.org/
Please refer to the user guide at http://www.iometer.org/doc/documents.html to use IOMeter.
79
Download