Chapter 4 Lecture Presentation

advertisement
Supported by NSF ITR-0312376, NSF EIN-0335190,
and DOE DE-FG02-04ER25640 grants
A Study of Applications
for
Optical Circuit-Switched Networks
Xiuduan Fang
May 1, 2006
1
Outline


Introduction
CHEETAH Background
―
―




CHEETAH concept and network
CHEETAH end-host software
Analytical Models of GMPLS Networks
Application (App) I: Web Transfer App
App II: Parallel File Transfers
Summary and Conclusions
2
Introduction

Many optical connection-oriented (CO) testbeds
―
―

Use Generalized Multiprotocol Label Switching
(GMPLS)
―


E.g., CANARIE's CA*net 4, UKLight, and CHEETAH
Primarily designed for e-Science apps
Immediate request, call blocking
Motivation: extend these GMPLS networks to
million of users
Problem Statement
―
―
What apps are well served by GMPLS networks?
Design apps to use GMPLS networks efficiently
3
Circuit-switched High-speed End-to-End
Transport ArcHitecture (CHEETAH)

Designed as an “add-on” service to the
Internet and leverages the services of the
Internet
IP router
End
host
Packet-switched
Internet
IP router
NIC I
NIC II
NIC I
Optical circuitswitched CHEETAH
network
Ethernet-SONET
Ethernet-SONET
gateway
gateway
NIC II
End
host
CHEETAH concept
4
CHEETAH Network
NYC
HOPI
Force10
UVa
mvstu6
NC
ORNL, TN
WASH
HOPI
Force10
UVa
Catalyst
4948
NCSU
M20
CUNY
Foundry
CUNY
Host
CUNY
WASH
Abilene
T640
Centuar
FastIron
FESX448
zelda4
zelda5
MCNC
Catalyst
7600
1G
Sycamore
SN16000
Atlanta, GA
SN16000
wukong
zelda1
Direct fibers
VLANs
MPLS tunnels
OC-192 lambda
zelda2
zelda3
Sycamore
SN16000
5
CHEETAH End-host Software
CHEETAH software
End host
OCS client
Application
TCP/IP
C-TCP
CHEETAH software
Internet
OCS client
Routing decision
Routing decision
RSVP-TE client
RSVP-TE client
NIC 1
NIC 2
CHEETAH
network
End host
NIC 1
Application
TCP/IP
NIC 2
C-TCP
OCS: Optical Connectivity Service
RD: routing decision
RSVP-TE: ReSerVation Protocol-Traffic Engineering
C-TCP: Circuit-TCP
6
Outline


Introduction
CHEETAH Background
―
―




CHEETAH concept and network
CHEETAH end-host software
Analytical Models of GMPLS Networks
Application (App) I: Web Transfer App
App II: Parallel File Transfers
Summary and Conclusions
7
Analytical Models of GMPLS Networks

―
Problem: what apps are suitable for GMPLS
networks?
Measure of suitability:



Call-blocking probability, Pb
Link utilization, U
―
App properties:


Per-circuit BW
Call-holding time,
1/ 
Assumptions:
―
―
―
Call arrival rate,  (Poisson process)
Single link
Single class: all apps are of the same type



A link of capacity C; m circuits; per-circuit BW=C/m
m is a measure of high-throughput vs. moderate-throughput
For high-throughput (e.g., e-Science apps), m is small
8
BW sharing models
Two kinds of apps: whether 1 /  is dependent on C / m
1 /  is independent of C / m  1 /  is dependent on C / m


Link L, capacity C
0
RD

1
N
N
…
…
…

1
Link L, capacity C
File size distribution:
,
 :shape , k :scale
The Erlang-B formula
 :crossover file size
9
Numerical Results:
1 /  is independent of C / m


Two equations, four variables
Fix U and m, compute Pb and 
10
Numerical Results:
1 /  is independent of C / m
m=10
Pb=23.62%
1/ 
Conclusions: to get high U
 Small m (~10): high Pb, thus book-ahead or call queuing
 Large m (~1000): high  (   N    /  ) , thus large N
 Intermediate m (~100): large 1 /  is preferred
11
Numerical Results: 1 /  is dependent
on C / m , when   1.1, k  1.25MB
Conclusions: to get high U
 Small m (~10): high Pb, thus book-ahead or call queuing
 As m increases, N does not increase
 m=100, to get U>80%, Pb<5%: 6MB<  <29MB, thus
0.5s  1 /   2.3s
12
Conclusions for Analysis



Ideal apps require BW on the order of onehundredth the link capacity as per-circuit
rate
Apps where is 1 /  independent of C / m
―
long call-holding time is preferred
―
need short call-holding time
Apps where is 1 /  dependent on C / m
13
Outline


Introduction
CHEETAH Background
―
―




CHEETAH concept and network
CHEETAH end-host software
Analytical Models of GMPLS Networks
Application (App) I: Web Transfer App
App II: Parallel File Transfers
Summary and Conclusions
14
APP I:
Web Transfer App on CHEETAH
Why web transfer?

―
―
Web-based apps are ubiquitous
Based on the previous analysis, m=100 is
suitable for CHEETAH
Consists of a software package WebFT

―
―
Leverages CGI for deployment without modifying
web client and web server software
Integrated with CHEETAH end-host software
APIs to allow use of the CHEETAH network in a
mode transparent to users
15
WebFT Architecture
Web server
Web client
Web Browser
(e.g. Mozilla)
URL
Response
RSVP-TE
daemon
Web Server
(e.g. Apache)
CGI scripts
(download.cgi &
redirection.cgi
WebFT sender
WebFT receiver
RSVP-TE API
C-TCP API
Control messages
via Internet
Data transfers
via a circuit
Cheetah end-host software APIs
and daemons
OCS API RD API
OCS daemon
RSVP-TE API
RD daemon
C-TCP API
RSVP-TE daemon
Cheetah end-host software APIs
and daemons
16
Experimental Testbed for WebFT
IP routers
Internet
IP routers
NIC I
NIC I
zelda3
wukong
NIC II
Atlanta, GA
CHEETAH
Network
Sycamore SN16000
Atlanta, GA



NIC II
NCSU
Sycamore SN16000
MCNC, NC
zelda3 and wukong: Dell machines, running Linux
FC3 and ext2/3, with RAID-0 SCCI disks
RTT between them: 24.7ms on the Internet path,
and 8.6ms for the CHEETAH circuit.
load Apache HTTP server 2.0 on zelda3
17
Experimental Results for WebFT
The web page to test WebFT

Test parameters:
―

Test.rm: 1.6 GB, circuit rate: 1 Gbps
Test results
―
throughput: 680 Mbps, delay: 19 s
18
Outline


Introduction
CHEETAH Background
―
―




CHEETAH concept and network
CHEETAH end-host software
Analytical Models of GMPLS Networks
Application (App) I: Web Transfer App
App II: Parallel File Transfers
Summary and Conclusions
19
APP II:
Parallel File Transfers on CHEETAH
Motivation: E-Science projects need to
share large volumes of data (TB or PB)
Goal: achieve multi-Gb/s throughput
Two factors limit throughput



―
―
TCP’s congestion-control algorithm
End-host limitations
Solutions to relieve end-host limitations

―
―
Single-host solution
Cluster solution, which has two variations


General case: non-split source file
Special case: split source file
20
General-Case Cluster Solution
Host 1
transfer
Host i’
assemble Original
Sink
…
…
…
Host n
transfer
Host 1’
…
Host i
…
…
Original split
Source
transfer
Host n’
21
Software Tools: GridFTP and PVFS2

GridFTP: a data-transfer protocol on the Grid
―
―

Extends FTP by adding features for partial file
transfer, multi-streaming and striping
We mainly use the GridFTP striped transfer feature.
PVFS: Parallel Virtual File System
―
―
―
An open source implementation of a parallel file
system
Stripes a file across multiple I/O servers like RAID0
A second version: PVFS2
22
globus-url-copy
GridFTP server
receiving front end
GridFTP server
sending front end
Block n+1
Block n+1
…
…
…
data node S1
data node R1 Sending data nodes
initiate data connections
to receiving nodes
Block 1
Block 1
Block n+1
Block n+1
…
data node Rn
GridFTP striped transfer
Parallel File System
Block 1
…
Parallel File System
Block 1
…
data node Sn
23
General-Case Cluster Solution:
Design
Steps
Approach
Pros.
GridFTP
partial file
transfer
Splitting &
Assembling
Transferring
Wastes disk space,
Performance overhead

Socket
program
Avoids wasting disk
space
pvfs2-cp
Avoids wasting disk
space
Performance overhead
Many independent
transfers incurring much
overhead to set up and
release connections
GridFTP
partial file
transfer
GridFTP
striped
transfer
Cons.
A single file transfer
24
General-Case Cluster Solution:
Implementation

To get a high throughput, we need to make
data nodes responsible for data blocks in
their local disks
―

Make
PVFS2
and GridFTP have the
same
stripe
Block
1
Block
1
pattern
Block n+1
Block n+1
…
Problems:
PVFS2
…
data node Rn
…
…
―
data node R1
PVFS2 1.0.1 does not provide a utility to inspect
data distribution
1
Block
1 receiving
DataBlock
connections
between sending
and
Block n+1
Block n+1
nodes
are random
PVFS2
―
…
data node S1
…
data node Sn
25
Random data connections
Block 1
Block 1
Block n+1
Block n+1
Block 1
Block 1
Block n+1
Block n+1
…
data node Rn
PVFS2
…
data node S1
…
…
PVFS2
…
data node R1
…
data node Sn
26
Random data connections
Block 1
Block 1
Block n+1
Block n+1
Block 1
Block 1
Block n+1
Block n+1
…
data node Rn
PVFS2
…
data node S1
…
…
PVFS2
…
data node R1
…
data node Sn
27
Implementation - Modifications to
PVFS2


Goal: know a priori how a file is striped in PVFS2
Use strace command to trace systems calls called by
pvfs2-cp
―
―

Pvfs2-fs-dump gives the (non-deterministic) I/O server order
of file distribution
Pvfs2-cp ignores the –s option for configuring stripe size
Modify PVFS2 code
―
―
―
For load balance, PVFS2 stripes files starting with a random
server: jitter = (rand() % num_io_servers);
Set jitter = -1 to get a fixed order of data distribution
Change the default stripe size (original: 64KBytes)
28
Implementation - Modifications to
GridFTP


Goal: use a deterministic matching sequence
between sending and receiving data nodes
Method: modify the implementation of SPAS
and SPOR commands
―
―
SPAS: sort the list of host-port pairs based on the
IP-address order for receiving data nodes
SPOR: request sending data nodes to initiate data
connections sequentially to receiving data nodes
29
Experimental Results



Conducted on a 22-node cluster, sunfire
Reduced network-and-disk contention
Performance of PVFS2 implementation was
poor
30
Summary and Conclusions

Analytical Models of GMPLS Networks
―

Application I: Web Transfer Application
―
―

Ideal apps require BW on the order of onehundredth the link capacity as per-circuit rate
provided deterministic data services to CHEETAH
clients on dedicated end-to-end circuits
No modifications to the web client and web server
software by leveraging CGI
Application II: Parallel File Transfers
―
―
Implemented a general-case cluster solution by
using PVFS2 and GridFTP striped transfer
Modified PVFS2 and GridFTP code to reduce
network-and-disk contention
31
Publication Lists


M. Veeraraghavan, X. Fang, and X. Zheng, On
the suitability of applications for GMPLS
networks, submitted to IEEE Globecom2006
X. Fang, X. Zheng, and M. Veeraraghavan,
Improving web performance through new
networking technologies, IEEE ICIW'06,
February 23-25, 2006 Guadeloupe, French
Caribbean
32
Future Work

Analytical Models of GMPLS Networks
―
―

Application I: Web Transfer Application
―
―

Multi-class
Multiple links and network models
Design a Web partial CO transfer to enable nonCHEETAH hosts to use CHEETAH
Connect multiple CO networks to further reduce
RTT
Application II: Parallel File Transfers
―
―
Test the general-case cluster solution on CHEETAH
Work on PVFS2 or try GPFS to get a high I/O
throughput
33
A Classification of Networks that
Reflects Sharing Modes
34
The client can be reached via the
CHEETAH network (OCS)
No
Yes
Request a CHEETAH circuit
(Routing Decision)
No
Yes
Set up a circuit (RSVP_TE client)
Fail
Succeed
Send the file via C-TCP
Release the circuit (RSVP_TE client)
Return Success
The flow chart for the WebFT sender
Return Failure
35
The WebFT Receiver



Integrates with the CHEETAH end-host
software modules similar to the WebFT
sender.
Runs as a daemon in the background on the
client host to avoid manual intervention.
Also provides the WebFT sender a desired
circuit rate.
36
Experimental Results for WebFT
37
PVFS2 Architecture
38
Experimental Configuration

Configuration of PVFS2 I/O servers
―
―

Configuration of GridFTP servers
―
―

The 1st PVFS2: sunfire1 through sunfire5
The 2nd PVFS2: sunfire10, and sunfire6 through 9
Sending front end: sunfire1 with data nodes sunfire1
through sunfire5
Receiving front end: sunfire10 with data nodes sunfire10,
sunfire6 through sunfire9
GridFTP striped transfer
globus-url-copy -vb –dbg -stripe
ftp://sunfire1:50001/pvfs2/test_1G
ftp://sunfire10:50002/pvfs2/test_1G1 2>dbg1.txt
39
Four Conditions to Avoid Unnecessary
Network-and-disk Contention




Know a priori how data are striped in PVFS2
PVFS2 I/O servers and GridFTP servers run
on the same hosts
GridFTP stripes data across data nodes in the
same sequence as PVFS2 does across PVFS2
I/O servers
GridFTP and PVFS2 have the same stripe size
40
41
The Specific Cluster Solution for TSI
orbitty at NCSU
zelda at ORNL
controller-0
(rudi)
compute0-0
zelda1
controller-1
(orbitty)
compute0-1
zelda2
disk-0-0
compute0-2
zelda3
compute0-3
zelda4
compute0-4
zelda5
disk-1-0
disk-2-0
Dell
5424
Dell
5224
disk-3-0
.
.
.
disk-4-0
compute0-19
CHEETAH
X1E at ORNL
LAN
X1E
monitoring
host
42
Numerical Results for
1 /  is dependent on C / m
Conclusions:
 Large m (~1000): does not increase N
43
Download