Web Services

advertisement
High Energy Physics: Networks
& Grids Systems for Global Science
Harvey B. Newman
California Institute of Technology
AMPATH Workshop, FIU
January 31, 2003
Next Generation Networks for
Experiments: Goals and Needs
Large data samples explored and analyzed by thousands of
globally dispersed scientists, in hundreds of teams
 Providing rapid access to event samples, subsets
and analyzed physics results from massive data stores
 From Petabytes by 2002, ~100 Petabytes by 2007,
to ~1 Exabyte by ~2012.
 Providing analyzed results with rapid turnaround, by
coordinating and managing the large but LIMITED computing,
data handling and NETWORK resources effectively
 Enabling rapid access to the data and the collaboration
 Across an ensemble of networks of varying capability
 Advanced integrated applications, such as Data Grids,
rely on seamless operation of our LANs and WANs
 With reliable, monitored, quantifiable high performance
LHC: Higgs Decay into 4 muons
(Tracker only); 1000X LEP Data Rate
(+30 minimum bias events)
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
109 events/sec, selectivity: 1 in 1013 (1 person in 1000 world populations)
Transatlantic Net WG (HN, L. Price)
Bandwidth Requirements [*]

2001 2002
2003
2004
2005
2006
CMS
ATLAS
100
200
300
600
800
2500
50
100
300
600
800
2500
BaBar
300
600
2300
3000
CDF
D0
100
300
2000
3000
6000
1600 2400 3200
6400
8000
BTeV
DESY
20
40
100
200
300
500
100
180
210
240
270
300
CERN
BW
400
1100 1600
400
155- 622 2500 5000 10000 20000
310
[*] BW Requirements Increasing Faster Than Moore’s Law
See http://gate.hep.anl.gov/lprice/TAN
HENP Major Links: Bandwidth
Roadmap (Scenario) in Gbps
Year
Production
Experimental
2001
2002
0.155
0.622
0.622-2.5
2.5
2003
2.5
10
DWDM; 1 + 10 GigE
Integration
2005
10
2-4 X 10
 Switch;
 Provisioning
2007
2-4 X 10
1st Gen.  Grids
2009
~10 X 10
or 1-2 X 40
~5 X 40 or
~20 X 10
~Terabit
~10 X 10;
40 Gbps
~5 X 40 or
~20-50 X 10
~25 X 40 or
~100 X 10
2011
2013
~MultiTbps
Remarks
SONET/SDH
SONET/SDH
DWDM; GigE Integ.
40 Gbps 
Switching
2nd Gen  Grids
Terabit Networks
~Fill One Fiber
Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade;
We are Rapidly Learning to Use and Share Multi-Gbps Networks
DataTAG Project
NewYork
ABILENE
UK
SuperJANET4
It
GARR-B
STARLIGHT
ESNET
GENEVA
GEANT
NL
SURFnet
Fr
INRIA
Atrium
STAR-TAP
CALREN
2
VTHD
 EU-Solicited Project. CERN, PPARC (UK), Amsterdam (NL), and INFN (IT);
and US (DOE/NSF: UIC, NWU and Caltech) partners
 Main Aims:
 Ensure maximum interoperability between US and EU Grid Projects
 Transatlantic Testbed for advanced network research
 2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003
Progress: Max. Sustained TCP Thruput
on Transatlantic and US Links




8-9/01
11/5/01
1/09/02
3/11/02
*
105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN
125 Mbps in One Stream (modified kernel): CIT-CERN
190 Mbps for One stream shared on 2 155 Mbps links
120 Mbps Disk-to-Disk with One Stream on 155 Mbps
link (Chicago-CERN)
 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams
 6/1/02
290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel)
 9/02
850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, OC48 Link
 11-12/02 FAST:
940 Mbps in 1 Stream SNV-CERN;
9.4 Gbps in 10 Flows SNV-Chicago
Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/;
and the Internet2 E2E Initiative: http://www.internet2.edu/e2e
FAST (Caltech): A Scalable, “Fair” Protocol
for Next-Generation Networks: from 0.1 To 100 Gbps
SC2002
11/02
Highlights of FAST TCP
SC2002
10 flows
SC2002
2 flows
I2 LSR
29.3.00
multiple
SC2002
1 flow
9.4.02
1 flow
 Standard Packet Size
 940 Mbps single flow/GE
card
9.4 petabit-m/sec
1.9 times LSR
 9.4 Gbps with 10 flows
37.0 petabit-m/sec
6.9 times LSR
 22 TB in 6 hours; in 10 flows
Implementation
 Sender-side (only) mods
 Delay (RTT) based
 Stabilized Vegas
Internet: distributed feedback system
Theory
Experiment
7000km
22.8.02
IPv6
Rf (s)
TCP
AQM
Rb’(s)
URL:
netlab.caltech.edu/FAST
p
Sunnyvale
3000km
Next: 10GbE; 1 GB/sec disk to disk
Geneva
Chicago
Baltimore
1000km
C. Jin, D. Wei, S. Low
FAST Team & Partners
FAST TCP: Aggregate Throughput
FAST
 Standard MTU
 Utilization averaged over > 1hr
88%
90%
90%
Average
utilization
92%
95%
1 flow
netlab.caltech.edu
2 flows
7 flows
9 flows
10 flows
HENP Lambda Grids:
Fibers for Physics
 Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes
from 1 to 1000 Petabyte Data Stores
 Survivability of the HENP Global Grid System, with
hundreds of such transactions per day (circa 2007)
requires that each transaction be completed in a
relatively short time.
 Example: Take 800 secs to complete the transaction. Then
Transaction Size (TB)
Net Throughput (Gbps)
1
10
10
100
100
1000 (Capacity of
Fiber Today)
 Summary: Providing Switching of 10 Gbps wavelengths
within ~3-5 years; and Terabit Switching within 5-8 years
would enable “Petascale Grids with Terabyte transactions”,
as required to fully realize the discovery potential of major HENP
programs, as well as other data-intensive fields.
Data Intensive Grids Now:
Large Scale Production
Efficient sharing of distributed heterogeneous compute
and storage resources
Virtual Organizations and Institutional resource sharing
Dynamic reallocation of resources to target specific problems
Collaboration-wide data access and analysis environments
Grid solutions NEED to be scalable & robust
 Must handle many petabytes per year
 Tens of thousands of CPUs
 Tens of thousands of jobs
Grid solutions presented here are supported in part by the
GriPhyN, iVDGL, PPDG, EDG, and DataTag
 We are learning a lot from these current efforts
 For Example 1M Events Processed using VDT Oct.-Dec. 2002
Grids NOW
Beyond Peoduction: Web Services for
Ubiquitous Data Access and Analysis
by a Worldwide Scientific Community
 Web Services: easy, flexible,
platform-independent access to data
(Object Collections in Databases)
 Well-adapted to use by individual
physicists, teachers & students
 SkyQuery Example
JHU/FNAL/Caltech with Web Service
based access to astronomy surveys
 Can be individually or
simultaneously queried via Web
interface
 Simplicity of interface hides
considerable server power
(from stored procedures etc.)
 This is a “Traditional” Web Service,
with no user authentication required
COJAC: CMS ORCA Java Analysis Component:
Java3D Objectivity JNI Web Services
Demonstrated Caltech-Rio
de Janeiro and Chile in 2002
LHC Distributed CM: HENP Data
Grids Versus Classical Grids
 Grid projects have been a step forward for HEP and
LHC: a path to meet the “LHC Computing” challenges
 “Virtual Data” Concept: applied to large-scale automated
data processing among worldwide-distributed regional centers
 The original Computational and Data Grid concepts are largely
stateless, open systems: known to be scalable
 Analogous to the Web
 The classical Grid architecture has a number of implicit
assumptions
 The ability to locate and schedule suitable resources,
within a tolerably short time (i.e. resource richness)
 Short transactions; Relatively simple failure modes
 HEP Grids are data-intensive and resource constrained
 Long transactions; some long queues
 Schedule conflicts; policy decisions; task redirection
 A Lot of global system state to be monitored+tracked
Current Grid Challenges: Secure
Workflow Management and Optimization
Maintaining a Global View of Resources and System State
Coherent end-to-end System Monitoring
Adaptive Learning: new algorithms and strategies
for execution optimization (increasingly automated)
Workflow: Strategic Balance of Policy Versus
Moment-to-moment Capability to Complete Tasks
Balance High Levels of Usage of Limited Resources
Against Better Turnaround Times for Priority Jobs
Goal-Oriented Algorithms; Steering Requests
According to (Yet to be Developed) Metrics
Handling User-Grid Interactions: Guidelines; Agents
Building Higher Level Services, and an Integrated
Scalable User Environment for the Above
Distributed System Services Architecture
(DSSA): CIT/Romania/Pakistan
 Agents: Autonomous, Auto-




discovering, self-organizing,
Lookup
collaborative
Service
“Station Servers” (static) host
mobile “Dynamic Services”
Servers interconnect dynamically;
form a robust fabric in which
mobile agents travel, with a
Station
payload of (analysis) tasks
Server
Adaptable to Web services:
OGSA; and many platforms
Station
Adaptable to Ubiquitous,
Server
mobile working environments
Lookup
Discovery
Service
Lookup
Service
Station
Server
Managing Global Systems of Increasing Scope and Complexity,
In the Service of Science and Society, Requires A New Generation
of Scalable, Autonomous, Artificially Intelligent Software Systems
MonaLisa: A Globally
Scalable Grid Monitoring System





By I. Legrand (Caltech)
Deployed on US CMS Grid
Agent-based Dynamic
information / resource
discovery mechanism
Talks w/Other Mon. Systems
Implemented in
 Java/Jini; SNMP
 WDSL / SOAP with UDDI
Part of a Global “Grid
Control Room” Service
MONARC SONN: 3 Regional Centres
“Learning” to Export Jobs (Day 9)
<E> = 0.83
CERN
30 CPUs
<E> = 0.73
1MB/s ; 150 ms RTT
NUST
20 CPUs
CALTECH
25 CPUs
<E> = 0.66
Optimized:
Day = 9
NSF ITR: Globally Enabled
Analysis Communities
 Develop and build
Dynamic Workspaces
 Build Private Grids to support
scientific analysis communities
 Using Agent Based
Peer-to-peer Web Services
 Construct Autonomous
Communities Operating
Within Global Collaborations
 Empower small groups of
scientists (Teachers and
Students) to profit from and
contribute to int’l big science
 Drive the democratization of
science via the deployment of
new technologies
Private Grids and P2P SubCommunities in Global CMS
14600 Host Devices;
7800 Registered Users in
64 Countries
45 Network Servers
Annual Growth 2 to 3X
An Inter-Regional Center for Research, Education
and Outreach, and CMS CyberInfrastucture
Foster FIU and Brazil (UERJ) Strategic Expansion
Into CMS Physics Through Grid-Based “Computing”
 Development and Operation for Science of International
Networks, Grids and Collaborative Systems
 Focus on Research at the High Energy frontier
 Developing a Scalable Grid-Enabled Analysis Environment
 Broadly Applicable to Science and Education
 Made Accessible Through the Use of Agent-Based (AI)
Autonomous Systems; and Web Services
 Serving Under-Represented Communities
 At FIU and in South America
 Training and Participation in the Development
of State of the Art Technologies
 Developing the Teachers and Trainers
 Relevance to Science, Education and Society at Large
 Develop the Future Science and Info. S&E Workforce
 Closing the Digital Divide
Download