Data-driven Demands for Better Languages GALT at NeSI 16

advertisement
GALT at NeSI
Data-driven Demands for Better
Languages
Prof. Malcolm Atkinson
Director
www.nesc.ac.uk
16th October 2003
Outline
What are the Common Factors for HL WFLs?
Uniform to Rich Type Transitions
Mobile Computation needs Safety
Dynamic re-factoring to optimise enactment
Sloan Digital Sky Survey
Production System
Slide from Ian Foster’s ssdbm 03 keynote
Global Knowledge Communities
Often Driven by Data: E.g., Astronomy
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
Data and images courtesy Alex Szalay, John Hopkins
Architecture of Service
Interaction
• Packaging to avoid round trips
• Unit for data movement services to handle
C
L
I
E
N
T
A
P
I
R
E
Q
U
E
S
T
O
R
S
T
U
B
1
Data Set
dr
Data Set
2
Architecture of Service
Interaction
C
L
I
E
N
T
A
P
Ident
I
Type
Value
R
E
Q
U
E
S
T
O
R
1
S
T
Ident
U
Type
B
Value
Data Set
dr
2
Data Set
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Architecture of Service
Interaction
Request
R
PerformRequestDocument.xsd
C
1
E
Data Set
<performRequest>
Q
L
…
U
I
E
</performRequest>
E
N
T
A
P
Ident
I
Type
Value
S
T
O
R
S
T
Ident
U
Type
B
Value
dr
2
Data Set
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Architecture of Service
Interaction
TableOfTargetGalaxies
R
WebRowSet.xsd
C
1
E
<table>
Q
L
…
U
I
E
</table>
E
N
T
A
P
Ident
I
Type
Value
S
T
O
R
S
T
Ident
U
Type
B
Value
Data Set
dr
2
Data Set
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Ident
Type
Value
Architecture (2)
2
Data Set
C
L
I
E
N
T
A
P
I
R
E
Q
U
E
S
T
O
R
S
T
U
B
1
Data Set
dr
3
Data Set
Data Set
C
L
I
E
N
T
C
O
N
S
U
M
E
R
A
P
I
S
T
U
B
4
Tera → Peta Bytes
RAM time to move
15 minutes
1Gb WAN move time
10 hours ($1000)
Disk Cost
7 disks = $5000 (SCSI)
Disk Power
100 Watts
Disk Weight
5.6 Kg
Disk Footprint
RAM time to move
2 months
1Gb WAN move time
14 months ($1 million)
Disk Cost
6800 Disks + 490 units +
32 racks = $7 million
Disk Power
100 Kilowatts
Disk Weight
33 Tonnes
Disk Footprint
Inside machine
May 2003 Approximately 60
Correct
m2
See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24
Data Access & Integration Services
1a. Request to Registry
for sources of data
about “x”
SOAP/HTTP
Registry
1b. Registry
responds with
Factory handle
service creation
API interactions
2a. Request to Factory for access
to database
Factory
Client
2c. Factory returns
handle of GDS to
client
3a. Client queries GDS with
XPath, SQL, etc
3c. Results of query returned to
client as XML
2b. Factory creates
GridDataService to manage
access
Grid Data
Service
XML /
Relationa
l
database
3b. GDS interacts with database
Future DAI Services
1a. Request to Registry for
sources of data about “x” &
“y”
1b. Registry
responds with
Factory handle
Data
Registry
SOAP/HTTP
service creation
API interactions
2a. Request to Factory for access and
integration from resources Sx and Sy
Data Access
& Integration
master
2c. Factory
returns handle of GDS to client
3b.
Client
Problem
tells“scientific”
Solving
analyst
Client
Application
Environment
coding
scientific
insights
Analyst
2b. Factory creates
Semantic
GridDataServices network
Meta data
3a. Client submits sequence of
scripts each has a set of queries
to GDS with XPath, SQL, etc
GDTS1
GDS
GDTS
XML
database
GDS2
Sx
3c. Sequences of result sets returned to
analyst as formatted binary described in
a standard XML notation
Application Code
GDS
GDS1
Sy
GDS3
GDS
GDTS2
GDTS
Relational
database
Take Home Message
Information Grids
How do we describe components / services / data?
Economic generation of those descriptions
Reliability and at the right level
How do we describe data & compute processes?
Characterising code behaviour
Characterising message content
Safe assembly of services and data operations
How do we provide Integrated Enactment?
Safe code movement
Optimisation
Plenty of Challenges
Face “dirty complexity”; deliver performance & safety
Download