Using OGSA-DAI in a commercial environment Terry Sloan EPCC

advertisement
Using OGSA-DAI in a commercial
environment
Terry Sloan
EPCC
Telephone: +44 131 650 5155
Email: tsloan@epcc.ed.ac.uk
Overview
FirstDIG
INWA
Outstanding issues raised by these projects
First Data Investigation on the Grid:
FirstDIG
http://www.epcc.ed.ac.uk/firstdig/
Motivation
Few UK e-Science projects involve service
companies such as First plc
First plc
– Operate worldwide in variety of transport sectors
– Over 10000 vehicles in the UK, 23% of the market
– UK’s largest operator
The challenge for First
– Meeting the needs of the travelling public whilst making money
– Data integration and mining may assist but huge range of
fragmented data sources
Data Sources in the Bus
Industry
Many different kinds of data involved with running
a bus company
– Mileage, revenue, customer contact, schedule, fuel consumption,
vehicle maintenance, routes…
Many means to collect data
–
–
–
–
Manually entered data at depot
Data collected on buses from ticket machines
Data collected on buses from GPS systems
GPS system notes when bus passes through a predefined
“footprint” and records the time at which this happens
Answering Business Questions
Want to combine data from more than one
source:
– Complaints versus Lateness
– Revenue versus Lost Miles
– Complaints versus Lost Miles
Want data aggregated in some way:
– By Service
– By Day
Want to consider subsets of the data
– e.g. weekdays only
Disparate Databases
Data is typically stored in disparate databases
– Various reasons for this: Incremental construction of systems.
– Not a problem for day-to-day running and querying but…
Introduces challenges for Data Analysis
–
–
–
–
–
–
Systems introduced at different times
Different database engines
Different front-ends
Different operating systems
Different physical locations
Different ways of representing data
These issues are NOT unique to buses
OGSA-DAI
 OGSA-DAI
– Open Grid Services Architecture : Data Access and Integration
– Potentially provides a solution
– Need business users to make transition from science to commerce
 Grid middleware:
– Assists with the access and integration of data from separate data
sources via the Grid
– Represents databases as Grid Services
– Enables access from other machines in a secure manner
FirstDIG Achievements
Deployment at First South Yorkshire
Combined two databases to answer real
business questions
– The Customer Contact System
• Microsoft Access
• Information on customer complaints e.g. time, service, nature
– The Mileage database
• dBASE IV
• Information on bus mileage e.g. lost miles
Produced generic Grid Data Service Browser
– SQL access including joins across the databases
First Grid Data Service
Browser
Informing Business & Regional Policy:
Grid-enabled fusion of global data &
‘local’ knowledge
INWA
http://www.epcc.ed.ac.uk/~inwa/
INWA
An e-Social Science demonstrator
– Demonstrates how grid technologies can improve business
– Combining private and public data sources
– Finance and Telecommunications
Uses many grid technologies
– TOG from Sun DCG provides access to remote HPC resource
– OGSA-DAI provides access control and discovery of distributed
heterogeneous data resources
– FirstDIG grid data service browser provides SQL access to
OGSA-DAI enabled resources
– Globus Toolkit 2 and 3
INWA Grid Infrastructure
User@Edinburgh
User@Curtin
Grid Engine FirstDIG
FirstDIG Grid Engine
Bank
Telco
Bank
TOG
Telco
TOG
Globus
Grid
Curtin
EPCC
UK Property
data service
Bank data
Australian
Property
data service
Telco data
References
 EPCC
– http://www.epcc.ed.ac.uk/
 FirstDIG
– http://www.epcc.ed.ac.uk/firstdig/
 OGSA-DAI
– http://www.ogsadai.org.uk
 INWA
– http://www.epcc.ed.ac.uk/~inwa
 Sun Data & Compute Grids
– http://www.epcc.ed.ac.uk/sungrid/
 Transfer-queue Over Globus (TOG)
– http://gridengine.sunsource.net/project/gridengine/tog.html
Outstanding issues raised by FirstDIG & INWA
Outstanding Issues:
Usability
 OGSA-DAI is middleware, client toolkit helps
 Incorporation of demo First browser helpful’ish
But really want …
 Interfaces to real data analysis & dbms packages eg
SPSS
 Otherwise users could end up building applications that
replicate these eg the First Grid Data Service Browser
 Want to be able to point Access, Excel, etc at a grid data
source and examine it
Outstanding issues:
Data
CSV (Comma separated value) data sources
– are common but current JDBC-ODBC drivers do not have
sufficient functionality (NOT an OGSA-DAI issue per se)
No support for BIT type field
– And others eg BOOLEAN, BINARY, etc
Certain characters (eg &, >) are not handled by
the OGSA-DAI XML parser
– Company names often have & in them
Dates from certain sources not handled properly
– First Grid Data Service has to handle this internally
Outstanding issues:
Miscellaneous
Security
– Rolemap file is not encrypted
– If one GDS accesses another GDS the user security credentials
are not passed on so it does not work
Installation & Testing
– Install & Set-up
• Well-explained but still a fair amount of user effort involved
– Lack of an example OGSA-DAI site to point at to test that your
OGSA-DAI installation works
Outstanding Issues:
Miscellaneous
Installation & Testing
– Lack of an example OGSA-DAI site to point at to test that your
OGSA-DAI installation works
Large results sets
– Can increase JVM size but this is not scalable
– This occurred on most datasets
Integration
– DQP is a start ….(Linux, OQL)
Why use OGSA-DAI ?
– Easysoft etc
– http://www.easysoft.com/products/2001/main.phtml
Why use OGSA-DAI ?
‘a RDBMS engine that appears
to client apps as a fully
conformant ODBC 3.5 data
source….can be used to
provide real-time,
heterogeneous access to
multiple target data sources.’
Download