LC o E e SS - National e

advertisement
LCoEeSS
LCoEeSS
The Main e-Social Science Issues
• Applications: Many large-scale research questions in
the social sciences may only be answered fully using a
multi-disciplinary computationally-intensive analysis;
• Data: The complexity of observational social science
data can make data curation, data management and
the subsequent analysis particularly difficult;
• Methodologies: Much of the quantitative technology
presently used in the social sciences dates back to the
1960s and 1970s. Many assumptions in this technology
were made in order to minimise computation;
• Computational Culture: Currently, most social
scientists in the UK perform their analyses using
standard packages or software written for single
processors, limiting the scope of the substantive
research questions.
LCoEeSS
Lancaster’s Infrastructure
• Lancaster’s HPC
• NW Trunk
LCoEeSS
Lancaster’s HPC
• Funded by the ESRC, EPSRC and HEFCE (£1.2M)
and consists of an array of 103 dual-processor SunBlade workstations, each having between 1 and 8
gigabytes of memory.
• Fileserver with 1300 gigabytes of disk storage.
• Sixteen of the workstations have "Myrinet" cards
installed to allow very high speed communication
between them, supporting parallel programs which
distribute large amounts of data.
• Jobs are submitted to the array from the HPC frontend
machine through the Sun Grid Engine/Codine queuing
system or via Globus.
• This in turns distributes each submitted job to one of
the many execution hosts, or holds it until a host
becomes available.
LCoEeSS
Lancaster’s HPC
LCoEeSS
Lancaster’s HPC
LCoEeSS
Rob Allan’s HPCGrid InfoPortal web page:
http://esc.dl.ac.uk/InfoPortal/
We are normally visible here but its not picking us up at the
moment as there seems to be monitoring and discovery
service (mds) registration problems at grid-support.
LCoEeSS
Lancaster’s HPC details can be found at:
http://giis.globus.org/ldapbrowser/login.php
LCoEeSS
Lancaster’s HPC
• Running globus 2.4 with the following
enhancements:
- Andrew McNab's GridPP Pool Account patch
(http://www.gridpp.ac.uk/authz/gridmapdir/) to
accommodate external job submissions from
users without a local HPC account
- a modified version of the original release of
Marko Krznaric's SGE Integeration Package
(new version is at
http://www.lesc.ic.ac.uk/projects/epic-gtsge.html)
• Currently investigating adding gt3 functionality
to HPC services
• The new LESC EPIC package adds SGE jobmanager functionality to gt3
LCoEeSS
NW Trunk
• Funded by NWDA (£1.77M)
• Four 10GbE links
– 10GbE Carlisle to Lancaster
– 2x 10GbE Lancaster to SJIV C-PoP at
Warrington
– 10GbE Lancaster to Daresbury Labs
• Eight 1GbE links
–
–
–
–
Carlisle – Lancaster, Carlisle-Penrith
Penrith-Kendal, Kendal-Lancaster
Lancaster-Preston, Lancaster-Chorley
Lancaster-SJIV C-PoP at Warrington
LCoEeSS
LCoEeSS
Existing Lancaster Projects
•
A Training and Support Environment for
Advanced Quantitative Methods in the Social
Sciences
•
An OGSA Component Based Approach to
Middleware for Statistical Modelling
•
JISC-funded e-Social Science ReDRESS
portal
LCoEeSS
A Training and Support Environment for
Advanced Quantitative Methods in the Social
Sciences (ESRC)
Short Courses and Masterclasses (£154k over
2 yrs).
1. Courses cover the main methods of data
collection, fundamental aspects of research
design, and statistical methods of data
analysis;
2. Courses viewed on-line via web browser;
3. Software courses to cover packages and
languages ranging from PC to HPC specific
software, such as SAS, SPSS, GAUSS and
LIMDEP, and programming languages such as
C++, FORTRAN and parallel programming.
• National Consultancy Service (£39k over 2
yrs)
LCoEeSS
An OGSA Component Based Approach to
Middleware for Statistical Modelling (£100k)
• SABRE: Statistical software written in Fortran designed
to model recurrent events. Standard generalised linear
models can be fitted as well as various mixture models
with random effects
• R: A free-to-use language and environment for
statistical computing and graphics providing a wide
variety of statistical and graphical techniques
• Middleware for e-Social Science: Development of a
parallel, multilevel, multiprocess (OGSA)
implementation of SABRE as an R object to enable
Social Scientists to disentangle the full stochastic
complexity of socio-economic processes
LCoEeSS
Multilevel, multiprocess models
• Most random effect models are for responses of a single
type, either dichotomous, ordinal or count. A single link
function and family are specified. (Take 2 days on a 2GHz
0.5MB RAM P4)
• Multi-process models, are models with two or more
substantively different outcomes, correlated random
effects.
• Some examples of two process models include health
status and mortality, or getting pregnant and finishing
school. Each process may, but need not, include repeated
outcomes.
• The models can also be used when the data possess a
hierarchical structure, e.g. multi-stage cluster sample,
where the responses at the lower levels are more
correlated than those higher up, e.g. responses on
individual pupils in the same class are more correlated
than those between classes at the same school.
LCoEeSS
ReDRESS Portal (Content)
•
•
•
•
•
•
•
•
•
•
•
•
Introductory material from roadshows
Specific material from the Agenda Setting Workshops
On-line demonstrators
Course timetables/notes
Video/audio material
Associated reference material and FAQs
Links to JISC national collections
Links to partner institutions in Social Science
World wide links
E-mail for students/staff
Additional help for self learners
Examination and monitoring results
LCoEeSS
ReDRESS Portal (functionality)
• Single sign-on/certificate-based
authentication (same as the Grid and Athens)
• Role-based authorisation (students, staff,
managers, developers etc.)
• Database back end for managing users and
resources (OGSA-DAI)
• Content management for staff and developers
• Active portal services for Grid-based
demonstrations (OGSA, Web services)
• Active monitoring suite to capture workflow
and mine for enhanced requirements
• XML/XSLT-driven dynamic pages
• uPortal or Jetspeed framework with services
based on BlackBoard, HPCPortal and
DataPortal
LCoEeSS
ReDRESS will use/contribute to this
technology
LCoEeSS
ReDRESS
Content:
(Existing Tools)
• Nesstar is a web-based facility that allows 66
major datasets to be explored online, allows
simple sub-setting and simple analyses.
• Only uses one data set at a time;
• Has very limited facilities for sub-setting and
none for fusing;
• Restricted statistical facilities, e.g. descriptive
analysis, linear regression;
• No facilities for handling missing data;
• Not currently Grid enabled.
ReDRESS Content:
LCoEeSS
(Existing Tools)
• A free web-based service using R, allowing
users to submit R jobs and get output back to
their web session
• Rweb it needs more menus, R has available a
very extensive statistical library, not used in
Rweb;
• Rweb uses R and not Rmpi. For use in a Grid
environment we would need these hooks to
extend functionality;
• R also lacks some of the key
multiprocess/multilevel and selection model
frameworks appropriate to social science data,
these are being developed;
LCoEeSS
Content: New Tools / Middleware
1. Social scientists have much less experience
and expertise in the use of the Grid than
those typically from other research council
areas;
2. There is a significant intellectual gap
between such disciplines and computer
science;
3. Distributed systems are also inherently
complex and associated middleware products
are not easy to use;
4. The Open Middleware Infrastructure Institute
(OMII) will provide (open-source)
middleware and associated services, but not
specifically targeted for the social science
community;
5. Need to build a more computer-literate
collaborative culture for Social Science.
Content New Tools / Middleware
LCoEeSS
We propose:
1. To promote the use of component-based
software development and visual composition
tools and scripting languages for ease of use;
2.
To offer a middleware consultancy service for
application developers;
3.
To exploit local expertise to develop bespoke
middleware solutions for customers;
4.
To develop exemplar e-Social Science
demonstrators for end users;
5.
To exploit state-of-the-art software
development technologies such as aspectoriented programming to enhance flexibility.
LCoEeSS
New Tools : Ex 1. VIDGRID
Multiple video streams can be delivered
into an AG or portlet environment
Video
Corpus
Researcher A
Researcher B
Researcher C
New Tools : Ex2. The Analysis Cycle
LCoEeSS
Main ESDS Data Sets
TTWA Data,
NOMIS
Select Data Set
and Appropriate
Variables:
Merge Files:
Add Variables
Working Data
Results
Contextual
Data
LCoEeSS
New Tools : Ex2. Linking Components
Data
Management
A
Data
Management
B
Data
Management
C
Analysis A
Analysis B
Analysis C
Middleware
LCoEeSS
The ReDRESS Community
Lancaster/Daresbury
Other Contributors/Steering
Committee
… plus other contributors in the UK,
from the USA & Europe
Key components will be accessible on the GRID and linked into the portal and demonstrators
LCoEeSS
New Lancaster Projects
• NWDA NW-GRID (400K kit 4 staff over 3
years, starts Dec 2003)
A collaboration between Lancaster (£1.0M),
Daresbury (£1.0M), Liverpool and
Manchester. Staff and equipment (Grids) at
each site.
Projects at Lancaster in Env. Science,
Physics, Computing, Sociology, Economics,
Applied Statistics and Grid Training
LCoEeSS
The e-Social Science Future
• Our existing quantitative tools rely heavily on
assumptions, they come out a technology that was
formed in the 60s and 70s when computers were
10**9 slower.
• What new research agendas are now relevant? The 3
exponentials will change everything.
• The new opportunities for collaboration and evidence
based research will lead to new (e)science, not just
making legacy approaches faster.
• We can now move away from the assumption ridden
technologies and develop robust nonparametric
procedures for decomposing the complexity of socioeconomic processes.
• There will be amazing opportunities to make a
difference/test policy instruments and address some
grand challenges be they in reducing drug abuse,
crime and poverty, or improving educational
attainment.
Download