S S D R e R e Case Study in e-Social Science

advertisement
ReDReSS
Case Study in e-Social Science
Rob Allan (CCLRC Daresbury Laboratory)
Rob Crouchley (University of Lancaster)
Building Collaborative e-Research Environments
JISC Consultation Workshops, 23/2/04 and 5/3/04
Specific Social Scientists Problems
ReDReSS
1.
They have much less experience and expertise in the use
of the Grid than those typically from other research
council areas;
2. There is a significant intellectual gap between such
disciplines and computer science;
3. Distributed systems are also inherently complex and
associated middleware products are not easy to use;
4. The Open Middleware Infrastructure Institute (OMII) is
likely to provide generic (open-source) middleware and
associated services.
E-Science middleware currently not specifically targeted for
the social science community.
Social Scientists Need
ReDReSS
1.
2.
3.
Help to develop a more computer-literate collaborative
culture;
Help to develop component-based software, visual
composition tools and scripting languages which are easy to
use;
To exploit state-of-the-art software development
technologies such as aspect-oriented programming to
enhance flexibility.
Middleware could be the catalyst for re-use and sharing in
the e-Social Sciences. Some examples and ideas follow.
ReDReSS
Some Features of Social Science Research
• Research motivated by a desire to determine causality
• Involves
1. identifying the various factors which influence the
behaviour or outcome of interest and quantifying their
effects;
2. controlling for all the different confounding factors
which would otherwise result in spurious relationships and
misleading results.
• Randomised experiments not feasible, we cannot randomly
allocate individuals to different levels of training in order to
evaluate programs.
• We rely on observational data, i.e. data that have been
obtained from surveys and censuses.
This is different to “exact sciences” like physics and chemistry
where repeatable experiments can be performed.
ReDReSS
3 related Aspects of Soc. Sci. Research
Observational Data,
usually full of holes
-missing data
-measurement error
-dropout
Substantive Theory
-what determines
what
-not comprehensive
-often contradictory
Methodology
-only partially developed
ReDReSS
Soc. Sci. needs Comprehensive Models
• Interdependent sub models, we need joint models for the
data complexities and the core processes we want to
understand
• Models are not linear in the parameters, require special
procedures and are highly computationally intensive due to
the high dimensionality and the interdependent sub models.
• Simple analyses are usually very misleading about the role of
the controls, eth, sex etc.
Soc. Sci. research is complex - large parameter space, many
interpretations and models which need to be tested. Cannot
be done in isolation…
Increasing need to link components and access large computers/
data sets from desktop.
ReDReSS
E-Science Technology can link Components!
Data
Management
A
Data
Management
B
Data
Management
C
Analysis A
Analysis B
Analysis C
Middleware
New Tools: The Analysis Cycle
ReDReSS
Main ESDS Data Sets
TTWA Data,
NOMIS
Select Data Set
and Appropriate
Variables:
Merge Files:
Add Variables
Working Data
Results
Contextual
Data
ReDReSS
New Tools: Simultaneous Analysis
Example: research in
educational attainment
Psychologists Analysis
National
Pupils
Database
Geographers Analysis
AnalysisLocational
Analysis B
Educationalists Analysis
Economists Analysis
ReDReSS
E-Science can enhance Collaboration!
•
•
•
•
Particularly important in qualitative research;
Enable comparison of different markup/ interpretation;
Direct access to datasets for validation;
Direct input of data from fieldwork involving
questionnaires, photography etc.
• Delivery/ input devices (some mobile) may include: portals,
Access Grid, PC tablets, PDA, camera, phone etc.
ReDReSS
New Tools : Collaboration in Video Markup
VIDGRID: Multiple video streams can be
delivered into an AG or portlet
environment
Video
Corpus
Researcher A
Researcher B
Researcher C
ReDReSS
Training and Awareness in e-Social Science!
Project ReDReSS: Resource Discovery for Researchers in eSocial Science
“ to accelerate the development and awareness of a new kind of
computing and data infrastructure for the Social Sciences,
and to support the increasingly national and global
collaborations emerging in many areas of Social Science”
– To help illustrate appropriate methodologies and software
that admits the full complexity of substantive problems;
– To help articulate the middleware needs of social
researchers;
– To help nurture and support a community of social
researchers;
– To help to provide critical mass and improve the
efficiency of interactions between the interested
researchers, thus reducing the number of lost
opportunities for social science.
ReDReSS
ReDReSS
We will use/ contribute to existing
technologies
•
Resource discovery
•
Sharing tools
•
Personalised
workspaces
•
Flexibly delivery
ReDReSS
E-Science enabling a Virtual Research
Environment!
“to make the use of e-Science technologies, methodologies
and resources easier and more transparent than simply
developing bespoke applications on an infrastructure toolkit
(such as Globus GT2 or OGSI/ WSRF). ”
We need to:
• Bridge the gap between different types of technology
(database management, computational methods, data
collection, networks, Condor resources, visualization
systems, collaborative working, Access Grid, etc.);
• Build on pilot projects and take input from other disciplines
• Link to core JCSR clusters and resources at other eScience Centres;
• Provide an environment to enhance the programmability and
usability of such a Grid by integrating work from a number
of ongoing projects and encourage community input.
ReDReSS
The Grid “Client Problem”
Many clients want to access a few Gridenabled resources
Grid Core
Middleware
Workplace:
desktop
clients
Grid Core
e.g. Globus
Portable clients:
phones, laptop, pda,
data collection
Consumer
clients: PC,
TV, video,
AG
ReDReSS
Some VRE Functions
• Authentication, Authorisation and Accounting – use
Shibboleth and Permis in line with JISC proposals;
• Community development of content - Content Management
and Editing tools:
– Access to middleware resources and documentation,
– Access to training materials and resources,
– Enable shared development of services/ applications,
– Access to a consultancy/ support service,
• Application Management Services - user access via predefined tools and applications to the UK e-Science Grid;
• Data Management Services – discovery, authorisation,
transfer, replication, upload, validation, curation;
• Access to Broadcasts - on the Access Grid network;
• Management Functions - for experts to maintain the
system and guide non-experts, e.g. via expert systems and
workflow.
ReDReSS
Functionality/Content of the VRE
Semantic
GRID
Services
Middleware
/Software
Library
Portal
Management
JJISC
VRE
Portal
Access
GRID
Security
Authorisation
Authentication
VLEPortal
Portal
PortalJISC
UK
GRID
Services
Text Mining/
Data services
Workshops
D
Awareness
Raising
Resources
ReDReSS
Sanity Check
However a number of areas significant for a production Grid
environment have hardly yet been tackled. Issues include:
• Grid information systems, service registration, discovery and
definition of facilities;
• Security, in particular role-based authorisation;
• Portable parallel job specifications;
• Meta-scheduling, resource reservation and ‘on demand’ access;
• Dynamic linking and interacting with remote data sources;
• Wide-area computational/ exprtimental steering;
• Workflow composition and optimisation for complex procedures;
• Distributed user and application management;
• Data management and replication services;
• Grid programming environments, PSEs and user interfaces;
• Auditing, advertising and billing in a Grid-based resource
market;
• Semantic and autonomic tools;
• Usability issues, ethics, etc…
Human Factors
ReDReSS
Customised delivery may be key to long-term uptake:
• Use an environment familiar to the researchers, e.g.:
– Web portals - training, awareness, search tools (search
engines are popular)
– Libraries - e.g. C for programmers
– Programming environment – e.g. R for statistical
analysis with well-known packages
– Sound, video for virtual collaboration (TV is a popular
medium)
Bottom line:
There is a lot we can/ need to do, but
Social Science is already hard – the scientists need tools that
do not make it harder!
ReDReSS
UK E-Social Science Programme
There is currently a growing body of work and projects in
this area:
• Pilot projects - ESRC
• ReDRESS: Resource Discovery for Researchers in e-Social
Science – JISC
• UK National Grid Service + e-Science Grid - JCSR and
DTI Core Programme
• NCeSS: National Centre for e-Social Science - ESRC
• CQeSSS: Centre for Quantitative e-Social Science
Support - ESRC (+ future NCeSS nodes)
• …
Download