ReDReSS Case Study in e-Social Science Rob Allan (CCLRC Daresbury Laboratory) Rob Crouchley (University of Lancaster) Building Collaborative e-Research Environments JISC Consultation Workshops, 23/2/04 and 5/3/04 Specific Social Scientists Problems ReDReSS 1. They have much less experience and expertise in the use of the Grid than those typically from other research council areas; 2. There is a significant intellectual gap between such disciplines and computer science; 3. Distributed systems are also inherently complex and associated middleware products are not easy to use; 4. The Open Middleware Infrastructure Institute (OMII) is likely to provide generic (open-source) middleware and associated services. E-Science middleware currently not specifically targeted for the social science community. Social Scientists Need ReDReSS 1. 2. 3. Help to develop a more computer-literate collaborative culture; Help to develop component-based software, visual composition tools and scripting languages which are easy to use; To exploit state-of-the-art software development technologies such as aspect-oriented programming to enhance flexibility. Middleware could be the catalyst for re-use and sharing in the e-Social Sciences. Some examples and ideas follow. ReDReSS Some Features of Social Science Research • Research motivated by a desire to determine causality • Involves 1. identifying the various factors which influence the behaviour or outcome of interest and quantifying their effects; 2. controlling for all the different confounding factors which would otherwise result in spurious relationships and misleading results. • Randomised experiments not feasible, we cannot randomly allocate individuals to different levels of training in order to evaluate programs. • We rely on observational data, i.e. data that have been obtained from surveys and censuses. This is different to “exact sciences” like physics and chemistry where repeatable experiments can be performed. ReDReSS 3 related Aspects of Soc. Sci. Research Observational Data, usually full of holes -missing data -measurement error -dropout Substantive Theory -what determines what -not comprehensive -often contradictory Methodology -only partially developed ReDReSS Soc. Sci. needs Comprehensive Models • Interdependent sub models, we need joint models for the data complexities and the core processes we want to understand • Models are not linear in the parameters, require special procedures and are highly computationally intensive due to the high dimensionality and the interdependent sub models. • Simple analyses are usually very misleading about the role of the controls, eth, sex etc. Soc. Sci. research is complex - large parameter space, many interpretations and models which need to be tested. Cannot be done in isolation… Increasing need to link components and access large computers/ data sets from desktop. ReDReSS E-Science Technology can link Components! Data Management A Data Management B Data Management C Analysis A Analysis B Analysis C Middleware New Tools: The Analysis Cycle ReDReSS Main ESDS Data Sets TTWA Data, NOMIS Select Data Set and Appropriate Variables: Merge Files: Add Variables Working Data Results Contextual Data ReDReSS New Tools: Simultaneous Analysis Example: research in educational attainment Psychologists Analysis National Pupils Database Geographers Analysis Educationalists Analysis Economists Analysis ReDReSS E-Science can enhance Collaboration! • • • • Particularly important in qualitative research; Enable comparison of different markup/ interpretation; Direct access to datasets for validation; Direct input of data from fieldwork involving questionnaires, photography etc. • Delivery/ input devices (some mobile) may include: portals, Access Grid, PC tablets, PDA, camera, phone etc. ReDReSS New Tools : Collaboration in Video Markup VIDGRID: Multiple video streams can be delivered into an AG or portlet environment Video Corpus Researcher A Researcher B Researcher C ReDReSS Training and Awareness in e-Social Science! Project ReDReSS: Resource Discovery for Researchers in eSocial Science “ to accelerate the development and awareness of a new kind of computing and data infrastructure for the Social Sciences, and to support the increasingly national and global collaborations emerging in many areas of Social Science” – To help illustrate appropriate methodologies and software that admits the full complexity of substantive problems; – To help articulate the middleware needs of social researchers; – To help nurture and support a community of social researchers; – To help to provide critical mass and improve the efficiency of interactions between the interested researchers, thus reducing the number of lost opportunities for social science. ReDReSS ReDReSS We will use/ contribute to existing technologies • Resource discovery • Sharing tools • Personalised workspaces • Flexibly delivery ReDReSS Samples showing use of CHEF framework for ReDReSS and delivery of lecture material by video ReDReSS E-Science enabling a Virtual Research Environment! “to make the use of e-Science technologies, methodologies and resources easier and more transparent than simply developing bespoke applications on an infrastructure toolkit (such as Globus GT2 or OGSI/ WSRF). ” We need to: • Bridge the gap between different types of technology (database management, computational methods, data collection, networks, Condor resources, visualization systems, collaborative working, Access Grid, etc.); • Build on pilot projects and take input from other disciplines • Link to core JCSR clusters and resources at other eScience Centres; • Provide an environment to enhance the programmability and usability of such a Grid by integrating work from a number of ongoing projects and encourage community input. ReDReSS The Grid “Client Problem” Many clients want to access a few Gridenabled resources Grid Core Middleware Workplace: desktop clients Grid Core e.g. Globus Portable clients: phones, laptop, pda, data collection Consumer clients: PC, TV, video, AG ReDReSS Some VRE Functions • Authentication, Authorisation and Accounting – use Shibboleth and Permis in line with JISC proposals; • Community development of content - Content Management and Editing tools: – Access to middleware resources and documentation, – Access to training materials and resources, – Enable shared development of services/ applications, – Access to a consultancy/ support service, • Application Management Services - user access via predefined tools and applications to the UK e-Science Grid; • Data Management Services – discovery, authorisation, transfer, replication, upload, validation, curation; • Access to Broadcasts - on the Access Grid network; • Management Functions - for experts to maintain the system and guide non-experts, e.g. via expert systems and workflow. ReDReSS Functionality/Content of the VRE Semantic GRID Services Portal Management Middleware /Software Library VRE Portal Access GRID JJISC Portal VLE PortalJISC Portal UK GRID Services Security Authorisation Authentication Text Mining/ Data services Workshops Awareness Raising Resources ReDReSS Sanity Check However a number of areas significant for a production Grid environment have hardly yet been tackled. Issues include: • Grid information systems, service registration, discovery and definition of facilities; • Security, in particular role-based authorisation; • Portable parallel job specifications; • Meta-scheduling, resource reservation and ‘on demand’ access; • Dynamic linking and interacting with remote data sources; • Wide-area computational/ exprtimental steering; • Workflow composition and optimisation for complex procedures; • Distributed user and application management; • Data management and replication services; • Grid programming environments, PSEs and user interfaces; • Auditing, advertising and billing in a Grid-based resource market; • Semantic and autonomic tools; • Usability issues, ethics, etc… Human Factors ReDReSS Customised delivery may be key to long-term uptake: • Use an environment familiar to the researchers, e.g.: – Web portals - training, awareness, search tools (search engines are popular) – Libraries - e.g. C for programmers – Programming environment – e.g. R for statistical analysis with well-known packages – Sound, video for virtual collaboration (TV is a popular medium) Bottom line: There is a lot we can/ need to do, but Social Science is already hard – the scientists need tools that do not make it harder! ReDReSS UK E-Social Science Programme There is currently a growing body of work and projects in this area: • Pilot projects - ESRC • ReDRESS: Resource Discovery for Researchers in e-Social Science – JISC • UK National Grid Service + e-Science Grid - JCSR and DTI Core Programme • NCeSS: National Centre for e-Social Science - ESRC • CQeSSS: Centre for Quantitative e-Social Science Support - ESRC (+ future NCeSS nodes) • …