Highlights from Day 3* in the Big Data House * ±1 Wednesday’s theme • It's not just the scale and volume of data that characterises data-intensive research, but also the complexity within and across datasets • May be in one discipline or across many My motivation: understanding the scholarly data ecosystem • Data collections are growing in number, volume and complexity • Overall there is growing heterogeneity • The scholarly process seems to be making people more and more expert in smaller and smaller areas • Grand challenges need researchers to cut across the silos: – – – – Data Technology Community Funding Before • I know people want to do data integration – linkage – different info about same thing/place/person/time • e.g. Google maps • e.g. Longitudinal studies • I wanted to know what it really means, inside and across disciplines • NAR 1000+ databases • e.g. Climate change MapReduce Where is it applicable? http://isabel-drost.de/hadoop/slides/fosdem2010.pdf BBC Look East: Anti-Social Behaviour July, August, September 2008 6,902 responses http://www.maptube.org/lookeast Mike Batty Ideas on the future of social science research data • Enduring challenges of documentation for replication, and coordination • More and more comparative analysis • Harmonisation and standardisation • Data linkage and data enhancement • Models for complex multiprocess systems • Fluency – increasing uptake by more users Paul Lambert 17/MAR/2010 DIR workshop: Handling Social Science Data 7 Andrey Rzhetsky Linked Open Data Linked data • • • • Lightweight Doesn’t mandate a technology Small investment, potential big return Sometimes misunderstood – Hugh Glaser didn’t use the O-word or the I-word • Well positioned for effect in the ecosystem • I’m worried about handling data that changes over time • “Publish and be damned” can be cultural obstacle What we didn’t discuss enough (or I wasn’t in the room) • • • • Provenance working across silos Map-Reduce Arts and humanities ... Carole Goble SysMO summary • Providing an environment where every data-driven researcher will thrive • Reality is messy. – Extreme Technology Determinism vs Voluntarist Sociocultural shaping • Extreme and continuous partnership with users. – Act Local Think Global • Agile development environment facilitated stream of features to tackle pain points. – Leverage other e-Laboratories, Maintaining scientists’ buy-in. • Socio-Political Axis dominates the Technical Axis. – Collaboration evolutions, Confidence in exchange. Socio-technical perspective strong • Carole’s talk: – Reputation, incentives, sharing • New forms of data for digital social research – – – – • • • • • Loyalty cards Traffic cameras Smart electricity meters Facebook Privacy vs. inference Sociology of digital entities? Social simulation Crowd sourcing and citizen-sensing Citation Digging into Data Structural Analysis of Large Amounts of Music Information University of Illinois, UrbanaChampaign, University of Southampton, McGill University Digging Into the Enlightenment: Mapping the Republic of Letters University of Oklahoma, University of Oxford, Stanford University Data Mining with Criminal Intent George Mason University, University of Alberta, University of Hertfordshire Towards Dynamic Variorum Editions Mount Allison University, Imperial College, London, Tufts University Digging into Image Data to Answer Authorship Related Questions Michigan State University, University of Illinois, UrbanaChampaign, University of Sheffield Harvesting Speech Datasets for Linguistic Research on the Web McGill University, Cornell University Railroads and the Making of Modern America–Tools for Spatio-Temporal Correlation, Analysis, and Visualization University of Portsmouth, University of Nebraska-Lincoln Mining a Year of Speech University of Oxford, University of Pennsylvania Thanks to everyone!