Data Mining with Criminal Intent

advertisement
Highlights
from Day
3* in the
Big Data
House
* ±1
Wednesday’s theme
• It's not just the scale and volume of data that
characterises data-intensive research, but also
the complexity within and across datasets
• May be in one discipline or across many
My motivation: understanding the
scholarly data ecosystem
• Data collections are growing in number, volume and
complexity
• Overall there is growing heterogeneity
• The scholarly process seems to be making people
more and more expert in smaller and smaller areas
• Grand challenges need researchers to cut across the
silos:
–
–
–
–
Data
Technology
Community
Funding
Before
• I know people want to do data integration –
linkage – different info about same
thing/place/person/time
• e.g. Google maps
• e.g. Longitudinal studies
• I wanted to know what it really means, inside
and across disciplines
• NAR 1000+ databases
• e.g. Climate change
MapReduce
Where is it
applicable?
http://isabel-drost.de/hadoop/slides/fosdem2010.pdf
BBC Look East: Anti-Social Behaviour
July, August,
September 2008
6,902 responses
http://www.maptube.org/lookeast
Mike Batty
Ideas on the future of social science
research data
• Enduring challenges of documentation for
replication, and coordination
• More and more comparative analysis
• Harmonisation and standardisation
• Data linkage and data enhancement
• Models for complex multiprocess systems
• Fluency – increasing uptake by more users
Paul Lambert
17/MAR/2010
DIR workshop: Handling Social Science Data
7
Andrey Rzhetsky
Linked Open Data
Linked data
•
•
•
•
Lightweight
Doesn’t mandate a technology
Small investment, potential big return
Sometimes misunderstood
– Hugh Glaser didn’t use the O-word or the I-word
• Well positioned for effect in the ecosystem
• I’m worried about handling data that changes over
time
• “Publish and be damned” can be cultural obstacle
What we didn’t discuss enough
(or I wasn’t in the room)
•
•
•
•
Provenance working across silos
Map-Reduce
Arts and humanities
...
Carole Goble
SysMO summary
• Providing an environment where every data-driven
researcher will thrive
• Reality is messy.
– Extreme Technology Determinism vs Voluntarist Sociocultural
shaping
• Extreme and continuous partnership with users.
– Act Local Think Global
• Agile development environment facilitated stream of
features to tackle pain points.
– Leverage other e-Laboratories, Maintaining scientists’ buy-in.
• Socio-Political Axis dominates the Technical Axis.
– Collaboration evolutions, Confidence in exchange.
Socio-technical perspective strong
• Carole’s talk:
– Reputation, incentives, sharing
• New forms of data for digital social research
–
–
–
–
•
•
•
•
•
Loyalty cards
Traffic cameras
Smart electricity meters
Facebook
Privacy vs. inference
Sociology of digital entities?
Social simulation
Crowd sourcing and citizen-sensing
Citation
Digging into Data
Structural Analysis of Large
Amounts of Music Information
University of Illinois, UrbanaChampaign, University of
Southampton, McGill University
Digging Into the Enlightenment:
Mapping the Republic of Letters
University of Oklahoma, University
of Oxford, Stanford University
Data Mining with Criminal
Intent
George Mason University,
University of Alberta, University of
Hertfordshire
Towards Dynamic Variorum
Editions
Mount Allison University, Imperial
College, London, Tufts University
Digging into Image Data to
Answer Authorship Related
Questions
Michigan State University,
University of Illinois, UrbanaChampaign, University of Sheffield
Harvesting Speech Datasets for
Linguistic Research on the Web
McGill University, Cornell University
Railroads and the Making of
Modern America–Tools for
Spatio-Temporal Correlation,
Analysis, and Visualization
University of Portsmouth,
University of Nebraska-Lincoln
Mining a Year of Speech
University of Oxford, University of
Pennsylvania
Thanks to everyone!
Download