Working collaboratively with big data
Jane Winters, Professor of Digital History, Institute of
Historical Research
Sussex Research Hive seminar, Sussex, 12 March 2015
Three big data projects
• Big UK Domain Data for the Arts and Humanities
(AHRC, 15 months)
• Traces through Time: Prosopography in Practice
across Big Data (AHRC, 15 months)
• Digging into Linked Parliamentary Data (Digging into
Data Challenge, 2 years)
Multiple types of collaboration
• Cross-sectoral – National Archives, British Library,
• Cross-institutional – Brighton, King’s College London,
Oxford, Sussex
• International – Aarhus, Amsterdam, Leiden, Toronto
• Interdisciplinary – media and communications,
information science, computational linguistics,
computing, history, social science, political science etc.
• All of the partners had worked together before
• Built on a successful pilot project
• Main partners geographically close, allowing regular
face-to-face meetings
• Very clearly defined remit for each of the partner
BUDDAH bursary holders
• 10 bursaries to postgraduate students and early
career researchers
• Mentoring arrangements in place, Google Groups,
meetings every six weeks
• Working with the research team to refine the search
interface and the indexing of data throughout the
Traces through Time
• Only some of the partners had worked together before
• Quite discrete work packages, focusing on the building of
separate but interlinked tools
• Separation of the data (by ownership and by time
• But regular communication, meetings and events to
bring everything together
Digging into Linked Parliamentary Data
• Project put together in just under two months to a very tight
• None of the partners had worked together before
• Large distances involved, making meetings difficult
• Funded by individual national funding bodies, even though
part of the same overall project
• But we are reaching the point where ‘the cool stuff’ happens
• Differing vocabularies and assumptions (and native language can
cause difficulties)
• Different approaches to project management – agile v. PRINCE II
• Division of big data into smaller data, which can then be worked on
• Tendency to work in parallel, with only occasional points of
• So much initial work to wrangle the data that research can get
• Learning from other people, disciplines and approaches
• Developing networks and ideas for new projects
• Producing tools and research that would be impossible
without large-scale collaboration
• Being able to ask (and answer) genuinely new research
• It’s challenging!
Big UK Domain Data for the Arts and Humanities – Jonathan Blaney, Niels
Brügger, Josh Cowls, Helen Hockx-Yu, Andy Jackson, Eric Meyer, Ralph
Schroeder, Peter Webster
Digging into Linked Parliamentary Data – Kaspar Beelen, Jonathan Blaney,
Luke Blaxill, Chris Cochrane, Richard Gartner, Graeme Hirst, Jaap Kamps,
Maarten Marx, Nona Naderi, Paul Seaward, Martin Steer
Traces through Time – Emma Bayne, Mark Bell, Susannah Baccardax, Lynne
Cahill, Roger Evans, Kleanthi Georgala, Matthew Hillyard, Arno Knobbe,
Karthikeyan Nagaraj, Sonia Ranade, Tony Russell-Rose, Benjamin van der