Working collaboratively with big data Jane Winters, Professor of Digital History, Institute of Historical Research Sussex Research Hive seminar, Sussex, 12 March 2015 Three big data projects • Big UK Domain Data for the Arts and Humanities (AHRC, 15 months) • Traces through Time: Prosopography in Practice across Big Data (AHRC, 15 months) • Digging into Linked Parliamentary Data (Digging into Data Challenge, 2 years) Multiple types of collaboration • Cross-sectoral – National Archives, British Library, Parliament • Cross-institutional – Brighton, King’s College London, Oxford, Sussex • International – Aarhus, Amsterdam, Leiden, Toronto • Interdisciplinary – media and communications, information science, computational linguistics, computing, history, social science, political science etc. BUDDAH • All of the partners had worked together before • Built on a successful pilot project • Main partners geographically close, allowing regular face-to-face meetings • Very clearly defined remit for each of the partner organisations BUDDAH bursary holders • 10 bursaries to postgraduate students and early career researchers • Mentoring arrangements in place, Google Groups, meetings every six weeks • Working with the research team to refine the search interface and the indexing of data throughout the project. Traces through Time • Only some of the partners had worked together before • Quite discrete work packages, focusing on the building of separate but interlinked tools • Separation of the data (by ownership and by time period) • But regular communication, meetings and events to bring everything together Digging into Linked Parliamentary Data • Project put together in just under two months to a very tight deadline • None of the partners had worked together before • Large distances involved, making meetings difficult • Funded by individual national funding bodies, even though part of the same overall project • But we are reaching the point where ‘the cool stuff’ happens Problems? • Differing vocabularies and assumptions (and native language can cause difficulties) • Different approaches to project management – agile v. PRINCE II • Division of big data into smaller data, which can then be worked on separately • Tendency to work in parallel, with only occasional points of intersection • So much initial work to wrangle the data that research can get squeezed Benefits? • Learning from other people, disciplines and approaches • Developing networks and ideas for new projects • Producing tools and research that would be impossible without large-scale collaboration • Being able to ask (and answer) genuinely new research questions • It’s challenging! Acknowledgements Big UK Domain Data for the Arts and Humanities – Jonathan Blaney, Niels Brügger, Josh Cowls, Helen Hockx-Yu, Andy Jackson, Eric Meyer, Ralph Schroeder, Peter Webster Digging into Linked Parliamentary Data – Kaspar Beelen, Jonathan Blaney, Luke Blaxill, Chris Cochrane, Richard Gartner, Graeme Hirst, Jaap Kamps, Maarten Marx, Nona Naderi, Paul Seaward, Martin Steer Traces through Time – Emma Bayne, Mark Bell, Susannah Baccardax, Lynne Cahill, Roger Evans, Kleanthi Georgala, Matthew Hillyard, Arno Knobbe, Karthikeyan Nagaraj, Sonia Ranade, Tony Russell-Rose, Benjamin van der Burgh