Document 13026792

advertisement
The Epistemology of Data-­‐Intensive Science Sabina Leonelli Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology University of Exeter @sabinaleonelli Outline • Data in the philosophy of science: the representa;onal view • Unresolved quesGons and contemporary developments • My empirical work: data journeys • An alternaGve: the rela;onal framework • Epistemological implicaGons Data in the philosophy of science Data interesGng insofar as they are visualised/interpreted/analysed through models/theories (‘data models’) Central themes: • data producGon and interpretaGon as central to discovery • theory-­‐ladenness and problems with inducGve method • tension between data as “given” and “made” • “evidenGal relaGonship as a purely formal, logical, or a priori maQer” (Woodward 2000, S172-­‐3) Data as embarrassing dent in scienGfic legiGmacy: how can intrinsically local, situated, idiosyncraGc, theory-­‐laden results serve as confirmaGon for universal truths about nature? Result: the Representa;onal View • LiQle work on data handling/disseminaGon pracGces and on data themselves as research components ! impossible to analyse data independently from modelling • Key assumpGons: – data represent an aspect of the world; – they document nature in some way which is mind-­‐independent and which can be surprising to invesGgators; – what exactly they document (the informaGon they contain) needs to be uncovered through analysis and interpretaGon processes ExcepGons: philosophy of experiment and classificaGon (Griesemer, Hacking, Rheinberger), data-­‐phenomena disGncGon (Bogen, Woodward, McAllister), Wylie’s study of diversity of data in archaeology So what? Unresolved ques;ons.. • Representa*onal power: if data document aspects of given target, – what about cases where it is not clear what is being represented and/or what phenomena, or claims about phenomena, data may be evidence for? – and where the representaGonal power aQributed to the same dataset shi`s dramaGcally depending on research context and quesGons asked? • The ‘raw’ factor: data can be viewed as immediate outputs of scienGfic instrument (Hacking’s ‘marks’), but – what about observaGons, fieldnotes, simulaGon results? – and data modified for further invesGgaGon (Rheinberger’s disGncGon between ‘traces’ and ‘data’)? • What consGtutes “good”, “reliable” data? (not just an empirical quesGon) • The relaGon between data and models • The relaGon between data and informa*on – are data informaGon, or do they contain informaGon in some sense? – does the processing of data (their formabng, the media through which they are disseminated, etc) affect their representa;onal power? Data in biology: lots of different formats, media and degrees of abstrac;on from target system ..and contemporary developments: data-­‐intensive science • High-­‐throughput technologies for producing and disseminaGng data • Large-­‐scale, global networks: assemblage of data from various sources is not only possible, but crucial to future advances • Open Science movement & emphasis on data as research outputs in and of themselves; ‘big data’ discourse However: • How can data be interpreted when removed from producGon context? Especially when contexts vary enormously? • Is paQern extracGon an inducGve process (“data-­‐driven”)? • If not inducGon, what forms of reasoning are involved? What role do theories and models play? • Does this differ from other big data assemblages in history of science? (nat. history, astronomy, demography, metereology) Data Journeys and the Key Role of Databases • Data journeys: the conceptual/material/
insGtuGonal labor involved in making data travel from sites of producGon to sites of re-­‐use • Databases as key to the collecGon, disseminaGon and interpretaGon of data about organisms • Study of work involved in developing and maintaining databases provides a window on circumstances under which data travel to new sites, and are integrated and re-­‐used therein CGCCGCCAC Observa;ons from study of data journeys • Data are highly diverse (not just computable quanGGes or symbols) and mutable in format and medium • Crucial role of decisions about what counts a data and metadata • Standards for what is viewed as data, and/or reliable data, can change dramaGcally even within the same field • Data have mulGple, someGmes overlapping value (not only as evidence: currencies, tokens of personal idenGty, commodiGes) • Data are handled differently by individuals and groups with diverse experGse: as a result, the evidenGal value and representaGonal power aQributed to data can vary widely • Data journeys are affected by ethos of relevant communiGes, condiGons for data access/use, percepGons of data ownership • Data journeys affect the ways in which data are defined, handled and interpreted: organisaGon, formabng and visualisaGon of data at every stage is relevant to subsequent analysis and informaGon ‘found’ therein (databases maximise ways in which users can search and visualise data and metadata) So How to Define Data? A Rela;onal Approach • ‘What counts as data?’ makes sense only in relaGon to specific research situaGons • I propose to give up on a definiGon of data based on – the degree to which they are manipulated – intrinsic properGes regardless of context of use • Data = any product of research acGviGes, ranging from arGfacts such as photographs to symbols such as leQers or numbers, which is collected, stored and disseminated in order to be used as evidence for knowledge claims • Any object can be considered as a datum as long as (1) it is treated as potenGal evidence for one or more claims about phenomena, and (2) it is possible to circulate it among individuals/groups. • RelaGonal framework: data are defined in terms of their funcGon within specific processes of inquiry (who uses them, how and for which purposes) Epistemological Implica;ons • The same objects may or may not be funcGoning as data, depending on the situaGon • EvidenGal value is what disGnguishes data from other objects and research components • Defining characterisGcs: – Portability – Materiality – Mutability • Variable medium.. • .. where no clear disGncGon is possible between medium and content (token and type) Conclusions • Diversity of interpretaGons and under-­‐determinaGon are not ‘problems’, but rather the motor of data-­‐intensive research • Data retain representaGonal value, but it is variable rather than intrinsic (in conjuncGon with models and theories used at the Gme of analysis) • Crucial relevance of insGtuGonal, poliGcal, economic factors to data journeys and therefore to data epistemology • Assessing what counts as ‘raw’ data, data manipulaGon, reliable data: relaGve to research sebng and to the availability and processing of informaGon about data journeys (e.g. meta-­‐data, knowledge of labels and formats used in databases, curators’ efforts to assess data quality) • Data-­‐intensive science is not inducGve in a naïve sense: data disseminaGon and analysis involves the careful development of situa*ons in which data can be interpreted as informa*ve • InformaGon technologies play a key role by helping to: – arGculate and mulGply the situaGons in which data can be organised and interpreted – allow users to track and criGcally assess the history of data (their provenance and the ways in which their journey has affected them) “Even the simple communicaGon of an item of knowledge can by no means be compared with the translocaGon of a rigid body in an Euclidean space. CommunicaGon never occurs without a transformaGon, and indeed always involves a stylized remodeling, which intracollecGvely achieves corroboraGon, and which intercollecGvely yields fundamental alteraGon.” Ludwig Fleck, 1979 {1935}, 111 This research is funded by the European Research Council under the European Union's 7th Framework Programme (FP7/2007-­‐2013) / ERC grant agreement n° 335925. 
Download