Journalism in an Era of Big Data

advertisement
Seth C. Lewis
•
Proposed special issue in Digital Journalism
•
Page 1
Proposal for a Special Issue of Digital Journalism
‘Journalism in an Era of Big Data’
Guest Editor: Dr. Seth C. Lewis, Assistant Professor, School of Journalism & Mass
Communication, University of Minnesota–Twin Cities, USA
Introduction and overview
The term “Big Data” is often invoked to describe the overwhelming volume of
information produced by and about human activity, made possible by the growing
ubiquity of mobile devices, tracking tools, always-on sensors, and cheap computing
storage. “In a digitized world, consumers going about their day—communicating,
browsing, buying, sharing, searching—create their own enormous trails of data”
(Manyika et al., 2011, p. 1). Technological advances have made it easier than ever to
harness, organize, analyze, and visualize massive repositories of these digital traces, both
for corporations, governments, and researchers, as well as for any individual with an
internet connection (Manovich, 2011). These streams of data, stored and structured in
reusable ways, increasingly provide the “raw material of journalism” (Bell, 2012, p.
48)—the potential for new forms of gathering, filtering, and disseminating news
information, refined by automation and algorithms. As Bell suggests, “One of the most
important questions for journalism’s sustainability will be how individuals and
organizations respond to this availability of data” (2012, p. 48).
Contours of this response are beginning to emerge in the journalism field. The
role of computer scientists, working as “programmer-journalists,” has grown
substantially since the mid-2000s, as several leading news organizations have assembled
teams dedicated to “news applications” and software-driven variations on traditional
computer-assisted reporting (Parasie & Dagiral, 2012; Powers, 2012). Overall, the work
of collecting, processing, assessing, and visualizing data sets as news can be seen as part
a larger development, what several scholars have referred to as computational
journalism. Anderson (2012) describes this as “increasingly ubiquitous forms of
algorithmic, social scientific, and mathematical forms of newswork adopted by many
21st-century newsrooms and touted by many educational institutions as ‘the future of
news’” (p. 1). Computational journalism, also called data journalism, can be seen as an
outgrowth of and a departure from the computer-assisted reporting (CAR) of decades past
(Flew, Spurgeon, Daniel, & Swift, 2012; Powers, 2012), leveraged on the growing
abundance of Big Data and its appropriation by computer programmers.
These developments raise all manner of questions for journalism and its role in
public life. To the extent that Big Data represents a major social, cultural, and
technological phenomenon (boyd & Crawford, 2012), one that challenges traditional
modes of information organization and expression (Manovich, 2011), what are the
implications of Big Data for journalism? How are journalists learning to cope with and
take advantage of new data sets? Are there emerging sets of routines, values, and norms
associated with Big Data in journalism? What forms and functions does data journalism
serve, and with what kinds of consequences for the overall character of news production?
What are the consequences of data for the epistemology of news, and for the nature of
Seth C. Lewis
•
Proposed special issue in Digital Journalism
•
Page 2
professional identity and expertise in journalism (see Parasie & Dagiral, 2012)? How
might scholars come to understand the sociology of data/computational/programmer
journalism; which theoretical frameworks or conceptual approaches are most fruitful in
helping to explain and predict the processes and outcomes of such journalistic activities
(see Anderson, 2012)? Beyond the traditional journalism field, how are other actors, such
as technology entrepreneurs and information scientists, transforming “information” into
novel forms of “journalism” (Lewis, 2012, p. 851)? While some scholars, like those cited
above, are beginning to address these questions, the academic literature, thus far, is rather
thin.
Perhaps most critically, there are key normative questions that need to be
examined. The Big Data phenomenon raises serious concerns about consumer privacy,
data accuracy, and the overall quantification of social life (boyd & Crawford, 2012;
Oboler, Welsh, & Cruz, 2012). “This data layer,” one observer noted, “is a shadow. It’s
part of how we live. It is always there but seldom observed” (quoted in Bell, 2012, p. 48).
Like corporations and governments, journalism is just beginning to learn how to navigate
this shadow layer of public life. What constitutes ethical practice in handling personal
data when it is so easily gathered and analyzed? What special obligations might
journalists have with regard to Big Data tools and techniques, and how should those be
understood within existing or emerging sets of journalistic norms?
One example: The transition from stories to data
These descriptive, conceptual, and ethical questions point to the need for research
to explore the implications of Big Data for journalism. Below, I offer an example from
my own ongoing work that is illustrative of the pressing need for research in this area.
Manovich (1999) famously argued that if narrative (first the novel, then the
cinema) was the defining form of cultural expression in the modern age, the computer
age has provided its correlate—the database. Unlike the cause-and-effect trajectory of
linear items (events) in narrative, the database “represents a world as a list of items which
it refuses to order” (Manovich, 1999, p. 85)—without beginning or end, without narrators
or key actors, but rather a repository of spreadsheet-like information that reveals different
permutations according to the actions that database users take. Commenting on
Manovich’s “oversimplified but brilliant and provocative formulation,” Schudson (2010)
acknowledged, “This idea implies quite a lot, I believe, about the future of news” (p.
100).
Journalists traditionally informed the public with episodic, narrative accounts. But
in an era of Big Data, journalists are fast adopting the tools of computation, statistics, and
social science to explain difficult problems through databases (Cohen, Hamilton, &
Turner, 2011; Flew et al., 2012). These databases present public data that users can
search, scrutinize, and visualize in novel ways, adding layers of interactivity and
customization that upend the traditional one-way model of news storytelling. In essence,
the character of news information is undergoing a pivotal shift from “stories” to “data”—
from linear, qualitative, narrator-guided modes of expression, to nonlinear, quantitative,
and user-guided modes of exploration.
While not replacing the traditional focus on “stories,” this data journalism
phenomenon is sweeping through the profession, as major news organizations such as the
Guardian, The New York Times and NPR deploy “data teams” comprised of reporters
Seth C. Lewis
•
Proposed special issue in Digital Journalism
•
Page 3
working alongside data scientists, information designers, and computer programmers
(Weber & Rall, 2012). A few questions naturally arise: How does this shift from stories
to data affect the professional culture, ethics, and values of journalism? And, by
extension, how does this shift affect the overall nature of news that is produced—the
news on which people rely for knowledge about public affairs? The transition from story
to data is more than a technical shift in news format; it speaks to a larger socio-cultural
transformation occurring in the information environment, as databases become a critical
mode through which “stories are told” about public affairs.
Such issues remain unanswered even as some researchers have heralded the
potential of data journalism, arguing that computer scientists can strengthen the
democratic functions of journalism at a time when many professional news staffs are
shrinking. “Could computing technology—which has played no small part in the decline
of the traditional news media—turn out to be a savior of journalism’s watchdog online?”
(Cohen, Li, Yang, & Yu, 2011, p. 1). But these aspirations are based on scattered
anecdotes, and have yet to be examined empirically in the academic literature.
The proposed special issue
Against that backdrop, the proposed special issue aims to make space for
scholarship that interrogates the evolving nature of journalism in an era of Big Data,
seeking to understand the implications of this social, cultural, and technological
phenomenon for news and its role in public life. The special issue would thus explore a
range of phenomena at the junction between journalism and the social, computer, and
information sciences; these phenomena are organized around the contexts of digital
information technologies being used in contemporary newswork—such as algorithms,
applications, sophisticated mapping, real-time analytics, automated information services,
dynamic visualizations, and other innovations that rely on massive data sets and their
maintenance. What are the implications of such Big Data phenomena for journalism’s
professional norms, routines, and ethics? For its modes of production, distribution, and
audience reception? For the overall social organization, epistemology, and normative
character of news in democratic society?
The special issue would not be merely a descriptive, reductionist catalog of these
various tools and their impact in journalism, nor would it be focused heavily on “best
practices.” Instead, the special issue would follow Anderson’s (2012) lead in encouraging
a “sociological approach to computational journalism” (p. 13, emphasis in original). This
approach means undertaking a research program that brackets, at least temporarily,
questions of practical utility to maintain a healthy skepticism—keeping in mind the tradeoffs, embedded values, and power dynamics associated with technological change. This
special issue thus would emphasize a range of critical engagements with the issues
surrounding Big Data and journalism.
The special issue would welcome papers drawing on a variety of theoretical and
methodological approaches, with a preference for empirically driven or conceptually rich
accounts. These papers might touch on a range of themes, including but not limited to the
following:
 The history (or histories) of computational forms of journalism;
 The epistemological ramifications of “data” in contemporary newswork;
Seth C. Lewis






•
Proposed special issue in Digital Journalism
•
Page 4
Norms, routines, and values associated with emerging forms of datadriven journalism (e.g., data visualizations, applications, interactives, and
alternative forms of storytelling);
The sociology of new actors connected to computational forms of
journalism, within and beyond newsrooms (e.g., news application teams,
programmer-journalists, tech entrepreneurs, web developers, and hackers);
The social, cultural, and technological roles of algorithms, automation,
real-time analytics, and other forms of mechanization in contemporary
newswork, and the implications of such for journalistic roles and routines
The ethics of journalism in the context of Big Data
Approaches for conceptualizing the distinct nature of emerging
journalisms (e.g., computational journalism, data journalism, algorithmic
journalism, and programmer journalism)
The blurring boundaries between “news” and other types of information,
and the role of Big Data and its related implications in that process
Proposed timeline
 Deadline for abstracts: May 1, 2013
 Completed papers: November 1, 2013
 Final revised papers due: March 1, 2014
Possible contributors
 C. W. Anderson, City University of New York (USA); he’s working on a
cultural history of “data” in journalism, and likely could add a historical
dimension to this discussion.
 Mike Annany, University of Southern California (USA); he is conducting
research on the sociology of news algorithms.
 Nick Diakopolous; an independent researcher trained in computer
science, he has done pathbreaking work in studying the particulars of
computational journalism practice.
 Terry Flew, Queensland University of Technology (Australia); he and his
colleagues published an important conceptual article on computational
journalism; perhaps there’s an opportunity for a follow-up empirical
investigation?
 Susan McGregor, Columbia University (USA); she has written about data
management in journalism.
 Gunnar Nygren, Södertörn University (Sweden); he and colleagues have
an ongoing research project examining data journalism at news
organizations in Sweden.
 Sylvain Parasie (University of Paris Est Marne-la- Vallée; France); the
work of this French sociologist (and his colleagues) has focused on the
development of news applications and data journalism in news
organizations.
Seth C. Lewis



•
Proposed special issue in Digital Journalism
•
Page 5
Cindy Royal, Texas State University (USA); hers is one of the first
scholarly accounts of “programmer-journalists,” and she could build on
that work for this special issue.
Nikki Usher, George Washington University (USA); in collaboration with
the guest editor, she has done substantial research on the growing
integration of computer programmers, web developers, and “hackers” (in
the pro-social sense of the term) into journalism, both in newsrooms and
beyond.
Farida Vis, The University of Sheffield (UK); a social media researcher,
she could speak to data visualizations, including as a practitioner in this
area as well as a scholar.
References
Anderson, C. W. (2012). Towards a sociology of computational and algorithmic
journalism. New Media & Society. doi:10.1177/146144481246513
Bell, E. (2012). Journalism by numbers. Columbia Journalism Review, 51(3), 48-49.
boyd, d., & Crawford, K. (2012). Critical questions for Big Data: Provocations for a
cultural, technological, and scholarly phenomenon. Information, Communication
& Society (iFirst online), 1–18.
Cohen, S., Li, C., Yang, J., & Yu, C. (2011). Computational journalism: A call to arms to
database researchers. In Proceedings of the 5th Biennial Conference on
Innovative Data Systems Research.
Cohen, S., Hamilton, J. T., & Turner, F. (2011). Computational journalism.
Communications of the ACM, 54(10), 66-71.
Flew, T., Spurgeon, C., Daniel, A., & Swift, A. (2012). The promise of computational
journalism. Journalism Practice, 6(2), 157–171.
Lewis, S. C. (2012). The tension between professional control and open participation:
Journalism and its boundaries. Information, Communication & Society, 15(6),
836-866.
Manovich, L. (1999). Database as symbolic form. Convergence: The International
Journal of Research Into New Media Technologies, 5(2), 80-99.
Manovich, L. (2011). Trending: The promises and the challenges of big social data. In M.
K. Gold (Ed.), Debates in the digital humanities (pp. 460-475). Minneapolis, MN:
The University of Minnesota Press.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H.
(2011). Big Data: The next frontier for innovation, competition, and productivity.
McKinsey Global Institute, 1-137.
Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as
computational social science. First Monday, 17(7-2). Retrieved from
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3993/3269
Parasie, S., & Dagiral, E. (2012). Data-driven journalism and the public good:
“Computer-assisted-reporters” and “programmer-journalists” in chicago. New
Media & Society. doi:10.1177/1461444812463345
Seth C. Lewis
•
Proposed special issue in Digital Journalism
•
Page 6
Powers, M. (2012). "In forms that are familiar and yet-to-be invented": American
journalism and the discourse of technologically specific work. Journal of
Communication Inquiry, 36(1), 24-43. doi:10.1177/0196859911426009
Schudson M (2010) Political observatories, databases & news in the emerging ecology of
public information. Dædalus 139(2): 100–109.
Weber, W., & Rall, H. (2012). Data visualization in online journalism and its
implications for the production process. In 16th International Conference on
Information Visualisation (IV) (pp. 349-56). IEEE.
Download