Data Journalism Handbook ppt

advertisement
DATA JOURNALISM
HANDBOOK
datajournalismhandboo
k.org
DATA JOURNALISM HANDBOOK
 The Data Journalism Handbook was born at a 48 hour
workshop at MozFest 2011 in London.
 Hundreds of contributors
WHAT IS DATA JOURNALISM?
 Journalism done with data
 New possibilities that open up when you combine the
traditional ‘nose for news’ and ability to tell a compelling
story, with the sheer scale and range of digital information
now available.
 Can help a journalist tell a complex story through engaging
infographics
 Can help explain how a story relates to an individual
 Data can be the source of data journalism, or it can be the
tool with which the story is told — or it can be both
WHY SHOULD JOURNALISTS USE DATA?
 Today news stories are flowing in as they happen, from multiple
sources, eye -witnesses, blogs and what has happened is filtered
through a vast network of social connections, being ranked,
commented and more of ten than not: ignored .
 Right now, a few pioneering journalists already demonstrate how data
can be used to create deeper insights into what is happening around us
and how it might af fect us .
 Data analysis can reveal “a stor y’s shape” (Sarah Cohen), or provides
us with a “new camera” (David McCandless).
 Becoming knowledgeable in searching, cleaning, and visualizing data is
transformative for the profession of information gathering.
 Companies and institutions around the world are looking for
“sensemakers” and professionals, who know how to dig through data
and transform it into something tangible .
 There is one barrier keeping journalists from using this potential:
training in order to learn how to work with data through all the steps
from a fir st question to a big data -driven scoop.
SURVEY
 The European Journalism Centre conducted a survey to find
out more about training needs of journalists.
 Found there is a big willingness to get out of the comfort zone
of traditional journalism and to invest time to master the new
skills.
 The results from the survey showed that journalists see the
opportunity, but need a bit of support to cut through the
initial problems keeping them from working with data.
WHY IS DATA JOURNALISM IMPORTANT
















Filtering t he Flow of Data
New Approaches to Stor ytelling ( Pilhofer)
Like Photo Journalism with a Laptop ( Boyer)
Data Journalism is t he Future ( TBL)
Number-Crunching Meets Word -Smithing
Updating Your Skills Set
A Remedy for Information A symmetr y
An Answer to Data -driven PR
Providing Independent Interpretations of Of ficial Information
Dealing with t he Data Deluge
Our Lives are Data
A Way to Save T ime
An Essential Par t of t he Journalists' Toolkit
Adapting to Changes in Our Information Environment
A Way to See T hings You Might Not Ot her wise See
A Way To Tell Richer Stories
EXAMPLES
EXAMPLES
EXAMPLES
EXAMPLES
EXAMPLES
EXAMPLES
PERSPECTIVES
 August 2010 – one of 1 st data journalism conferences in
Amsterdam
 The way that media organizations like Guardian and the New
York Times handled the large amounts of data released by
Wikileaks is one of the major steps that brought the term into
prominence.
 One of the earliest formulations of what we now recognise as
data journalism was in 2006 by Adrian Holovaty, founder of
EveryBlock — an information service which enables users to find
out what has been happening in their area, on their block .
 He argues that journalists should publish structured, machine readable data, alongside the traditional ‘big blob of text ’.
 Using data to improve reportage and delivering structured (if not
machine readable) information to the public has a long history.
(CAR)
PERSPECTIVES
 In the early 1970s the term ‘precision journalism’ was coined
to describe this type of news -gathering: “the application of
social and behavioral science research methods to the
practice of journalism.”
 Early examples, Manchester Schools (1821) and Mortality of
British Army (1858)
 CAR is a technique for gathering and analyzing data as a way
of enhancing (usually investigative) reportage, whereas data
journalism pays attention to the way that data sits within the
whole journalistic workflow.
 Data journalism is about mass data literacy
IN THE NEWSROOM
 How does data journalism sit within newsrooms around the
world?
 How did leading data journalists convince their colleagues
that it is a good idea to publish datasets or launch data -driven
news apps?
 Should journalists learn how to code, or work in tandem with
talented developers?
 Look at the role of data and data journalism at the Australian
Broadcasting Corporation, the BBC, the Chicago Tribune, the
Guardian and the Zeit Online.
THE ABC’S DATA JOURNALISM PLAY
 Australia’s public broadcaster
 Content makers at the ABC have been encouraged to as the
corporate mantra puts it — be ‘agile’.
 Inspired by work of The Guardian
 Coal Seam Gas by the Numbers - five pages of interactive
maps, data visualizations and text
 The jewel was an interactive map showing coal seam gas
wells and leases in Australia.
THEIR TEAM
 A web developer and designer
 A lead journalist
 A part time researcher with expertise in data extraction, excel
spread sheets and data cleaning
 A part time junior journalist
 A consultant executive producer
 A academic consultant with expertise in data mining, graphic
visualization and advanced research skills
 The services of a project manager and the administrative
assistance of the ABC’s multi -platform unit
 Importantly we also had a reference group of journalists and
others whom we consulted on a needs basis
WHAT THEY LEARNED
 Co-location of the team is vital. Our developer and designer
were of f-site and came in for meetings. This is definitely not
optimal! Place in the same room as the journalists.
 Our consultant EP was also on another level of the building.
We needed to be much closer, just for the drop -by factor
 Choose a story that is solely data driven.
 Big media organizations need to engage in capacity building
to meet the challenges of data journalism
 Data journalism is interdisciplinary
DATA JOURNALISM AT BBC









School league tables
Every death on the road
The world at seven billion
BBC Budget Calculator
Simple tools
Mining the data
Understanding an issue
Team is 20 journalists, designers and developers
Don’t have people who are specifically identified as ‘data’
journalists, but all editorial staf f on the team have to be
proficient at using basic spreadsheet applications such as
Excel and Google Docs to analyze data.
HOW THE NEWS APPS TEAM AT CHICAGO
TRIBUNE WORKS








We work closely with editors and reporters to help :
(1) research and report stories,
(2) illustrate stories online
(3) build evergreen web resources for the fine people of
Chicagoland.
Important that they sit in newsroom
Founded by technologists for whom journalism was a career
change
Work in Agile fashion; daily standup meeting; pair
programming; short projects; fail fast; hack iteratively and on
deadline
App ideas come from newsroom
GUARDIAN DATABLOG
Of fers full datasets from behind news stories
Now consists of a front page (guardian.co.uk/data);
searches of world government and global development data;
data visualizations by from around the web and Guardian
graphic artists, and tools for exploring public spending data
 Members of Parliament expense scandal - crowdsourced
458,000 documents relating to MPs' expenses and analyzed
the detailed data of which MPs had claimed what
 Wikileaks
 Sit next to newsdesks




PROCESS
ZEIT ONLINE
 The PISA based Wealth Comparison project is an interactive
visualization that enables comparison of standards of living in
dif ferent countries.
 Analyze and visualize this data to provide a unique way of
comparing standards of living in dif ferent countries
HOW TO HIRE A HACKER














Organization may have people with these skills
Post on job websites
Contact relevant mailing lists
Contact relevant organizations
Join relevant groups/networks
Local interest communities
Hackathons and competitions
Ask a geek
They code the full stack
They see the whole picture
They tell a good story
They talk things through
They teach themselves
But they need technical management
HACKATHONS







Find stories
Connect people
Host a social event
Question or data?
Little technical knowledge
Is it news?
What’s the routine?
OUR STORIES COME AS CODE
 Data projects don’t date
 You can build on your past work
 Data journalism pays for itself
BUSINESS MODELS
 Many journalists seem to be unaware of the size of the
revenue that is already generated through data collection,
data analytics and visualization
 The big, worldwide market that is currently opening up is all
about transformation of publicly available data into
something our that we can process: making data visible and
making it human
 Bloomberg
 Reuters
 The Economist
 Startups
 Increase importance of role in society; trust
 Non-profit funding
GETTING DATA
 Many databases on the web are indexed by search engines,
whether the publisher intended this or not.
 Include search terms relating to content and format
 Search by part of url: ‘inurl:downloads filetype:xls’
 Search sites where data may be available: ‘site:agency.gov
Directory Listing’
 Browse data sites and services
 Ask on a forum or mailing list
 Join Hacks/Hackers/Ask an expert
 Learn about Govt IT
 Search again with new phrases
 Write a FOI request
YOUR RIGHT TO DATA














Freedom of Information
Plan ahead
Know fees
Know your rights and communicate that
Keep it simple
Keep it focused
Be specific
Submit multiple requests/international requests
Do a test run
Anticipate exceptions
Ask for access
Keep a record
Make it public
Ask for raw data
WOBBING EXAMPLES
 Farm Subsidy
 Side Ef fects
 Smuggling Death
GETTING DATA FROM THE WEB
Web-based APIs
Extract from PDF
Scrape it
Scraperwiki
Chrome Scraper Extension
Scrapers are small pieces of code that automate getting data
from a website that is not in csv or other easy to use format.
 Based on html structure of page
 We will discuss in coming weeks






THE WEB AS A DATA SOURCE














Whois
Blekko
Compete.com
Google site search – narrow search
Bit.ly
Twitter
Google’s cache
Wayback Machine at archive.org
View, Source
TinEye – image search
YouTube statistics
Emails
Wikipedia article traffic
Google Insights
OTHER
 Crowdsourcing data
 Sharing - Ethical issues, restrictions, licenses
BECOMING DATA LITERATE IN 3 STEPS
 How was the data collected?
 What’s in there to learn?
 How reliable is the information? Sample size,
intervening variables, accounting for all relevant
information…
 These are basic research methods techniques with
which you should be familiar.
TIPS FOR WORKING WITH NUMBERS






Enjoy yourself; handle as you would other evidence
Don’t confuse skepticism with cynicism
Be objective and open-minded about what the data means
Uncertainty is ok
The investigation is a story
Use prompts to think about the data. Is it too big or small?
Does it make sense? Where did it come from?
BASIC STEPS FOR WORKING WITH DATA
 Data requests should begin with a list of questions you want
to answer.
 Data often is messy and needs to be cleaned.
 Data may have undocumented features
DATA STORIES









Start with data, finish with a story
Measurement – counting or totaling something
Proportion
Internal comparison
External comparison
Change over time
League tables – must take size into account
Analysis by category
Association - correlation
TOOLS OF CHOICE







Google Spreadsheets and Fusion Tables
Excel
Django and Ruby on Rails
Statistical tools like SPSS, Google Refine and R
Mapping tools – QGIS, ArcGIS, TileMill
Tableau Public
A good text editor, like TextWrangler, BBEdit, Komodo Edit
VISUALIZE DATA
 Use it to provide insight




Tables for small amounts of data
Charts with visual properties
Maps
Graphs
 Analyze and interpret what you see
 Document your insights and steps
 Why have I created this chart?
 What have I done to the data to create it?
 What does this chart tell me?
 Transform data – zoom, filter, outlier removal
PRESENTING DATA TO THE PUBLIC
 There are times when data can tell a story better than words or
photos
 New tools and technologies help in visual storytelling
 The question facing journalists now less about whether you can
turn your dataset into a visualization, but whether you should - –
Aron Pilhofer
 Use of motion and animation when appropriate
 Excel, news desk, visualize, post. Then promote via Twitter –
Simon Rogers
 Time on data projects > than other items on Guardian
 Visualize and provide download; increases transparency
 Human element – location;
 Open source – Alastair Dant
 Partner, design, usability and user experience – Chrys Wu
HOW TO BUILD A NEWS APP
 Enduring, help user s solve problems, ex ProPublica’s
Dialysis Facility Tracker
 Audience? what are their needs
 How much time should I spend?
 Take things to the next level by developing productivity
tools
 News Apps 2.0 – combining
stor ytelling and public ser vice
– Chase Davis
 ProPublica – data, national in
scope, yet granular enough to
expose details – Scott Klein
 Need headline, byline, lead, nut
graph
 Generative – generate more
repor ting and stories
 ProPublica’s Dollars for Docs
 Mapping LA
 Why? It’s great journalism, it’s
hugely popular, and if we don’t do
it somebody else will
VISUALIZATION
 Help you identify themes and questions for the rest of your
reporting
 Identify outliers: good stories, or perhaps errors, in your data
 Help you find typical examples
 Show you holes in your reporting
Visualizations also play multiple roles in publishing:
 Illustrate a point made in a story in a more compelling way
 Remove unnecessarily technical information from prose
 Particularly when they are interactive and allow exploration,
provide transparency about your reporting process to your
readers
TIPS
 Use small multiples to quickly orient yourself in a large
dataset
 Look at your data upside down and sideways
 Don’t assume
 Avoid obsessing over precision
 Create chronologies of cases and events
 Meet with your graphics department early and of ten
 Match the ef fort of the data collection with the interactive
graphic
 Design for two types of readers
 Convey one idea – then simplify
WHEN TO USE VISUALIZATION
To
To
To
To
show change over time
compare values
show connections
trace flows
DESIGNING WITH DATA
 To show hierarchy
 To browse large databases
 To envision alternate outcomes
When to not use
 When your story can be better
told through text or multimedia
 When you have very few data points
 When you have little variability in your data
 When an map is not a map
 Consider a table
TOP TOOLS









Fusion Tables
Tableau Public
Google Spreadsheet Charts
ManyEyes
Color Brewer
Chartsbin
iCharts
GeoCommons
Piktochart.com
3 KINDS OF USERS
 Occasional Users. These are users who want information
simply and quickly. They are interested in getting a picture of
the data, not detailed analytics. We can engage them via
Tweets or interactive graphics.
 Active Users. Users who stimulate discussion, and use the
data to increase their knowledge of a given area or challenge
the assumptions of the data. For these users we want to
provide feedback mechanisms and the possibility to share
insights with their peers via social networks.
 Data Hogs: These users want raw data for visualization or
analysis. We simply give them the data for their purposes.
 Engage people with social media
Download