DATA JOURNALISM HANDBOOK datajournalismhandboo k.org DATA JOURNALISM HANDBOOK The Data Journalism Handbook was born at a 48 hour workshop at MozFest 2011 in London. Hundreds of contributors WHAT IS DATA JOURNALISM? Journalism done with data New possibilities that open up when you combine the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available. Can help a journalist tell a complex story through engaging infographics Can help explain how a story relates to an individual Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both WHY SHOULD JOURNALISTS USE DATA? Today news stories are flowing in as they happen, from multiple sources, eye -witnesses, blogs and what has happened is filtered through a vast network of social connections, being ranked, commented and more of ten than not: ignored . Right now, a few pioneering journalists already demonstrate how data can be used to create deeper insights into what is happening around us and how it might af fect us . Data analysis can reveal “a stor y’s shape” (Sarah Cohen), or provides us with a “new camera” (David McCandless). Becoming knowledgeable in searching, cleaning, and visualizing data is transformative for the profession of information gathering. Companies and institutions around the world are looking for “sensemakers” and professionals, who know how to dig through data and transform it into something tangible . There is one barrier keeping journalists from using this potential: training in order to learn how to work with data through all the steps from a fir st question to a big data -driven scoop. SURVEY The European Journalism Centre conducted a survey to find out more about training needs of journalists. Found there is a big willingness to get out of the comfort zone of traditional journalism and to invest time to master the new skills. The results from the survey showed that journalists see the opportunity, but need a bit of support to cut through the initial problems keeping them from working with data. WHY IS DATA JOURNALISM IMPORTANT Filtering t he Flow of Data New Approaches to Stor ytelling ( Pilhofer) Like Photo Journalism with a Laptop ( Boyer) Data Journalism is t he Future ( TBL) Number-Crunching Meets Word -Smithing Updating Your Skills Set A Remedy for Information A symmetr y An Answer to Data -driven PR Providing Independent Interpretations of Of ficial Information Dealing with t he Data Deluge Our Lives are Data A Way to Save T ime An Essential Par t of t he Journalists' Toolkit Adapting to Changes in Our Information Environment A Way to See T hings You Might Not Ot her wise See A Way To Tell Richer Stories EXAMPLES EXAMPLES EXAMPLES EXAMPLES EXAMPLES EXAMPLES PERSPECTIVES August 2010 – one of 1 st data journalism conferences in Amsterdam The way that media organizations like Guardian and the New York Times handled the large amounts of data released by Wikileaks is one of the major steps that brought the term into prominence. One of the earliest formulations of what we now recognise as data journalism was in 2006 by Adrian Holovaty, founder of EveryBlock — an information service which enables users to find out what has been happening in their area, on their block . He argues that journalists should publish structured, machine readable data, alongside the traditional ‘big blob of text ’. Using data to improve reportage and delivering structured (if not machine readable) information to the public has a long history. (CAR) PERSPECTIVES In the early 1970s the term ‘precision journalism’ was coined to describe this type of news -gathering: “the application of social and behavioral science research methods to the practice of journalism.” Early examples, Manchester Schools (1821) and Mortality of British Army (1858) CAR is a technique for gathering and analyzing data as a way of enhancing (usually investigative) reportage, whereas data journalism pays attention to the way that data sits within the whole journalistic workflow. Data journalism is about mass data literacy IN THE NEWSROOM How does data journalism sit within newsrooms around the world? How did leading data journalists convince their colleagues that it is a good idea to publish datasets or launch data -driven news apps? Should journalists learn how to code, or work in tandem with talented developers? Look at the role of data and data journalism at the Australian Broadcasting Corporation, the BBC, the Chicago Tribune, the Guardian and the Zeit Online. THE ABC’S DATA JOURNALISM PLAY Australia’s public broadcaster Content makers at the ABC have been encouraged to as the corporate mantra puts it — be ‘agile’. Inspired by work of The Guardian Coal Seam Gas by the Numbers - five pages of interactive maps, data visualizations and text The jewel was an interactive map showing coal seam gas wells and leases in Australia. THEIR TEAM A web developer and designer A lead journalist A part time researcher with expertise in data extraction, excel spread sheets and data cleaning A part time junior journalist A consultant executive producer A academic consultant with expertise in data mining, graphic visualization and advanced research skills The services of a project manager and the administrative assistance of the ABC’s multi -platform unit Importantly we also had a reference group of journalists and others whom we consulted on a needs basis WHAT THEY LEARNED Co-location of the team is vital. Our developer and designer were of f-site and came in for meetings. This is definitely not optimal! Place in the same room as the journalists. Our consultant EP was also on another level of the building. We needed to be much closer, just for the drop -by factor Choose a story that is solely data driven. Big media organizations need to engage in capacity building to meet the challenges of data journalism Data journalism is interdisciplinary DATA JOURNALISM AT BBC School league tables Every death on the road The world at seven billion BBC Budget Calculator Simple tools Mining the data Understanding an issue Team is 20 journalists, designers and developers Don’t have people who are specifically identified as ‘data’ journalists, but all editorial staf f on the team have to be proficient at using basic spreadsheet applications such as Excel and Google Docs to analyze data. HOW THE NEWS APPS TEAM AT CHICAGO TRIBUNE WORKS We work closely with editors and reporters to help : (1) research and report stories, (2) illustrate stories online (3) build evergreen web resources for the fine people of Chicagoland. Important that they sit in newsroom Founded by technologists for whom journalism was a career change Work in Agile fashion; daily standup meeting; pair programming; short projects; fail fast; hack iteratively and on deadline App ideas come from newsroom GUARDIAN DATABLOG Of fers full datasets from behind news stories Now consists of a front page (guardian.co.uk/data); searches of world government and global development data; data visualizations by from around the web and Guardian graphic artists, and tools for exploring public spending data Members of Parliament expense scandal - crowdsourced 458,000 documents relating to MPs' expenses and analyzed the detailed data of which MPs had claimed what Wikileaks Sit next to newsdesks PROCESS ZEIT ONLINE The PISA based Wealth Comparison project is an interactive visualization that enables comparison of standards of living in dif ferent countries. Analyze and visualize this data to provide a unique way of comparing standards of living in dif ferent countries HOW TO HIRE A HACKER Organization may have people with these skills Post on job websites Contact relevant mailing lists Contact relevant organizations Join relevant groups/networks Local interest communities Hackathons and competitions Ask a geek They code the full stack They see the whole picture They tell a good story They talk things through They teach themselves But they need technical management HACKATHONS Find stories Connect people Host a social event Question or data? Little technical knowledge Is it news? What’s the routine? OUR STORIES COME AS CODE Data projects don’t date You can build on your past work Data journalism pays for itself BUSINESS MODELS Many journalists seem to be unaware of the size of the revenue that is already generated through data collection, data analytics and visualization The big, worldwide market that is currently opening up is all about transformation of publicly available data into something our that we can process: making data visible and making it human Bloomberg Reuters The Economist Startups Increase importance of role in society; trust Non-profit funding GETTING DATA Many databases on the web are indexed by search engines, whether the publisher intended this or not. Include search terms relating to content and format Search by part of url: ‘inurl:downloads filetype:xls’ Search sites where data may be available: ‘site:agency.gov Directory Listing’ Browse data sites and services Ask on a forum or mailing list Join Hacks/Hackers/Ask an expert Learn about Govt IT Search again with new phrases Write a FOI request YOUR RIGHT TO DATA Freedom of Information Plan ahead Know fees Know your rights and communicate that Keep it simple Keep it focused Be specific Submit multiple requests/international requests Do a test run Anticipate exceptions Ask for access Keep a record Make it public Ask for raw data WOBBING EXAMPLES Farm Subsidy Side Ef fects Smuggling Death GETTING DATA FROM THE WEB Web-based APIs Extract from PDF Scrape it Scraperwiki Chrome Scraper Extension Scrapers are small pieces of code that automate getting data from a website that is not in csv or other easy to use format. Based on html structure of page We will discuss in coming weeks THE WEB AS A DATA SOURCE Whois Blekko Compete.com Google site search – narrow search Bit.ly Twitter Google’s cache Wayback Machine at archive.org View, Source TinEye – image search YouTube statistics Emails Wikipedia article traffic Google Insights OTHER Crowdsourcing data Sharing - Ethical issues, restrictions, licenses BECOMING DATA LITERATE IN 3 STEPS How was the data collected? What’s in there to learn? How reliable is the information? Sample size, intervening variables, accounting for all relevant information… These are basic research methods techniques with which you should be familiar. TIPS FOR WORKING WITH NUMBERS Enjoy yourself; handle as you would other evidence Don’t confuse skepticism with cynicism Be objective and open-minded about what the data means Uncertainty is ok The investigation is a story Use prompts to think about the data. Is it too big or small? Does it make sense? Where did it come from? BASIC STEPS FOR WORKING WITH DATA Data requests should begin with a list of questions you want to answer. Data often is messy and needs to be cleaned. Data may have undocumented features DATA STORIES Start with data, finish with a story Measurement – counting or totaling something Proportion Internal comparison External comparison Change over time League tables – must take size into account Analysis by category Association - correlation TOOLS OF CHOICE Google Spreadsheets and Fusion Tables Excel Django and Ruby on Rails Statistical tools like SPSS, Google Refine and R Mapping tools – QGIS, ArcGIS, TileMill Tableau Public A good text editor, like TextWrangler, BBEdit, Komodo Edit VISUALIZE DATA Use it to provide insight Tables for small amounts of data Charts with visual properties Maps Graphs Analyze and interpret what you see Document your insights and steps Why have I created this chart? What have I done to the data to create it? What does this chart tell me? Transform data – zoom, filter, outlier removal PRESENTING DATA TO THE PUBLIC There are times when data can tell a story better than words or photos New tools and technologies help in visual storytelling The question facing journalists now less about whether you can turn your dataset into a visualization, but whether you should - – Aron Pilhofer Use of motion and animation when appropriate Excel, news desk, visualize, post. Then promote via Twitter – Simon Rogers Time on data projects > than other items on Guardian Visualize and provide download; increases transparency Human element – location; Open source – Alastair Dant Partner, design, usability and user experience – Chrys Wu HOW TO BUILD A NEWS APP Enduring, help user s solve problems, ex ProPublica’s Dialysis Facility Tracker Audience? what are their needs How much time should I spend? Take things to the next level by developing productivity tools News Apps 2.0 – combining stor ytelling and public ser vice – Chase Davis ProPublica – data, national in scope, yet granular enough to expose details – Scott Klein Need headline, byline, lead, nut graph Generative – generate more repor ting and stories ProPublica’s Dollars for Docs Mapping LA Why? It’s great journalism, it’s hugely popular, and if we don’t do it somebody else will VISUALIZATION Help you identify themes and questions for the rest of your reporting Identify outliers: good stories, or perhaps errors, in your data Help you find typical examples Show you holes in your reporting Visualizations also play multiple roles in publishing: Illustrate a point made in a story in a more compelling way Remove unnecessarily technical information from prose Particularly when they are interactive and allow exploration, provide transparency about your reporting process to your readers TIPS Use small multiples to quickly orient yourself in a large dataset Look at your data upside down and sideways Don’t assume Avoid obsessing over precision Create chronologies of cases and events Meet with your graphics department early and of ten Match the ef fort of the data collection with the interactive graphic Design for two types of readers Convey one idea – then simplify WHEN TO USE VISUALIZATION To To To To show change over time compare values show connections trace flows DESIGNING WITH DATA To show hierarchy To browse large databases To envision alternate outcomes When to not use When your story can be better told through text or multimedia When you have very few data points When you have little variability in your data When an map is not a map Consider a table TOP TOOLS Fusion Tables Tableau Public Google Spreadsheet Charts ManyEyes Color Brewer Chartsbin iCharts GeoCommons Piktochart.com 3 KINDS OF USERS Occasional Users. These are users who want information simply and quickly. They are interested in getting a picture of the data, not detailed analytics. We can engage them via Tweets or interactive graphics. Active Users. Users who stimulate discussion, and use the data to increase their knowledge of a given area or challenge the assumptions of the data. For these users we want to provide feedback mechanisms and the possibility to share insights with their peers via social networks. Data Hogs: These users want raw data for visualization or analysis. We simply give them the data for their purposes. Engage people with social media