VALUING PUBLIC DOMAIN IMAGES ON WIKIPEDIA AND WHY IT MATTERS Paul J. Heald Richard W. & Marie L. Corman Research Professor College of Law, University of Illinois University of Glasgow, CREATe (RCUK Centre for Copyright and New Business Models in the Creative Economy) PART I: THE PROBLEM AND WHY IT MATTERS Retroactive extension of the copyright term Nothing has fallen into the public domain in the U.S. due to expiration since 1998. 1923! Justification? “Bad things happen when works fall into the public domain.” [allegations of nonuse and over-use] Reality? The Problem of the Missing Works . . . Empirical evidence as relevant to the policy debate. PART II: VALUATION OF PUBLIC DOMAIN WORKS Measuring what the creative industries lose when works don’t fall into the public domain and disappear. Valuing public domain images on Wikipedia as a positive example Possible application for thinking about valuation in litigation and transactions too . . . 2317 New Editions from Amazon by Decade 400 350 300 Fiction & Non-Fiction Books 250 200 150 100 50 1800 1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 0 0.3 Estimated Amazon Titles by Percent Per Decade 0.25 0.2 0.15 Fiction & Non-Fiction Books 0.1 0.05 1800 1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 0 0.25 0.2 0.15 Estimated Amazon Book Titles Adjusted for Total Number of Books Published Per Decade WorldCat Adjusted CopyReg Adjusted 0.1 0.05 1800 1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 0 AVAILABILITY OF BESTSELLERS PUBLISHED 1913-22 (IN PD) AND 1923-32 (COPYRIGHTED) Availability of Works 1 0.9 0.8 Percent available (1=100%) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Books Published 1907-22 Books Published 1923-32 DO EBOOKS SOLVE THE PROBLEM? In 2014, 94% of 165 public domain bestsellers from 1913-32 were available in eBook format, up from 48% in 2006. Of 167 bestsellers from 1923-32 still under copyright, only 27% (45/167) had been made available as eBooks by publishers by 2014. And of those 45 copyrighted eBooks, only one was out-of-print in hard copy format. Market failure?!! Percentage of 800 NYT Reviewed Books in eBook Format by Decade 0.8 0.7 0.6 0.5 2014 eBooks 0.4 0.3 0.2 0.1 0 1930's 1940's 1950's 1960's 1970's 1980's 1990's 2000's EVIDENCE OF DEMAND FOR MISSING WORKS Smith, Telang, and Zhang, “Analysis of the Potential Market for Out-of-Print eBooks,” http://papers.ssrn.com/sol3/papers.cfm?ab stract_id=2141422 (2012) Authors used matched pairs analysis to estimate a $740 million eBook market for out-of-print titles. Why do publishers seems to leave money on the table? EVIDENCE OF DEMAND FOR MISSING WORKS Initial Publication Dates of New (Amazon) and Used Books (Abe Books) for Sale 2012-2013 10000000 9000000 8000000 7000000 6000000 5000000 4000000 Used Books New Books 3000000 2000000 1000000 2000's 1990's 1980's 1970's 1960's 1950's 1940's 1930's 1920's 1910's 1900's 1890's 1880's 1870's 1860's 1850's 1840's 1830's 1820's 1810'2 1800's 0 EXPLOITATION LEVELS OF AUDIO BOOKS What Is Available in Audio Book Version? 33% public domain titles 16% copyrighted titles 80% of top 20 copyrighted titles 100% of top 20 Public Domain titles WHAT ELSE IS AT STAKE: PENGUIN CLASSICS PRICING DATA 48 Copyrighted Books: Average price per book ($14.60) Average length (310 pages) Average price per page ($.047) 48 Public Domain Books: --Average Price per book ($11.10) --Average length (374 pages) --Average price per page ($.03) AUDIO BOOK PRICING DATA for the top 20 PD titles, (CD) (MP3) Average price, per minute playing time based on the lowest price version at audible.com For top 20 copyright titles (CD) (MP3) DOES LACK OF OWNERSHIP CAUSE OVERUSE? Copyright owners license compositions once every 3.3 years. Public domain compositions used once every 3.8 years. No evidence of over-grazing looking at the songs as a group OVER-USE?, CONT’D The two most exploited PD songs are: Danny Boy (9 movies from 1993-2001) After You’ve Gone (9 movies, 1996-2006) Copyrighted Songs in 1930’s Sweet Georgia Brown (15 movies), Am I Blue? (17 movies) Happy Days Are Here Again (34 movies) Copyrighted songs more recently; Blues Skies (10 movies from 1994-2004) Stardust (10 movies in the 1990’s) Dream a Little Dream of Me (10 movies from 1995-2005) VALUING PUBLIC DOMAIN IMAGES ON WIKIPEDIA AND WHY IT MATTERS Paul J. Heald Richard W. & Marie L. Corman Research Professor College of Law, University of Illinois University of Glasgow, CREATe (RCUK Centre for Copyright and New Business Models in the Creative Economy) WIKIPEDIA RESEARCH . . . CALCULATING THE VALUE OF THE PUBLIC DOMAIN CONTEXT UK Intellectual Property Office wants to know! Debate over retroactive extension of the copyright term. Evaluating the benefits of orphan works legislation Exercise in valuation with applicability to damage calculation in cases of image infringement on the web, e.g. how much is an infringer unjustly enriched by appropriating an image. Prior How research on the cost of © Protection: to evaluate the positive benefit of the lack of © protection? DATA SOURCE: WHY WIKIPEDIA? Everyone agrees that Wikipedia is a valuable resource Public domain photos add value to pages Data about use of photos is transparent and accessible See http://en.wikipedia.org/wiki/Amy_Tan See http://stats.grok.se/ VALUING WHAT? How to calculate the value of a copyright in a photo to its owner? How to calculate the value of a copyright in a photo to the public? How to calculate value of the absence of legal protection to the public? Private value ≠ public welfare POLLOCK HYPO A copyright book sells for $10 in the book shop. It falls into the public domain and now sells in the shop for $5 and is available for free on the internet. Has the value of the book changed? Less valuable to the former copyright owner More valuable to the public (cheaper) Should policymakers encourage the change in legal status? As long as the book remains accessible, we see an increase in consumer surplus of $5-$10 per copy. So why ever protect a work with copyright? RESEARCH QUESTIONS Is a sample of Wikipedia web pages more likely to contain an image when a public domain work is available? To what extent does the availability of public domain images lower the cost of web page building? To what extent does the addition of an image to a web page increase traffic to that page? Can the total value of both cost savings and increased traffic due to the use of public domain images on Wikipedia be quantified by reference to the characteristics of the sample of Wikipedia pages? PHASE I: BESTSELLING AUTHORS Identify 365 authors with New York Times year-end bestselling novels in the United States from 1895 to 1965 and collect data for each author: Number of bestsellers, date of first bestseller, birth and death date of author; Wikipedia URL of author page and date image of author (if any) added; Copyright status of any author image and legal justification for any image in the public domain; Number of Amazon reviews of most popular book for each author; Number of page views in March, April, and May of 2009 and 2014. Word count on author page as of June 2009 and June 2014 OLDER AUTHORS = MORE IMAGES Public domain effect means that older authors (counter-intuitively) have more images: Bestselling Authors by Date of Birth Percent with Image on Wiki Page 0.93 0.92 0.82 0.81 0.61 0.58 0.52 0.46 0.54 <1850 <1860 <1870 <1880 <1890 <1900 <1910 <1920 <1940 n=15 n=25 n=46 n=52 n=68 n=53 n=49 n=35 n=28 362 Bestselling Authors by Date of Death Percent with Image on Wiki Page 0.9 0.94 0.8 0.8 0.76 0.69 0.56 0.63 0.6 0.53 0.31 <1910 <1920 <1930 <1940 <1950 <1960 <1970 <1980 <1990 <2000 <2014 n=10 n=16 n=30 n=39 n=49 n=45 n=52 n=28 n=34 n=26 n=33 SOURCE OF IMAGES? Legal Status of Author Images Percent 0.79 0.21 Copyrighted Public Domain Justification for Image Use Percent 0.54 0.13 0.07 0.12 0.13 PRELIMINARY CONCLUSION The Public Domain clearly increases the number of photos on Wiki web pages. This adds value, but how much? Direct value might be measured in costs saved to page builders Indirect value might be measured in term of increased traffic to web sites with images. http://www.koozai.com/blog/searchmarketing/content-marketingseo/increase-traffic-with-images/ COSTS SAVED: KIPLING ET AL . . . Free on Wikimedia Commons License for 1 Year: $105 on Cobis and $117 on Getty Images COSTS SAVED . . . 25 authors have public domain images exactly the same as those licensed by Corbis or Getty 104 more have public domain images similar to those licensed by Corbis or Getty Average yearly license = $120 Page builders saved approximately $77,400 over a five-year period (129 public domain images x $120/year x 5 years). INCREASED TRAFFIC? Authors with images had a total of 6.8 million views during March, April, and May of 2014 Authors without images had a total of 386,000 views during March, April, and May of 2014 Suggests serious need to adjust for author popularity, but . . . Adjusting for a page’s word count seems unnecessary. (From June 2009 to June 2014, word count for authors with images when up 68% while over the same period word count for authors without images went up 67%). BIG PROBLEM . . . How would you adjust for differences in traffic caused by the popularity of the author? Ernest Hemingway is very popular Maarten Maartens, not so much . . . ADJUSTING FOR POPULARITY #1 As a measure of popularity, the number of Amazon reviews for each author’s most reviewed book was counted. Authors were grouped according to the Amazon review number: 0-9, 10-29, 30-99, 100-200. Authors with more than 200 customer reviews were omitted: 47 with images; 5 without. Median Page Views: March, April, & May 2014 11575 Authors with Image Authors without Image 5224 2590 1326 758 0-9 Reviews N=76/57 1595 5168 2436 10-29 Reviews 30-99 Reviews N=36/21 N=43/21 100-199 Reviews N=32/14 ADJUSTING FOR POPULARITY #2 40 pairs of authors without images on June 1, 2009 were matched together based on similar or exact number page views counted during the months of March, April, and May 2009. This created a set of pairs of authors of similar popularity at a time when none of them had images on their web pages. Half of the authors received an image before March 1, 2014, and one-half did not. MATCHED PAIRS METHODOLOGY In March, April, & May of 2009, Gwen Davis page [no image] had 544 views. In March, April, & May of 2009, James Will page [no image] had 542 views. In March, April, & May of 2014 Gwen Davis [image added 2011] had 675 page views. In March, April, & May 2014 James Will [no image] had 525 page views. 6% Percent Traffic Increase from June 2009 to June 2014 0.35 0.3 0.25 0.2 0.15 Traffic Increase 0.1 0.05 0 Authors with Images Authors without Images ADJUSTING FOR POPULARITY #3 Identified the lowest traffic month for each author in the year prior to June 2009 and June 2014. 42 tightly matched pairs of authors with and without images based on lowest traffic month in the year prior to 2009. Authors with images showed a 36% increase in traffic from 2009-2014, while authors without images showed a 19% increase. Net increase associated with image use = 17% COMPOSERS AND LYRICISTS : ADJUSTING FOR POPULARITY #4 77 pairs and compared the number page views during the period of March, April, and May 2009 before any composer or lyricist page acquired an image, with the number of page views in March, April, and May of 2014, after half of the pages acquired an image. Tightly matched. Pages that never acquired an image had 209,116 aggregate page views in March, April, and May of 2009, while pages that later acquired an image had 209,294 Between 2009 and 2014, the traffic to pages with images increased 56% while the traffic to pages without images increased only 34%, resulting in a net increase in traffic to pages with images of 22%. COMPOSERS AND LYRICISTS FOR MORE DATA POINTS: ADJUSTING FOR POPULARITY #5 68 tightly matched pairs based on the lowest traffic month for each composer and lyricist in 2009 before any sample page contained an image. Over the five-year period, traffic to pages with images increased 40% while the traffic to pages without images increased only 21%, resulting in a net increase of 19%. INCREASED TRAFFIC DUE TO IMAGES ON WIKIPEDIA PAGES? Amazon Review Adjustment = 100% Matched Pairs #1 (authors) = 6% Matched Pairs #2 (authors) = 22% Matched Pairs #3 (composers) = 17% Matched Pairs #4 (composers) = 19% EXTRAPOLATING FROM RANDOM PAGES 300 random pages studied 50% contain images 87% of images are in the public domain The pages can be categorized: 25% (Places), 27% (Biographical), 5% (Events), and 43% (Things) EXTRAPOLATING COSTS SAVED . . . 4,560,201 [total Wikipedia pages as of July 18, 2014] x .50 x .87 = 2,000,000 Given that Corbis and Getty routinely charge $105 and $117 dollars respectively to license a photographic image for a year on the internet, this suggests a net savings of $208 million to $232 million per year. EXTRAPOLATING INCREASED TRAFFIC 4,560,021[total Wiki pages as of 6/14] x .5 [percentage of pages with images] x .87 [percentage of pages with public domain images] x 18,966 [average page views per year] x .0053 [average value of a Wikipedia page view] x .19 [percent of traffic due to public domain image] = $37,884,478.77 per year traffic value ROBUSTNESS CHECK: WILLINGNESS TO PAY? 240 authors with images received approximately 28 million page views in 2014. Hypothetical cost of licenses = approximately $28,000 (240 x $120/year). Per page view cost = 1/10 of a penny. If the 19% traffic increase figure is correct, then images drove 5,320,000 of our author’s page views in 2014. If the WebInDetail estimate of a $.0053 value for each Wikipedia page view is also correct, then the advertising value of the images on our author web pages is $28,196. OH, AND BUY MY BOOK! SOURCES Buccafusco, Christopher & Paul Heald. 2013. “Do Bad Things Happen When Works Fall into the Public Domain?: Empirical Tests of Copyright Term Extension,” 28 Berkeley Journal of Law & Technology 1-43. Brooks, Tim. 2005. Survey of Reissues of U.S. Recordings. Washington, D.C.: Library of Congress, available at http://www.clir.org/pubs/reports/pub 133. Crook, John R. 2013. “U.S. Supports New Treaty to Facilitate Visually Impaired Persons’ Access to Book,” 107 American Journal of Int’l Law 933-34. David, Paul & Jared Rubin. 2008. “Restricting Access to Books on the Internet: Some Unanticipated Effects of U.S. Copyright Legislation,” 5 Review of Economic Research on Copyright Issues 2353. Erickson, Christopher, Paul J. Heald, and Martin Kretschmer. 2015. “The Valuation of Unprotected Works: A Case Study of Public Domain Images on Wikipedia,” 28 Harvard Journal of Law & Technology ___. Favale, Marcella, et al. 2013. Copyright and the Regulation of Foreign Works: A Comparative Review of Seven Jurisdictions and a Rights Clearance Simulation. London: Intellectual Property Office. Ginsburg, Jane. 2000. “From Having Copies to Experiencing Works: The Development of an Access Right in U.S. Copyright Law,” in Hugh Hansen (ed.), U.S. Intellectual Property: Law & Policy. Sweet & Maxwell: London. SOURCES Heald, Paul J. 2008a. “Property Rights and the Efficient Exploitation of Copyrighted Works: An Empirical Analysis of Public Domain and Copyrighted Fiction Bestsellers,” 93 Minnesota Law Review 1031-63. Heald, Paul J. 2008b. “Optimal Remedies for Patent Infringement: A Transactions Cost Approach,” 45 Houston Law Review 1165-1200. Heald, Paul J. 2014a. “How Secondary Liability Rules Create a Market for Music on YouTube,” 82 University of Missouri-Kansas City Law Review 313-26. Heald, Paul J. 2014b. “How Copyright Keeps Works Disappeared,” 11 Journal of Empirical Legal Studies 829-66. Landes, William & Richard Posner. 2003. The Economic Structure of Intellectual Property Law. Boston: Belknap Press. Liebowitz, Stan & Stephen Margolis. 2005. “17 Famous Economists Weigh in on Copyright: The Role of Theory, Empirics, and Network Effects,” 18 Harvard Journal of Law and Technology 435-57. Liebowitz, Stan J. 2009. “The Myth of Copyright Inefficiency,” 32 Journal of Regulation 28-34. Loren, Lydia. 2007. “Building a Reliable Semi-Commons of Creative Works: Enforcement of Creative Commons License and the Limited Abandonment of Copyright,” 14 George Mason Law Review 271-328. SOURCES Lunney, Glynn. 1996. “Reexamining Copyright’s Incentives-Access Paradigm,” 49 Vanderbilt Law Review 483-656. Mueller-Langer, Frank & Richard Watt. 2010. “Copyright and Open Access for Copyrighted Works,” 7 Review of Economic Research on Copyright Issues 45-65. Schonwetter, Tobias, et al. 2009-2010. “Copyright and Education: Lessons from African Copyright and Access to Knowledge,” African Journal of Information and Communication 37-52. Smith, Michael, Rahul Telang, and Yi Zhang. 2012. “Analysis of the Potential Market for Out-of-Print eBooks,” available at http://papers.ssrn.com/sol3/papers.cfm?ab stract_id=2141422. Suzor, Nicholas. 2013. “Access, Progress, and Fairness: Rethinking Exclusivity in Copyright,” 15 Vanderbilt Journal of Entertainment & Technology Law 297-342.