Local Big Data: The Role of Libraries in Building Community Data Infrastructures ABSTRACT Communities face opportunities and challenges in many areas, including education, health and wellness, workforce and economic development, housing, and the environment [21]. At the same time, governments have significant fiscal constraints on their ability to address these challenges and opportunities. Through a combination of open government, open data, and civic engagement, however, governments, citizens, civil society groups, and others are reinventing the relationship between governments and the governed by developing crowdsourced and other innovative solutions for community advancement. Underlying this reinvention and innovation is data – particularly local data about housing, air quality, graduation rates, literacy rates, poverty, disease, and more. And yet, not all communities have the capacity to create, work with, or leverage data at the local level. Using a case study approach in a medium-sized U.S. city, this paper focuses on the issues that smaller communities face when seeking to create local data infrastructures and the extent to which libraries can develop their capabilities, capacity, and abilities to work with community information and data to facilitate community engagement and high-impact, locally relevant analytics. General Terms Data management, communities, libraries. Keywords Big Data, Community engagement, Data infrastructure, Data curation. . 1. INTRODUCTION Communities face opportunities and challenges in many areas, including education, health and wellness, workforce and economic development, housing, and the environment (Seattle Foundation, 2006). At the same time, governments have fiscal constraints which limit their ability to directly address these challenges and opportunities. Through a combination of open government, open data, and civic engagement, however, governments, citizens, civil society groups, and others are reinventing the relationship between Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. governments and the governed by developing civic crowdsourcing initiatives and other innovative solutions for community advancement. Underlying this reinvention and innovation is data – particularly local data about housing, employment, air quality, graduation rates, literacy rates, poverty, business activity, and disease.. Data have existed in many key domain areas for some time, often in the form of large-scale national datasets, such as those created by the U.S. Census Bureau, Bureau of Labor Statistics, Environmental Protection Agency, and the Centers for Disease Control. All of the data from these agencies have varying levels of local granularity, and often have more localized (e.g., block, neighborhood, city, county, region) components. Emerging data integration capabilities and analytic techniques, however, enable novel ways of viewing and analyzing data. This, in turn, has supported new strategies for informing policy-makers, decision-makers, stakeholders, and citizens about their communities. Often referred to as Big Data, the ability to harness geo-spatial data, chronic disease data, literacy data, and others to create data visualizations, interactive map-based analysis, and more can often shed light on critical community needs, gaps, and solutions [1, 5, 7]. But in order to engage in these data science efforts; create analytic tools; and foster civic engagement, there are underlying infrastructure needs which must be met. Critical elements of community data infrastructures include, but are not limited to [8, 10, 14]: Central data repositories, where data are stored, maintained, and catalogued; Data standards, to which collected data adhere; Data communities, which will collect, maintain, and curate data; Effective information structure/ecology, through which to foster data communities, engagement, and use; and Awareness, at the organizational, neighborhood, and individual levels, that data affect their daily well-being and functioning. In short, while data – and their analyses – are increasingly central to better understanding and improving the communities in which we live, realizing this potential requires infrastructure, organization, and skills that many communities are just now developing. Over the past several years there has been a steady increase in media and scholarly attention given to application of data analytics undertaken to strengthen communities. However, much of this research and media focus on Big Data and Smart Cities has focused on the efforts of large metropolitan areas, the use of vast data sets, and the large-scale open data initiatives [20]. While important, this work overlooks the fact that many communities operate on a much smaller, “local”, scale. In the US alone, there are over 18,000 cities, towns, and villages [22], many of which lack the population or capacity to engage in data initiatives using the strategies used by larger cities, national governments, and international NGOs. For every San Francisco there are thousands of smaller cities and towns, each of which has a range of local data (what we might call Local Big Data) – agricultural, cultural, community, historical – or the need for localized data drawn from larger national and international datasets [15]. While smaller communities often lack the resources, personnel, and infrastructure to fully realize the potential of Local Big Data using the same strategies employed by larger cities, it would be incorrect to assume that they have no information institutions that could facilitate local data initiatives. There are over 16,700 library buildings across the U.S., many of which are in small and rural communities [23]. Although libraries are not the first organizations that come to mind in discussions of Big Data, they have a long history of working with community members to make use of information resources to meet their individual and community needs. This, coupled with the growing role for libraries in the dissemination of government data and provision of public services, raises critical questions about how (and if) libraries might help their communities realize the potential of Local Big Data. To explore the local aspects of Big Data, and reported gaps between need and capacity within smaller communities, this paper presents preliminary findings on a case study conducted in a medium-sized U.S. city that focused on the ways in which community data can be leveraged through public libraries. In particular, the study explored the ways in which libraries can co-develop their capabilities, capacity, and abilities – and those of their small/mid-sized communities – to work with community information and data to better meet community challenges and opportunities. The paper concludes with a call for additional research that explores the challenges of Local Big Data and how smaller communities might leverage existing institutions and capacities to create robust local data ecosystems. 2. Literature Review When Barack Obama was a candidate running for the presidency the first time in 2008, his campaign focused on issues related to information and technology. The Obama campaign not only used technology – particularly social media – in new ways to raise money, target and contact voters, and get out the vote; it also devoted considerable attention to the ways in which an Obama presidency would use technology in governance and the policies they would support regarding information and technology [13]. The promise and challenges of harnessing the potential of large amounts of data were an area of emphasis in the campaign literature, an intended focus of his administration, and a critical factor in the success of his campaign’s fundraising and voter organization efforts [13]. While Big Data is not a new idea, candidate Obama was the first major presidential contender to express real interest in its potential at a time when technological advances made working with large amounts of data increasingly practicable. After initial efforts to promote open and transparent government through Executive Orders, the Obama Administration advanced openness through the Open Government Partnership (http://www.opengovpartnership.org/) [17] and in policies that required the release of machine-readable datasets [17]. The overarching technology focus of the Obama Administration has been on the use of technology to increase government transparency, or at least increase the volume of government information that is generally available [5, 11]. This follows an overall trend in recent years toward using e-government for greater access to government records and increased focus on proactive release [9]. The efforts of the Obama administration to promote access, openness, and transparency have centered around two main technologies – social media and open data [4, 6, 11, 12, 18]. These approaches both promote access, openness, and transparency and allow for members of the public, organizations, communities, and others to generate new and innovative insights from preexisting, but previously inaccessible, data. At their most advanced, Big Data initiatives from the government are based on open data and offer the potential for the democratization of working with data and providing innumerable opportunities to generate new value and insights from large amounts of existing information. For researchers, the release of large-scale government data has prompted new scientific breakthroughs in across disciplines. For companies, greater availability of large amounts of government data can make their products more effective, such as commercial weather forecasting services that rely on access to National Weather Service data. For community members, access to large amounts of government data offers opportunities to create new ways to understand, navigate, and develop their communities. For community institutions, like libraries, this type of data can provide insights into community needs and community composition at a previously-impossible level. However, while there is significant potential benefit created by making government data accessible, each of the opportunities described above cannot be realized unless communities, organizations, and individuals have the necessary infrastructure, skills, and knowledge. As a result, to date, most Local Big Data initiatives have been the province of the largest cities such as New York (https://data.cityofnewyork.us/), San Francisco (https://data.sfgov.org/), and Chicago (https://data.cityofchicago.org/). As reported by MeriTalk [15], there are often a range of capacity and skills gaps that smaller municipalities face when trying to promote engagement with community data. Thus although there is a recognition that managing and leveraging local data sets are important, there is often an inability to do so in smaller jurisdictions. Often ignored in this discussion, however, is the role and centrality of public libraries in the local data infrastructure domain. With over 16,700 public library buildings in the US, libraries are in almost every community – small and large – and bring information management, data curation, public access technology infrastructure, and digital literacy skills that are essential to working with community data [3, 19]. Libraries and librarians, however, can [2]: Provide data curation and management expertise. Big Data require management, curation, preservation, metadata schemes, and structures for access and availability. Libraries are well positioned to provide this expertise to the communities that they serve. Develop data analytics skills within libraries to foster and promote the use of data within the communities that the libraries, which can facilitate policy development and decision making. Serve as facilitators of open data in order to enhance transparency and openness of government. By working with the open and big data communities, libraries have an opportunity to promote democratic processes within the communities that they serve. Host a range of data events such as hackathons that promote the use of data for community engagement. While the involvement of public libraries in community efforts to realize the potential of Local Big Data is in its infancy, libraries such as Chattanooga Public Library (http://opendata.chattlibrary.org/) and Hartford Public Library (http://hartfordinfo.org/) have begun leading the way. Yet, at this time the best strategies for blending Big Data at the local level, public libraries, and communities remains unclear. The potential, however, is substantial. Thus this study sought to explore the topic through a case study approach as described below. needs, efforts, uses, and activities – and focus on existing gaps, future directions, potential realization, and determine the extent to which libraries might be able to help communities develop their data infrastructure and facilitate its use with the overall goal of exploring how (and if) to position libraries at the center of these efforts. The following research questions guided the study: The study focused on central issues of building critical data capabilities of within communities such as data infrastructure; organization of data and data communities; identification of key data sources and resources; assessing and improving data frequency and quality; data curation; facilitating data use, and measuring the impact of data related investments. The case study involved preliminary interviews with civil society and community groups, discussions with state library agency staff, an analysis of the city’s neighborhoods, an analysis of the library branches in the communities and surrounding areas, and a culminating workshop with a range of stakeholders (libraries, community organizations, researchers) intended to discuss and further identify issues associated with Local Big Data. More specifically, the study team: 3. METHODOLOGY The study used a case study methodology [24] to explore the data needs of community organizations and roles of public libraries within a medium-sized US city between August and October 2013 in meeting those needs. Site selection was purposeful, as the researchers had knowledge of the information and data landscape of the city, the civil society community, the public library community (in and around the city), and had support from the state library agency that coordinates public library initiatives throughout the entire state in which the city is located. In addition, the state library agency was willing to fund the initial research to promote a discussion regarding the roles that public libraries might play in developing community data infrastructures. A critical focus of the study, therefore, was to identify current data What are the local data needs of community organizations, libraries, and other community stakeholders? How do these stakeholders identify and select data of interest? How do these stakeholders currently manage the data that they use? Are there data that would be of use but are currently out of the reach of these stakeholders? How are these stakeholders using community data, and what are the gaps in skills regarding data use? What roles can libraries play the collection, management, and use of data within local communities? What challenges do libraries face in assuming data infrastructure roles in their communities? Documented the current practice and need regarding activities, events, and services of the community to better assess the community infrastructure for disseminating information. Identified current practices in libraries and community organizations for collecting and using activities, events, and services data. Reviewed how the community organizations, including libraries, create, collect, manage, and use the information and data about activities, events, and services. Explored practices used by community organizations to disseminate data and information services and resources to community members. These efforts provided baseline data that informed the interviews conducted with community organizations, which then informed the culminating workshop. The study team identified and contacted 44 community organizations (e.g., civil society, non-profit organizations) institutions for interviews and were successful in interviewing representatives from 14 of the organizations. Phone interviews were recorded, and interviewer notes and subject responses were captured using a Qualtrics survey to create an interview record. The interviewees were asked to describe how their organization engaged in annual planning of its activities; the types of information that they typically used in the planning process; the types of data that they used to communicate with funding agencies; additional data that they see would be beneficial to their organizations; and challenges that they face regarding the gathering, analysis, and use of data. The study team also conducted an exploratory analysis of library websites informed by a series of searches via Google. The searches were designed to identify libraries that might be engaging in local data initiatives, with a particular emphasis on leveraging local data, building a community of practice around data, and data engagement events. The search yielded several libraries that had aspects of data practices, but two in particular: Chattanooga Public Library (http://opendata.chattlibrary.org/) and Hartford Public Library (http://hartfordinfo.org/). The study team analyzed the websites of these libraries to better understand the roles that these libraries were playing in the local data ecosystem. Both libraries sought to be a community data platform, with Chattanooga engaging in a number of data events (i.e., hackathons) intended to foster community data use and development. Hartford Public Library has to date focused more on the gathering and repository aspects of community data. The data from the interviews and library website analysis informed the study team’s September 2013 workshop entitled “All Data is Local: The Role of Libraries in Local Data Ecosystems.” The event brought together 12 representatives of civil society, community organizations, researchers from the study team and local university, and the library community. The study team presented its findings from study activities, and facilitated discussion around community data needs, data usage, data challenges, data literacy skills, possible library roles in facilitating and meeting community data needs, and proposed strategies to address ways in which communities can leverage existing resources to build data infrastructure capacities. The next section presents key findings from the study. It is important to note, however, that this exploratory study has several limitations, including a focus on a single community, the limited number of participants, and the challenges experienced in gaining access to a broader cross section of community organizations. These limitations constrain the generalizability of the findings. However, while subject to significant caveats, the findings of this preliminary study, offer important insights into the challenges small and mid- sized communities face in developing the capacity to engage with and use Local Big Data. 4. FINDINGS Based on the background data gathering, interviews, website assessment, and workshop, findings emerged in four key areas: 1) Data needs; 2) Building capacity; 3) Demonstration; and 4) Building community. 4.1 Data Needs The study found that non-profit institutions not only need more data, they need more meaningful data. Although the organizations were often aware of general data sources that were available, such as U.S. Census Bureau data, their use of them was limited. Because many of the free sources of data are larger in scope it is hard for these institutions to get targeted information that is relevant to their institutional goals due to the varying degrees of data granularity. The types of data these organizations reported needing were: Targeted demographic data; Neighborhood-level data; Service supply and demand assessments; and Information on individuals who would be likely to donate funds. Institutions with a narrower scope, such as housing support and early literacy programs, need demographic information specific to their audience, such as youth or people with disabilities. Organizations focused on addressing local issues at the micro-level need data for specific, often idiosyncratically defined, regions. Many respondents recognized that while access to data was important, having the time, knowledge, and skills necessary to map the available data to specific decisions, actions, or needs was a critical gap. Some of the challenges inherent with obtaining data for small, locally focused organizations include obtaining the initial data and keeping it updated, reaching the right audience, selecting and applying appropriate analytic methods, and making the best use of limited time and staff available to process and interpret the available data. Although these institutions reported having access to pools of public data, these sources were often not targeted to the institution’s needs. Therefore, many organizations collected their own data, either through community meetings, budget or strategic plans, their own historical data and demographic studies, visiting similar centers, or holding benchmark studies. As a result, there is a need to add value and relevance to existing available datasets to better meet the data needs of the institutions. 4.2 Building Capacity A common theme that emerged was the need to develop the ability communities in general, and libraries in particular, to build, use, and maintain their capability to make use of local data. Specifically respondents notes that their organizations and communities would benefit from efforts to enhance their: Data infrastructure. There is a need to create, curate, and manage local datasets. These can be subsets of national datasets (e.g., Census, Center for Disease Control, Education, etc.) that are disaggregated at local levels, local datasets that focus on domain areas (e.g., housing, health), or regional/state datasets. Data infrastructure needs to include coordinating mechanisms, metadata standards, and other features that facilitate access and use of these datasets. Data portals. These can take a number of forms, but there is a need to create coordinated and centralized data portals that provide stakeholders with ready access to datasets, data dictionaries, information on metadata standards, and other features that ensure a place to store data, access to data, and data management techniques to ensure currency and reliability of datasets. Workshops. There is a clear need to host a range of workshops to bring stakeholders communities together and develop a range of skills such as data management and curation skills, data development and collection, data use, data analytics, visualizations, and the like. These may evolve into hackathons and connect to larger hacking events such as the National Day of Civic Hacking (http://hackforchange.org/). The presence of these capabilities was also influenced by the size of the organization, the size of their network, and the numbers of different programs/services they offered. However, no matter what their size and position in the community, respondents consistently identified the need for additional capacity in these areas. 4.3 Demonstration Participants and interviewees identified the need for data, but in some cases were unsure of what data, how it could be used effectively, and ways to demonstrate impact of the data collected. From these responses, it was clear that there is a need for: Identification of best practices. The collection of examples and best practices of community organizations, libraries, and other stakeholders use of data, data tools, and the building of local data infrastructures would provide clear paths for libraries and communities to follow as they engage in local data infrastructure development. Demonstration projects. Pilot projects in differing communities combined with workshops, can facilitate and promote dialog among key constituencies and further develop libraries as central figures in the building of local data infrastructures. Seed funding. There is an opportunity to create small amounts of funds for which community organizations and libraries can qualify in order to focus on local data infrastructure projects. In combination, these efforts can foster innovation while simultaneously building capacity and community data infrastructures. 4.4 Building Community The study identified the need for community building. More specifically, the study bought to light the need to: Bring together stakeholder communities. Civil society, researchers, non-profits organizations, and libraries often intersect and collaborate. They often do not, however, in the area of data infrastructures. Each of these constituencies can play important roles in data development, use, and impact, but each often operates independently. Creating a holistic community around data can create a much more robust local data infrastructure. And libraries can play a pivotal role as facilitator and convener. Identify critical roles and capabilities. By bringing together key data and community stakeholders, libraries can help map local data infrastructures, capabilities, and needs. Doing so can help address and develop community needs for coordinated data collection, management, storage, availability, and use. Create a central coordinating mechanism. It was clear that none of the institutions that participated in the study had the capacity to “do it all” – that is, create foundational data infrastructure, build data curation capacity, and develop the skills required for high-impact analytics. Having a central and neutral party for housing and coordinating local data was seen as an important community need – and one that at least some participants thought that libraries could fulfill. Thus the findings suggest that collaboration and coordination can benefit individual stakeholders and the community at large. 5. CONCLUSION Although much attention is paid to Big Data initiatives at national levels and large cities, there are many questions regarding how to create Big Data activities in smaller jurisdictions. Although subject to important caveats and limitations, these findings show the need for the building of a community of practice and advocacy focused on local data – and important challenges such communities face in being able to engage in data initiatives. However, additional research is necessary to explore the topic of local data infrastructures and the roles of libraries in community data. Specifically, there is a need to gather more data from better variety of organizations in different geographic areas in order to see if the results can be generalized. This would also include figuring out a standard lexicon that everyone could use to refer to this type of information. This research can be used to help create framework of events that libraries could host to be more like data platforms, and in essence be their community’s onestop-shop. In other words, the ultimate goal is to create a working theoretical framework for these communities to use data, and also create a model for data infrastructure that could be universally applied. References [1] Bertot, J.C., & Choi, H. (2013). Big data and egovernment: issues, policies, and recommendations. In Proceedings of the 14th Annual International Conference on Digital Government Research (dg.o '13). ACM, New York, NY, USA, 1-10. [2] Bertot, J. C., Gorham, U., Jaeger, P. T. & Sarin, L.C. (forthcoming). Big Data, Libraries, and the Information Policies of the Obama Administration. The Bowker Annual. [3] Bertot, J. C., Jaeger, P. T., Gorham, U., Taylor, N. G., & Lincoln, R. (2013). Delivering e-government services and transforming communities through innovative partnerships: Public libraries, government agencies, and community organizations. Information Polity, 18, 127-138. [4] Bertot, J. C., Jaeger, P. T., & Grimes, J. M. (2012). Promoting transparency and accountability through ICTs, social media, and collaborative e-government. Transforming Government: People, Process and Policy, 6(1), 78-91. [5] Bertot, J. C., Jaeger, P. T., & Grimes, J. M. (2010). Using ICTs to create a culture of transparency?: Egovernment and social media as openness and anticorruption tools for societies. Government Information Quarterly, 27(3), 264-271. [6] Bertot, J. C., Jaeger, P. T., Munson, S., & Glaisyer, T. (2010). Engaging the public in open government: The policy and government application of social media technology for government transparency. IEEE Computer, 43(11), 53-59. [7] Bollier, D. (2010). The Promise and Peril of Big Data. Washington, DC: Aspen Institute. Available at http://ilmresource.com. [8] Boyd, D., & Crawford, K. (2012). Critical questions for Big Data. Information, Communication & Society, 15(5), 662-679. [9] Cullier, D., & Piotrowski, S. J. (2009). Internet information-seeking and its relation to support for access to government records. Government Information Quarterly, 26(3), 441-449. [10] Frederikson, L. (2012). Big Data. Public Services Quarterly, 8(4), 345-349. [11] Jaeger, P. T., & Bertot, J. C. (2010). Transparency and technological change: Ensuring equal and sustained public access to government information. Government Information Quarterly, 27(4), 371-376 [12] Jaeger, P. T., Bertot, J. C., & Shilton, K. (2012). Information policy and social media: Framing governmentcitizen Web 2.0 interactions. In C. G. Reddick & S. K. Aikins (Eds.), Web 2.0 Technologies and Democratic Governance: Political, Policy and Management Implications (pp. 11-25). London: Springer. [13] Jaeger, P. T., Paquette, S., & Simmons, S. N. (2010). Information policy in national political campaigns: A comparison of the 2008 campaigns for President of the United States and Prime Minister of Canada. Journal of Information Technology & Politics, 7(1), 1-16. [14] Little, G. (2012). Managing the data deluge. Journal of Academic Librarianship, 38, 263-264. [15] MeriTalk. (2013). The State and Local Big Data Gap. Alexandria, VA: MeriTalk. Available at: http://www.meritalk.com/state-and-local-big-data.php. [16] Obama, B.H. (2013, May 9). Executive Order 13642: Making Open and Machine Readable the New Default for Government Information. Washington, DC: Office of the Executive. Available at: http://www.gpo.gov/fdsys/pkg/FR2013-05-14/pdf/2013-11533.pdf. [17] Office of Science and Technology Policy. (2011). The Open Government Partnership: National action plan for the United States of America. Washington, DC: Office of Science and Technology Policy. Available at: http://www.whitehouse.gov/sites/default/files/us_national_ action_plan_final_2.pdf. [18] Paquette, S., Jaeger, P. T., & Wilson, S. C. (2010). Identifying the risks associated with governmental use of cloud computing. Government Information Quarterly, 27(3), 245-253. [19] Shuler, J.A., Jaeger, P.T., & Bertot, J.C. (2014). Egovernment without government. Government Information Quarterly, 31(1): 1-3. [20] The Economist. (2012, October 27). Special report on technology and geography: A sense of place. The Economist, 405(8808): 1-22. [21] The Seattle Foundation. (2006). A Healthy Community: What You Need to Know to give Strategically. Seattle, WA: The Seattle Foundation. Available at: http://www.seattlefoundation.org/aboutus/Documents/1002 9170_HCReport_web.pdf. [22] U.S. Census Bureau. (2011). City and Town Totals: Vintage 2011. Available at: http://www.census.gov/popest/data/cities/totals/2011/index. html. [23] U.S. Institute of Museum and Library Services. (2013). Public Libraries in the United States Survey: Fiscal Year 2010. Washington, DC, U.S. Institute of Museum and Library Services. Available at: http://www.imls.gov/assets/1/AssetManager/PLS2010.pdf. [24] Yin, R. (2013). Case Study Research: Design and Methods (5th Ed). Thousand Oaks, CA: Sage Publications, Inc.