ERDC/CERL TR-08-DRAFT Best-practice Methods for Open-source Human Geography Data Compilation and Integration Azerbaijan and Turkey: Data Development Efforts Construction Engineering Research Laboratory Marina V. Drigo and Lynndee A. Kemmet Approved for public release; distribution is unlimited. September 2013 Center Directed Research ERDC/CERL TR-08-DRAFT September 2013 Best-practice Methods for Open-source Human Geography Data Compilation and Integration 95th Civil Affairs Brigade Data Development Efforts Dr. Charles R. Ehlschlaeger and Mr. Jeffrey A. Burkhalter Construction Engineering Research Laboratory U.S. Army Engineer Research and Development Center 2902 Newmark Drive Champaign, IL 61822 Ms. Marina V. Drigo The PERTAN Group 44 East Main Street, Suite 403 Champaign, IL 61820 Ms. Lynndee A. Kemmet Network Science Center at West Point Thayer Hall Room 119 West Point, NY 10996 Final report Approved for public release; distribution is unlimited. [or a restricted statement] Prepared for Under Monitored by U.S. Army Corps of Engineers Washington, DC 20314-1000 Work Unit D34502 Construction Engineering Research Laboratory U.S. Army Engineer Research and Development Center 2902 Newmark Drive, Champaign, IL 61822 ERDC/CERL TR-08-DRAFT Abstract: Development of human geography data for stability operations around the world is one of the primary interests of Civil Affairs units. There is a need for consistent and reliable tools and methods for compiling and integrating open-source human geography data to assist Civil Affairs teams in mission planning prior to, and during deployment. ERDC researchers are engaged with the 95th Civil Affairs Brigade to develop best-practice methods for compiling and integrating human geography data down to the neighborhood scale using Azerbaijan and Turkey as case studies. This paper describes open-source online and public tools suitable for data collection and integration. The data collection and integration follows the methodology of the sixteen (16) data collection themes set by the National Geospatial Intelligence Agency's Human Geography Working Group (HGWG). DISCLAIMER: The contents of this report are not to be used for advertising, publication, or promotional purposes. Citation of trade names does not constitute an official endorsement or approval of the use of such commercial products. All product names and trademarks cited are the property of their respective owners. The findings of this report are not to be construed as an official Department of the Army position unless so designated by other authorized documents. DESTROY THIS REPORT WHEN NO LONGER NEEDED. DO NOT RETURN IT TO THE ORIGINATOR. ii ERDC/CERL TR-08-DRAFT Table of Contents Preface ...........................................................................................................................................................vii 1 Introduction ............................................................................................................................................ 1 2 Dataset Types and Sources ................................................................................................................. 2 2.1 2.2 3 Overview of Datasets by Type ....................................................................................... 3 Overview of Dataset Sources ........................................................................................ 4 Data Search Methodology .................................................................................................................. 6 3.1 3.2 Search for Baseline and Foundational Datasets ......................................................... 6 Search for Specialized Datasets ................................................................................... 7 4 Data Integration from Disparate Sources ......................................................................................10 5 Global Sources for Human Geography Data ..................................................................................11 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 6 Collection of Four or More Human Geography Themes ............................................ 11 Communications and Media ....................................................................................... 15 Demographic and Human Population Measures....................................................... 16 Economy ....................................................................................................................... 16 Education ..................................................................................................................... 16 Ethnicity........................................................................................................................ 16 Language ..................................................................................................................... 17 Land: Cultural Terrain .................................................................................................. 18 Land: Ownership .......................................................................................................... 18 Land: Use and Cover ............................................................................................... 18 Medical and Health ................................................................................................. 19 Organizations ........................................................................................................... 20 Religion..................................................................................................................... 20 Significant Events .................................................................................................... 20 Social Groups ........................................................................................................... 21 Transportation Use .................................................................................................. 21 Water Supply and Control ........................................................................................ 21 COCOM Sources for Human Geography Data Themes ................................................................22 6.1 AFRICOM Sources ........................................................................................................ 22 6.1.1 Collection of Four or More Human Geography Themes ............................................. 22 6.1.2 Communications and Media ....................................................................................... 22 6.1.3 Demographic and Human Population Measures ....................................................... 22 6.1.4 Economy ....................................................................................................................... 22 6.1.5 Education...................................................................................................................... 22 6.1.6 Ethnicity ........................................................................................................................ 22 6.1.7 Language ...................................................................................................................... 22 6.1.8 Land: Cultural Terrain .................................................................................................. 23 iii ERDC/CERL TR-08-DRAFT 6.1.9 6.2 6.3 6.4 iv Land: Ownership .......................................................................................................... 23 6.1.10 Land: Use and Cover ............................................................................................... 23 6.1.11 Medical and Health ................................................................................................. 23 6.1.12 Organizations ........................................................................................................... 23 6.1.13 Religion .................................................................................................................... 23 6.1.14 Significant Events .................................................................................................... 23 6.1.15 Social Groups .......................................................................................................... 23 6.1.16 Transportation Use .................................................................................................. 23 6.1.17 Water Supply and Control ....................................................................................... 23 CENTCOM Sources ...................................................................................................... 24 6.2.1 Collection of Four or More Human Geography Themes ............................................. 24 6.2.2 Communications and Media ....................................................................................... 24 6.2.3 Demographic and Human Population Measures ....................................................... 24 6.2.4 Economy ....................................................................................................................... 24 6.2.5 Education...................................................................................................................... 24 6.2.6 Ethnicity ........................................................................................................................ 24 6.2.7 Language ...................................................................................................................... 24 6.2.8 Land: Cultural Terrain .................................................................................................. 24 6.2.9 Land: Ownership .......................................................................................................... 25 6.2.10 Land: Use and Cover ............................................................................................... 25 6.2.11 Medical and Health ................................................................................................. 25 6.2.12 Organizations ........................................................................................................... 25 6.2.13 Religion .................................................................................................................... 25 6.2.14 Significant Events .................................................................................................... 25 6.2.15 Social Groups .......................................................................................................... 25 6.2.16 Transportation Use .................................................................................................. 25 6.2.17 Water Supply and Control ....................................................................................... 26 EUCOM Sources ........................................................................................................... 26 6.3.1 Collection of Four or More Human Geography Themes ............................................. 26 6.3.2 Communications and Media ....................................................................................... 27 6.3.3 Demographic and Human Population Measures ....................................................... 27 6.3.4 Economy ....................................................................................................................... 28 6.3.5 Education...................................................................................................................... 28 6.3.6 Ethnicity ........................................................................................................................ 28 6.3.7 Language ...................................................................................................................... 28 6.3.8 Land: Cultural Terrain .................................................................................................. 28 6.3.9 Land: Ownership .......................................................................................................... 28 6.3.10 Land: Use and Cover ............................................................................................... 28 6.3.11 Medical and Health ................................................................................................. 29 6.3.12 Organizations ........................................................................................................... 29 6.3.13 Religion .................................................................................................................... 29 6.3.14 Significant Events .................................................................................................... 29 6.3.15 Social Groups .......................................................................................................... 29 6.3.16 Transportation Use .................................................................................................. 29 6.3.17 Water Supply and Control ....................................................................................... 29 PACOM Sources ........................................................................................................... 29 ERDC/CERL TR-08-DRAFT 6.5 7 v 6.4.1 Collection of Four or More Human Geography Themes ............................................. 29 6.4.2 Communications and Media ....................................................................................... 29 6.4.3 Demographic and Human Population Measures ....................................................... 29 6.4.4 Economy ....................................................................................................................... 29 6.4.5 Education...................................................................................................................... 30 6.4.6 Ethnicity ........................................................................................................................ 30 6.4.7 Language ...................................................................................................................... 30 6.4.8 Land: Cultural Terrain .................................................................................................. 30 6.4.9 Land: Ownership .......................................................................................................... 30 6.4.10 Land: Use and Cover ............................................................................................... 30 6.4.11 Medical and Health ................................................................................................. 30 6.4.12 Organizations ........................................................................................................... 30 6.4.13 Religion .................................................................................................................... 30 6.4.14 Significant Events .................................................................................................... 30 6.4.15 Social Groups .......................................................................................................... 30 6.4.16 Transportation Use .................................................................................................. 31 6.4.17 Water Supply and Control ....................................................................................... 31 SOUTHCOM Sources .................................................................................................... 31 6.5.1 Collection of Four or More Human Geography Themes ............................................. 31 6.5.2 Communications and Media ....................................................................................... 31 6.5.3 Demographic and Human Population Measures ....................................................... 31 6.5.4 Economy ....................................................................................................................... 31 6.5.5 Education...................................................................................................................... 31 6.5.6 Ethnicity ........................................................................................................................ 31 6.5.7 Language ...................................................................................................................... 31 6.5.8 Land: Cultural Terrain .................................................................................................. 31 6.5.9 Land: Ownership .......................................................................................................... 31 6.5.10 Land: Use and Cover ............................................................................................... 32 6.5.11 Medical and Health ................................................................................................. 32 6.5.12 Organizations ........................................................................................................... 32 6.5.13 Religion .................................................................................................................... 32 6.5.14 Significant Events .................................................................................................... 32 6.5.15 Social Groups .......................................................................................................... 32 6.5.16 Transportation Use .................................................................................................. 32 6.5.17 Water Supply and Control ....................................................................................... 32 Azerbaijan Sources .............................................................................................................................33 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 Collection of Four or More Human Geography Themes ............................................ 33 Communications and Media ....................................................................................... 35 Demographic and Human Population Measures....................................................... 36 Economy ....................................................................................................................... 36 Education ..................................................................................................................... 37 Ethnicity........................................................................................................................ 37 Language ..................................................................................................................... 37 Land: Cultural Terrain .................................................................................................. 37 Land: Ownership .......................................................................................................... 37 ERDC/CERL TR-08-DRAFT 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 8 Turkey Sources.....................................................................................................................................41 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 8.15 8.16 8.17 9 Land: Use and Cover ............................................................................................... 37 Medical and Health ................................................................................................. 38 Organizations ........................................................................................................... 38 Religion..................................................................................................................... 38 Significant Events .................................................................................................... 39 Social Groups ........................................................................................................... 39 Transportation Use .................................................................................................. 39 Water Supply and Control ........................................................................................ 39 Collection of Four or More Human Geography Themes ............................................ 41 Communications and Media ....................................................................................... 42 Demographic and Human Population Measures ....................................................... 42 Economy ....................................................................................................................... 43 Education ..................................................................................................................... 45 Ethnicity........................................................................................................................ 45 Language ..................................................................................................................... 46 Land: Cultural Terrain .................................................................................................. 46 Land: Ownership .......................................................................................................... 46 Land: Use and Cover ............................................................................................... 46 Medical and Health ................................................................................................. 47 Organizations ........................................................................................................... 48 Religion..................................................................................................................... 48 Significant Events .................................................................................................... 48 Social Groups ........................................................................................................... 49 Transportation Use .................................................................................................. 49 Water Supply and Control ........................................................................................ 49 Conclusions ..........................................................................................................................................50 vi ERDC/CERL TR-08-DRAFT Preface This study was conducted for the Director, Engineering Research and Development Center under Project D34502, “Rapid Model Prototyping for Infrastructure and Essential Services.” The technical monitor was [T.M. Name]. The work was performed by the Land Heritage and Resource Conservation Branch (CN-C) of the Environmental Division (CN), U.S. Army Engineer Research and Development Center – Construction Engineering Research Laboratory (ERDC-CERL). At the time of publication, Dr Christopher M White was Chief, CEERD-CN-C; Dr Michelle Hanson was Chief, CEERDCN; and Dr. Bert Davis was the Technical Director for Geospatial Research and Engineering. The Deputy Director of ERDC-CERL was Dr. Kirankumar Topudurti and the Director was Dr. Ilker Adiguzel. The Commander and Executive Director of ERDC was COL Kevin J. Wilson and the Director was Dr. Jeffery P. Holland. vii ERDC/CERL TR-08-DRAFT 1 Introduction Development of human geography data for stability operations around the world is one of the primary interests of Civil Affairs units. There is a need for consistent and reliable tools and methods for compiling and integrating open-source human geography data to assist Civil Affairs teams in mission planning prior to, and during deployment. ERDC researchers are engaged with the 95th Civil Affairs Brigade to develop best-practice methods for compiling and integrating human geography data down to the neighborhood scale using Azerbaijan and Turkey as case studies. This paper describes open-source online and public tools suitable for data collection and integration. The data collection and integration follows the methodology of the sixteen (16) data collection themes set by the National Geospatial Intelligence Agency's Human Geography Working Group (HGWG). This paper is organized as follows: section 2 describes dataset types as defined by NGA (baseline, foundational and specialized) and briefly outlines data sources. Section 3 describes methodology for searching for and collecting open-source data. Section 4 (forthcoming) outlines problems and provides recommendations associated with integrating data from disparate sources into a common dataset. Section 5 describes global data sources (i.e., those covering most of the world) and classifies them according to the 16 Human Geography themes. Section 6 describes data sources covering the areas of responsibility of each of the COCOMs (AFRICOM, CENTCOM, EUCOM, PACOM, and SOUTHCOM) and also classifies them according to the 16 Human Geography themes. Section 7 follows the same methodology for Azerbaijan, and section 8 – for Turkey. While the best efforts were made to organize all data sources according to the 16 Human Geography themes, many data sources covered multiple themes. In order to avoid unnecessary repetition, a special category “Collection of Four or More Human Geography Themes” was created to precede the list of Human Geography themes in sections 5 through 8. 1 ERDC/CERL TR-08-DRAFT 2 Dataset Types and Sources In recent years, the growth of online sources for free and available sources of data has increased the ability of public access to and control of economic, demographic and other types of spatial and tabular data. In a developing country context, while traditional gaps in technical capacity, statistical sophistication and public transparency have been mitigated by increased investments in capacity, there still remain large gaps in data and analysis when compared to readily available in the United States. However, often lacking is ready availability of spatial data, including geocoded surveys, or surveys with highly resolved geographically identifying information, particularly in less developed regions and states. In general, data are more plentiful in main cities. Developing countries may often lack the internal spatial data infrastructure, or frameworks of data, metadata, users and tools that interact for the use and creation of a coordinated spatial data infrastructure. In situations when digital capability exists, local government agencies often consider high resolution GIS data as security risks and choose not to make data available. Also of significance to future data gathering is the difference in state statistical capacities which varies by region and country. The World-Wide Human Geography Data (WWHGD) Working Group, https://wwhgd.org/, is currently organizing the description of the most important data layers for Social Cultural Analysis (SCA). These SCA data layers will be organized within a Human Geography Data Dictionary (HGDD) and a Human Geography Entity Catalog (HGDD). The data layers are organized by themes, of which 16 exist as of May 8, 2013. These themes, which are called sub-models by the WWHGDWG, are Communications and Media, Demographic and Human Population Measures, Economy, Education, Ethnicity, Social Groups, Organizations, Language, Land: Use and Cover, Land: Cultural Terrain, Land: Ownership, Medical and Health, Religion, Significant Events, Transportation Use, and Water Supply and Control. The sub-models can be downloaded, after subscribing to the WWHGDWG, at http://wwhgd.org/content/human-geographystandards-working-group-hgwg-sub-models. 2 ERDC/CERL TR-08-DRAFT 2.1 Overview of Datasets by Type Availability of high quality spatial data is critical to developing human geography datasets. NGA has identified three types of spatial data: foundational, baseline, and specialized. Foundational data layers contain information spread throughout the spatial and temporal domain. Examples of foundational data layers include population density maps, isopleth weather maps, and similar layers that represent information across geographic extents. Baseline data layers are those data layers that locate information in geographic space. Examples of useful baseline data include administrative boundaries, transportation networks, names and locations of settlements, street network and essentially anything that can be pinpointed on a map and used as a baseline. Specialized data contains detailed mission specific information. Much of this information is in tabular or narrative form. Since it is often not geotagged and often not temporally-tagged either, specialized data requires baseline data for geo-referencing. Examples of specialized data include listings of hospital equipment by hospital, school children enumerations by district or city, type of crop grown by parcel, population censuses and various surveys. When specialized data are combined with baseline data, geo-referenced maps can be created on different topics: for example, population density, nutrition, childhood mortality, poverty, basic needs and others. It must be noted that population censuses are expensive, and only occur once a decade at most for a majority of countries. Surveys, however, use a much smaller sample size, and even correctly sampled cannot give precise estimates or small areas; particularly rural and less populated regions. To create timely highly resolved spatial analyses, Census data can also be combined with smaller more topical surveys using Small Area Estimate statistical techniques to create region or country-wide high resolution estimates of demographic factors. Data as supplied can also be aggregated to regional units and joined with baseline data. 3 ERDC/CERL TR-08-DRAFT 2.2 Overview of Dataset Sources Many services provided by both private and public source initiatives collect, create, and disseminate spatial and tabular data products. International bodies, such as the Food and Agriculture Organization of the United Nations (FAO), were among the first organizations that began collating international spatial information to improve access and use of spatial data. Other examples of international organizations, research institutions and organizations responsible for collecting and/or disseminating data for different countries include USGS and the World Bank. These organizations generally provide a free access to the data through their websites, though frequently it may be necessary to request the data specifically and to get approved first. Local agencies, i.e. agencies located in a study area, also collect and disseminate spatial and tabular data. Local agencies can fall into categories of public agencies and private organizations. Examples of private institutions may include banks, schools, churches, power companies and others. Examples of public agencies may include equivalents of the U.S. departments (e.g., the Department of Defense, the Department of Education, U.S. Census Bureau and others). As noted earlier, local agencies in developing countries do not always have the capabilities for collecting and disseminating high-quality and high-resolution data, may choose not to disclose such data for security reasons or may not make the data be easily accessible. For example, while city governments may publish an interactive map on their website, there is frequently no user-friendly way to download displayed data in GIS or any other format. Crowd-sourced spatial data or citizen-collected data sources have become increasingly popular in recent years due to the availability of Internet and mobile phones. OpenStreetMap is one such example, as it is being increasingly used by many organizations (e.g., for developing transportation applications). Citizen-collected data may be problematic due to lack of standardized metadata as well as device-introduced and humanintroduced errors but it is certainly most up-to-date and provides the freshest insight into spatial infrastructure. Another problem is that the data coverage of non-western countries may be very incomplete, even for urban areas. Finally, media (newspapers) and social media data (Twitter) are a valuable source for collecting data (frequently geocoded), including in real time. As 4 ERDC/CERL TR-08-DRAFT with any other citizen-collected data, data collection may be impeded by the availability of internet and users’ online activity as well as by the fact that the majority of data are likely to be in a native language (not English). 5 ERDC/CERL TR-08-DRAFT 3 Data Search Methodology This section describes methods used to search for open-source geographic base, foundational and specialized data that can be found through the Internet. The search for open-source data was primarily conducted using Google and Yahoo search engines. Google was used as a primary search engine, and Yahoo was used as a secondary search engine, due to the fact that the relevance results between Google and Yahoo may differ. Additionally, a web search engine Google Scholar was used to search for academic publications and references, primarily on topics of religion, politics, ethnicity and linguistics. 3.1 Search for Baseline and Foundational Datasets The search for baseline GIS data (administrative boundaries, transportation and river networks and other) can be done by typing appropriate search terms in a search engine, for example, “Azerbaijan GIS data”. It is also useful to include more specific terms in a search, for example, “Azerbaijan transportation GIS data” and to alternate search terms, for example, “geographic” instead of “GIS”. The use of words such as “free” or “open-source” allows excluding commercial GIS sources. Global datasets containing baseline data for the entire world can be quickly identified in this manner. The search for free geographical data turns up not only individual websites with data, but also websites that serve as a reference to external data sources. Most if not all universities have spatial portals maintained through a library or individual departments (search terms like “university GIS spatial data” usually turn up necessary information). Reference websites may also be created by individual citizen enthusiasts or be created as a part of a larger research project. Such reference websites may or may not be updated on a regular basis and many (though not all) referenced data sources overlap. Reference websites are convenient as they provide a ready-to-use list of geographic data, which is usually characterized by theme (e.g., ecology, human geography, land use etc.) and location (world, country, and/or city datasets). As al- 6 ERDC/CERL TR-08-DRAFT ready noted, reference websites are not necessarily updated and links to the data may be broken or outdated. Additionally, referenced data may be of a questionable quality, outdated or lacking metadata. The search for foundational data themes (e.g., population density maps, land use maps etc.) is similar to the search for baseline data themes as described above. Similarly, it is recommended to execute searches using a variety of search terms with different search engines. Reference websites almost always include links to foundational data themes on a global scale. Global foundational themes can be in a raster or vector format. When in a raster format, the data are usually available at the resolution of at least one kilometer. 3.2 Search for Specialized Datasets The search for specialized data themes (e.g., data on traffic accidents) can be more cumbersome for international data sources, since non-western countries often either do not collect specialized data or do not make such data freely available to the public. In most cases, specialized data that were collected on a neighborhood scale are made available only in an aggregate form on a first or second administrative level due to privacy issues. Specialized data can be available in English, but usually they are published in a language of the country in question. The search for specialized data themes using web search engines is generally not productive unless the search is conducted using the language of the country in question. In the latter case, there is a better chance of returning relevant search results. It is recommended to start the search on specialized data themes by consulting the major government agency responsible for collecting statistical information in a given country (such agency can be indicated by conducting an internet search; additionally, the United States Bureau of Labor Statistics provides a list of international statistical agencies at http://www.bls.gov/bls/other.htm/). While there may be issues associated with the trustworthiness of data collected by government, government statistics are frequently the major source of information. Statistics from the major government agency usually covers a great variety of topics ranging from environment to socio-economic factors. Data are usually available on the first, second and third administrative levels. 7 ERDC/CERL TR-08-DRAFT State statistical agencies typically provide economic and demographic data in a form of censuses, household surveys, health/HIV surveys and commodity and price surveys on a variety of administrative levels. Availability of data and data acquisition processes differ for individual countries; frequently, the raw files and datasets are made publicly accessible. State statistical agencies in developing countries often lack robust data search tools, which can make the process of navigation and data search less efficient when compared with the developed world. The major statistical agency typically serves as a nexus for other places that may have data. However, it is often useful to consult other governmental agencies and departments as they occasionally provide some additional data in a form of reports and maps on their website. Examples of such agencies include the equivalents of the Department of Health, the Department of Defense, the Department of Education, the Department of Agriculture and other Departments in the United States. A list of governmental agencies can be compiled by running a web search or consulting Wikipedia (http://www.wikipedia.org/), which is a good source for basic information on a given country’s political and administrative organization. Additional ways to find specialized data on a neighborhood level include searching official websites of cities and administrative areas at level 2 or finer. A list of these sources can be obtained with the help of Wikipedia, which often publishes links to official websites of cities and other administrative entities. However, this method of searching for data can be timeconsuming and not productive, unless it is automated using web-scraping software. Official websites of cities and other administrative entities can vary greatly in terms of quantity and quality of information but sometimes they publish scanned maps, provide links to interactive maps, and provide reports and other kinds of data. The primary challenge with these sources is that official websites tend to provide information in their native language only, and even if an English version is available, it tends to be much more poorly represented than a native version. Specialized data include various surveys (e.g., surveys on political attitudes) that are done by individual researchers, research groups and institutions. Survey reports are generally produced on a country level, and while raw data are usually available free of charge, in most cases it is necessary to request a special permission to use the data due to confidentiality issues. Many surveys can be found via web by using various combinations 8 ERDC/CERL TR-08-DRAFT of search terms. Google Scholar is useful for finding academic publications that provide analyses of the country’s issues (e.g., religion). Academic studies that are based on a survey will provide a reference to their data source. Individual researchers may be able to share their data. It must be noted that smaller surveys are usually done within a city or a smaller administrative area and hence are not nationally representative. Besides Google Scholar (which usually provides references only), academic publications can be searched and obtained through university libraries or other libraries that have access to academic databases. University libraries usually have access to the ProQuest dissertation database; unpublished dissertations and theses are another useful source of data references. Specialized data can be further obtained from newspapers, blogs, forums, and other social media. These sources can be found via web search and by following any further references published on their site. One example of specialized data found through media sites may be a number of protesters as reported by official news and as estimated by non-government affiliated experts. Analysis of media environment can provide useful information about events and accompanying attitudes of the population in a near realtime setting. Aside from Facebook and Twitter, data can be retrieved through news agencies, blogs and forums. News articles most definitely include information related to geographic location of the event, and specific geographic information may also be found in blogs and forums. 9 ERDC/CERL TR-08-DRAFT 4 Data Integration from Disparate Sources <forthcoming> 10 ERDC/CERL TR-08-DRAFT 5 Global Sources for Human Geography Data 5.1 Collection of Four or More Human Geography Themes 5.1.1 The Economist The Economist (http://www.economist.com) covers political and other news, as well as blogs and debates. 5.1.2 Topix Topix (http://www.topix.com) aggregates and delivers updated news from various sources, including forums. 5.1.3 The New York Times The New York Times (http://topics.nytimes.com) provides current news, as well as archived articles and commentaries. 5.1.4 World Bank World Bank’s datasets (http://data.worldbank.org/data-catalog) provide a variety of national level thematic indicators. Access to raw data requires registration, but reports on a national level are availably freely. For Azerbaijan, seven surveys are available: Azerbaijan - Global Financial Inclusion (Global Findex) Database 2011 (by Development Research Group, Finance and Private Sector Development Unit - World Bank); Azerbaijan - Enterprise Survey 2002, 2005 and 2009 (by World Bank, European Bank for Reconstruction and Development); Azerbaijan - Financial Literacy Survey 2009 (by Azerbaijan Micro-finance Association); Azerbaijan - Multiple Indicator Cluster Survey 2000 (by State Statistical Committee of the Azerbaijan Republic, UNICEF Multiple Indicator Cluster Surveys); and Azerbaijan - Survey of Living Conditions 1995 (by Social Studies Center, Institute of Sociology and Political Science (SORGU) and the World Bank). 11 ERDC/CERL TR-08-DRAFT 5.1.5 United Nations United Nation’s datasets (http://data.un.org/) provide data from its constituent agencies on population, Millennium Development Goals, mortality and other social and economic information on a country level. 5.1.6 Internal Displacement Monitoring Centre The Internal Displacement Monitoring Centre (http://www.internaldisplacement.org) provides information and analysis on IDP (internal displace population) situation and background worldwide, with social and economic data on IDPs being most readily available on a country level. 5.1.7 WikiMapia WikiMapia (www.wikimapia.org) is a collaborative project, where users can create their own or update existing map data worldwide. WikiMapia’s data may be extracted as a Google Earth file with .kml extension though an application programming interface (API). Data coverage differs by individual country, and urban areas are typically covered more extensively. 5.1.8 OpenStreetMap OpenStreetMap (http://www.openstreetmap.org/) creates and distributes free geographic data for the worlds; the users are allowed to make changes to the maps and to add new content by uploading GPS data. Available data layers may include settlements, railway stations, transportation networks, water features, random points of interest (schools, hotels, banks, ATMs, etc.), land use, natural reserves and vegetation. 5.1.8.1 GeoFabrik.de GeoFabrik.de (http://download.geofabrik.de/openstreetmap/) creates extracts of OpenStreetMap data. 5.1.8.2 BBBike.org BBBike.org (http://download.bbbike.org/osm/) creates extracts of OpenStreetMap data. 12 ERDC/CERL TR-08-DRAFT 5.1.8.3 GIS-Lab GIS-Lab (http://gis-lab.info/projects/osm_shp/region) creates extracts of OpenStreetMap data. 5.1.9 USGS Earth Explorer The USGS Earth Explorer (http://earthexplorer.usgs.gov/) has a variety of aerial, satellite, and radar map projects, including digital elevation data and water body data for different uses. 5.1.10 EDENext Data Portal The EDENext Data Portal (http://www.edenextdata.com) provides datasets on a variety of topics, primarily related to climate change, biodiversity and agriculture. Links to additional global datasets are available at http://www.edenextdata.com/?q=content/global-gis-datasets-links-0. 5.1.11 Global Administrative Areas Database Global Administrative Areas Database (GADM, http://www.gadm.org) provides access to administrative boundaries, hydrologic, road, railroads, ports, airports, and populated places data. 5.1.12 Natural Earth Natural Earth (http://www.naturalearthdata.com) provides access to administrative boundaries, hydrologic, road, railroads, ports, airports, and populated places data. 5.1.13 DIVA-GIS DIVA-GIS (http://www.diva-gis.org/Data) provides access to administrative boundaries, hydrologic, road, railroads, ports, airports, and populated places data. 5.1.14 Second Administrative Level Boundaries Database Second Administrative Level Boundaries (SALB, http://www.unsalb.org/) Database provides access to standardized maps of subnational administrative boundaries, which are widely used in other research projects. 5.1.15 Food and Agriculture Organization (FAO) of the United Nations 13 ERDC/CERL TR-08-DRAFT GeoNetwork FAO GeoNetwork (http://www.fao.org/geonetwork/srv/en/main.home) provides access to georeferenced databases, interactive maps, and satellite imagery. The Global Administrative Units Database (http://www.fao.org/geonetwork/srv/en/metadata.show?id=12691) provides access to first and second level administrative levels, and to lower levels, if available. FAO also provides access to the Relational World Database II (RWDB2), which can be accessed by entering the search term. RWDB2 is a collection of accurate second level and in some cases third and fourth level administrative unit shapefiles, rivers, roads and other administrative data. 5.1.16 International Center for Tropical Agriculture International Center for Tropical Agriculture (CIAT, www.ciat.cgiar.org) focuses on agriculture, food security and climate change research in Asia, African and Latin America and the Caribbean. It provides access to data, models and web mapping tools (Tools and Resources tabs at http://dapa.ciat.cgiar.org/). 5.1.17 World Values Survey World Values Survey (www.worldvaluessurvey.org) provides access to surveys in 87 societies including some major cities. Questions include socioeconomic status, demographics, and values related to religion, race, gender, government, politics and others. Surveys have been conducted in 1981-2014; the most recent surveys will be released in 2014. 5.1.18 PreventionWeb PreventionWeb (http://www.preventionweb.net/) is a project of the UN Office for Disaster Risk Reduction. It provides access to data in vector and raster format on cyclones, droughts, earthquakes, fires, floods, landslides, tsunamis and volcanoes. 5.1.19 Pew Research Center Pew Research Center (http://www.pewresearch.org/) conducts annual surveys on various topics, e.g., religion, inequality, corruption, freedom, attitudes towards current political leaders and others. The following pro- 14 ERDC/CERL TR-08-DRAFT jects cover countries other than the Unites States: Global Attitudes Project, and Religion and Public Life Project. 5.1.20 SETA Foundation for Political, Economic and Social Research SETA (http://setav.org/) is a non-profit research agency conducting work on national, regional and interregional issues. Their reports may be useful for assessing situations on a national and a sub-national level. 5.1.21 Carbon Monitoring for Action (CARMA) CARMA (http://carma.org/) is associated with the Confronting Climate Change Initiative at the Center of Global Development (http://www.cgdev.org). It is a global database with data on the best available estimates for CO2 emissions around the world and the identities of firms that own them. 5.1.22 The Guardian The Guardian (http://www.theguardian.com/) makes a variety of data available for many countries, in addition to providing news. The data be found through their Datastore (http://www.theguardian.com/data) and Datablog (http://www.theguardian.com/news/datablog). It is most detailed for the UK but other countries can be found as well. The data are frequently in an interactive map format. 5.2 Communications and Media 5.2.1 The Electoral Knowledge Network The Electoral Knowledge Network (http://aceproject.org/) provides data on the electoral process in countries around the world. The database provides sources of data and makes an effort to verify data. Data covers a wide array of categories related to electoral systems, including information on voting regulations, regulations pertaining to political parties and campaigns and regulations relating to the media in elections. 5.2.2 Reporters without Borders Reporters without Borders (http://en.rsf.org/) organization has developed rankings for press freedom of 179 countries. It provides information on freedom of the press and access to uncensored information. 15 ERDC/CERL TR-08-DRAFT 5.2.3 The Committee to Protect Journalists The Committee to Protect Journalists (http://www.cpj.org/) also publishes reports on the state of press freedom worldwide. 5.2.4 The DIMES Project The DIMES Project (http://www.netdimes.org/) is an open-source distributed scientific research project studying the connectivity, structure and topology of the Internet. The data are collected with the help of volunteers (a volunteer installs the DIMES software on their computer, which then collects the data and sends it over in a manner similar to Berkeley’s SETI@home project). The latest data (published monthly) dates to April, 2012. Data can be used to map density of internet connectivity. 5.3 Demographic and Human Population Measures 5.3.1 SEDAC Columbia University’s Socio-Economic Data and Applications Center’s (SEDAC, http://sedac.ciesin.columbia.edu) provides grids of local population, population density, population change as well as urban extents. 5.4 Economy 5.4.1 SEDAC Columbia University’s Socio-Economic Data and Applications Center’s (SEDAC, http://sedac.ciesin.columbia.edu) provides data on unsatisfied basic needs, poverty and food security. 5.5 Education 5.6 Ethnicity 5.6.1 Joshua Project The Joshua Project (http://www.joshuaproject.net) provides descriptions and statistical summaries of ethnic groups, languages spoken and types of religions. The purpose of the project is to emphasize groups with the fewest followers of Christianity. 16 ERDC/CERL TR-08-DRAFT 5.6.2 International Conflict Research The International Conflict Research group ( www.icr.ethz.ch ) provides access to geo-referenced ethnic groups (GREG) and geo-referenced ethnic power relations (GeoEPR) datasets ( http://www.icr.ethz.ch/data/other ). 5.6.3 People Groups The People Groups (http://peoplegroups.org/) is similar to Joshua Project. The purpose of the project is to determine groups with the largest/fewest followers of evangelical Christianity. The project provides statistical summaries on people’s groups, including approximate location, language spoken, religion and ethnic affiliations. 5.7 Language 5.7.1 UNESCO The UNESCO Atlas of the World’s Languages in Danger (http://www.unesco.org/culture/languages-atlas/index.php) provides the number of language speakers and classifies languages as safe, vulnerable, definitely/severely/critically endangered and extinct. 5.7.2 Ethnologue The Ethnologue: Languages of the World (http://www.ethnologue.com) project provides the number of language speakers, lists dialect names, describes language use, gives statistical summaries as well as language maps (for selected regions). 5.7.3 Joshua Project The Joshua Project (http://www.joshuaproject.net) provides descriptions and statistical summaries of ethnic groups, languages spoken and types of religions. The purpose of the project is to emphasize groups with the fewest followers of Christianity. 5.7.4 Lingvarium Project Lingvarium Project (http://lingvarium.org/index.shtml) provides data on linguistic geography as well as historical distribution of linguistic groups. 17 ERDC/CERL TR-08-DRAFT 5.7.5 People Groups The People Groups (http://peoplegroups.org/) is similar to Joshua Project. The purpose of the project is to determine groups with the largest/fewest followers of evangelical Christianity. The project provides statistical summaries on people’s groups, including approximate location, language spoken, religion and ethnic affiliations. 5.8 Land: Cultural Terrain 5.9 Land: Ownership 5.10 Land: Use and Cover 5.10.1 IUCN Red List of Threatened Species The IUCN Red List of Threatened Species (www.iucnredlist.org) provides assessments for almost 70,000 species, with about 40,000 species mapped on a global scale. The UICN Red List provides data on distribution of sea grasses, amphibians, reptiles, mammals and marine fish as well as mangroves and coral reefs. 5.10.2 BirdLife International The BirdLife International (http://www.birdlife.org/datazone/info/spcdownload) provides data on distribution of threatened bird species. The data can be accessed with permission only. 5.10.3 Lincoln Institute of Land Policy Lincoln Institute of Land Policy (http://www.lincolninst.edu/) conducted a study on landuse and landuse change in major cities worldwide. The dataset ‘Atlas of Urban Expansion’ is available for download with an accompanying report. The dataset includes landuse raster files for selected cities. 18 ERDC/CERL TR-08-DRAFT 5.10.4 Project Quicksilver Project Quicksilver (http://forecast.io/quicksilver/) features a real-time map of global temperature. According to the authors, this is an experimental project which may have unresolved issues (e.g., temperature over the oceans has the least resolution and accuracy). Data can be downloaded hourly in a TIFF format, with a resolution of 0.05 degrees. 5.11 Medical and Health 5.11.1 HIV Spatial Resource Repository HIV Spatial Resource Repository (http://www.hivspatialdata.net/?page=data) provides information on spatially explicit HIV-related data. 5.11.2 Demographic and Health Surveys USAID’s Measure Demographic and Health Surveys (DHS) program (http://www.measuredhs.com/) provides survey data on health and health services. DHS program includes 67 surveys from 36 countries, incluing latitude and longitude coordinates of surveyed communities. Additional resource is HIV Spatial Resource Repository located at http://www.hivspatialdata.net/?page=data. 5.11.3 SEDAC Columbia University’s Socio-Economic Data and Applications Center’s (SEDAC, http://sedac.ciesin.columbia.edu) provides data on infant mortality rates and prevalence of child malnutrition. 5.11.4 World Health Organization World Health Organization (http://www.who.int/) collects data on various health related topics. These include the World Health Survey (http://www.who.int/healthinfo/survey/en/index.html) and Global Tobacco Surveys (http://www.who.int/tobacco/surveillance/survey/en/index.html), Global School-based Student Health Survey (http://www.cdc.gov/gshs/index.htm). 19 ERDC/CERL TR-08-DRAFT 5.12 Organizations 5.13 Religion 5.13.1 Joshua Project The Joshua Project (http://www.joshuaproject.net) provides descriptions and statistical summaries of ethnic groups, languages spoken and types of religions. The purpose of the project is to emphasize groups with the fewest followers of Christianity. 5.13.2 People Groups The People Groups (http://peoplegroups.org/) is similar to Joshua Project. The purpose of the project is to determine groups with the largest/fewest followers of evangelical Christianity. The project provides statistical summaries on people’s groups, including approximate location, language spoken, religion and ethnic affiliations. 5.14 Significant Events 5.14.1 The Amnesty International The Amnesty International (http://www.amnesty.org/) provides annual reports on the human rights condition in countries. These are short, readable reports that highlight significant events in human rights occurring each year. 5.14.2 GDELT Event Database Global Database of Events, Language and Tone (GDELT, http://gdelt.utdallas.edu/) provides georeferenced worldwide data on human societal-scale behavior and beliefs, as extracted from news and social media archives. The entire dataset has over a quarter-billion records dating back to January 1979. Dataset updates occur daily. 5.14.3 Electoral Geography 2.0 Electoral Geography 2.0 (http://www.electoralgeography.com/new/en/) is a blog dedicated to collecting and mapping data on elections worldwide. 20 ERDC/CERL TR-08-DRAFT The data comes from many sources, including newspapers and Wikipedia. The website also provides links to similar projects. 5.15 Social Groups 5.16 Transportation Use 5.17 Water Supply and Control 21 ERDC/CERL TR-08-DRAFT 6 COCOM Sources for Human Geography Data Themes 6.1 AFRICOM Sources 6.1.1 Collection of Four or More Human Geography Themes 6.1.2 Communications and Media 6.1.3 Demographic and Human Population Measures 6.1.4 Economy 6.1.5 Education 6.1.6 Ethnicity 6.1.6.1 Gulf2000 Project Gulf2000 Project (www.gulf2000.columbia.edu) provides linguistic, ethnic, religious and cultural maps. 6.1.7 Language 6.1.7.1 Gulf2000 Project Gulf2000 Project (www.gulf2000.columbia.edu) provides linguistic, ethnic, religious and cultural maps. 22 ERDC/CERL TR-08-DRAFT 6.1.8 Land: Cultural Terrain 6.1.9 Land: Ownership 6.1.10 Land: Use and Cover 6.1.11 Medical and Health 6.1.12 Organizations 6.1.13 Religion 6.1.13.1 Gulf2000 Project Gulf2000 Project (www.gulf2000.columbia.edu) provides linguistic, ethnic, religious and cultural maps. 6.1.14 Significant Events 6.1.15 Social Groups 6.1.16 Transportation Use 6.1.17 Water Supply and Control 23 ERDC/CERL TR-08-DRAFT 6.2 CENTCOM Sources 6.2.1 Collection of Four or More Human Geography Themes 6.2.1.1 Radio Free Europe Radio Liberty Radio Free Europe Radio Liberty (http://www.rferl.org) describes itself as an agency working in the countries without free press and providing access to uncensored news and debates. 6.2.2 Communications and Media 6.2.3 Demographic and Human Population Measures 6.2.4 Economy 6.2.5 Education 6.2.6 Ethnicity 6.2.6.1 Gulf2000 Project Gulf2000 Project (www.gulf2000.columbia.edu) provides linguistic, ethnic, religious and cultural maps. 6.2.7 Language 6.2.7.1 . Gulf2000 Project Gulf2000 Project (www.gulf2000.columbia.edu) provides linguistic, ethnic, religious and cultural maps. 6.2.8 Land: Cultural Terrain 24 ERDC/CERL TR-08-DRAFT 6.2.9 Land: Ownership 6.2.10 Land: Use and Cover 6.2.10.1 . The Interactive Agricultural Ecological Atlas of Russia and Neighboring Countries The Interactive Agricultural Ecological Atlas of Russia and Neighboring Countries (www.agroatlas.ru), funded by the USDA Agricultural Research Service and Office and International Research Programs, provides spatial data on crops and crop wild relatives, as well as diseases, pests, weeds and environment (climate, soils, vegetation). The data are in a MapInfo format, which can be converted into ESRI shapefiles using ArcGIS Interpolability extension. 6.2.11 Medical and Health 6.2.12 Organizations 6.2.13 Religion 6.2.13.1 Gulf2000 Project Gulf2000 Project (www.gulf2000.columbia.edu) provides linguistic, ethnic, religious and cultural maps. 6.2.14 Significant Events 6.2.15 Social Groups 6.2.16 Transportation Use 25 ERDC/CERL TR-08-DRAFT 6.2.17 6.3 Water Supply and Control EUCOM Sources 6.3.1 Collection of Four or More Human Geography Themes 6.3.1.1 . Osservatorio Balcani e Caucaso Osservatorio Balcani e Caucaso (http://www.balcanicaucaso.org) provides news and analysis of social and political changes in South-East Europe, Turkey and Caucasus. 6.3.1.2 European Union External Action European Union External Action (http://eeas.europa.eu) delivers news on the relationships within EU countries. 6.3.1.3 Portal on Central Eastern and Balkan Europe Portal on Central Eastern and Balkan Europe by IECOB & AIS (PECOB, http://www.pecob.eu) is primarily a collection of printed and online news resources. 6.3.1.4 Marilisa Lorusso's Blog Marilisa Lorusso's Blog (http://marilisalorusso.blogspot.com/) describes the main events (primarily political but including economic and social too) of Georgia, Armenia and Azerbaijan. 6.3.1.5 The Caucasus Research Resource Centers The Caucasus Research Resource Centers (www.crrccenters.org) is a program of the Eurasia Foundation funded by the Carnegie Corporation of New York, which conducts research in Armenia, Azerbaijan and Georgia. Their research methods include desk reports and surveys on such issues as corruption, religious beliefs, household skills, social cohesion and political attitudes, to name a few. Studies including Azerbaijan are Caucasus Barometer annual household survey about social, economic issues and political attitudes; and Social Capital, Media and Gender Survey. The data is nationally representative and can be aggregated to a larger geographic region (e.g., southwest, northeast etc.). 26 ERDC/CERL TR-08-DRAFT 6.3.1.6 Eurofound Eurofound (http://www.eurofound.europa.eu/index.htm) is the European Union agency conducting research in the areas of social and economic change. It conducts the following surveys: the European Quality of Life Survey, the European Working Conditions Survey and the European Company Survey. The surveys cover EU member and candidate countries; are nationally representative; and are done in multiple waves. The surveys cover a broad range of indicators, both objective and subjective. 6.3.1.7 European Social Survey The European Social Survey (http://www.europeansocialsurvey.org/) is done biennially and it covers such topics as the attitudes, beliefs and behaviors of people. Example questions include those on politics and government, social life, terrorism, religion, economy and others. Turkish survey was done in 2008. 6.3.1.8 Eurobarometer Eurobarometer programme (http://ec.europa.eu/public_opinion/index_en.htm; http://www.gesis.org/en/eurobarometer/home/) conducts surveys on such topics as social situation, health, culture, information technology, environment, the Euro, defense and others. 6.3.1.9 European Environmental Agency European Environmental Agency (http://www.eea.europa.eu/) is the European Union agency responsible for conducting research and disseminating information on the environment. Available datasets include national emissions, water quantity and quality, natural protected areas, land cover and others. Some datasets cover parts of the countries adjacent to Europe. 6.3.2 Communications and Media 6.3.3 Demographic and Human Population Measures 27 ERDC/CERL TR-08-DRAFT 6.3.4 Economy 6.3.5 Education 6.3.6 Ethnicity 6.3.7 Language 6.3.8 Land: Cultural Terrain 6.3.9 Land: Ownership 6.3.10 Land: Use and Cover 6.3.10.1 The Interactive Agricultural Ecological Atlas of Russia and Neighboring Countries The Interactive Agricultural Ecological Atlas of Russia and Neighboring Countries (www.agroatlas.ru), funded by the USDA Agricultural Research Service and Office and International Research Programs, provides spatial data on crops and crop wild relatives, as well as diseases, pests, weeds and environment (climate, soils, vegetation). The data are in a MapInfo format, which can be converted into ESRI shapefiles using ArcGIS Interpolability extension. 6.3.10.2 The European Soil Data Centre The European Soil Data Centre provides a thematic data infrastructure for soils (http://eusoils.jrc.ec.europa.eu/). While soil data are available in a digital format (the European Soil Databases), the data are only for 27 European Union countries. 28 ERDC/CERL TR-08-DRAFT 6.4 6.3.11 Medical and Health 6.3.12 Organizations 6.3.13 Religion 6.3.14 Significant Events 6.3.15 Social Groups 6.3.16 Transportation Use 6.3.17 Water Supply and Control PACOM Sources 6.4.1 Collection of Four or More Human Geography Themes 6.4.2 Communications and Media 6.4.3 Demographic and Human Population Measures 6.4.4 Economy 29 ERDC/CERL TR-08-DRAFT 6.4.5 Education 6.4.6 Ethnicity 6.4.7 Language 6.4.8 Land: Cultural Terrain 6.4.9 Land: Ownership 6.4.10 Land: Use and Cover 6.4.11 Medical and Health 6.4.12 Organizations 6.4.13 Religion 6.4.14 Significant Events 6.4.15 Social Groups 30 ERDC/CERL TR-08-DRAFT 6.5 6.4.16 Transportation Use 6.4.17 Water Supply and Control SOUTHCOM Sources 6.5.1 Collection of Four or More Human Geography Themes 6.5.2 Communications and Media 6.5.3 Demographic and Human Population Measures 6.5.4 Economy 6.5.5 Education 6.5.6 Ethnicity 6.5.7 Language 6.5.8 Land: Cultural Terrain 6.5.9 Land: Ownership 31 ERDC/CERL TR-08-DRAFT 6.5.10 Land: Use and Cover 6.5.11 Medical and Health 6.5.12 Organizations 6.5.13 Religion 6.5.14 Significant Events 6.5.15 Social Groups 6.5.16 Transportation Use 6.5.17 Water Supply and Control 32 ERDC/CERL TR-08-DRAFT 7 Azerbaijan Sources 7.1 Collection of Four or More Human Geography Themes 7.1.1 State Statistical Committee of the Republic of Azerbaijan The State Statistical Committee of the Republic of Azerbaijan at http://www.stat.gov.az covers such topics as demographics (population, gender, labor market, education, science, culture, health, and crimes), economy (agriculture, forestry, fishery, industry, energetics, construction, trade, transport, telecommunications and postal, finances, and tourism), and other (food related, entrepreneurship, environmental protection, and information society). Data are available on a national, district (rayon), economic region (an aggregation of several districts) and urban/rural level. While data are most readily offered on a national and economic region levels, district level information constitutes a significant percentage of all available data. The State Committee provides some historical data (usually generic population data) as well as more recent data extrapolated to 2010-2012 on the basis of 2009 Census. While the website has nearly identical sections in English and Azerbaijani languages, the Azerbaijani component has additional files with data for each district (however, Nakhchivan economic region is not broken into districts). The site additionally features an electronic library with reports on selected topics (e.g., children). Some of the reports are freely available while others are for sale only. Most of the reports are in English though some are in Azerbaijani. Since the State Committee publishes statistical yearbooks, the reports should be available through university libraries in the US. 7.1.2 City of Baku The city of Baku statistical data (http://www.baku.azstat.org) provides the same kind of statistical information as the State Statistical Committee of the Republic of Azerbaijan. 33 ERDC/CERL TR-08-DRAFT 7.1.3 Ministry of Culture and Tourism of the Republic of Azerbaijan The Ministry of Culture and Tourism of the Republic of Azerbaijan (http://mct.gov.az) provides information in several languages, including English. It provides a link to an online navigator GoMap (www.gomap.az) developed by a commercial company SINAM (www.sinam.net). The purpose of GoMap navigator is to enhance tourism opportunities and it thus features a great variety of data: administrative buildings, hotels and lodging, educational and medical institutions, entertainment, points of interest, industrial, nature, and other. While the navigator can be used for trip planning purposes, there are no easy ways to download and save any data of interest. 7.1.4 Ministry of Labor and Social Protection of the Republic of Azerbaijan The Ministry of Labor and Social Protection of the Republic of Azerbaijan (http://mlspp.gov.az/) features an interactive map (http://inforoom.mlspp.gov.az/) of districts. The map covers most districts (excluding the Nakhchivan region) and for each district, the following information is included: general information (e.g., number of enterprises, number of secondary/primary/higher education schools etc.), population (including the number of internally displaced persons), land area (including total area of cultivated land, area of pastures etc.), the names of main agricultural and economic crops, and poultry production. While the publication year of the data is not listed, the data are presumably recent. 7.1.5 Gov.az The website at www.gov.az provides a list of links to local government websites for each administrative district. All websites are in Azerbaijani and are built using the same template, and they feature the same categories of interest: economy, education, health, culture and sports. However, the availability of information differs for each district and in several cases is absent. Information on education and health usually includes a name and an address of an educational institution and of a hospital or a clinic. Using other information, such as a detailed street map and/or Google Earth maps, it may be possible to indicate geographic coordinates of each institution and place them on a map. As for the economy, the information can vary from purely descriptive (e.g., stating that a given region specializ- 34 ERDC/CERL TR-08-DRAFT es in meat production) to quantitative. While some websites organize any quantitative information in an easy-to-read format (e.g., in a table), many simply insert such information throughout a body of text. 7.1.6 Forum azeri.net A forum in Azerbaijani language (http://forum.azeri.net/>) has some general discussion topics, as such dating, internet (how to earn money using internet), cooking, culture and other. 7.1.7 Ans Press Ans Press is a news site (http://www.anspress.com/index.php?lng=ru) in Russian. 7.1.8 Apa News Agency Apa News Agency is a news site (http://en.apa.az/) in English. 7.1.9 Azertag Azertag is a national news agency (http://azertag.com/en) specializing in official government news. 7.1.10 Day.az Day.az – Today.az is a news site (http://today.az/) in English. 7.1.11 Novosti Azerbaijan Novosti Azerbaijan - Azerbaijan News is a news site (http://novosti.az/) in Russian and Azeri. 7.2 Communications and Media 35 ERDC/CERL TR-08-DRAFT 7.3 Demographic and Human Population Measures 7.3.1 State Social Protection Fund of the Republic of Azerbaijan The State Social Protection Fund of Azerbaijan Republic (www.sspf.gov.az) features an interactive map with information on how many people receive pensions by district. The information dates to the beginning of 2012 and is broken into three categories: 1) people receiving pensions due to age; 2) people receiving pensions due to some disability; and 3) people receiving pensions ABI (translated as OBI by Google Translate tool, but the definition of this abbreviation is not clear). 7.4 Economy 7.4.1 State Social Protection Fund of the Republic of Azerbaijan The State Social Protection Fund of Azerbaijan Republic (www.sspf.gov.az) features an interactive map with information on how many people receive pensions by district. The information dates to the beginning of 2012 and is broken into three categories: 1) people receiving pensions due to age; 2) people receiving pensions due to some disability; and 3) people receiving pensions ABI (translated as OBI by Google Translate tool, but the definition of this abbreviation is not clear). 7.4.2 Centralized Information System on Mass Payments The Centralized Information System on Mass Payments of the Central Bank of the Republic of Azerbaijan provides information on banks: name of the bank; name, code and address of the branch; number of operators at the branch (http://info.apus.az/?p=banks). It also provides information on service providers, including the name of the provider; the name and code of the branches; and the number of individual and business subscribers (http://info.apus.az/?p=merchants). Additional data include financial information (in Azerbaijani currency) for each bank and service provider: daily average transactions with cash and payment cards by month for 2008-2013. Combined together, this information can inform the status of economic development on a district level. 36 ERDC/CERL TR-08-DRAFT 7.4.3 “Azerenergy” JSC “Azerenergy” JSC (www.azerenerji.gov.az) is the biggest power producer in Azerbaijan and it makes available a map of current and prospective power lines of the country in a .jpg format 7.5 Education 7.5.1 Azerbaijan Republic Education Portal The Azerbaijan Republic Education Portal at http://portal.edu.az/index.php?r=article/item&id=222&mid=6&lang=en features an interactive map of schools by district (http://portal.edu.az/index.php?r=schoolmap&lang=en#list). For each district, a list of schools as well as the name of a settlement, where a school is located, is available. It is possible to use this information to create a map of school locations (as well as school types, such as primary vs. secondary) by settlement. Currently, no additional information is available for each school, though it is possible that the Education Portal may decide to include it in the future. 7.6 Ethnicity 7.7 Language 7.8 Land: Cultural Terrain 7.9 Land: Ownership 7.10 Land: Use and Cover 7.10.1 State Land Surveying Institute The State Land Surveying Institute (http://www.dyli.az/en/) provides detailed topographic maps of municipalities in each of two districts: Agsu (47 37 ERDC/CERL TR-08-DRAFT municipalities) and Samaxi (48 municipalities). The maps are in a .jpg format and must be converted into a GIS format. 7.10.2 Real Estate Cadastre and Technical Inventory Center of the State Committee on Property of the Republic of Azerbaijan The Real Estate Cadastre and Technical Inventory Center of the State Committee on Property of the Republic of Azerbaijan (http://kadastr.az) has links to several maps of Baku City districts, though these maps are quite small and can only be used for general reference. 7.10.3 State Committee for Architecture and Urban Planning of Azerbaijan Republic The State Committee for Architecture and Urban Planning of Azerbaijan Republic (http://www.arxkom.gov.az) has links to cadastral/topographic maps of major cities in 24 districts in .jpg format. These digital maps are detailed and are of high enough resolution to be converted into georeferenced maps. 7.10.4 Baku Cartographic Factory Baku Cartographic Factory (http://bkf.az) provides excerpts of digital atlases with information on ecology, topography, distribution of flora/fauna, historical landmarks etc. Digital versions of atlas maps can be used to create corresponding georeferenced maps and paper versions of the same atlases may be available through university libraries in the US. 7.11 Medical and Health 7.12 Organizations 7.13 Religion 7.13.1 Ministry of Culture and Tourism of the Republic of Azerbaijan The Ministry of Culture and Tourism of the Republic of Azerbaijan also provides a list of 510 religious communities, officially registered by the State Committee for Work with Religious Communities (located at 38 ERDC/CERL TR-08-DRAFT http://www.scwra.gov.az/). Data on religious communities includes a name, a description (whether it is a mosque, a church or other) and a location (usually - but not always - down to the village level). With a help of a gazetteer, a detailed map of religious communities can be created. 7.13.2 State Committee for Work with Religious Communities The State Committee for Work with Religious Communities (http://www.scwra.gov.az/) has launched an interactive map of religious institutions with such information as physical or historical description. The map is of limited use as there are no options to download and save an entire dataset to file. 7.14 Significant Events 7.15 Social Groups 7.16 Transportation Use 7.17 Water Supply and Control 7.17.1 Azesu OJSC Azesu OJSC (www.azersu.az) is a company supplying drinking water and sanitation services. For each administrative district, it provides the following information: number of served residential areas; number of served subscribers (population and non-population); names of the water sources; number of water reservoirs; length of pipelines and length of sewerage network. 7.17.2 Azerbaijan Amelioration and Water Management Open Joint Stock Company Azerbaijan Amelioration and Water Management Open Joint Stock Company (www.mst.gov.az) lists information in Azerbaijani, English and Russian, but most of information comes from the Azerbaijani section. This 39 ERDC/CERL TR-08-DRAFT site’s survey revealed that the only data useful for conversion into a geographic format is a melioration map of the country in a .jpg format. 40 ERDC/CERL TR-08-DRAFT 8 Turkey Sources 8.1 Collection of Four or More Human Geography Themes 8.1.1 MetroPOLL Strategic and Social Research Center MetroPOLL Strategic and Social Research Center (http://www.metropoll.com.tr/) conducts surveys on population’s opinions, e.g., trust of the government, political attitudes, perception of Turkey’s problems and others. Reports are freely available and summarize the results on a national level. The English version of the website does not provide details regarding the sampling methodology and availability of microdata (Turkish version may be more informative). 8.1.2 BiLGESAM | Wise Men Center for Strategic Studies BiLGESAM | Wise Men Center for Strategic Studies (http://www.bilgesam.org/en/index.php?option=com_content&view=fro ntpage&Itemid=1) is a research center that addresses global problems in relation to Turkey. Several reports based on two national surveys are available. The reports mainly cover such topics as attitudes towards Kurds, and expectations of new constitution. 8.1.3 Turkish Statistical Institute Turkish Statistical Institute (http://www.turkstat.gov.tr/Start.do) is the major provider of official statistics on numerous topics such as demography, crime, economy and others. A lot of statistics are available on a subnational level, such as regions, sub-regions, provinces and districts (http://tuikapp.tuik.gov.tr/Bolgesel/menuAction.do?dil=en). 8.1.4 Türkiye.gov.tr Türkiye.gov.tr is the Turkish government’s site for a wide array of e-data sources is available at https://www.turkiye.gov.tr/ . However, this site is in Turkish. Also, accessing data through official government data sources often requires that one has a Turkish ID number, similar to a Social Security Number in the U.S. 41 ERDC/CERL TR-08-DRAFT 8.1.5 Hurriyet Daily News Hurriyet Daily News (http://www.hurriyetdailynews.com/) contain domestic (Turkish) as well as world news coverage. It is the oldest current English-language daily newspaper. 8.1.6 Posta Posta (http://www.posta.com.tr/) is a daily Turkish newspaper covering domestic (Turkish) and international news. It is available in Turkish. 8.1.7 General Command for Mapping General Command for Mapping (http://www.hgk.msb.gov.tr/english/index.php) is an organization responsible for developing maps related to astronomy, topography, cadaster, geology and other areas. However, all maps except for the most basic ones are for purchase only. Additionally, some maps may be available to governmental agencies only (presumably Turkish). 8.2 Communications and Media 8.2.1 ICTA (Information and Communications Technologies Authority) ICTA (http://btk.gov.tr/) is a national communications regulatory authority. The website is in both English and Turkish though the Turkish version appears to be fuller. The Turkish version provides links to statistical data and reports. None of the data are readily available for GIS input. Statistics on communications are available on a province level for the years 20072012 in .xls format and can be joined to an appropriate administrative layer in GIS. 8.3 Demographic and Human Population Measures 8.3.1 Institute of Population Studies at Hacettepe University Institute of Population Studies at Hacettepe University (http://www.hips.hacettepe.edu.tr/eng/index.html) conducts research related to the demographic, social, economic, cultural and medical aspects of population studies. It conducts Turkish Demographic and Health Survey quinquennially. Additionally, it conducts other surveys, most recent ones Turkey National Maternal Mortality Study (2005), Turkey Migration and Internally Displaced Persons Survey (2005), and National Research on 42 ERDC/CERL TR-08-DRAFT Domestic Violence Against Women in Turkey Survey (2008). Summary reports are available freely and can be used to extract some georeferenced data but microdata need to be requested from the Institute. 8.3.2 Generate Directorate of the Prisons and Detention Houses Generate Directorate of the Prisons and Detention Houses (http://www.cte.adalet.gov.tr/) provides only general statistics by year on imprisoned population. The site is in Turkish only. It provides additional links to other justice and crime-related institutions, which may be useful. Particularly, it provides a link to the Department of Probation (http://www.cte-ds.adalet.gov.tr/) which has an interactive map featuring locations of probation offices and their address. The data are not readily available for GIS input but it can be collected from the interactive map. 8.4 Economy 8.4.1 Republic of Turkey Ministry of Development Republic of Turkey Ministry of Development (http://www.dpt.gov.tr/) is the former State Planning Organization (reorganized in 2011). The agency is responsible for conducting studies, developing policies and doing other work in regards to social, economic and cultural areas of development. It provides statistical data in these fields on the national level (in Turkish and English). The former State Planning Organization website is located at http://www.devplan.org/ and may have additional data in a form of reports available. Specifically, a report on building construction and parcel statistics provides data by sub-national level for 2009. The report is in a PDF format and the data need to be prepared for GIS input. 8.4.2 Turkish Patent Institute Turkish Patent Institute (http://www.turkpatent.gov.tr/; http://www.tpe.gov.tr/) is an intellectual property organization. It provides data on patents and associated information primarily in a form of reports. Some statistics are available in a table format (.csv) on a national level and can easily integrated into GIS. 8.4.3 Ministry of Labor and Social Security Ministry of Labor and Social Security (http://www.csgb.gov.tr/) provides reports statistics on workers, union members, strikes and wages. Most of it 43 ERDC/CERL TR-08-DRAFT is on a national level but some may be available on a sub-national level. The site requires knowledge of Turkish. The data are in a .pdf format, which requires preprocessing prior to GIS input. 8.4.4 Social Security Agency of the Republic of Turkey Social Security Agency of the Republic of Turkey (http://www.sgk.gov.tr/) provides information in several languages but Turkish section is the most informative. It features an interactive map of provinces with information on number of employees, population receiving different types of social security assistance and similar information. The data are not readily available for GIS input and need to be collected from the interactive map. Additional data may available in a form of reports and other publications. 8.4.5 General Directorate of Petroleum Affairs General Directorate of Petroleum Affairs (http://www.pigm.gov.tr/) is a petroleum sector agency. Turkish section of the site is the most informative. It provides information on oil and natural gas exploration on a national level. Some information may be available by a geographic region (provinces). The data are in .xls format and can be easily used for GIS input. 8.4.6 Ministry of Environment and Urbanization, Air Quality Monitoring Network Ministry of Environment and Urbanization, Air Quality Monitoring Network (http://www.havaizleme.gov.tr/) provides air quality data as recorded by stations. Data are available for each station on daily, weekly and monthly basis. Data can be obtained from an interactive map or it can be generated as a report for each station. Information for each station includes longitude and latitude, so it is possible to use it to generate data in GIS format. 8.4.7 General Directorate of Electrical Power Survey and Development General Directorate of Electrical Power Survey and Development (http://www.eie.gov.tr/) provides data on energy-related issues (e.g., consumption in kWh) on a 3rd administrative level for 2010-2011. Information is presented in a form of an interactive map (http://www.eie.gov.tr/il_enerji.aspx). Additional projects include solar energy, wind energy, and hydroelectric energy potential atlases. Data from 44 ERDC/CERL TR-08-DRAFT these atlases is represented in a form of interactive maps as well as published reports on a third administrative level. None of these data are readily available for GIS input. 8.5 Education 8.5.1 Republic of Turkey Ministry of National Education Republic of Turkey Ministry of National Education (http://www.meb.gov.tr/english/indexeng.htm) publishes a statistical bulletin on educational indicators. The latest available bulletin covers the years of 2012-2013. Most data are available on the 3rd administrative level (provinces). The website has Turkish and English sections; Turkish section is the most informative. 8.5.2 Republic of Turkey Ministry of National Education, General Directorate of Secondary Education Republic of Turkey Ministry of National Education, General Directorate of Secondary Education (http://ogm.meb.gov.tr/) provides statistics on secondary education schools (in Turkish only). The data are presented in a form of an online table; by clicking on the school’s name a user can see that school’s statistics for selected years (2010 is the most recent). The same information is available for download in Microsoft Access database format though the database’s macros appear to be broken and need to be fixed in order to view the data. Assuming the data are fixed, it should be possible to create geographic data for GIS input by using schools’ locations (a school location appears to be represented as an address and as a province or 3rd administrative level). 8.6 Ethnicity 8.6.1 The Kurdish Institute of Paris The Kurdish Institute of Paris (http://www.institutkurde.org/) is an independent organization supporting activities aimed at contributing to the knowledge pool about the Kurdish community, its language and culture. The website has links to online publications, conference proceedings and other publications. There is no data directly suitable for GIS input but publications and reports provide a general background and some information extracted from them may be useful for GIS. 45 ERDC/CERL TR-08-DRAFT 8.7 Language 8.8 Land: Cultural Terrain 8.9 Land: Ownership 8.9.1 Deed Inquiry: Inquiry TAKBIS Land Registry and Cadastre, Land Purchase Event Deed Inquire (http://www.takbis.org/) provides data on land ownership in Turkey. It seems geared more toward providing information to potential land purchases. As with many Turkish data sources, access to the search features requires payment of a fee and/or that one have a Turkish ID. Land ownership data in Turkey is also incomplete. However, with funding support from the World Bank, Turkey has undertaken a project that will eventually map all land parcels and ownership. 8.9.2 Generate Directorate of Land Registry and Cadastre Generate Directorate of Land Registry and Cadastre (http://www.tkgm.gov.tr/) provides data on land ownership in Turkey. As with many Turkish data sources, access to the search features requires payment of a fee and/or that one have a Turkish ID. Land ownership data in Turkey is also incomplete. However, with funding support from the World Bank, Turkey has undertaken a project that will eventually map all land parcels and ownership. 8.10 Land: Use and Cover 8.10.1 Ministry of Forestry and Water Affairs of Republic of Turkey Ministry of Forestry and Water Affairs of Republic of Turkey (http://cbs.ormansu.gov.tr/) has launched a geoportal, which allows to obtain a variety of forestry-related data. Presumably, the data could be downloaded in ESRI shapefile, raster and/or geodatabase format. However, the site requires a working knowledge of Turkish (English version is less complete and many links are broken). In September 2013, several unsuccessful attempts were made to download some data from the geoportal. It is not clear whether the request didn’t go through, whether the geoportal 46 ERDC/CERL TR-08-DRAFT is not completely up yet, or whether it is necessary to become a registered user first. 8.10.2 General Directorate of Combating Desertification and Erosion General Directorate of Combating Desertification and Erosion (http://www.cem.gov.tr/) is a part of the Ministry of Forestry and Water Affairs. The agency is dedicated to soil conservation, flood control and protection and development of natural resources. The website is in Turkish only. It provides data in a form of reports and graphs, many of which are on a national level. Some data may be available on a sub-national level and needs to be extracted from the reports. No readily GIS data are available; any data extracted from the reports needs to be converted into a proper format. 8.10.3 General Directorate of Forestry General Directorate of Forestry (http://web.ogm.gov.tr/) provides statistics on forest conditions, forest fires and similar in a form of published reports on a national level. Some data may be available on a sub-national level, but it needs to be extracted from the published reports and prepared from GIS input. An English version of the site is available but it is less complete than the Turkish version. 8.11 Medical and Health 8.11.1 Institute of Population Studies at Hacettepe University Institute of Population Studies at Hacettepe University (http://www.hips.hacettepe.edu.tr/eng/index.html) conducts research related to the demographic, social, economic, cultural and medical aspects of population studies. It conducts Turkish Demographic and Health Survey quinquennially. Additionally, it conducts other surveys, most recent ones Turkey National Maternal Mortality Study (2005), Turkey Migration and Internally Displaced Persons Survey (2005), and National Research on Domestic Violence Against Women in Turkey Survey (2008). Summary reports on the national level are available freely and can be used to extract some georeferenced data but microdata need to be requested from the Institute. 47 ERDC/CERL TR-08-DRAFT 8.11.2 Northern Cyprus Ministry of Health of the Republic of Turkey Northern Cyprus Ministry of Health of the Republic of Turkey (http://www.saglikbakanligi.com/) provides information in Turkish only. Additionally, the way the website is set up makes it difficult to use Google Translate services. The website provides health-related statistics from 2002 through 2012. The data for each year comes in different formats (e.g., published reports and .xls/.csv files). 8.12 Organizations 8.13 Religion 8.13.1 The Presidency of Religious Affairs The Presidency of Religious Affairs (http://www.diyanet.gov.tr/) is a branch of government responsible for regulating religious services. Only Turkish version of the website is available. It provides several statistical tables (primarily covering a national level though some cover the third administrative level) on number of religious organizations, people attending religious schools, number of religious and non-religious personnel, and others. 8.14 Significant Events 8.14.1 The Official Gazette of Turkey The Official Gazette for Turkey (http://www.resmigazete.gov.tr/) documents the actions of government, such as legislation, treaties, executive orders, judiciary decisions and other official announcements. This data mostly covers actions of the executive branch of the government and is available back to editions from 1921 and there is a search feature on the website. Use of the website does require a working knowledge of Turkish. 8.14.2 The Grand National Assembly of Turkey The Grand National Assembly of Turkey (http://www.tbmm.gov.tr/) provides documentation of its activities. There is some National Assembly data as far back as 1920. Much of the data consists of pdf files, in Turkish, of Assembly meetings, similar to the Congressional Record of the United States. 48 ERDC/CERL TR-08-DRAFT 8.14.3 Indiegogo A protest movement in June of 2013 in Turkey against demolition of Gezi Park in Istanbul transformed into a movement against the ruling AK Party that united Turks across society. A documentary film (http://www.indiegogo.com/projects/istanbul-united-the-movie) is in progress that examines how this political movement even united football fans from three competing teams. The documentary seeks to examine how the political situation in Turkey has made it possible for “ultra” football fans to put aside their differences in sports to work together for political change. This is an unusual data source but potentially insightful. 8.14.4 BELGEnet BELGEnet (http://www.belgenet.net/) provides data on elections by district since 1954. The latest dataset dates to 2007. 8.15 Social Groups 8.16 Transportation Use 8.17 Water Supply and Control 8.17.1 State Hydraulic Works State Hydraulic Works () is a state agency responsible for Turkey’s water resources. The website is in Turkish only. It appears that some basic statistics are available (e.g., the amount of groundwater at a given reservoir). Additionally, an interactive map related to surface water is available at http://rasatlar.dsi.gov.tr/. It does not appear that any of the data can be easily downloaded and integrated into GIS. 49 ERDC/CERL TR-08-DRAFT 9 Conclusions Developing countries often lack the internal spatial data infrastructure, or frameworks of data, metadata, users and tools that interact for the use and creation of a coordinated spatial data infrastructure. In some cases, developing countries choose not to disclose their data, especially on a finer level. Open-source data for Azerbaijan and Turkey are often not available beyond the district level (third administrative level). However, Azerbaijan and Turkey are actively working on improving their digital mapping capabilities. Spatial crowd-sourced data from projects like OpenStreetMap is becoming a valuable tool in geospatial research in order to quickly assess on-the-ground conditions; however, the potential flaws of such data must be understood and a hybrid approach to data integration must be undertaken. Statistical agencies of Azerbaijan and Turkey publish a variety of social and economic data though it must be kept in mind that many developing countries lack a certain degree of transparency and may alter statistical numbers for political purposes. For these reasons, it is useful to additionally consult independent sources when possible. Occasionally, data of high quality may be obtained from independent researchers and scholars, though this will require establishing working relationships with these individuals and/or institutions. 50