Assessing the feasibility of micro-data access Atle Alvheim Assistant Director Norwegian Social Science Data Services Luxembourg 26 - 27 October 2006 Norwegian Social Science Data Services (NSD) www.nsd.uib.no nsd@nsd.uib.no There is a lack of “Tools for thought” ”Much more time went into finding or obtaining information than into digesting it” Dr. J.C.R. Licklider Maximize ( time spent on digesting and thinking time spent on finding and accessing ) Situation to-day Only a fraction of data resources available on line Lack of standardization Poor integration between data and metadata Institutional, legal, and commercial obstacles Situation tomorrow? All empirical data available on-line An integrated gateway to be used to integrate and locate relevant resources The ability to browse, visualize, and analyze data on-line Hyperlinks from data to relevant scientific publications and resources Empirical feedback system to build the collective memory of a data collection What are we looking for? Data Data Sharing Metadata Tools Why are metadata important? Unlabeled stuff Labeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pd The functions of metadata Finding Sharing Understanding Assessing Data Documentation Initiative (DDI) An international XML-based standard for the content, presentation, transport, and preservation of documentation Benefits of the DDI Approach Providing the data analyst with broader knowledge about a given collection. Interoperability Codebooks can be exchanged and transported seamlessly, and applications can be written to work with these homogeneous documents. Richer content Single document - multiple purposes DDI documents are easily imported into online analysis systems, rendering datasets more readily usable for a wider audience. On-line subsetting and analysis Precision in searching The codebook contains all of the information necessary to produce several different types of output. Field-specific searches across documents and studies are enabled. A life-cycle model of data Data Archiving Study Concept Data Collection Data Processing Data Distribution Data Discovery Repurposing Combined life cycle model Data Analysis A common European data portal • Metadata is all about communication • Madiera: A set of tools, + an idea: Data is a kernel that facilitates a ”discussion” • Maybe future libraries consist of datasets with linked or derived knowledge-products, books, papers, tables, etc, wikis • Could we imagine libraries of hypoteses ? • Libraries of questions and discussions more than of answers ? What was the specified MADIERA Objectives ? • The development of an integrated and effective distributed social science portal to facilitate access to a range of data archives and disparate resources. WP3 • The development of specific add-ons to existing virtual data library technologies, in particular data location technology WP4 • The employment of a multi-lingual thesaurus to break the language barriers to the discovery of key resources. WP5 • An extensive programme to add content, both at the data/information and knowledge levels. WP6 • Extensive training of data providers and users to encourage the continuos growth of the infrastructure WP7 A Web of the Social Sciences • Building on a distributed model where data and resources are stored and maintained locally End users • For the end user the system will appear as a integrated system • A virtual data library offering global access to locally supported data holdings Data providers What is then necessary to develop useful procedures ? • • • • Metadata standards lift data from digits to research information Technical solutions, software: Information- and access systems, in addition to analysis and download possibilities Political agreements, conditions for data access Economic agreements, logging, audits EXAMPLE A common resource European Social Survey (ESS) europeansocialsurvey.org ess.nsd.uib.no An academically driven social survey designed to chart and explain the interaction between Europe's changing institutions and the attitudes, beliefs and behaviour patterns of its diverse populations. ANOTHER EXAMPLE: Aggregate data The determinants of active civic participation at European and national level (CIVICACTIVE) nsd.uib.no/civicactive A third example: a common entry-point madiera.net The MADIERA project has developed an effective infrastructure for the European social science community by integrating data with other tools, resources and products of the research process. A Finnish researcher A scheme A Swiss researcher An Irish researcher • A registration procedure, register with home archive • Look up and access data across holdings Data on Finland (A geographic area) Eurobarometer (A data collection) Attitudes towards Immigrants (A problem area) A ”Data-archive Political Context” for 20+ national archives I. It might be money involved Is the data a free or commercial good ? There are categories of users, what about non-academic use, non-CESSDA use ? Who are to fix the prices ? II. Varying Access rules. The crossing of national borders What laws apply. Who set the rules Who is responsible ? What sanctions available ? III. There are some “Common good” data Eurobarometers, Value studies, ISSP, ESS, Comparative collections Could best be provided from one single point (?) Charging ? Access Conditions ? Double Storage ? IV. It is a good thing to have national archives, enhances amount of data available and betters the accessibility. Need justification and visibility All use the ”NESSTAR Publisher” / DDI ELSST Madiera: A common portal for all of Europe, ++ Portal Functionality: Link many local servers Search and browse possibilities __________________ NSD FSD SSD Standardised software and standardised documentation Translation possibilities ZA DANS DDA UKDA Politics: Coordinated access rules Politics Money