Zoo 955: Information Management in Ecology

advertisement
Zoo 955, http://limnology.wisc.edu/courses/zoo955
Zoo 955: Information Management in Ecology
Spring 2008
Course Information
Instructors:
Paul Hanson, Center for Limnology, pchanson@wisc.edu
Barbara Benson, Center for Limnology, bjbenson@wisc.edu
Meeting time: 9:55 – 11:50 Wednesdays; two blocks (A, B) per week with a 10 minute break
between blocks. Students are expected to attend both blocks.
Locations: Most class periods will meet in the Center for Limnology Conference room (rm 210).
One or more labs may meet at computer facilities elsewhere on campus.
Course goals: In this seminar you will learn information management issues spanning a broad
range of research models, from single-investigator projects to large, international research
collaborations. As a group, we will investigate the relationships between information and the
research process. We will have practical activities that use tools and technologies required for
managing ecological data. As part of this seminar, students will create their own well-designed
database, using their data, and tailored to their needs.
Student’s responsibilities: (a) participate in class discussions and laboratory exercises; (b) read
assigned materials before class; (c) lead or co-lead a one hour discussion; (d) present project.
More information on “c” and “d” follows.
Activities:
 Lectures and guest speakers: Instructors or guest speakers will provide lectures on the
topics listed and will lead a discussion. Readings may be provided and should be read
before the lecture.
 Labs: Instructors will guide students through hands-on information management
activities. Computers and software are supplied, although students may choose to use
their own computers. Software will be open source whenever possible.
 Discussions: Each student will choose a discussion topic to lead during one block.
Depending on the number of students, some may need to work in pairs. Sample
discussion topics are listed following the syllabus.
 Student projects: Each student will create a database, using data from her/his choosing.
At the end of the semester, each student will have an opportunity to present the database,
including the model, technology, metadata, etc.
Resources: http://limnology.wisc.edu/courses/Zoo955: Some materials for the course are
available at this Web site.
1
Zoo 955, http://limnology.wisc.edu/courses/zoo955
Syllabus
Date
23 Jan
Person(s)
Benson
Benson
Hanson
Benson, Hanson
Luke Winslow, CFL
Luke Winslow, CFL
Topic
Introduction
Exercise on data structures
Introduction
Exercise on data structures (cont.)
Create a database
Create a database
GLEON 6, instructors away
Amy Kamarainen
Hanson, Winslow
Jeff Maxted, CFL
Jeff Maxted, CFL
Hanson
Michael Hamilton,
James Reserve, CA
Katrina Butkas
Metadata
More database tools
Spatial data
Spatial data
Sensor networks
Embedded ecological sensor networks
A
Activity
Lecture
Lab
Lecture
Lab
Lab
Lab
Work day
Work day
Discussion
Lab
Lecture
Lecture
Discussion
Guest
speaker
Discussion
Discussion
Spring break
Spring break
Discussion
Matt van de Bogert
26 Mar
B
A
B
A
02 Apr
B
A
Discussion
Guest
speaker
09 Apr
B
A
Discussion
Guest
speaker
Lucas Mayer-Horner
Deana Pennington,
LTER Network
Office
Steve Powers
Peter McCartney,
NSF
B
A
B
Discussion
Discussion
Guest
speaker
A
B
A
B
A
B
Student proj.
Cancelled
Student proj.
Student proj.
Student proj.
Student proj.
30 Jan
06 Feb
13 Feb
20 Feb
27 Feb
05 Mar
12 Mar
19 Mar
16 Apr
23 Apr
30 Apr
07 May
A
B
A
B
A
B
A
B
A
B
A
B
A
B
Sarah Johnson
Ann Busche
Noah Lottig
Alain Roy and Todd
Tannenbaum, UW
Computer Science
Katrina, Lucas
Steve, Noah, Amy
Matt, Sarah, Ann
Hanson, Benson
Hanson, Benson
2
Current practices of documenting flow
of scientific analyses
Scientific workflows
Hierarchical collaborations using a
long-term dataset: a local case study
Collaboration technologies
Structure and function of crossdisciplinary collaborations and the
flow of information
Young data
A perspective from the National
Science Foundation on IM in
biological sciences
IM at the organizational level
Top 20 IM needs of students
Open Science Grid – International
collaborative networks
Project presentations
CFL Field planning meeting
Project presentations
Project presentations
Summary
Summary
Zoo 955, http://limnology.wisc.edu/courses/zoo955
Readings by Date/Topic:
20 Feb 2008: Metadata
Michener, W. K., J. W. Brunt, J. J. Helly, T. B. Kirchner, and S. G. Stafford. 1997.
Non-geospatial metadata for the ecological sciences. Ecol. Appl. 7:330-42.
05 Mar 2008: Sensor Networks
Collins, Scott L., Bettencourt, Luis M.A., Hagberg, Aric, Brown, Renee F., Moore,
Douglas I., Bonito, Greg, D. 2006. New opportunities in ecological sensing using
wireless sensor networks. Frontiers Ecology and the Environment 4(8): 402-407
<Collins et al 2006.pdf>
Estrin, Debra et al. 2003. Environmental Cyberinfrastructure Needs for Distributed
Sensor Networks: A Report from a National Science Foundation Sponsored
Workshop. (Introduction; p.25 Box 6 on ENS; p.29 Box 7 on CLEANR;
Chapter 6; Chapter 8) <Estrin et al. 2003>
Porter, J. et al. 2005. Wireless sensor networks for ecology. BioScience 55(7): 561572 <Porter et al 2005.pdf>
05 Mar 2008: Sensor Networks
Altintas, I., C. Berkeley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. 2004.
Kepler: an extensible system for design and execution of scientific workflows.
Proc. 16th Int. Conf. Sci. Stat. Database Manag.
Deelman, E.W, and Y. Gil. Workshop on the Challenges of Scientific Workflows.
National Science Foundation. http://www.isi.edu/nsf-workflows06
Pennington, D.D., D. Higgins, A. Townsend Peterson, M.B., Jones, B. Ludascher,
and S. Bowers. Ecological Niche Modeling Using the Kepler Workflow System.
Report.
02 Apr 2008: Large, Collaborative Networks
09 Apr 2008: National Science Foundation
NSF Cyberinfrastructure Council. 2007. Cyberinfrastructure Vision for 21st Century
Discovery. National Science Foundation. http://www.nsf.gov/pubs/2007/nsf0728
3
Zoo 955, http://limnology.wisc.edu/courses/zoo955
Sample Discussion Topics
1. Data models: Data models refers to the way in which data are organized, including data
types, relationships between variables, variable grouping, and the relationships between meta
data and data. What data models exist? Which are most commonly used in ecology? How do
they differ as a function of the system they represent, e.g., can bird data be organized
differently from water chemistry data?
2. Metadata: Metadata is the contextual information needed to use a data set (data about the
data). What metadata are important for science reuse of data and why? What metadata
standards have been developed for ecology? What incentives might be provided to
researchers to generate good quality metadata for their data sets?
3. Data discovery: Exploring data repositories and searching across data archives requires data
to be “exposed” to the world and tools for accessing those data. How does the data model,
data structure, and metadata facilitate this? What standards exist, such as EML, to help this
process? What tools are being developed to facilitate discovery across multiple IM systems?
What are the techniques for visualizing discovered data?
4. Sensor networks: Tremendous resources have been invested in sensor networks designed to
automatically monitor the environment. The huge volumes of data and the effort required to
deploy and maintain these systems require automation and standardization at many steps.
What are the implications of continuously streaming data for IM? What are the unique
requirements for data models, QA/QC, and data discovery?
5. Semantic mediation, controlled vocabularies, ontologies: Integrating ecological data from
multiple sources can be challenging due to heterogeneity in content, format, scale, semantics,
etc. What are some of the approaches to semantic mediation of data sets? How can a
controlled vocabulary facilitate data integration? What are some examples from ecology of
use of controlled vocabularies? What roles can ontologies play in data integration? Why are
ontologies difficult to create?
6. QA/QC: Ensuring data quality (quality assurance/ quality control) is a critical, though often
underappreciated, component of IM. What are the standards of QA/QC? What tools are
available? Who should be responsible for this – field technicians, database administrators or
researchers? What algorithms are used to perform QA/QC, and how are data quality
represented in the database?
7. IM in large, collaborative networks: LTER, NEON, GLEON, GEON, WATERS, and
CLIME are examples of large collaborative networks. Big science requires big investment in
IM. How do these organizations do it? What resources are required? At what level are their
IMSs compatible?
8. Scientific workflows: IM and data analysis often requires repetitive tasks executed in
defined, though sometimes complicated, sequence. Scientific workflows formalize, automate
and document these tasks. What are the components of a workflow? What tasks lend
themselves to workflows? What tools exist to develop workflows, and who is using them?
9. Collaboration technologies: People have to communicate, and scientific collaborations are
more frequently occurring among geographically dispersed researchers. What are some of
the technologies available that enable scientists in these collaborations to analyze, discuss,
annotate, and view data?
10. Document management: The lifecycle of the research process produces a variety of
information, ranging from proposals to data to manuscripts. New technologies organize and
4
Zoo 955, http://limnology.wisc.edu/courses/zoo955
serve these documents for consumption by people inside the organization, as well as the
general public. What technologies exist for document management, and how does it differ
from traditional fileserving? What are the implications for duties and responsibilities for IM
within research organizations? How does this type of technology improve the competitive
advantage of research organizations?
5
Download