A DCO Data Legacy? - Deep Carbon Observatory

advertisement
Deep-Time Data Infrastructure:
A DCO Legacy Program
Robert M. Hazen—Geophysical Lab, Carnegie Institution
DCO Data Science Day—RPI—June 5, 2014
Conclusions
Vast, largely untapped, data resources
inform our view of Earth’s dynamic
history over 4.5 billion years.
Combining those deep-time data
resources into a single infrastructure
represents an opportunity for
accelerated “abductive” discovery.
Deep-Time Data Collaborators
Carnegie Institution
Robert Hazen
Xiaoming Liu
Anat Shahar
Rutgers
Paul Falkowski
RPI
Peter Fox
Univ. of Arizona
Robert Downs
Mihei Ducea
Grethe Hystad
Barbara Lafuente
Hexiong Yang
Alex Pires
Joaquin Ruiz
Joshua Golden
Melissa McMillan
Shaunna Morrison
Johns Hopkins Univ. CalTech
Dimitri Sverjensky
Ralph Milliken
Charlene Estrada Univ. of Maine
John Ferry
Edward Grew
Namhey Lee
Smithsonian Inst.
Harvard University
Timothy McCoy
Andrew Knoll
Univ. of Manitoba
Indiana University
Andrey Bekker
David Bish
MINDAT.ORG
Univ. of Michigan
Jolyon Ralph
Rodney Ewing
Colorado State
Univ. of Maryland
Holly Stein
James Farquhar
Aaron Zimmerman
John Nance
Univ. of Tennessee
Univ. of Wisconsin
Linda Kah
John Valley
Univ College London
Geol. Survey Canada Dominic Papineau
Wouter Bleeker
George Mason Univ.
Stephen Elmore
Deep-Time Data Resources
Mineralogy and petrology data:
Mineral species and assemblages
Compositions (including isotopes)
Age (ages)
Geographic location; tectonic setting
Crystal size; morphology; twinning
Solid and fluid inclusions; defects;
Magnetic domains; zoning; exsolution
Surface properties; grain boundaries
Deep-Time Data Resources
Mineralogy and petrology data
Paleobiology data
Fossil species and assemblages
Age
Biominerals; isotopic composition
Molecular biomarkers
Host lithology
Geological/tectonic context
Deep-Time Data Resources
Mineralogy and petrology data
Paleobiology data
Proteomics data
Enzyme structure and function
Age (from phylogenetics)
Active site composition
Microbial context
Deep-Time Data Resources
Mineralogy and petrology data
Paleobiology data
Proteomics data
Geochemistry data and modeling
Thermochemical data
Equilibrium and reaction path models
Deep-Time Data Resources
Mineralogy and petrology data
Paleobiology data
Proteomics data
Geochemistry data and modeling
Paleotectonic & Paleomagnetic Data
Age
This is the IMA Mineral
Database website, with a
direct link to the Mineral
Evolution Database.
This map displays the localities. The popup
demonstrates metadata for a given locality.
The Potential of Deep-Time Data
The Premise: Rocks, minerals,
fossils, and life’s biochemistry hold
clues to significant changes in
Earth’s near-surface environment
through 4.5 billion years of history.
The Rise of Atmospheric Oxygen
Lyons et al. (2014) Nature 506, 307-314.
D.E.Canfield (2014) Oxygen. Princeton Univ. Press
The Rise of Atmospheric Oxygen
?
Kump (2008) Nature 451, 277-278.
The Rise of Atmospheric Oxygen
D.E.Canfield (2014) Oxygen. Princeton Univ. Press.
Lyons et al. (2014) Nature 506, 307-314.
The Rise of Oxygen: Evidence
from redox-sensitive elements
= Major metal
element
= Major non-metal
element
= Trace
element
The Rise of Subsurface Oxygen
Geochemical modeling is key.
log fO2 ~ -72
The Rise of Subsurface Oxygen
log fO2 < -68
Siderite
FeCO3
The Rise of Subsurface Oxygen
log fO2 > -43
Azurite
&
Malachite
The Rise of Subsurface Oxygen:
Basalt weathering before/after the GOE
Reaction path calculations reveal changes in mineralogy
as fluids and rocks not in equilibrium react with each
other. Data from Sverjensky et al. (in prep)
The Rise of Subsurface Oxygen:
Basalt weathering before/after the GOE
Reaction path calculations reveal changes in mineralogy
as fluids and rocks not in equilibrium react with each
other. Data from Sverjensky et al. (in prep)
What minerals won’t form before the
Great Oxidation Event?
598 of 643 Cu minerals
Chrysocolla
202 of 220 U minerals
319 of 451 Mn minerals
Piemontite
47 of 56 Ni minerals
582 of 790 Fe minerals
Garnierite
Xanthoxenite
Co-evolution of the geosphere
and biosphere
Biologically mediated changes in
Earth’s atmospheric composition
at ~2.4 to 2.2 Ga represent the
single most significant factor in
Earth’s mineralogical diversity.
Enzymes reveal Earth’s
geochemical history.
Ferredoxin (before the GOE)
Enzymes reveal Earth’s
geochemical history.
Nitrogenase (after the GOE)
The Rise of Subsurface Oxygen
The Rise of Subsurface Oxygen
Golden et al. (2013), EPSL
SE HERE
GOE HERE
The Rise of Subsurface Oxygen
Kump (2008) Nature 451, 277-278.
The Rise of Subsurface Oxygen
Hypothesis: There was a
protracted “Great Subsurface
Oxidation Interval” that
postdated the GOE by a billion
years. This interval was the single
most significant factor in Earth’s
mineralogical diversification.
Data-Driven Discovery
Most of what scientists do most
of the time is start with a known
phenomenon, and then collect
relevant data and develop
explanatory hypotheses.
Deduction
Earth’s atmospheric oxidation
influenced the partitioning of
redox-sensitive elements.
Mo, Re, Ni, and Co are
redox-sensitive elements.
Therefore, we deduce that
atmospheric oxidation influenced the
partitioning of Mo, Re, Ni, and Co.
RESULTS: Molybdenite (MoS2) through Time
Golden et al. (2013) EPSL 366:1-5.
SE HERE
GOE HERE
RESULTS: Cu/Ni in carbonates vs. time
25
Xiaoming Liu et al. (2013)
20
15
Siderite/Ankerite
GOE HERE
SE HERE
Cu/Ni
Calcite/Dolomite
10
5
0
0
500
1000
1500
2000 2500
Age (Ma)
3000
3500
4000
Induction
Each of the last 5 supercontinent cycles led
to episodes of enhanced mineralization
during intervals of continental convergence.
Mo, Be, B, and Hg are
mineral-forming elements.
Therefore, we predict by induction that
Mo, Be, B, and Hg minerals will display
enhanced mineralization during intervals
of continental convergence.
The Supercontinent Cycle
The Supercontinent Cycle
SUPERCONTINENT
STAGE
INTERVAL
DURATION
Kenorland (Superia) Assembly
Stable
Breakup
2.8-2.5
2.5-2.4
2.4-2.0
300
100
400
Columbia (Nuna)
Assembly
Stable
Breakup
2.0-1.8
1.8-1.6
1.6-1.2
200
200
400
Rodinia
Assembly
Stable
Breakup
1.2-1.0
1.0-0.75
0.75-0.6
200
250
150
Pannotia
Assembly
Stable
Breakup
0.6-0.56
0.56-0.54
0.54-0.43
40
20
110
Pangaea
Assembly
Stable
Breakup
0.43-0.25
0.25-0.175
0.175-present
180
75
175
RESULTS: The
Supercontinent
CYCLE
The distribution of zircon
crystals through time
correlates with the
supercontinent cycle
over the past 3 billion
years.
(Condie & Aster 2010;
Hawksworth et al. 2010)
RESULTS: Mo Mineral Evolution
Temporal distribution of molybdenite (MoS2)
Golden et al. (2013) EPSL 366:1-5.
Hg Mineral Evolution
The distribution of mercury
(Hg) minerals through time
correlates with the SC cycle
over the past 3 billion
years, but there’s a gap
during Rodinia asembly.
Hazen et al. (2012) Amer. Mineral. 97:1013.
Abduction
Abduction is a form of logical
inference that goes from reliable data
(i.e., observations), to a hypothesis
that seeks to explain those data.
(Paraphrased from Wikipedia)
Abduction
Observations lead to new hypotheses.
We have vast amounts of data on mineral
species, compositions, isotopes, petrologic
context, thermochemical parameters,
tectonic settings, and the co-evolving
biosphere through deep time.
Previously unrecognized patterns and
correlations will emerge from the
integration and evaluation of those data.
Data-Driven Discovery
THE CHALLENGE: Recognizing
statistically meaningful patterns in
large data resources:
1. Correlations among many variables
DATA-DRIVEN DISCOVERY
Large integrated data resources can be
explored with multivariate techniques
(i.e., principal component analysis).
Search for highly
correlated patterns
among linear
combinations of
many different
variables.
Data-Driven Discovery
THE CHALLENGE: Recognizing
statistically meaningful patterns in
large data resources:
2. Meaningful trends in data vs. time
RESULTS: Molybdenite (MoS2) through Time
Golden et al. (2013) EPSL 366:1-5.
432 molybdenite samples
Are these trends
statistically significant?
• Analyze equal sized bins.
• Apply statistical tests:
linear regression of log Re
content vs. time.
(Montgomery et al. 2006)
Data-Driven Discovery
THE CHALLENGE: Recognizing
statistically meaningful patterns
in large data resources:
3. Peak-to-noise problem
Peaks in ages of ~40,000 zircon crystals
Condie & Aster (2010) Precambrian Research 180:227-236.
Monte Carlo Mean Kernal Density Analysis
Condie & Aster (2010) Precambrian Research 180:227-236.
Data-Driven Discovery
THE CHALLENGE: Recognizing
statistically meaningful patterns
in large data resources:
4. Visualization opportunities
Why Do We See the Minerals We See?
Log Number of Minerals
4
3
2
1
0
‐5
‐4
‐3
Too many species:
As, Hg, Sb, U
‐2
‐1
0
1
2
Log Crustal Abundance (ppm)
Too few species:
Ga, Rb, Hf
3
4
y = 0.2185x + 1.6926
R² = 0.3295
5
6
Element abundances versus numbers of mineral species
(Hazen, Grew, Downs et al.)
Why Do We See the Minerals We See?
Island area versus numbers of biological species
(MacArthur and Wilson, 1967)
Why Do We See the Minerals We See?
Cobalt minerals that also incorporate arsenic
What percentage of minerals incorporating element X,
also incorporates element Y? (Hazen, Fox, Downs et al.)
Why Do We See the Minerals We See?
Frequency distributions of 4933 mineral species: 22% of
mineral species are known from only one locality.
Why Do We See the Minerals We See?
Frequency distributions of 4933 mineral species: 22% of
mineral species are known from only one locality.
Therefore:
(1) Numerous additional minerals exist on Earth
but as yet remain undescribed.
(2) Numerous other plausible minerals do not
now exist on Earth, but might have in the past,
or might occur on other Earth-like planets.
(3) If we “played the tape over again,” then the
first 4933 minerals to be found would likely
differ by ~1000 mineral species.
Conclusions
Vast, largely untapped, data resources
inform our view of Earth’s dynamic
history over 4.5 billion years.
Combining those deep-time data
resources into a single infrastructure
represents an opportunity for
accelerated “abductive” discovery.
Data-Driven Discovery
CONCLUSIONS
We are poised to make fundamental
discoveries about our planetary home
through development, integration, and
exploration of deep-time data resources.
Please join this effort:
• Archive your data
• Release “dark data”
• Help us build this resource
Are these trends
statistically significant?
Statistical tests: linear regression of
log Re content vs. time
(Montgomery et al. 2006):
Log(Re) = β0+β1t+β2x2+β3x3+β4x4+β5x5+β6x6
[t = time; βi = regression parameters;
xi = indicator variables]
β0=0; β1=0.0059(8); β2=4.6(7); β3=12(2);
β4=15(2); β5=18(2); β6=19(2)
Enzymes reveal Earth’s geochemical history.
David & Alm (2011) “Rapid evolutionary innovation during
an Archean genetic expansion.” Nature 469,93-96.
Download