Volume - France in the United Kingdom

Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Ideas in Big Data?
French-British Workshop on Big Data - London, November 2012
Monica Marinucci
Director of Research, Oracle Global Education & Research Industry Unit
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data in Research: Volume
Exponential growth in data and the ability to access critical information
The volume of worldwide climate data
is expanding rapidly, creating
challenges for both physical archiving
and sharing, for ease of access of
relevant information in a
multidisciplinary environment
Volume
Very large
quantities of data
Total Archive in
TerraBytes (TB)
Evolution of ESA's EO Data Archives between 1986-2007
and future estimates (up to 2020)
22000
21000
20000
19000
18000
17000
16000
15000
14000
13000
12000
11000
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
Future Data Estimates
LA NDSA T 2-4 MSS (75-Dec 93)
The volume of earth-observation
data from European Space Agency’s
satellites passed 3PB in 2007 and
the projection for 2020 is seven-fold
A QUA Modis (A pril 03-today)
ENVISA T LR (March 02-today)
ENVISA T HR (March 02-today)
TERRA Modis (June 01-today)
QUICK SCA TT (01-today) /PROBA (May 02-today)
LA NDSA T 7 ETM (A pril 99-Dec 03)
SEA STA R SeaWifs (A pr 98-today)
ERS 2 HR (May 95-today)
ERS 2 LBR (May 95-today)
JERS SA R/OPS VNIR (92-Sep 98)
ERS 1 HR (Jul 91-Mar 00)
ERS 1 LBR (Jul 91-Mar 00)
SPOT 1-4 HRV (87-today)
MOS 1, 1b MESSR (87-Oct 93)
NOA A 9-17 A VHRR (86-today)
LA NDSA T 5 TM (A pril 84-today)
NIMBUS 7 (Nov 78-May 86), SEA SA T (Jun-Oct 78)
1986 1989 1993 1995 1998 2000 2003 2005 2007 2015 2020
Year
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data in Research: Velocity
Rapid growth in speed of data generation
The LOFAR Radio-Interferometre
is producing 1.6TB/sec  setting
new frontiers for radio-astronomy
In high energy physics, the Large
Hadron Collider generates 60TB of
data per day
Velocity
Extremely
fast streams of data
© CERN
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Big Data in Research: Variety
Enterprise infrastructure ability to quickly accommodate new data sources
© CERN
The proposed Large Synoptic Survey
Telescope will record 30 trillion bytes
of image data every day
© CERN
Variety
Wide range of
data type characteristics
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
In genomics on average scientists
can fully sequence 167 individuals
per week, generating 250GB of
images or 200 movie files
Big Data in Research: Value
Ability to translate raw data into information and knowledge
In genomics the cost of
sequencing is dropping
by 50% every 5 months
“… analysis, not
sequencing, will be the
main expense hurdle”
Value
High potential value
if harnessed correctly
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
(Chris Ponting , University of Oxford,
UK in Feb 2011 Article “Will
Computers crash Genomics?”)
New Frontiers in silico
Materials Science: Nanotube composites
Nature 447
http://www.bcu.ac.uk/elss
• (Extremely) Large Data Volumes
Storage
Metadata
Access
Exascale computing
The Carleton Wind Turbine
http:// http://onlyhdwallpapers.com
• Global Collaborations
Data sets integration
Large scale simulations & modeling
Context based
Visualisation
• Cross-Discipline Research
Cross-breeding of technology and innovative methods inspired by new
collaborations and exchange of methods and approaches
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
http://compbio.cs.toronto.edu/l
Oracle Labs
• To look for novel approaches and methodologies
• To focus on real-world outcomes: to develop
technologies that will someday play a significant role in
the evolution of technology and society.
• 4 main areas:
• Exploratory research
• Directed research
• Consulting
• Product incubation
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Challenges
Erasmus Medical Centre
• Complex data processing and analysis.
• Ability to
• load huge data information in minimum time
• store these data and their genomic DNA research results on storage disk
• have an efficient system able to give them query performance
Results
Thanks to an Exadata-based solution, Erasmus Medical Centre achieved:
•
For a 11 minute query, Exadata could improve it to 1 second, which is a major advantage for researchers to
have immediate results
•
Smart Scan and Flash Card : give performance in analyzing data.
•
Hybrid Columnar Compression : gives performance in the ability to manipulate Tb of data (compression from
133 Gb to 11 Gb), with increased performance.
•
Adding Oracle Database 11g features like partitioning gives more performance in manipulating, quantifying data
obtained through the study of various genomes
More information in the Press Release: Erasmus Medical Center employs Oracle Exadata for DNA research
https://emeapressoffice.oracle.com/Press-Releases/Erasmus-Medical-Center-employs-Oracle-Exadata-for-DNA-research-1a0e.aspx
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Visualisation
Courtesy of Prof. Peter van der Spek, Erasmus Medical Centre
PCA CLUSTERS
HEATMAP
CHEMICAL STRUCTURES
CHROMOSOMES
How is every record related to
every other?
What is the range and
distribution of values?
What is the range and
distribution of values?
What is the range and
distribution of values?
BRAIN ATLAS
PATIENT CORRELATION
PATHWAY NETWORKS
DNA, RNA & PROTEIN SEQUENCING DATA
Ref:
Allele1
Allele2
What are the major themes or
concepts?
How are the numeric attributes
correlated?
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
What are the supported
regulatory relationships?
What is the underlying natural
sequence variation?
Innovating with …
© CERN
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
… however …
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Q&A
Thank you
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.