ppt - Department of Computer Science

advertisement
Big Data:
Big Challenges for
Computer Science
Henri Bal
Vrije Universiteit Amsterdam
Multiple types of data explosions
High-volume data
10-100 x global internet
traffic per year (by 2018)
Complex data
Graphics Processing Units
(GPUs)
Differences CPUs and GPUs
●
●
CPU: minimize latency of 1 activity (thread)
●
Must be good at everything
●
Big on-chip caches
●
Sophisticated control logic
ALU
ALU
ALU
ALU
Control
Cache
GPU: maximize throughput of all threads using
large-scale parallelism
Example: NVIDIA Maxwell
●
●
16 independent streaming
multiprocessors
2048 compute cores
Ongoing GPU work at VU
●
●
Applications
●
Multimedia data
●
Digital forensics data
●
Climate modelling
●
Radio astronomy data
Methodologies
●
●
COMMIT/
Hadoop on accelerators
Programming methods
for accelerators
●
Teaching GPUs (with UvA)
●
National ICT research infrastructure
Complex data
●
Still smaller in volume than astronomy etc.
●
Much more complicated, semantically rich data
●
Growing fast ….
Semantic web
●
Make the Web smarter by injecting meaning so
that machines can reason about it
●
●
initial idea by Tim Berners-Lee in 2001
Now attracted the interest of big IT companies
WebPIE: a Web-scale Parallel
Inference Engine
●
Web-scale parallel reasoner doing full
materialization
●
●
Orders of magnitude faster than previous work by using
smart parallel algorithms
Jacopo Urbani + Frank van Harmelen (VU)
Christiaan Huygens nomination PhD thesis Urbani
Reasoning on changing data
●
WebPIE must recompute everything if data
changes
●
●
Challenge: real-time incremental reasoning,
combining new (streaming) data & historic data
●
●
●
●
Takes on the order of 1 day on a 64-node compute
cluster
Nanopublications (http://nanopub.org)
Handling 2 million news articles per day (Piek Vossen,
VU)
Data streams from (health) sensors & smart phones
Exploit massive parallel computing and GPUs
Other work on complex data
●
●
Use semantic web to describe and reason about
computer infrastructure (Cees de Laat, UvA)
Machine learning using GPUs (Hadoop)
●
●
Joint work with Max Welling (UvA)
Business applications
●
With Frans Feldberg (VU, Economy)
Discussion
●
●
●
We can process peta-scale (1015 , LHC) simple
data
with cluster and grid technology
Exascale (1018 , SKA) may be feasible with GPUs,
but requires new parallel programming
methodologies
Processing complex data is vastly more
complicated, even at smaller scales
●
Complex data is also escalating in size
●
Dynamic (streaming) data will be next
●
Processing exa-scale dynamic complex data?
Download