CS 4390/5390 Fall 2014
Shirley Moore, Instructor svmoore.pbworks.com
August 25, 2014
1
The Industrial Revolution of
Big Data (Joe Hellerstein, 2008)
Image credit: http://www.uberb2b.com/b4b-presents-the-first-industry-4-0-miniconference/
2
3
Image credit: http://www.opentracker.net/solutions/big-data-analytics
4
The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades…
Because now we really do have essentially free and ubiquitous data. So the complementary scarce factor is the ability to understand that data and extract value from it.
Hal Varian, Google’s Chief Economist
The McKinsey Quarterly, Jan 2009
5
Data-Intensive Science
Image credit: Synergistic Challenges in Data-Intensive Science and Exascale Computing, DOE
ASCAC Data Subcommittee Report, March 2013, available on science.energy.gov
6
Slide courtesy of Kathy Yelick 7
8
• ATLAS and CMS experiments at the Large Hadron
Collider enerate data at rates of petabytes per second running round the clock for a large fraction of the year.
• Climate simulations generating several petabytes of data per year.
• Computational biology and genomics producing extreme volumes of data for
– biophysical simulations of cellular environments
– cracking the code of the genome across species
– correlating observational ecology and models of population dynamics
– reverse engineering the brain
9
Image credit: wiki.creativecommons.org/Case_Studies/CERN
10
o d t a m l e e d a t e d
S l a i m u
11
12
1. Cognition is limited.
2. Memory is limited.
13
• The Door Study https://www.youtube.com/watch?v=FWSxSQsspiQ
• The Invisible Gorilla www.invisiblegorilla.com
• Mueller and Krummenacher, “Visual search and selective attention” , Visual Cognition 14:
389-410, 2006.
14
• Test your working memory capacity
– http://www.gocognitive.net/demo/working-memory-capacity
• The Magical Number Seven Plus or Minus Two
– George Miller, “The magical number seven, plus or minus two: Some limits on our capacity for processing information” , The Psychological
Review 63: 81-97, 1956.
15
• Using perception to point out interesting things
• Using pictures to enhance working memory
16
MTHIVLWYADCEQGHKILKMTWYN
ARDCAIREQGHLVKMFPSTWYARN
GFPSVCEILQGKMFPSNDRCEQDIFP
SGHLMFHKMVPSTWYACEQTWRN
17
V
MTHI V LWYADCEQGHKILKMTWYN
ARDCAIREQGHL V KMFPSTWYARN
GFPS V CEILQGKMFPSNDRCEQDIFP
SGHLMFHKM V PSTWYACEQTWRN
18
Which Number Appears Most Often?
15 19 60
33 11 75
57 34 79
18 51 92
73 22 13
71 60 22
17 10 68
73 18 55
65 46 29
19
Which Number Appears Most Often?
(cont.)
60 73 22
46 92 97
10 58 46
57 17 83
26 99 33
88 92 60
91 29 57
96 12 47
20
• Can you devise a visualization that makes this task easier?
21
• http://www.merriamwebster.com/dictionary/visualization
1. formation of mental visual images
2. the act or process of interpreting in visual terms or of putting into visible form
“Computer-based visualization systems provide visual representations of datasets intended to help people carry out some task more effectively.”
--Tamara Munzner
22
• How does triglyceride level vary by age and income level?
TRIGLYCERIDE LEVEL
Males Females
Income
0-$24,999
$25,000+
Under 65 65 or Over Under 65 65 or Over
250
320
200
150
275
400
450
200
• Can you devise a visualization that enables seeing the answer at a glance?
23
• Answer questions
• Generate hypotheses
• Make decisions
• See data in context
• Expand memory
• Support computational analysis
• Find patterns
• Tell a story
• Inspire
24
• What is the purpose of the visualization?
• What techniques are used?
• Are the techniques used effectively?
• Does the visualization focus our attention on important aspects of the data?
• Does the visualization accomplish its purpose?
• How could the visualization be improved?
25
• Hans Rosling TED Talk
– https://www.youtube.com/watch?v=usdJgEwMinM
• Homework assignment #1
– Write a critique of a visualization from this video
– Bring to class to share on Wednesday, Sept 3
26
• Course website: http://svmoore.pbworks.com/
(click on Data Visualization)
• Instructor: Shirley Moore
– Office: CCSB 3.0422
– Office hours: MW 3:00-4:00pm, others by appointment
• Teaching assistant: Henry Moncada
– Office: CCSB 3.1202H
– Office hours:
27
• Grades will be posted on Blackboard
• Approximate breakdown
– 35% homework and lab assignments
– 15% class preparation and participation
– 25% course exam
– 25% project
28
• Visualization Design and Analysis: Abstractions,
Principles, and Methods by Tamara Munzner, AK Peters
2014 (to appear). Draft available at http://www.cs.ubc.ca/~tmm/courses/533/book/vispm p-draft.pdf
• Visual Thinking for Design by Colin Ware, Morgan
Kaufman, 2008.
• Visualizing Data: Exploring and Explaining Data with
the Processing Environment, by Ben Fry, O’Reilly, 2007.
• The ParaView Tutorial, Version 4.1, by Kenneth
Moreland, Sandia National Lab, 2013. http://www.paraview.org/Wiki/The_ParaView_Tutorial
29
• Processing
– processing.org
– Programming language and development environment for information visualization applications
– Download and install on your laptop
• ParaView
– www.paraview.org
– Open-source data analysis and visualization application
– Developed to analyze extremely large datasets using distributed computing resources
• Others to be determined
30
• Implementation of a visualization for a significant dataset
– You may choose your own dataset or use one provided by the instructor
• Report describing background and design decisions
• Presentation during last week of class or final exam period (or special poster/demo session?)
• Work individually or in teams of up to three
31
• CPS 5310 Simulation and Modeling
• CS 5334 Parallel & Concurrent Programming
• CS 4342 Database Management
• CS 4390/5339 Web-based Systems
• CS 4317/5317 Human-Computer Interaction
• CS 3370 Computer Graphics
• Graphic Design
• Psychology
32
• Build more complex and insightful visualizations than your current skills and tools allow
• Learn how to effectively communication information about complex data to others
• Ask questions about, explore, and understand data for your job or research
• The set of people who need to be able to visualize data is growing beyond experts in visualization.
33
Opportunity: Vizzies Visualization
Challenge
• http://www.nsf.gov/news/special_reports/scivis/
• Sponsored by National Science Foundation and Popular
Science
• Deadline September 30, 2014
• Cash prizes
• Categories
– Photography
– Illustration
– Posters and graphics
– Games and apps
– Videos
• Extra credit and/or use as basis for project for this class
34
• Read Munzner Chapter 1
• Readings on Visualization Techniques
• Download Processing software and install on your computer (see me if you don’t have a computer you can use for this)
– Will start using Processing second week of class
– Will also get Processing installed on lab computers
• Start on Homework Assignment 1
35