the slides - Data Science at the University of Washington

advertisement
Data Science Incubator
ß
This morning
•
•
•
•
Context: A Data Science Environment
Data Science Studio
Pilot Incubator Program
Discussion
2
A 5-year, $37.8 million cross-institutional collaboration
3
Establish a virtuous cycle
• 6 working groups, each with
• 3-6 faculty from each institution
Pilot Program Organizers
•
•
•
•
Andrew Whitaker, Research Scientist
Dan Halperin, Director of Research, Scalable Data Analytics
Jake Vanderplas, Director of Research, Physical Sciences
Bill Howe, Associate Director
5
The Data Science Studio
• An open collaborative research space
• A resident data science team
– Permanent staff of ~5 data scientists – applied research and development
– ~15-20 data science fellows (research scientists, visitors, postdocs, students)
• How to Engage:
– Drop-in open workspace
– Studio “Office Hours”
– Incubation Program
…plus seminars, sponsored lunches, workshops, bootcamps, joint proposals...
6
A partnership among …
• Provost
• UW Libraries
• Physics, Astronomy,
Arts & Sciences
• eScience Institute
6th floor Physics Astronomy Building
7
Estimated Timeline:
• Design Phase Jan-June
• Construction June – Sep
• Target: October 1, 2014
8
Incubator Program Overview
• Goal: Create watercooler opportunities and scale our efforts by co-locating
collaborations from different fields in the studio
• Protocol: ~1-page proposals for 1-quarter, on-site data science collaborations
with us
• What we're looking for: Projects where fruitful collaboration is possible, with
potential for significant impact, and that have sustained engagement
• This meeting: Pilot program for Spring Quarter to inform full launch Fall 2014.
http://data.uw.edu/incubator
9
Spring Incubator Pilot Program Logistics
• Applications due online 3/10
• Each proposal identifies a Project Lead (PL)
– The person doing the work, not the thesis advisor
• Incubator participants join the studio 2 days/week
– Days decided collectively by participants and team
• Pilot program operates out of Sieg 326
• Milestones at 3, 6, 9 weeks
– blog posts + demo, visualization, IPython notebook, dataset, GitHub repo,
preliminary results, etc.
• Networking/poster session during 9th week
10
Areas of interest
•
•
•
•
•
•
scalable data management and analytics
learning and predictive models
interactive visualization
parallel algorithms
code review, publishing, and reproducibility
online teaching materials, tutorials
11
A Live SeaFlow Dashboard
Francois
Ribalet
Jarred
Swalwell
Nozzle
d1
Ginger
Armbrust
FSC
(Forward scatter)
Microscope Objective
Laser
Lens
Pine Hole
d2
Red fluo
Orange fluo
12
SeaFlow Ambitions
• SeaFlow is a huge success! NSF wants one on
every R/V
13
SeaFlow Ambitions
• Underway biology should enable adaptive
sampling - a sort of “holy grail”
“Wait! We saw a population
change between P3 and P4!”
“Let’s go back!”
• How can remote collaborators participate?
• What about citizen science?
14
A Live SeaFlow Dashboard
Where is the ship?
What is it doing?
Is the instrument
working?
What phytoplankton
populations are we seeing?
15
The AscotDB Project
• A multi-year collaboration between
UW Astronomy and UW Computer
Science researchers and students
• ASCOT = the AStronomy
COllaborative Toolkit
• Goal: Provide an interactive and
collaborative environment for
analysis of astronomical data.
AscotDB
Query
Input from user
Query
SCIDB
SCALRR
ASTRO IMAGES
REPOSITORY
PYTHON
INTERFACE
ResultArray
ASTROJSFITS
VIEWER
NumpyArray
TIME-SERIES
PLOT
SCIDB
COADDSIGMACLIP
FITS file
16
The AscotDB Project
• Interacting browser-based
widgets for generating
database queries &
associated visualization.
• The resulting visualizations
can be shared with
collaborators through a
browser URL
17
Pilot cohort desiderata
•
•
•
•
•
good clustering
alignment with sponsor and program goals
new directions, new questions
availability, engagement, commitment
“do only what we can only do together”
– with apologies to Djikstra
• clarity and shovel-readiness
• capacity for measurable outcomes
18
Spring Schedule
•
•
•
•
•
•
•
•
3/10: Proposals due
3/14: Follow-up requests
3/21: Pilot participants notified
3/31: Spring program start date
4/21: First milestone
5/12: Second milestone
6/2: Third milestone
6/6: Poster/networking event
19
Download