Data Science Incubator ß This morning • • • • Context: A Data Science Environment Data Science Studio Pilot Incubator Program Discussion 2 A 5-year, $37.8 million cross-institutional collaboration 3 Establish a virtuous cycle • 6 working groups, each with • 3-6 faculty from each institution Pilot Program Organizers • • • • Andrew Whitaker, Research Scientist Dan Halperin, Director of Research, Scalable Data Analytics Jake Vanderplas, Director of Research, Physical Sciences Bill Howe, Associate Director 5 The Data Science Studio • An open collaborative research space • A resident data science team – Permanent staff of ~5 data scientists – applied research and development – ~15-20 data science fellows (research scientists, visitors, postdocs, students) • How to Engage: – Drop-in open workspace – Studio “Office Hours” – Incubation Program …plus seminars, sponsored lunches, workshops, bootcamps, joint proposals... 6 A partnership among … • Provost • UW Libraries • Physics, Astronomy, Arts & Sciences • eScience Institute 6th floor Physics Astronomy Building 7 Estimated Timeline: • Design Phase Jan-June • Construction June – Sep • Target: October 1, 2014 8 Incubator Program Overview • Goal: Create watercooler opportunities and scale our efforts by co-locating collaborations from different fields in the studio • Protocol: ~1-page proposals for 1-quarter, on-site data science collaborations with us • What we're looking for: Projects where fruitful collaboration is possible, with potential for significant impact, and that have sustained engagement • This meeting: Pilot program for Spring Quarter to inform full launch Fall 2014. http://data.uw.edu/incubator 9 Spring Incubator Pilot Program Logistics • Applications due online 3/10 • Each proposal identifies a Project Lead (PL) – The person doing the work, not the thesis advisor • Incubator participants join the studio 2 days/week – Days decided collectively by participants and team • Pilot program operates out of Sieg 326 • Milestones at 3, 6, 9 weeks – blog posts + demo, visualization, IPython notebook, dataset, GitHub repo, preliminary results, etc. • Networking/poster session during 9th week 10 Areas of interest • • • • • • scalable data management and analytics learning and predictive models interactive visualization parallel algorithms code review, publishing, and reproducibility online teaching materials, tutorials 11 A Live SeaFlow Dashboard Francois Ribalet Jarred Swalwell Nozzle d1 Ginger Armbrust FSC (Forward scatter) Microscope Objective Laser Lens Pine Hole d2 Red fluo Orange fluo 12 SeaFlow Ambitions • SeaFlow is a huge success! NSF wants one on every R/V 13 SeaFlow Ambitions • Underway biology should enable adaptive sampling - a sort of “holy grail” “Wait! We saw a population change between P3 and P4!” “Let’s go back!” • How can remote collaborators participate? • What about citizen science? 14 A Live SeaFlow Dashboard Where is the ship? What is it doing? Is the instrument working? What phytoplankton populations are we seeing? 15 The AscotDB Project • A multi-year collaboration between UW Astronomy and UW Computer Science researchers and students • ASCOT = the AStronomy COllaborative Toolkit • Goal: Provide an interactive and collaborative environment for analysis of astronomical data. AscotDB Query Input from user Query SCIDB SCALRR ASTRO IMAGES REPOSITORY PYTHON INTERFACE ResultArray ASTROJSFITS VIEWER NumpyArray TIME-SERIES PLOT SCIDB COADDSIGMACLIP FITS file 16 The AscotDB Project • Interacting browser-based widgets for generating database queries & associated visualization. • The resulting visualizations can be shared with collaborators through a browser URL 17 Pilot cohort desiderata • • • • • good clustering alignment with sponsor and program goals new directions, new questions availability, engagement, commitment “do only what we can only do together” – with apologies to Djikstra • clarity and shovel-readiness • capacity for measurable outcomes 18 Spring Schedule • • • • • • • • 3/10: Proposals due 3/14: Follow-up requests 3/21: Pilot participants notified 3/31: Spring program start date 4/21: First milestone 5/12: Second milestone 6/2: Third milestone 6/6: Poster/networking event 19