Prepared for Census-MIT Big Data Workshop Series MIT November 2015 “Google Now” for Official Statistics -- A Data Source Hypothetical Micah Altman Director of Research MIT Libraries Overview of “Google Now” “Google Now” for Official Statistics -- Hypothetical What is Google Now? A personal digital assistant -- compare to Siri, Cortana Popular Science 2012 “Innovation of the Year” Integrated in to the Android mobile operating systems Usage: 1.4 Billion Android users worldwide >52% of U.S. Mobile Phone Users (187.5 Million) Use of component services unknown [Credit: Wikimedia Commons] “Google Now” for Official Statistics -- Hypothetical What does it do? Monitor Respond Track: time, place, activities Listen: when screen is on Commands Phone queries In-application queries Suggest / Alert Weather Travel News Application-specific “Google Now” for Official Statistics -- Hypothetical Application Integration Google Integration Search: alerts to news based on search patterns Calendar: alerts and travel times to appointments Maps: travel times, traffic Gmail: alerts and tracking based on scanning e-mail for appointments, flights, purchases Others:… Google Fit, Waze Third Party Apps App control Integrated alerts/cards Walmart; TripAdvisor; Whatsapp.. Lyft, Kayak, RunKeeper ... Websites and Email Integrated schema.org markup Reservation schemas “Google Now” for Official Statistics -- Hypothetical “Google Now” As Data… “Google Now” for Official Statistics -- Hypothetical Data Collected by Google Now Google Now/Android Search history Location history Voice searches & clips Voice models GPS / wifi / cell towers Map / direction interactions Travel integration interactions Usage Devices Google services Apps “Google Now” for Official Statistics -- Hypothetical Data Integrated into Now Communications Gmail Instant Message/SMS Calls history made through Voice, Hangouts Voicemail Personal information Birthdate Email Telephone Address Search history, chrome browser history Transactions Maps, Youtube videos, blogs, notes, tasks, drive, photos Google+ Youtube Email graph Proximity graph? - location, bluetooth Web Created Content Social Play store Wallet Physical Activity Phone accelerometer Linked Fitness devices “Google Now” for Official Statistics -- Hypothetical Some things Google “knows” – of potential interest to Social Scientists & Census About a person Habits: travel, physical activity, web activity, purchasing, communications People: with whom they talk, visit, communicate with, share with, share physical locations Current activity: Where are they? What are they doing? With whom? Interests: searches, apps used, news consumed, places visited, products purchased About a location Who visits Who occupies & when Traffic patterns related to location Image-derived knowledge -- tags, annotations, lighting, building density (?) “Google Now” for Official Statistics -- Hypothetical Inferential Challenges Sampling Measurement Accuracy of sensors introduce uncertainty Observed measures, not designed for target inference Variability Mobile Device Owners < Android Users < Google Service Users … are not a random sample of the population Mobile devices are not always carried/on/connected: observations are nonrandom sample of activities Sensor characteristics vary across devices, conditions Sampling and measurement characteristics change over time New policies, services and algorithms are introduced… changes may be discontinuous Treatment Observational data – selection towards/against treatment possible Unit level contagion and interference “Google Now” for Official Statistics -- Hypothetical Questions? E-mail: Web: escience@mit.edu informatics.mit.edu “Google Now” for Official Statistics -- Hypothetical