census-mit_2nd_workshop_google_now

advertisement
Prepared for
Census-MIT Big Data Workshop Series
MIT
November 2015
“Google Now” for Official Statistics
-- A Data Source Hypothetical
Micah Altman
Director of Research
MIT Libraries
Overview of
“Google Now”
“Google Now” for Official Statistics -- Hypothetical
What is Google Now?




A personal digital assistant
-- compare to Siri, Cortana
Popular Science 2012
“Innovation of the Year”
Integrated in to the Android
mobile operating systems
Usage:



1.4 Billion Android users
worldwide
>52% of U.S. Mobile Phone Users
(187.5 Million)
Use of component services
unknown
[Credit: Wikimedia Commons]
“Google Now” for Official Statistics -- Hypothetical
What does it do?

Monitor



Respond




Track: time, place, activities
Listen: when screen is on
Commands
Phone queries
In-application queries
Suggest / Alert




Weather
Travel
News
Application-specific
“Google Now” for Official Statistics -- Hypothetical
Application Integration

Google Integration






Search: alerts to news based on
search patterns
Calendar: alerts and travel times to
appointments
Maps: travel times, traffic
Gmail: alerts and tracking based on
scanning e-mail for appointments,
flights, purchases
Others:… Google Fit, Waze
Third Party Apps

App control


Integrated alerts/cards


Walmart; TripAdvisor; Whatsapp..
Lyft, Kayak, RunKeeper ...
Websites and Email


Integrated schema.org markup
Reservation schemas
“Google Now” for Official Statistics -- Hypothetical
“Google Now”
As Data…
“Google Now” for Official Statistics -- Hypothetical
Data Collected by Google Now
Google Now/Android
 Search history



Location history




Voice searches & clips
Voice models
GPS / wifi / cell towers
Map / direction interactions
Travel integration interactions
Usage



Devices
Google services
Apps
“Google Now” for Official Statistics -- Hypothetical
Data Integrated into Now

Communications









Gmail
Instant Message/SMS
Calls history made through
Voice, Hangouts
Voicemail
Personal information


Birthdate
Email
Telephone
Address








Search history, chrome
browser history
Transactions

Maps, Youtube videos, blogs,
notes, tasks, drive, photos
Google+
Youtube
Email graph
Proximity graph?
- location, bluetooth
Web

Created Content

Social
Play store
Wallet
Physical Activity


Phone accelerometer
Linked Fitness devices
“Google Now” for Official Statistics -- Hypothetical
Some things Google “knows” – of potential
interest to Social Scientists & Census

About a person





Habits:
travel, physical activity, web activity, purchasing, communications
People:
with whom they talk, visit, communicate with, share with, share physical
locations
Current activity:
Where are they? What are they doing? With whom?
Interests:
searches, apps used, news consumed, places visited, products purchased
About a location




Who visits
Who occupies & when
Traffic patterns related to location
Image-derived knowledge
-- tags, annotations, lighting, building density (?)
“Google Now” for Official Statistics -- Hypothetical
Inferential Challenges

Sampling



Measurement



Accuracy of sensors introduce uncertainty
Observed measures, not designed for target inference
Variability




Mobile Device Owners < Android Users < Google Service Users
… are not a random sample of the population
Mobile devices are not always carried/on/connected:
observations are nonrandom sample of activities
Sensor characteristics vary across devices, conditions
Sampling and measurement characteristics change over time
New policies, services and algorithms are introduced…
changes may be discontinuous
Treatment


Observational data – selection towards/against treatment possible
Unit level contagion and interference
“Google Now” for Official Statistics -- Hypothetical
Questions?
E-mail:
Web:
escience@mit.edu
informatics.mit.edu
“Google Now” for Official Statistics -- Hypothetical
Download