Document 14025270

advertisement

Henry Kautz

Department of Computer Science

Director, Institute for Data Science

Fungibility of Data and

Meta-Data

Henry Kautz

Department of Computer Science

Director, Institute for Data Science

Data Mining Twitter

Henry Kautz

Department of Computer Science

Director, Institute for Data Science

Fungible

[Fuhn-juh-buh] ( adjective ) Being of such nature or kind as to be freely exchangeable or replaceable, in whole or in part, for another of like nature or kind.

Related forms fungibility, noun

Data

[Dey-tuh] ( noun ) Individual facts, statistics, or items of information.

Related forms metadata

Metadata

[me-ta-day-tuh] ( noun ) A set of data that describes and gives information about other data.

Related forms data

Data and Metadata

Phone call audio

Email body

Youtube videos

Phone call numbers, time, date

Email headers: to, from, date, received-from, …

User ID, number of views, comments, subscribers

Twitter text (140 characters)

GPS, sender ID, sender profile, followers / following

Organic Sensor

Networks

52% of adults use online social networks

Real time, location aware smartphone access

Detailed measurements at a population scale

No active user participation

Fine granularity

Inference & Prediction

What will happen in the future?

What factors (places/events/actions) influence behavior?

Predicting Social Ties

GPS location + Tweet contents

Friend (mutual follows) relationship

Example of fungibility of data and metadata

Your privacy settings cannot hide your friendships!

The Data

New York City & Los Angeles

26M tweets over a month

1.2M unique users

7.6M geo-tagged tweets

?

?

Predicting Social Ties

0.8

0.7

?

0.9

Learn&

Decision&

Tree&

Belief&

Propaga3on&

Predicted&

Friendships&

Loca*on % Text%

Network(

Structure(

%%%%%%%%%%%%Twi.er%Feed%

Predicting Social Ties

(2010)

Predicting Location

Friends’ GPS locations

Your location

Your friends are noisy sensors of your location (and other things about you)

Your privacy settings cannot hide your location!

Predicting Location

Most Frequent

(2011)

Mining Deeper…

Once we have access to even sporadic data about your location, messages, and social network, we can find many other signals about peoples’ state and behavior

Individual (noisy, but useful)

Aggregate (can be highly accurate)

Twitter Health

Aggregate accuracy comparable with:

Google Flu Trends

(R = 0.73)

CDC statistics

+

we can model finegrained interactions between specific individuals

Fungibility of Science,

Commerce, & Government

Nathan Eagle (MIT, Harvard,

Northeastern)

“My research involves engineering computational tools, designed to explore how the petabytes of data generated about human movements, financial transactions, and communication patterns can be used for social good.”

Download