Creating Dynamic Social Network Models from Sensor Data Tanzeem Choudhury Dieter Fox

advertisement
Creating Dynamic Social Network
Models from Sensor Data
Tanzeem Choudhury
Intel Research / Affiliate Faculty CSE
Dieter Fox
Henry Kautz
CSE
James Kitts
Sociology
What
are we doing?
Why
are we doing it?
How
are we doing it?
Social Network Analysis

Work across the social & physical sciences
is increasingly studying the structure of
human interaction
o 1967 – Stanley Milgram – 6 degrees of separation
o 1973 – Mark Granovetter – strength of weak ties
o 1977 –International Network for Social Network
Analysis
o 1992 – Ronald Burt – structural holes: the social
structure of competition
o 1998 – Watts & Strogatz – small world graphs
Social Networks

Social networks are naturally
represented and analyzed as graphs
Example Network Properties

Degree of a node

Eigenvector centrality
o global importance of a node

Average clustering coefficient
o degree to which graph decomposes into cliques

Structural holes
o opportunities for gain by bridging disconnected
subgraphs
Applications

Many practical applications
o Business – discovering organizational bottlenecks
o Health – modeling spread of communicable
diseases
o Architecture & urban planning – designing spaces
that support human interaction
o Education – understanding impact of peer group
on educational advancement

Much recent theory on finding random
graph models that fit empirical data
The Data Problem

Traditionally data comes from manual
surveys of people’s recollections
o Very hard to gather
o Questionable accuracy
o Few published data sets
o Almost no longitudinal (dynamic) data

1990’s – social network studies based
on electronic communication
Social Network Analysis of Email

Science, 6 Jan 2006
Limits of E-Data

Email data is cheap and
accurate, but misses
o Face-to-face speech – the vast
High Complexity Information
Within a Floor
Within a Building
majority of human interaction,
especially complex communication
Within a Site
Between Sites
o The physical context of
0
communication – useless for
studying the relationship between
environment and interaction
• Can we gather data on face to face
communication automatically?
20
40
Proportion of Contacts
Face-to-Face
Telephone
60
80
Research Goal
Demonstrate that we can…

Model social network dynamics by gathering large
amounts of rich face-to-face interaction data
automatically
o using wearable sensors
o combined with statistical machine learning techniques

Find simple and robust measures derived from sensor
data
o that are indicative of people’s roles and relationships
o that capture the connections between physical environment and
network dynamics
Questions we want to investigate:

Changes in social networks over time:
o How do interaction patterns dynamically relate to
structural position in the network?
o Why do people sharing relationships tend to be similar?
o Can one predict formation or break-up of communities?

Effect of location on social networks
o What are the spatio-temporal distributions of interactions?
o How do locations serve as hubs and bridges?
o Can we predict the popularity of a particular location?
Support

Human and Social Dynamics – one of
five new priority areas for NSF
o $800K award to UW / Intel / Georgia Tech team
o Intel at no-cost

Intel Research donating hardware and
internships

Leveraging work on sensors &
localization from other NSF & DARPA
projects
Procedure

Test group
o 32 first-year incoming CSE graduate students
o Units worn 5 working days each month
o Collect data over one year

Units record
o Wi-Fi signal strength, to determine location
o Audio features adequate to determine when conversation is
occurring

Subjects answer short monthly survey
o Selective ground truth on # of interactions
o Research interests

All data stored securely
o Indexed by code number assigned to each subject
Privacy

UW Human Subjects Division
approved procedures after 6 months of
review and revisions

Major concern was privacy, addressed
by
o Procedure for recording audio features without
recording conversational content
o Procedures for handling data afterwards
Data Collection
Intel Multi-Modal Sensor Board
Coded
Database
Real-time
audio feature
extraction
audio
features
WiFi
strength
code
identifier
Data Collection

Multi-sensor board sends sensor data stream
to iPAQ

iPAQ computes audio features and WiFi
node identifiers and signal strength

iPAQ writes audio and WiFi features to SD
card

Each day, subject uploads data using his or
her code number to the coded data base
Older Procedure

Because the real-time feature extraction
software was not finished in time, the
Autumn 2005 data collections used a
different process (also approved)
o Raw data was encrypted on the SD card
o The upload program simultaneously unencrypted
and extracted features
o Only the features were uploaded
Speech Detection

From the audio signal, we want to
extract features that can be used to
determine
o Speech segments
o Number of different participants (but not identity
of participants)
o Turn-taking style
o Rate of conversation (fast versus slow speech)

But the features must not allow the
audio to be reconstructed!
Speech Production
vocal tract
filter
The source-filter Model
Fundamental frequency (F0/pitch) and formant frequencies (F1, F2 …) are the
most important components for speech synthesis
Speech Production

Voiced sounds: Fundamental frequency (i.e.
harmonic structure) and energy in lower
frequency component

Un-voiced sounds: No fundamental frequency
and energy focused in higher frequencies

Our approach: Detect speech by reliably
detecting voiced regions

We do not extract or store any formant
information. At least three formants are required
to produce intelligible speech*
* 1. Donovan, R. (1996). Trainable Speech Synthesis. PhD Thesis. Cambridge University
2. O’Saughnessy, D. (1987). Speech Communication – Human and Machine,
Addison-Wesley.
Goal: Reliably Detect Voiced Chunks in
Audio Stream
Speech Features Computed
1. Spectral entropy
2. Relative spectral entropy
3. Total energy
4. Energy below 2kHz (low frequencies)
5. Autocorrelation peak values and number of
peaks
6. High order MEL frequency cepstral coefficients
Features used: Autocorrelation
(a)
(b)
Autocorrelation of (a) un-voiced frame and (b) voiced frame.
Voiced chunks have higher non-initial autocorrelation peak and fewer
number of peaks
Features used: Spectral Entropy
Spectral entropy: 4.21
Spectral entropy: 3.74
FFT magnitude of (a) un-voiced frame and (b) voiced frame.
Voiced chunks have lower entropy than un-voiced chunks, because
voiced chunks have more structure
Features used: Energy
Energy in voiced chunks is concentrated in the lower frequencies
Higher order MEL cepstral coefficients contain pitch (F0) information.
The lower order coefficients are NOT stored
Segmenting Speech Regions
Attributes Useful for Inferring Interaction

Attributes that can be reliably extracted from
sensors:
o Total number of interactions between people
o Conversation styles – e.g. turn-taking, energy-level
o Location where interactions take place – e.g. office,
lobby etc.
o Daily schedule of individuals – e.g. early birds, late
nighters
Locations

Wi-Fi signal strength can be used to
determine the approximate location of
each speech event
o 5 meter accuracy
o Location computation done off-line

Raw locations are converted to nodes
in a coarse topological map before
further analysis
Topological Location Map

Nodes in map are identified
by area types
o Hallway
o Breakout area
o Meeting room
o Faculty office
o Student office

Detected conversations
are associated with their
area type
Social Network Model

Nodes
o Subjects (wearing sensors, have given consent)
o Public places (e.g., particular break out area)
o Regions of private locations (e.g., hallway of
faculty offices)
o Instances of conversations

Edges
o Between subjects and conversations
o Between places or regions and conversations
Non-instrumented Subjects

We may recruit additional subjects who do
not wear sensors

Such subjects would allow us to infer
information about their behavior indirectly,
and to appear (coded) as a node in our
network model
o E.g., based on their particular office locations

Only people who have provided written
consent appear as entities in our network
models
Disabling Sensor Units

As a courtesy, subjects will disable their
units in particular classrooms or offices
Access to the Data

Publications about this project will
include summary statistics about the
social network, e.g.:
o Clustering coefficient
o Motifs (temporal patterns)

We will not release the actual graph
o This is prohibited by our HSD approval

We welcome additional collaborators
Download