Labeling Personal Characteristics from Mobile Phone Traces

advertisement
Poster Abstract: Labeling Personal Characteristics from
Mobile Phone Traces
Yang Yue1,2,, Jia Chen1,2,, Bo Hu1,2,, Rong Xie3, Xiao-Qing Zuo4, Xing Xie5
State Key Lab of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University
2 Engineering Research Center for Smart Acquisition and Applications of Spatiotemporal Data, Ministry of
Education, Wuhan University
3 International School of Software, Wuhan University
4Faculty of Land Resource Engineering, Kunming University of Science and Technology
5Microsoft Research Asia, Beijing
yueyang@whu.edu.cn
1
ABSTRACT
2. PROPOSED METHODOLOGY
We present our ongoing work on labeling personal characteristics
using mobile phone trace data, POIs (Point of Interests) and real
estate price data. POIs and real estate data are used to extract
sematic features of the regions where mobile phone users actively
involved. Referring to the regional features, a group of personal
labels can be attached to the user.
The basic assumption of this study is that, where people live,
work and hangs out are those regions that can meet their
preferences. Therefore, the characteristics of those regions, to a
great extent, also reflect the characteristics of the person. We first
identify regions closely attached to a user, i.e., ROIs (Regions of
interest), by spatially aggregating the mobile phone traces. Then,
we extract the features of these ROIs by using data crawled from
POI reviewing websites and real estate websites, together with
map data. Last, the regional features are associated to the user as
his/her personal labels. Figure 1 shows the label matching method.
Here, an assumption is hold that that people work at day time, and
stay home at night.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Retrieval
models. J.4 [Social and Behavioral Sciences]: Sociology.
General Terms
Algorithms, Experimentation.
Keywords
% of timestamps of
trace points in a
clustered region
Mobile Phone Data, Semantic Label, Trajectory Data Analysis.
1. INTRODUCTION
Where a person lives, works, and hangs out, to a great extent,
reflects the person’s characteristics, social status, and hobby, etc.
The ability to automatically label the personal characteristics is
important to many customized applications, advanced business
analyses, and privacy protection.
Mobile phone trace data has the potential to provide insight into
the personal characteristics of phone users. However, different
from those continuous and fine-grained GPS data, mobile phone
traces are generated only when a voice call, a text message, or any
other form of communication act, such as access to the Web
(hereafter, named as event) occurs. Therefore, mobile phone trace
data is coarse in space (at the granularity of cellular tower
coverage radius, and sparse in time (only when an event happens),
less precise, and in certain extent uncompleted [1], [2]. In most
circumstances, the accuracy ranges from 100-500m (urban area)
to several kilometers [3]. This makes mobile phone-based
trajectory data analysis more challenging.
Some studies have been carried out based on the mobile phone
trace data, such as analyzing user mobility [4], predicting user
movement [5], and extracting home and work locations [1]. This
study goes one step further and attempts to refer user
characteristics by knowing where he/she lives, works, and hangs
out.
Copyright is held by the author/owner(s).
IPSN’12, April 16-20, 2012, Beijing, China.
ACM 978-1-4503-1227-1/12/04.
N
Y
Night
Y
Workday
N
On work
Home
N
Y
Leisure
Housing
price
Workplace
Top k POI features
in the region
Income labels
Top k POI features
in the region
Leisure &
expenditure labels
Job lables
Figure 1. Labeling personal characteristics
We use off-the-shelf clustering approach ST-DBSCAN [6] to
identify ROIs attached to a person considering time of the day,
day of the week, and public holidays. 500m is used as the
threshold radius. Since ROI represents an area that a person
spends with a significant amount of time and/or visited frequently,
a cluster cannot be generated in this study unless the person had at
least a call, a text message, or a data communication within the
region.
The next step is, generating semantic features for each ROI. In
this process, word frequency analysis is performed on POIs
category data, to extract the features of these regions, such as bar,
shopping, and park. Real estate price data is also used to label the
positioning of the region, by assuming regions with high real
estate price are associated with high-expenditure, and vice verses.
3. CASE STUDY
In this study, a group of individuals’ mobile phone trace data
ranging from 1st ~ 26th Aug. 2010 are analyzed. Only anonymous
records associated with a cell tower ID are used. It is a type of
sporadic samples of the approximate locations of the phone users.
Due to page limitation, only one user is illustrated here, who did
not have very typical work, off-work and leisure patterns. This is
a test on our proposed algorithm. We use this as an example for
two reasons: 1) to illustrate the real pattern detection ability of our
algorithm; and 2) to raise more discussions about human mobility
and behavior patterns.
Figure 2 presents the spatial distribution of the user’s mobile
phone traces (blue points), and three ROIs (R1-R3, in orange)
attached to him/her, and the associated time distribution (workday
only).
(a) Time distribution of traces
(b) Top 10 POI categories
Figure 3. Features associated with R3
4. CONLUDING REMARKS
Personal characteristics are important to many applications and
privacy protection. In this study, we first identify important
regions from his/her traces considering spatiotemporal patterns,
and then analyze extract sematic information of the regions from
POI and real estate data, which is further used to generate a group
of labels that reflect the personal characteristics. Although these
are coarse-level labels, they are valuable for most applications.
Further studies will consider how to validate the labeling results
with ground truth data. Concerning privacy issue, mobile phone
data used in this study may have limited usage. However, the
propose approach can be used in a similar manner on the
increasingly available “check-in” data.
5. ACKNOWLEDGMENTS
This project is partially supported by China NSFC 41171348 and
Microsoft Research Asia. Our thanks to Prof. Xiao-Qing Zou for
providing the experiment data (NSFC 41061043).
6. REFERENCES
[1]
S. Isaacman, R. Becker, R. o. C´aceres, K. Stephen, M.
Martonosi, J. Rowland, and A. Varshavsky, "Identifying
Important Places in People's Live from Cellular Network
Data," Lecture Notes in Computer Science, Vol. 6696, pp.
133-151, 2011.
[2]
C. Licoppe, D. Diminescu, Z. Smoreda, And C. Ziemlicki,
"Using Mobile Phone Geolocalisation for 'Socio-geographical'
Analysis of Co-ordination, Urban Mobilities, and Social
Integration Patterns," Tijdschrift voor economische en sociale
geografie, Vol. 99, pp. 584-601, 2008.
[3]
N. Eagle, A. Pentland, and D. Lazer, "Inferring friendship
network structure by using mobile phone data," Proceedings
of the National Academy of Sciences of the United States of
America, Vol. 106, pp. 15274-15278, Sep 8 2009.
[4]
M. A. Bayir, M. Demirbas, and A. Cosar, "A Web-Based
Personalized Mobility Service for Smartphone Applications,"
Computer Journal, Vol. 54, pp. 800-814, May 2011.
[5]
M. A. Bayir, M. Demirbas, and N. Eagle, "Discovering
SpatioTemporal Mobility Profiles of Cellphone Users," 2009
IEEE International Symposium on a World of Wireless,
Mobile and Multimedia Networks & Workshops, pp. 119-127,
2009.
[6]
D. Birant and A. Kut, "ST-DBSCAN: An algorithm for
clustering spatial-temporal data," Data & Knowledge
Engineering, Vol. 60, pp. 208-221, 2007
Figure 2. A Person’s Mobile phone Trace and ROIs
Although this person’s trace spread over a wide area, he/she only
has three important regions attached. It can be observed that R1
and R3 are the regions the personal spent most of the time, while
R1 is associated with working-time and R3 is more related to
off-work time. Thus, it is very possible that the person works at
R1 and lives at R3. In R3, around 60% of the housing price is
between 8,000-10,000RMB/m2 which is above the average price
of the study area. Then, a lable possibly associated with the
person is “Middle-high income”.
As to R2, most of the trace points were generated in day time,
both at working day and weekend (Figure 3a). We further
examined the top 10 POI categroies in this regaion, as shown in
Figure 3b. It can be referred that this area is higly related to
building and decoration materials. Since the time pattern of R2 is
very similar to R1, our algirithm lables it as “Workplace”. It is not
very often that a person is assicated with more than one
workplaces, but some people, such as a boss with two shops, do
have such features. Although there may have other possiblities,
for the time being, this is the preliminary result generated by our
algorithm using existing dataset. Further work may necessary to
refine or validate the result. In general, the labels generated for
this user are: middle-high income, building material.
Download