Poster Abstract: Labeling Personal Characteristics from Mobile Phone Traces Yang Yue1,2,, Jia Chen1,2,, Bo Hu1,2,, Rong Xie3, Xiao-Qing Zuo4, Xing Xie5 State Key Lab of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University 2 Engineering Research Center for Smart Acquisition and Applications of Spatiotemporal Data, Ministry of Education, Wuhan University 3 International School of Software, Wuhan University 4Faculty of Land Resource Engineering, Kunming University of Science and Technology 5Microsoft Research Asia, Beijing yueyang@whu.edu.cn 1 ABSTRACT 2. PROPOSED METHODOLOGY We present our ongoing work on labeling personal characteristics using mobile phone trace data, POIs (Point of Interests) and real estate price data. POIs and real estate data are used to extract sematic features of the regions where mobile phone users actively involved. Referring to the regional features, a group of personal labels can be attached to the user. The basic assumption of this study is that, where people live, work and hangs out are those regions that can meet their preferences. Therefore, the characteristics of those regions, to a great extent, also reflect the characteristics of the person. We first identify regions closely attached to a user, i.e., ROIs (Regions of interest), by spatially aggregating the mobile phone traces. Then, we extract the features of these ROIs by using data crawled from POI reviewing websites and real estate websites, together with map data. Last, the regional features are associated to the user as his/her personal labels. Figure 1 shows the label matching method. Here, an assumption is hold that that people work at day time, and stay home at night. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models. J.4 [Social and Behavioral Sciences]: Sociology. General Terms Algorithms, Experimentation. Keywords % of timestamps of trace points in a clustered region Mobile Phone Data, Semantic Label, Trajectory Data Analysis. 1. INTRODUCTION Where a person lives, works, and hangs out, to a great extent, reflects the person’s characteristics, social status, and hobby, etc. The ability to automatically label the personal characteristics is important to many customized applications, advanced business analyses, and privacy protection. Mobile phone trace data has the potential to provide insight into the personal characteristics of phone users. However, different from those continuous and fine-grained GPS data, mobile phone traces are generated only when a voice call, a text message, or any other form of communication act, such as access to the Web (hereafter, named as event) occurs. Therefore, mobile phone trace data is coarse in space (at the granularity of cellular tower coverage radius, and sparse in time (only when an event happens), less precise, and in certain extent uncompleted [1], [2]. In most circumstances, the accuracy ranges from 100-500m (urban area) to several kilometers [3]. This makes mobile phone-based trajectory data analysis more challenging. Some studies have been carried out based on the mobile phone trace data, such as analyzing user mobility [4], predicting user movement [5], and extracting home and work locations [1]. This study goes one step further and attempts to refer user characteristics by knowing where he/she lives, works, and hangs out. Copyright is held by the author/owner(s). IPSN’12, April 16-20, 2012, Beijing, China. ACM 978-1-4503-1227-1/12/04. N Y Night Y Workday N On work Home N Y Leisure Housing price Workplace Top k POI features in the region Income labels Top k POI features in the region Leisure & expenditure labels Job lables Figure 1. Labeling personal characteristics We use off-the-shelf clustering approach ST-DBSCAN [6] to identify ROIs attached to a person considering time of the day, day of the week, and public holidays. 500m is used as the threshold radius. Since ROI represents an area that a person spends with a significant amount of time and/or visited frequently, a cluster cannot be generated in this study unless the person had at least a call, a text message, or a data communication within the region. The next step is, generating semantic features for each ROI. In this process, word frequency analysis is performed on POIs category data, to extract the features of these regions, such as bar, shopping, and park. Real estate price data is also used to label the positioning of the region, by assuming regions with high real estate price are associated with high-expenditure, and vice verses. 3. CASE STUDY In this study, a group of individuals’ mobile phone trace data ranging from 1st ~ 26th Aug. 2010 are analyzed. Only anonymous records associated with a cell tower ID are used. It is a type of sporadic samples of the approximate locations of the phone users. Due to page limitation, only one user is illustrated here, who did not have very typical work, off-work and leisure patterns. This is a test on our proposed algorithm. We use this as an example for two reasons: 1) to illustrate the real pattern detection ability of our algorithm; and 2) to raise more discussions about human mobility and behavior patterns. Figure 2 presents the spatial distribution of the user’s mobile phone traces (blue points), and three ROIs (R1-R3, in orange) attached to him/her, and the associated time distribution (workday only). (a) Time distribution of traces (b) Top 10 POI categories Figure 3. Features associated with R3 4. CONLUDING REMARKS Personal characteristics are important to many applications and privacy protection. In this study, we first identify important regions from his/her traces considering spatiotemporal patterns, and then analyze extract sematic information of the regions from POI and real estate data, which is further used to generate a group of labels that reflect the personal characteristics. Although these are coarse-level labels, they are valuable for most applications. Further studies will consider how to validate the labeling results with ground truth data. Concerning privacy issue, mobile phone data used in this study may have limited usage. However, the propose approach can be used in a similar manner on the increasingly available “check-in” data. 5. ACKNOWLEDGMENTS This project is partially supported by China NSFC 41171348 and Microsoft Research Asia. Our thanks to Prof. Xiao-Qing Zou for providing the experiment data (NSFC 41061043). 6. REFERENCES [1] S. Isaacman, R. Becker, R. o. C´aceres, K. Stephen, M. Martonosi, J. Rowland, and A. Varshavsky, "Identifying Important Places in People's Live from Cellular Network Data," Lecture Notes in Computer Science, Vol. 6696, pp. 133-151, 2011. [2] C. Licoppe, D. Diminescu, Z. Smoreda, And C. Ziemlicki, "Using Mobile Phone Geolocalisation for 'Socio-geographical' Analysis of Co-ordination, Urban Mobilities, and Social Integration Patterns," Tijdschrift voor economische en sociale geografie, Vol. 99, pp. 584-601, 2008. [3] N. Eagle, A. Pentland, and D. Lazer, "Inferring friendship network structure by using mobile phone data," Proceedings of the National Academy of Sciences of the United States of America, Vol. 106, pp. 15274-15278, Sep 8 2009. [4] M. A. Bayir, M. Demirbas, and A. Cosar, "A Web-Based Personalized Mobility Service for Smartphone Applications," Computer Journal, Vol. 54, pp. 800-814, May 2011. [5] M. A. Bayir, M. Demirbas, and N. Eagle, "Discovering SpatioTemporal Mobility Profiles of Cellphone Users," 2009 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks & Workshops, pp. 119-127, 2009. [6] D. Birant and A. Kut, "ST-DBSCAN: An algorithm for clustering spatial-temporal data," Data & Knowledge Engineering, Vol. 60, pp. 208-221, 2007 Figure 2. A Person’s Mobile phone Trace and ROIs Although this person’s trace spread over a wide area, he/she only has three important regions attached. It can be observed that R1 and R3 are the regions the personal spent most of the time, while R1 is associated with working-time and R3 is more related to off-work time. Thus, it is very possible that the person works at R1 and lives at R3. In R3, around 60% of the housing price is between 8,000-10,000RMB/m2 which is above the average price of the study area. Then, a lable possibly associated with the person is “Middle-high income”. As to R2, most of the trace points were generated in day time, both at working day and weekend (Figure 3a). We further examined the top 10 POI categroies in this regaion, as shown in Figure 3b. It can be referred that this area is higly related to building and decoration materials. Since the time pattern of R2 is very similar to R1, our algirithm lables it as “Workplace”. It is not very often that a person is assicated with more than one workplaces, but some people, such as a boss with two shops, do have such features. Although there may have other possiblities, for the time being, this is the preliminary result generated by our algorithm using existing dataset. Further work may necessary to refine or validate the result. In general, the labels generated for this user are: middle-high income, building material.