Sparse Real Estate Ranking with Online User Reviews and Offline Moving Behaviors Yanjie Fu Hui Xiong, Yu Zheng, Yong Ge, Zijun Yao, Yanchi Liu, Jing Yuan Rutgers, the State University of New Jersey ICDM2014@Shenzhen, China Agenda 2 Background and Motivation Problem Statement Methodology Evaluation Conclusions Why Housing Matters 3 A house is a signature of your capability to settle down your family. Without a house, you might… Darling, I am so sorry. You have no house. Young man, go home, and buy a new house. With a house, you might… Yes. I do! Make extra money as an investment option. Real Estate Investment Value 4 Market value The price an estate would trade in the marketplace Investment value The growth potential of resale value Motivation to enter estate market We don’t predict future price! We predict investment potential! Quantifying Investment Value 5 Estate investment return rate of a given market period Prepare the ground-truth investment values of estates for training data Identify rising market period and falling market period of Beijing Calculate the investment returns of each real estate during rising market period and falling market period Estate grading (5>4>3>2>1) by finding inflection points Existing Housing Analysis Methods 6 Housing indexes Financial time series analysis FHFA/OFHEO, S&P/Case-Shiller Indices, FNC Residential Price Index Suit for regional housing analysis rather than a specific house Trend, periodicity and volatility of housing price time series Noisy: speculative demands/government policies affect prices Correlating estate value to the static statistics of urban infrastructure The numbers/distance of bus stop, subway stations, road network entries, and POIs Physical facilities have both positive and negative effect Train stations bring noise and pollution and degrade estate value Lack of dynamics and hard to reflect the changing pulses of a city What Better Reflects Estate Investment Value ? 7 Consumer psychology People’s opinions and estate investment value If people have better opinions for an estate, the demand for this estate is higher and its investment value will be higher Uncovering people’s opinions for an estate from user-generated estate-related dynamic data Online User Reviews 8 Yelp Foursquare Yahoo Local Listing Google Local for Business show the explicit opinions of mobile users for places surrounding an estate Offline Moving Behaviors (1) Cell -Tower Traces Taxicab GPS Traces Check-in Traces Bus Traces These data better sense the dynamic pulse of city, comparing to static statistics of urban infrastructure Offline Moving Behaviors (2) 10 Taxi drop-off points Taxi transits Fast and expensive Central business district and financial areas Bus transits Slow and cheap Information technology and education areas Checkins Walking portion of mobility Areas full of attractions, entertainments, and POIs Encode the static statistics of urban infrastructure Reflect the implicit “opinions” of mobile users for a neighborhood Checkins Bus drop-off points Problem Definition 11 Given Estates with locations and historical prices Online User Reviews (rating and comments for business venues/point of interests) Offline Moving Behaviors (taxi traces, bus traces, mobile check-ins) Objective Rank estates based on their investment values Core tasks Extract discriminative features that reflect residents’ opinions for estates Learn an estate ranking predictor by combining a pairwise ranking objective and a sparsity regularization Methodology Overview 12 Estate Grades Business Review Feature Extraction Prices Estates Human Mobility Feature Extraction Mobility Features Profile Vectors of Real Estates R2 R3 R4 Synthetically Ranking Estate Grading Neighborhood Profiling (Direction, Volume, Popularity, Velocity Heterogeneity, Topic, Density, Contrast) Business Review Features R1 Rising Market Model Estate Ranking Predictor Estate Ranking Mixture Objective Falling Market Model Pairwise Consistentcy Sparsity Regularization Rising Market Data Falling Market Data Sparse Estate Ranking Estate Feature Extraction Bus/Taxi/Checkin Business Reviews Moving Behaviors Features from Online User Reviews 13 Overall Satisfaction Service Quality Environment Class Consumption Cost Functionality Planning Features from Offline Moving Behaviors 14 Taxi Arriving Volume Taxi Leaving Volume Taxi Transition Volume Taxi Driving Velocity Taxi Commute Distance Bus Arriving Volume Bus Leaving Volume Bus Transition Volume Bus Stop Density Popularity of Checkin Topic Profile of Checkin Propagating word-of-mouth from poi to neighborhood Textual profiling from words to topics Predicting Estate Investment Value 15 Estate Investment Value Features of User Review Features of Taxi Trajectories Features of Smart Card Transactions Features of Checkins Modeling Ranking Objective 16 Prediction Accuracy Maximizing the likelihood ≈ minimizing square loss Ranking Consistency a>b>c>d A ranked list of estates is viewed as a directed graph Nodes as real estates Directed edge A B meaning A ranks higher than B Our model generate edges with certain probability Maximizing the likelihood ≈ minimizing the ranking loss of graph-based ranking structure Incorporating Sparsity Regularization into Estate Ranking 17 Extracting large amount of estate-related features Features are correlated and redundant Two steps in classic method A small number of good features can determine the ranking of estates based on investment values Feature Selection Fit the selected features with ranking model The selected feature subset may not be optimal for ranking because the two steps are modelled separately Combining sparsity and ranking in a unified model Enforce sparse representations during learning by setting some feature weights to zero and avoiding overfitting Solving The Ranking Objective 18 Log of posterior Maximize A Posterior Experimental Data 19 Beijing real-world Data Beijing estate data 2851 estates with transaction records from 04/2011 to 09/2012 Falling market(04/2011 to 02/2012) and Rising market (02/2012 to 09/2012) Beijing Taxi Traces Beijing Smart Card Transactions Beijing Check-Ins Beijing Business Review Evaluation Methods and Metrics 20 Baseline algorithms MART: it is a boosted tree model, specifically, a linear combination of the out puts of a set of regression trees RankBoost: it is a boosted pairwise ranking method, which trains multiple weak rankers and combines their outputs as final ranking LambdaMart: it is the boosted tree version of LambdaRank, which is based on RankNet Coordinate Ascent: it uses domination loss and applies coordinate descent for optimization FenchRank: designed for solving the sparse ranking problem with a L1 constraint Evaluation metrics Normalized Discounted Cumulative Gain (NDCG) Precision Recall Kendall’s Tau Coefficient Correlation Analysis 21 If the heterogeneity of functional planning is too high or too low, the house will be low-valued If the commute distance is short, the house is close to important places and business venuses Feature Evaluation on Different Sources 22 • • Business reviews and checkins performs better than taxi and bus traces • Checkins and reviews represent attending phrase • Taxi and bus traces moving phrase Taxi features perform better than bus features in falling market • Taxi mobility represents white-collar and business people • Bus mobility represents mediate classes Feature Evaluation on Different Radiuses 23 • • We recommend to set the radius of neighborhood to 0.75±0.25km, rather than too short(<0.25km) or too long(>2km) 0.75km is not only a comfortable walking distance for bus and taxi stops, but also sufficient to capture the outdoor activities of estate neighborhoods Model Evaluation 24 Our method and FenchelRank achieve the first and second best ranking accuracy in top-k ranking Our method keep a balance between top-k and overall ranking Conclusions 25 High-value house discovery Investment-value based real estate ranking A novel geo-buesiness problem Features from online user reviews to capture explicit opinions for POIs near an estate Features from offline moving behaviors to capture implicit geographical preferences of mobile users Real estate ranking by combining prediction accuracy, ranking consistency and sparsity regularization Benefits Online user reviews and offline moving behavior better sense the upto-date geo-preference of people toward real estates in a cheaper way All aspects of feature engineering of the interest of industry Joint modeling of prediction accuracy, ranking consistency, and sparsity regularization in a unified framework