real_estate_ranking_Zheng

advertisement
Sparse Real Estate Ranking with
Online User Reviews and Offline
Moving Behaviors
Yanjie Fu
Hui Xiong, Yu Zheng, Yong Ge, Zijun Yao, Yanchi Liu, Jing Yuan
Rutgers, the State University of New Jersey
ICDM2014@Shenzhen, China
Agenda
2
Background and Motivation
 Problem Statement
 Methodology
 Evaluation
 Conclusions

Why Housing Matters
3
A house is a signature of
your capability to settle down your family.
Without a house, you might…
Darling, I am so sorry.
You have no house.
Young man, go home,
and buy a new house.
With a house,
you might…
Yes. I do!
Make extra money as an investment option.
Real Estate Investment Value
4

Market value

The price an estate would trade
in the marketplace

Investment value


The growth potential of resale value
Motivation to enter estate market
We don’t predict future price!
We predict investment potential!
Quantifying Investment Value
5

Estate investment return rate of a given market period

Prepare the ground-truth investment values of estates for training
data

Identify rising market period and falling market period of Beijing

Calculate the investment returns of each real estate during rising market period
and falling market period
Estate grading (5>4>3>2>1) by finding inflection points

Existing Housing Analysis Methods
6

Housing indexes



Financial time series analysis



FHFA/OFHEO, S&P/Case-Shiller Indices, FNC Residential Price Index
Suit for regional housing analysis rather than a specific house
Trend, periodicity and volatility of housing price time series
Noisy: speculative demands/government policies affect prices
Correlating estate value to the static statistics of urban
infrastructure


The numbers/distance of bus stop, subway stations, road network
entries, and POIs
Physical facilities have both positive and negative effect


Train stations bring noise and pollution and degrade estate value
Lack of dynamics and hard to reflect the changing pulses of a city
What Better Reflects Estate
Investment Value ?
7
Consumer psychology
 People’s opinions and estate investment value



If people have better opinions for an estate, the demand
for this estate is higher and its investment value will be
higher
Uncovering people’s opinions for an estate
from user-generated estate-related dynamic
data
Online User Reviews
8
Yelp
Foursquare
Yahoo Local Listing
Google Local for Business
show the explicit opinions of mobile users for places
surrounding an estate
Offline Moving Behaviors (1)
Cell -Tower
Traces
Taxicab GPS
Traces
Check-in
Traces
Bus Traces
These data better sense the dynamic pulse of city, comparing to
static statistics of urban infrastructure
Offline Moving Behaviors (2)
10
Taxi drop-off points

Taxi transits


Fast and expensive
Central business
district and
financial areas

Bus transits


Slow and cheap
Information
technology and
education areas

Checkins


Walking portion of
mobility
Areas full of attractions,
entertainments, and
POIs
Encode the static statistics of urban infrastructure
Reflect the implicit “opinions” of mobile users for a neighborhood


Checkins
Bus drop-off points
Problem Definition
11

Given
Estates with locations and historical prices
 Online User Reviews (rating and comments for business
venues/point of interests)
 Offline Moving Behaviors (taxi traces, bus traces, mobile
check-ins)


Objective


Rank estates based on their investment values
Core tasks
Extract discriminative features that reflect residents’
opinions for estates
 Learn an estate ranking predictor by combining a
pairwise ranking objective and a sparsity regularization

Methodology Overview
12
Estate Grades
Business Review
Feature Extraction
Prices
Estates
Human Mobility
Feature Extraction
Mobility
Features
Profile Vectors of Real Estates
R2
R3
R4
Synthetically Ranking
Estate Grading
Neighborhood Profiling
(Direction, Volume, Popularity, Velocity
Heterogeneity, Topic, Density, Contrast)
Business
Review Features
R1
Rising Market
Model
Estate
Ranking
Predictor
Estate
Ranking
Mixture
Objective
Falling Market
Model
Pairwise
Consistentcy
Sparsity
Regularization
Rising Market Data Falling Market Data
Sparse Estate Ranking
Estate Feature Extraction
Bus/Taxi/Checkin
Business Reviews Moving Behaviors
Features from Online User
Reviews
13

Overall Satisfaction

Service Quality

Environment Class

Consumption Cost

Functionality Planning
Features from Offline Moving
Behaviors
14









Taxi Arriving Volume
Taxi Leaving Volume
Taxi Transition Volume
Taxi Driving Velocity
Taxi Commute Distance
Bus Arriving Volume
Bus Leaving Volume
Bus Transition Volume
Bus Stop Density


Popularity of Checkin
Topic Profile of Checkin


Propagating word-of-mouth from poi
to neighborhood
Textual profiling from words to topics
Predicting Estate Investment
Value
15
Estate Investment Value
Features of User Review
Features of Taxi
Trajectories
Features of Smart Card
Transactions
Features of Checkins
Modeling Ranking Objective
16

Prediction Accuracy


Maximizing the likelihood ≈ minimizing square loss
Ranking Consistency
a>b>c>d
A ranked list of estates is viewed as a directed graph
 Nodes as real estates
 Directed edge A  B meaning A ranks higher than B
 Our model generate edges with certain probability
 Maximizing the likelihood ≈ minimizing the ranking loss of
graph-based ranking structure

Incorporating Sparsity
Regularization into Estate Ranking
17


Extracting large amount of estate-related features
Features are correlated and redundant


Two steps in classic method




A small number of good features can determine the ranking of estates
based on investment values
Feature Selection
Fit the selected features with ranking model
The selected feature subset may not be optimal for ranking because
the two steps are modelled separately
Combining sparsity and ranking in a unified model

Enforce sparse representations during learning by setting some
feature weights to zero and avoiding overfitting
Solving The Ranking Objective
18

Log of posterior

Maximize A Posterior
Experimental Data
19

Beijing real-world Data

Beijing estate data






2851 estates with transaction records from 04/2011 to 09/2012
Falling market(04/2011 to 02/2012) and Rising market (02/2012 to 09/2012)
Beijing Taxi Traces
Beijing Smart Card Transactions
Beijing Check-Ins
Beijing Business Review
Evaluation Methods and Metrics
20

Baseline algorithms






MART: it is a boosted tree model, specifically, a linear combination of the
out puts of a set of regression trees
RankBoost: it is a boosted pairwise ranking method, which trains multiple
weak rankers and combines their outputs as final ranking
LambdaMart: it is the boosted tree version of LambdaRank, which is based
on RankNet
Coordinate Ascent: it uses domination loss and applies coordinate descent
for optimization
FenchRank: designed for solving the sparse ranking problem with a L1
constraint
Evaluation metrics




Normalized Discounted Cumulative Gain (NDCG)
Precision
Recall
Kendall’s Tau Coefficient
Correlation Analysis
21
If the heterogeneity of functional planning is too
high or too low, the house will be low-valued
If the commute distance is short, the house is close
to important places and business venuses
Feature Evaluation on Different
Sources
22
•
•
Business reviews and checkins performs better than taxi and bus traces
• Checkins and reviews represent attending phrase
• Taxi and bus traces moving phrase
Taxi features perform better than bus features in falling market
• Taxi mobility represents white-collar and business people
• Bus mobility represents mediate classes
Feature Evaluation on Different
Radiuses
23
•
•
We recommend to set the radius of neighborhood to 0.75±0.25km, rather than
too short(<0.25km) or too long(>2km)
0.75km is not only a comfortable walking distance for bus and taxi stops, but also
sufficient to capture the outdoor activities of estate neighborhoods
Model Evaluation
24


Our method and FenchelRank achieve the first and second best
ranking accuracy in top-k ranking
Our method keep a balance between top-k and overall ranking
Conclusions
25

High-value house discovery


Investment-value based real estate ranking




A novel geo-buesiness problem
Features from online user reviews to capture explicit opinions for POIs
near an estate
Features from offline moving behaviors to capture implicit
geographical preferences of mobile users
Real estate ranking by combining prediction accuracy, ranking
consistency and sparsity regularization
Benefits



Online user reviews and offline moving behavior better sense the upto-date geo-preference of people toward real estates in a cheaper way
All aspects of feature engineering of the interest of industry
Joint modeling of prediction accuracy, ranking consistency, and sparsity
regularization in a unified framework
Download