giasan.vn real-estate analytics: a Vietnam situation study
-
1.
Real-estate analytics: A Vietnam situation study
Real-estate analytics: a Vietnam situation study
Viet-Trung Tran
School of Communication and knowledge Technology
Hanoi College of Science
-
2.
Outline
• Problem
• Where big data analytics might help
• Geographically weighted regression for
property evaluation
• Conclusion
2
-
3.
Problem
• A nationwide database is required to support investors and residential
buyers.
– "After greater than twenty years of multinational and development, info on
Vietnam’s housing market Vietnam continues to be rated have less transparency"
3
-
4.
Where’s my data?
• The great
– Property listings are nearly public on the web
• Unhealthy
– Thousands sites
– Semi-structured text, needed NLP
• The ugly
– Junk e-mail/Duplication
– Unreal, united nations-correct, low data quality
4
-
5.
5
there’s a boom in buying and selling floors and lots of use methods similar
to individuals adopted by multi-level marketing companies such
as delivering messages to customers, supplying misleading
details about property products, causing cost
bubbles.
-
6.
6
Trang tin ABC
Trang tin XYZ
-
7.
Vietnam real-estate versus. stock exchange
• 300 billions USD (FPT
securities/2015)
• Lack of top quality data, tons
of scrams
• Under weak governmental
control
• No national databases
• 33 billionsUSD (quandl.com)
• Obvious reports & plots, curated
data
• Strong governmental control
• Centralized, real-time
monitoring
7
-
8.
Vietnam real-estate versus. things e-commerce
• Quality value, high Return on investment
• Immobile
8
• Low value, no Return on investment
• Mobile, disappeared with time
Vietnam property listings are marketed within the same
manner as fridges and television
-
9.
Where big data analytics might help
• Index the whole housing market
– 8.5 millions listing up to now (02/2017)
• Deliver real-time market insights
– operated by machine learning and Vietnamese
language processing
9
MARKET DATA
TRANSPARENCY
for those
Save Your Time
AVOID OVER Cost
for buyers
-
10.
Big information systems
10
Big information systems
Natural language
processing
Crawlers
QC: Filters/deduplication
Distributed Database
Report
Chatbot
Website
-
11.
Vietnamese language processing
• Tasks
– Named Entity Recognition (NER)
– Vietnamese address normalization (Critical!)
11
-
12.
Big information systems
• Tasks
– Cost timelines for each roads, wards, districts, metropolitan areas
– Automatic property evaluation
– More analytics in the future
• About our data
– 8.5 millions listings (up to now)
– Stored on Hbase
– Processed on Spark
12
-
13.
Prototype (up to now)
13
-
14.
Automatic property evaluation
• Tran, Hung Tien, Hiep Tuan Nguyen, and Viet-Trung Tran. "Large-scale
geographically weighted regression on Spark." Understanding and Systems
Engineering (KSE), 2016 Eighth Worldwide Conference on. IEEE, 2016.
14
GWR + =
– Large-scale spatial data
– Improve performance
– Distributed
First Law of Geography – Waldo Tobler:
“Everything is expounded with anything else, but
closer situations are more related”.
-
15.
Background
• First Law of Geography – Waldo Tobler:
“Everything is expounded with anything else, but closer
situations are more related”.
• Model GWR
– The OLS estimator takes the shape
yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + … + βmi (u)xmi
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
-
16.
Background
• Kernel function
– Gaussian function
• Bandwidth
16
fixed bandwidth adaptive bandwidth
-
17.
Problem
• Estimating a nearby model
• Bandwidth selection
– Which bandwidth is nice
• Evaluation model
– Choose kernel function
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Source: http://rose.bris.ac.united kingdom
O(n3)
-
18.
Problem
• How you can use the model for big-scale
data?
– Data points
– Features
– Regression points
-
19.
Large-Scale GWR on Spark
• Exactly why is Spark?
– In-memory cluster-computing platform
– Parallel programming
– Resilient distributed datasets
-
20.
Large-Scale GWR on Spark
• We advise three method of scaling GWR
– Scaling Weighted Straight line Regression
– Parallel Multiple WLR models
– Parallel Geographically Weighted Regression
(combine the very first two approach)
-
21.
Scalable GWR on Spark
• Naïve approach – Scaling Weighted Straight line
Regression
Foreach regPoint
Compute weight
Fit Weighted
Straight line Regression
Summary model
Compute weight
parallel
Compute WLR
model parallel
-
22.
Scalable GWR on Spark
• Parallel Multiple WLR models
Regression dataset
Training dataset
WLR
Compute weight
WLR
Compute parallel
multiple WLR models
Summary
-
23.
Scalable GWR on Spark
• Parallel Geographically Weighted Regression
R
R
R
T
T
T
RT
RT
RT
Regression
dataset
Training
dataset
Combine
dataset
Distributed GWR Computation
-
24.
Experiments
• Atmosphere
– Cluster: 8 nodes on Amazon . com Web Service
• 4 cores Inte Xeon E5-2670 v2 2.5 GHz
• 16 GB RAM, 2×40 GB SSD
• Hadoop 2.7.2 and Spark 1.6.1
– Dataset
− −x : double(nullable = false)
− −y : double(nullable = false)
− −label : double(nullable = false)
− −f eatures : vector(nullable = false)
-
25.
Large training dataset
200
400
600
800
1000
1200
10000 100000 1000000 2000000 5000000
Distributed WLR
computation
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
Quantity of training points
-
26.
Large regression dataset
200
400
600
800
1000
1200
1000 5000 10000 20000 50000
Distributed WLR computation
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
Quantity of regression points
-
27.
Cluster performance
500
1000
1500
2000
2-node 4-node 8-node
Distributed WLR computation
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
-
28.
Land value conjecture (GWR)
28
-
29.
Land value heat map
29
-
30.
30
-
31.
Conclusion
• Vietnam real-estate analytics just work!
– Large-scale crawlers
– Big information systems
– Specialized NLP for listing corpus
• However
– large amount of undiscovered values from data
– large amount of room to enhance and also to research on
31
Demand collaboration!
-
32.
Interesting attention!
trungtv@soict.hust.edu.vn
32
Resourse: https://slideshare.internet/microlife/
Vietnam War | The 20th century | World history | Khan Academy
Video COMMENTS:
minh do: I don't understand why American was so afraid of the spread of communism. Isn't it fair everyone has his own choice? You can live your way, I will live my way. We respect each other, and live well together. Why harm the other when they are different from you?
Megan Deorio: I have read Animal Farm, and I've also taken four years of advanced history so I don't need anyone to tell me to "tread lightly on topics like this." Especially when it's just because I'm a girl.
95TurboSol: You have to realize these events all came just a decade or two after WW2 where 25 million people died because of rogue marxist/communist governments, you see that kind of destruction and you kinda overreact (if you can even say it's an overreaction). The soviet union starved and murdered millions and so did Germany. But yeah we should live and let live with any benevolent governments rather they are communist or not.
Imadisneydonk: OMG … Those images … this is the first time in my life that I am truly ashamed to be an American. I would have never thought that OUR TROUPS could have it in them to do that. An entire town murdered in cold blood, but what just about made me wretch were the images of the innocent children. WHY oh WHY in the name of GOD could our troups do that. To all the Vietnemese people if any see and read this, my heart cries for you and as an American … I am sorry this happened … I am deeply and TRULY sorry.
No Na-Me: Legitimate question, how is it possible to feel sorry for something that you never lived through that never affects you or affects anyone who you know? I can understand sympathy or even empathy but sorrow?
Flame Fusion: Imadisneydonk I am an American. I am not sorry as I am not responsible.
marsilss: bro thanks for this
leon kenedy: I cant believe that American actually massacre MY OWN TOWN. As a vietnamese I am digust. Rest in peace my innocent people.
JF BTS: +Hoa Nguyen True, lets not forget the North Vietnamese "re-education" camps and atrocities they inflicted on Cambodia.
Asia Franklin: beautiful