Giasan.vn real-estate analytics: a vietnam situation study

Giasan.vn real-estate analytics: a vietnam situation study buying and

giasan.vn real-estate analytics: a Vietnam situation study

  1. 1.

    Real-estate analytics: A Vietnam situation study

    Real-estate analytics: a Vietnam situation study

    Viet-Trung Tran

    School of Communication and knowledge Technology

    Hanoi College of Science

  2. 2.

    Outline

    • Problem

    • Where big data analytics might help

    • Geographically weighted regression for

    property evaluation

    • Conclusion

    2

  3. 3.

    Problem

    • A nationwide database is required to support investors and residential

    buyers.

    – "After greater than twenty years of multinational and development, info on

    Vietnam’s housing market Vietnam continues to be rated have less transparency"

    3

  4. 4.

    Where’s my data?

    • The great

    – Property listings are nearly public on the web

    • Unhealthy

    – Thousands sites

    – Semi-structured text, needed NLP

    • The ugly

    – Junk e-mail/Duplication

    – Unreal, united nations-correct, low data quality

    4

  5. 5.

    5

    there’s a boom in buying and selling floors and lots of use methods similar

    to individuals adopted by multi-level marketing companies such

    as delivering messages to customers, supplying misleading

    details about property products, causing cost

    bubbles.

  6. 6.

    6

    Trang tin ABC

    Trang tin XYZ

  7. 7.

    Vietnam real-estate versus. stock exchange

    • 300 billions USD (FPT

    securities/2015)

    • Lack of top quality data, tons

    of scrams

    • Under weak governmental

    control

    • No national databases

    • 33 billionsUSD (quandl.com)

    • Obvious reports & plots, curated

    data

    • Strong governmental control

    • Centralized, real-time

    monitoring

    7

  8. 8.

    Vietnam real-estate versus. things e-commerce

    • Quality value, high Return on investment

    • Immobile

    8

    • Low value, no Return on investment

    • Mobile, disappeared with time

    Vietnam property listings are marketed within the same

    manner as fridges and television

  9. 9.

    Where big data analytics might help

    • Index the whole housing market

    – 8.5 millions listing up to now (02/2017)

    • Deliver real-time market insights

    – operated by machine learning and Vietnamese

    language processing

    9

    MARKET DATA

    TRANSPARENCY

    for those

    Save Your Time

    AVOID OVER Cost

    for buyers

  10. 10.

    Big information systems

    10

    Big information systems

    Natural language

    processing

    Crawlers

    QC: Filters/deduplication

    Distributed Database

    Report

    Chatbot

    Website

  11. 11.

    Vietnamese language processing

    • Tasks

    – Named Entity Recognition (NER)

    – Vietnamese address normalization (Critical!)

    11

  12. 12.

    Big information systems

    • Tasks

    – Cost timelines for each roads, wards, districts, metropolitan areas

    – Automatic property evaluation

    – More analytics in the future

    • About our data

    – 8.5 millions listings (up to now)

    – Stored on Hbase

    – Processed on Spark

    12

  13. 13.

    Prototype (up to now)

    13

  14. 14.

    Automatic property evaluation

    • Tran, Hung Tien, Hiep Tuan Nguyen, and Viet-Trung Tran. "Large-scale

    geographically weighted regression on Spark." Understanding and Systems

    Engineering (KSE), 2016 Eighth Worldwide Conference on. IEEE, 2016.

    14

    GWR + =

    – Large-scale spatial data

    – Improve performance

    – Distributed

    First Law of Geography – Waldo Tobler:

    “Everything is expounded with anything else, but

    closer situations are more related”.

  15. 15.

    Background

    • First Law of Geography – Waldo Tobler:

    “Everything is expounded with anything else, but closer

    situations are more related”.

    • Model GWR

    – The OLS estimator takes the shape

    yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + … + βmi (u)xmi

    βˆ(u) = (X TW (u)X )−1 X TW (u)Y

  16. 16.

    Background

    • Kernel function

    – Gaussian function

    • Bandwidth

    16

    fixed bandwidth adaptive bandwidth

  17. 17.

    Problem

    • Estimating a nearby model

    • Bandwidth selection

    – Which bandwidth is nice

    • Evaluation model

    – Choose kernel function

    βˆ(u) = (X TW (u)X )−1 X TW (u)Y

    Source: http://rose.bris.ac.united kingdom

    O(n3)

  18. 18.

    Problem

    • How you can use the model for big-scale

    data?

    – Data points

    – Features

    – Regression points

  19. 19.

    Large-Scale GWR on Spark

    • Exactly why is Spark?

    – In-memory cluster-computing platform

    – Parallel programming

    – Resilient distributed datasets

  20. 20.

    Large-Scale GWR on Spark

    • We advise three method of scaling GWR

    – Scaling Weighted Straight line Regression

    – Parallel Multiple WLR models

    – Parallel Geographically Weighted Regression

    (combine the very first two approach)

  21. 21.

    Scalable GWR on Spark

    • Naïve approach – Scaling Weighted Straight line

    Regression

    Foreach regPoint

    Compute weight

    Fit Weighted

    Straight line Regression

    Summary model

    Compute weight

    parallel

    Compute WLR

    model parallel

  22. 22.

    Scalable GWR on Spark

    • Parallel Multiple WLR models

    Regression dataset

    Training dataset

    WLR

    Compute weight

    WLR

    Compute parallel

    multiple WLR models

    Summary

  23. 23.

    Scalable GWR on Spark

    • Parallel Geographically Weighted Regression

    R

    R

    R

    T

    T

    T

    RT

    RT

    RT

    Regression

    dataset

    Training

    dataset

    Combine

    dataset

    Distributed GWR Computation

  24. 24.

    Experiments

    • Atmosphere

    – Cluster: 8 nodes on Amazon . com Web Service

    • 4 cores Inte Xeon E5-2670 v2 2.5 GHz

    • 16 GB RAM, 2×40 GB SSD

    • Hadoop 2.7.2 and Spark 1.6.1

    – Dataset

    − −x : double(nullable = false)

    − −y : double(nullable = false)

    − −label : double(nullable = false)

    − −f eatures : vector(nullable = false)

  25. 25.

    Large training dataset

    200

    400

    600

    800

    1000

    1200

    10000 100000 1000000 2000000 5000000

    Distributed WLR

    computation

    Parallel WLR

    Distributed GWR NE

    Distributed GWR GD

    time (sec).

    Quantity of training points

  26. 26.

    Large regression dataset

    200

    400

    600

    800

    1000

    1200

    1000 5000 10000 20000 50000

    Distributed WLR computation

    Parallel WLR

    Distributed GWR NE

    Distributed GWR GD

    time (sec).

    Quantity of regression points

  27. 27.

    Cluster performance

    500

    1000

    1500

    2000

    2-node 4-node 8-node

    Distributed WLR computation

    Parallel WLR

    Distributed GWR NE

    Distributed GWR GD

    time (sec).

  28. 28.

    Land value conjecture (GWR)

    28

  29. 29.

    Land value heat map

    29

  30. 30.

    30

  31. 31.

    Conclusion

    • Vietnam real-estate analytics just work!

    – Large-scale crawlers

    – Big information systems

    – Specialized NLP for listing corpus

    • However

    – large amount of undiscovered values from data

    – large amount of room to enhance and also to research on

    31

    Demand collaboration!

  32. 32.

    Interesting attention!

    trungtv@soict.hust.edu.vn

    32

Resourse: https://slideshare.internet/microlife/

Vietnam War | The 20th century | World history | Khan Academy


Video COMMENTS:

minh do: I don't understand why American was so afraid of the spread of communism. Isn't it fair everyone has his own choice? You can live your way, I will live my way. We respect each other, and live well together. Why harm the other when they are different from you?

Megan Deorio: I have read Animal Farm, and I've also taken four years of advanced history so I don't need anyone to tell me to "tread lightly on topics like this." Especially when it's just because I'm a girl.

95TurboSol: You have to realize these events all came just a decade or two after WW2 where 25 million people died because of rogue marxist/communist governments, you see that kind of destruction and you kinda overreact (if you can even say it's an overreaction). The soviet union starved and murdered millions and so did Germany. But yeah we should live and let live with any benevolent governments rather they are communist or not.

Imadisneydonk: OMG … Those images … this is the first time in my life that I am truly ashamed to be an American. I would have never thought that OUR TROUPS could have it in them to do that. An entire town murdered in cold blood, but what just about made me wretch were the images of the innocent children. WHY oh WHY in the name of GOD could our troups do that. To all the Vietnemese people if any see and read this, my heart cries for you and as an American … I am sorry this happened … I am deeply and TRULY sorry.

No Na-Me: Legitimate question, how is it possible to feel sorry for something that you never lived through that never affects you or affects anyone who you know? I can understand sympathy or even empathy but sorrow?

Flame Fusion: Imadisneydonk I am an American. I am not sorry as I am not responsible.

marsilss: bro thanks for this

leon kenedy: I cant believe that American actually massacre MY OWN TOWN. As a vietnamese I am digust. Rest in peace my innocent people.

JF BTS: +Hoa Nguyen True, lets not forget the North Vietnamese "re-education" camps and atrocities they inflicted on Cambodia.

Asia Franklin: beautiful