Machine Learning 能否應用於預測樓市的去向? 實際試一下便知,以下是我用的工具。
1) Ubuntu
2) iPython notebook
3) keras
第一步 數據收集
香港政府會將樓價/租金等資料放上網,並按樓宇的大小為5類
40 平分米以下為 A類
40至69.9 平分米為 B類
70至99.9 平分米為 C類
100至159.9 平分米為 D類
160 平分米以上為 E類
(每月樓價/租金等資料可於 http://www.rvd.gov.hk/doc/en/statistics/ 下載)
今次主要 用LSTM 預測 Class A(香港區)
Feature 為每月樓價和每月租金
Label 為 3 個月後的樓價
首先Import library
import os, pandas
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
import math, quandl, keras
import seaborn as sns
import numpy as np # linear algebra
from keras.optimizers import Adam
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from pandas import read_csv
Set Parameter
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
look_back = 3 #Label 3 個月後
batch_size = 1 #Batch Size
np.random.seed(8) #Random Seed
trainSplit=88 #Train 和TEST 分拆 比例
epochsValue=1000 #訓練次數
將 CSV import 去ARRAY
#Read Data
data=pd.read_csv("source/DomesticPrice.csv")
data["Year"] = data["Year"].astype(str)
data["Month"] = data["Month"].astype(str)
data["Date"] = [dt.datetime.strptime(d,'%Y%m').date() for d in data["Date"].astype(str)]
data["Value"] = (data["Value"].replace(' ','')).astype(float)
rent=pd.read_csv("source/DomesticRent.csv")
rent["Year"] = rent["Year"].astype(str)
rent["Month"] = rent["Month"].astype(str)
#rent['Date'] = rent[['Year', 'Month']].apply(lambda x: ''.join(x), axis=1)
rent["Date"] = [dt.datetime.strptime(d,'%Y%m').date() for d in rent["Date"].astype(str)]
data['Rent'] = rent['Value']
#data.dtypes
print('Average Price by Class')
print(data.head(),'\n')
Average Price by Class
Year Month Date Class Place Value Rent
0 1999 1 1999-01-01 A Hong Kong 42663.0 190
1 1999 2 1999-02-01 A Hong Kong 43068.0 196
2 1999 3 1999-03-01 A Hong Kong 42683.0 199
3 1999 4 1999-04-01 A Hong Kong 43223.0 191
4 1999 5 1999-05-01 A Hong Kong 43316.0 191
#檢查是否有 NULL VALUE
#Verify any Null Value
print(data.isnull().any())
Year False
Month False
Date False
Class False
Place False
Value False
Rent False
未做 Predict 前,先要了解你手上的數據
#Histogram for average price by Class from 1999 to 2016
%matplotlib inline
%pylab inline
pylab.rcParams['figure.figsize'] = (20, 20)
print('\n\nAverage price from Year 1999 to 2016 - 每平方米售價')
print(round(data.where(data.Year != '2017').dropna()['Value'].groupby([data['Class'], data['Place']]).describe(percentiles=[])))
sns.distplot(data.where(data.Year != '2017').dropna()['Value'])
樓價分佈
1) 1999 ~ 2016 不同Class 和區 的樓價 MEAN / STD / MIN / MAX
Class A(Hong Kong) 為例 Min 和 Max 相差 6倍, Mean(平均數) 為70,183
Average price from Year 1999 to 2016 - 每平方米售價 count mean std min 50% max Class Place A Hong Kong 216.0 70183.0 40555.0 23363.0 51771.0 151462.0 Kowloon 216.0 54120.0 32292.0 19768.0 38178.0 124574.0 New Territories 216.0 49493.0 27801.0 19724.0 36409.0 114705.0 B Hong Kong 216.0 76817.0 38650.0 27661.0 61530.0 153461.0 Kowloon 216.0 60796.0 33152.0 19834.0 47356.0 128291.0 New Territories 216.0 47274.0 23222.0 20193.0 36118.0 99686.0 C Hong Kong 216.0 96861.0 45127.0 36005.0 82718.0 181770.0 Kowloon 216.0 83604.0 44138.0 23706.0 73494.0 166294.0 New Territories 216.0 55656.0 23820.0 24439.0 45115.0 107064.0 D Hong Kong 216.0 117856.0 54801.0 40724.0 107832.0 215879.0 Kowloon 216.0 100424.0 50509.0 32407.0 90700.0 202067.0 New Territories 216.0 59254.0 20859.0 25747.0 54932.0 106307.0 E Hong Kong 216.0 158109.0 76408.0 0.0 145088.0 351027.0 Kowloon 216.0 127965.0 77787.0 0.0 113680.0 557678.0 New Territories 216.0 62025.0 20291.0 27461.0 62935.0 131290.0
1999至2016 樓價分佈圖
#Histogram for average price by Class from 2017 January to June
print('\n\nAverage Price from Year 2017 January - June - 每平方米售價')
print(round(data.where(data.Year == '2017').dropna()['Value'].groupby([data['Class'], data['Place']]).describe(percentiles=[])))
sns.distplot(data.where(data.Year == '2017').dropna()['Value'])
2) 2017 1月至6月 樓價不同CLASS 和區的 MEAN/STD /MIN/MAX2017 年 ClassA(Hong Kong)平均數為 154,314
Average Price from Year 2017 January - June - 每平方米售價
count mean std min 50% max Class Place A Hong Kong 6.0 154314.0 6058.0 145930.0 154687.0 160842.0 Kowloon 6.0 126572.0 4304.0 121875.0 127037.0 132952.0 New Territories 6.0 119037.0 4179.0 112391.0 120082.0 123121.0 B Hong Kong 6.0 158427.0 9004.0 149245.0 157392.0 173099.0 Kowloon 6.0 127969.0 5543.0 118786.0 128000.0 135443.0 New Territories 6.0 104152.0 3749.0 98651.0 105234.0 109003.0 C Hong Kong 6.0 183396.0 5055.0 177600.0 183794.0 191130.0 Kowloon 6.0 161993.0 9420.0 149225.0 164392.0 171789.0 New Territories 6.0 108778.0 1432.0 107401.0 108334.0 111360.0 D Hong Kong 6.0 210984.0 11912.0 191646.0 215463.0 220623.0 Kowloon 6.0 169772.0 9997.0 155837.0 170111.0 183973.0 New Territories 6.0 104136.0 8928.0 92255.0 104143.0 114054.0 E Hong Kong 6.0 242705.0 28044.0 199533.0 252744.0 275942.0 Kowloon 6.0 248105.0 146072.0 0.0 258817.0 438692.0 New Territories 6.0 96008.0 12629.0 80038.0 94472.0 113930.02017年樓價分佈圖
租金分佈
#Histogram for rent by Class from 1999 to 2016%matplotlib inline pylab.rcParams['figure.figsize'] = (20, 20) print('\n\nAverage rent from Year 1999 to 2016 - 每平方米租金') print(round(rent.where(data.Year != '2017').dropna()['Value'].groupby([rent['Class'], rent['Place']]).describe(percentiles=[]))) sns.distplot(rent.where(data.Year != '2017').dropna()['Value'])
1) 1999 ~ 2016 租金不同CLASS 和區的 MEAN/STD /MIN/MAXAverage rent from Year 1999 to 2016 - 每平方米租金 count mean std min 50% max Class Place A Hong Kong 216.0 262.0 90.0 146.0 236.0 470.0 Kowloon 216.0 202.0 69.0 118.0 176.0 367.0 New Territories 216.0 159.0 58.0 87.0 134.0 287.0 B Hong Kong 216.0 252.0 77.0 139.0 229.0 409.0 Kowloon 216.0 198.0 64.0 112.0 166.0 334.0 New Territories 216.0 144.0 47.0 83.0 121.0 249.0 C Hong Kong 216.0 299.0 69.0 181.0 281.0 437.0 Kowloon 216.0 237.0 65.0 139.0 220.0 400.0 New Territories 216.0 164.0 45.0 91.0 147.0 261.0 D Hong Kong 216.0 331.0 72.0 200.0 320.0 454.0 Kowloon 216.0 249.0 59.0 146.0 241.0 379.0 New Territories 216.0 203.0 44.0 114.0 192.0 321.0 E Hong Kong 216.0 382.0 78.0 244.0 372.0 565.0 Kowloon 216.0 243.0 81.0 0.0 228.0 488.0 New Territories 216.0 211.0 50.0 0.0 210.0 343.0
#Histogram for average price by Class from 2017 January to June
print('Average rent from Year 2017 January - June - 每平方米租金')
print(round(rent.where(rent.Year == '2017').dropna()['Value'].groupby([rent['Class'], rent['Place']]).describe(percentiles=[])))
sns.distplot(rent.where(rent.Year == '2017').dropna()['Value'])
2) 2017 1月至6月 租金不同CLASS 和區的 MEAN/STD /MIN/MAXAverage rent from Year 2017 January - June - 每平方米租金 count mean std min 50% max Class Place A Hong Kong 6.0 435.0 10.0 424.0 434.0 453.0 Kowloon 6.0 342.0 9.0 331.0 340.0 357.0 New Territories 6.0 285.0 6.0 278.0 284.0 293.0 B Hong Kong 6.0 398.0 7.0 386.0 399.0 406.0 Kowloon 6.0 324.0 10.0 314.0 320.0 336.0 New Territories 6.0 248.0 9.0 240.0 244.0 262.0 C Hong Kong 6.0 424.0 5.0 419.0 422.0 432.0 Kowloon 6.0 353.0 14.0 334.0 355.0 366.0 New Territories 6.0 256.0 8.0 248.0 254.0 266.0 D Hong Kong 6.0 438.0 6.0 430.0 441.0 444.0 Kowloon 6.0 343.0 17.0 322.0 342.0 366.0 New Territories 6.0 248.0 11.0 235.0 244.0 267.0 E Hong Kong 6.0 432.0 23.0 395.0 436.0 455.0 Kowloon 6.0 381.0 53.0 331.0 364.0 465.0 New Territories 6.0 234.0 10.0 221.0 233.0 251.0
沒有留言:
發佈留言