An introduction to Machine Learning applied to Real Estate Estimation

I’m currently looking for a house in Montpellier, and I’m still wondering if the price displayed could be negotiated or not. So instead of thinking for myself (I’m very bad at business…), I prefer to let the computer do it for me.

To have a rough estimation of a property, you can get the average price per square meter and multiply by the property surface. But you know that lots of criteria can influence the final price. For example, the land area in case of houses. Let’s see how machine learning can help us to refine our estimation by taking in account more than only living area.

Building our Model

We will suppose, for this very simple model, that real estate prices mainly depend on living area and land area. Our goal is to have a program that takes those 2 values as input and give us the price of the house:

house_price = compute_price([living_area, land_area])

By convention, inputs of the model are called the “Xs”, and output are called the “Ys”. The model is a function that compute a Y from a given X.

Unfortunately, there is no such thing as “The Model to rule them all”. There are dozens and dozens of models, more or less complicated, more or less easy to use, more or less adapted for certain category of problems. For example, assuming a “Price per square meter” is already a model in itself. You assume price grows linearly with living area. It may not be the case, it may grow exponentially, or stagnate at some point, or do some crazy things like “A house less than 100m² cost €300.000 and a house more than 100m² cost €500.000, and that’s it”. But you assume using a linear variation is a good approximation of what happen.

We can assume final cost will also vary linearly with land area. The bigger the land, the more expensive the house. So let’s create a simple model that assume house price is a linear function of living_area and land_area.

In Python, we will use the linear_model module of sklearn package:

 from sklearn import linear_model

Training our Model

To train this model, we need data. There are several ways to get real estate transaction data. You can get them for example from https://www.immo-data.fr/. I won’t detail how to get data here, but let’s assume you manage to have a few dozens of real estate transaction over the last year on a given town. You know, for each house, its living_area, its land_area, and its price. For each couple [living_area, land_area], you have a corresponding price. This gives us 2 lists:

  • The list of [living_area, land_area], which are inputs of the model, the “Xs”.
  • The list of price, which are output of the model, the “Ys”.
 # [living_area, land_area]
X = [[137, 800], 
     [120, 400],
     [125, 425],
     [130, 1500],
     [250, 1825],
     [351, 1612],
     [203, 4135],
     [173, 1690],
     [115, 622],
     [91, 388],
     [160, 2854],
     [106, 913],
     [86, 471],
     [80, 767],
     [135, 1497],
     [217, 2000],
     [167, 1796],
     [98, 3544],
     [120, 1007],
     [101, 3286],
     [134, 1108],
     [180, 1854],
     [137, 1784],
     [117, 1570],
     [176, 1743],
]
     
# House price
Y = [629000, 
     599000, 
     594000, 
     650000, 
     840000,
     1074648,
     590000,
     625000,
     700540,
     465170,
     1000000,
     430000,
     503900,
     370000,
     632000,
     1200000,
     700000,
     655750,
     564000,
     600000,
     568400,
     815000,
     650000,
     485800,
     724700,
]
 

We train our model using those data. It will learn how price vary in function of living/land area in this particular town.

 model = linear_model.LinearRegression().fit(X, Y)

Using our Model

When a new house for sale appears in that city, you can run your model to see if the price displayed is consistent with the local market.

 living_area = 110
land_area = 880
house_price = model.predict([[living_area, land_area]])

In this case, the model gives a price of €589.010. If this house is for sale for €590.000, that seems fair. If it’s for sale for €650.000, you probably have room for negotiation.

Conclusion

Machine learning is a huge topic that has existed long before the trend of deep learning, and it exists model that are much simpler and require much smaller training set than deep neural network.

House pricing depend on various things like condition of the house, its location, the year of construction, its energy balance, etc… It could be a good exercise to build a more complex model taking in account those criteria. 😉

I let you the full python code. To run it, copy/past into a run.py file and type:

python run.py

You may have to install sklearn with:

pip install sklearn
 from sklearn import linear_model

X = [[137, 800], 
     [120, 400],
     [125, 425],
     [130, 1500],
     [250, 1825],
     [351, 1612],
     [203, 4135],
     [173, 1690],
     [115, 622],
     [91, 388],
     [160, 2854],
     [106, 913],
     [86, 471],
     [80, 767],
     [135, 1497],
     [217, 2000],
     [167, 1796],
     [98, 3544],
     [120, 1007],
     [101, 3286],
     [134, 1108],
     [180, 1854],
     [137, 1784],
     [117, 1570],
     [176, 1743],
]
     
Y = [629000, 
     599000, 
     594000, 
     650000, 
     840000,
     1074648,
     590000,
     625000,
     700540,
     465170,
     1000000,
     430000,
     503900,
     370000,
     632000,
     1200000,
     700000,
     655750,
     564000,
     600000,
     568400,
     815000,
     650000,
     485800,
     724700,
]
     
model = linear_model.LinearRegression().fit(X, Y)

living_area = 110
land_area = 880
house_price = model.predict([[living_area, land_area]])

print("Estimated Price: ", house_price)

Ps: I would like to thank the Odysee team that introduced me to machine learning domain. “It was short but intense” 😉

Advertisement

Home


Similar topics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s