skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Use of Data Mining for Analysis of Czech Real Estate Market

Acta Informatica Pragensia, 2023-01, Vol.12 (2), p.275-295 [Peer Reviewed Journal]

ISSN: 1805-4951 ;EISSN: 1805-4951 ;DOI: 10.18267/j.aip.215

Full text available

Citations Cited by
  • Title:
    Use of Data Mining for Analysis of Czech Real Estate Market
  • Author: Tsakunov, Ilya ; Chudán, David
  • Subjects: data mining ; exploratory analysis ; real estate market ; web scraping
  • Is Part Of: Acta Informatica Pragensia, 2023-01, Vol.12 (2), p.275-295
  • Description: This paper analyses data from the real estate market domain. The data were scraped from the bezrealitky.cz portal. The analysis looks at both sales and rental data. A total of 3546 records and 54 attributes were obtained. A basic overview of the data was performed using exploratory data analysis where some basic characteristics of the data were identified, such as the average price of sold and rented flats. More specific results were obtained by applying data mining methods such as regression (linear regression, lasso regression and ridge regression) for predicting the flat prices and payments for utilities, classification (support vector machines, KNN, Gaussian naïve Bayes, decision tree and random forest) for estimating the PENB class (building energy performance certificate) and building condition. Lasso regression performed the most successfully (R2 = 0.76) in predicting the rent price. Among the classification tasks, the best result was achieved with random forest, which had an accuracy over 80% in some cases. Other tasks included clustering (k-means and k-modes) and anomaly detection (isolation forest). The main focus was on descriptive data mining, especially on clustering. Clusters created using the k-means algorithm (silhouette score of 0.78) with flats based on geographic coordinates were identified which show that the most expensive flats are on average in Bohemian regions, followed by Silesia and the cheapest are in central Moravia. Another cluster application identified flats in the Moravian-Silesian region with very high payments for utilities (silhouette score of 0.56). The models can help estimate the value of flats based on their attributes as well as location.
  • Publisher: Prague University of Economics and Business
  • Language: English;Czech
  • Identifier: ISSN: 1805-4951
    EISSN: 1805-4951
    DOI: 10.18267/j.aip.215
  • Source: ROAD: Directory of Open Access Scholarly Resources
    DOAJ Directory of Open Access Journals

Searching Remote Databases, Please Wait