Journal of Geographical Sciences, Volume. 30, Issue 5, 794(2020)

A comparative study of land price estimation and mapping using regression kriging and machine learning algorithms across Fukushima prefecture, Japan

Ahmed DERDOURI1、* and Yuji MURAYAMA2
Author Affiliations
  • 1Division of Spatial Information Science, Graduate School of Life and Environmental Sciences, Uni-versity of Tsukuba, Tennodai, Tsukuba, Ibaraki, Japan
  • 2Faculty of Life and Environmental Sciences, University of Tsukuba, Tennodai, Tsukuba, Ibaraki, Japan
  • show less
    Figures & Tables(20)
    Fukushima prefecture and its administrative boundaries, topographic features, transportation lines, and evacuation zones after the Fukushima Daiichi Nuclear Plant disaster (as of September 2015)
    Changes in land prices averaged by land type in Fukushima prefecture (2005-2018)
    Methodological framework of the study
    The distribution of land price samples in the study area
    Fitted semi-variograms for the kriging models for the year 2015: (a) Exp: Exponential (b) Gau: Gaussian (c) Sph: Spherical. The nugget, range, and sill values and the mathematical models are shown in the bottom right corner
    The results of the regression kriging for the year 2015 using the exponential model (upper), Gaussian model (middle), and spherical model (lower). On the left are the estimated log-transformed land prices using regression kriging. On the right are the validation errors in the training samples. Capital letters denote major cities within Fukushima prefecture, which are A: Fukushima, B: Koriyama, C: Iwaki, D: Aizuwakamtsu, and E: Shirakawa
    Land price maps for the year 2015 predicted from officially published land price observations using regression kriging based on three mathematical models (ordered from left to right): (1) Krig.EXP: Exponential model, (2) Krig.GAU: Gaussian model, and (3) Krig.SPH: Spherical model
    Boxplots of performance of machine learning methods in terms of the MAE, the RMSE, and R2 for the year 2015
    Observed land prices vs. predicted land prices for the year 2015 in the testing samples by different machine learning methods (ordered from left to right, up to down): (1) GLM: generalized linear model, (2) GAMS: generalized linear model using splines, (3) SVMLinear: support vector machines with linear kernel, (4) MARS: multivariate adaptive regression spline, (5) kNN: k-nearest neighbors, (6) SVMRadial: support vector machines with radial basis function kernel, (7) Cubist, (8) GBM: stochastic gradient boosting and (9) RF: random forest
    Land price maps for the year 2015 predicted from officially published land price observations using machine learning algorithms (ordered from left to right, up to down): (1) GLM: generalized linear model, (2) GAMS: generalized linear model using splines, (3) SVMLinear: support vector machines with linear kernel, (4) MARS: multivariate adaptive regression spline, (5) kNN: k-nearest neighbors, (6) SVMRadial: support vector machines with radial basis function kernel, (7) Cubist, (8) GBM: stochastic gradient boosting and (9) RF: random forest
    Maps of differences in the 2015 land prices between the best-performing machine learning algorithms: (1) RF: Random Forest, (2) Cubist, (3) MARS: Multivariate Adaptive Regression Spline and (4) GAMS: Generalized Linear Model using Splines and kriging exponential model. A1, A2, A3, and A4 show zoomed-in maps of Koriyama city and its outskirts
    Area percentage of RF- and krig.EXP-based estimated land price for the year 2015 distributed by predefined ranges in Fukushima prefecture and its subregions
    • Table 1.

      Descriptive list of reviewed literature regarding land price estimation/mapping grouped by estimation approach: (1) hedonic models, (2) geostatistical methods, (3) machine learning algorithms, and (4) comparison of various approaches

      View table
      View in Article

      Table 1.

      Descriptive list of reviewed literature regarding land price estimation/mapping grouped by estimation approach: (1) hedonic models, (2) geostatistical methods, (3) machine learning algorithms, and (4) comparison of various approaches

      Estimation approachStudyStudy areaMethod(s)MappingObjectiveHighlighted results
      Hedonic models(Löchl, 2006)Canton Zurich, SwitzerlandHedonic regressionYesDeveloping an estimation model of rent and land pricesTwo classified maps of land prices for residential and commercial uses
      (Kim and Kim, 2016)Seoul, South KoreaOLS and spatial regression modelsNoEstimation of land value using OLS and generalized regression modelsSpatial error model (SEM) found to be the best of the tested models
      (Hilal et al., 2016)Côte-d’Or, FranceOLSNoEstimation of the price of agricultural lands at cadastral levels based on previous real estate transactionsHedonic prices were calculated based on a range of attributes influencing agricultural lands most notable time effects
      Geostatistical methods(Luo and Wei, 2004)Milwaukee, Wisconsin, USAKrigingNoPredicting urban land values of different land use categories using kriging modelsOverall average standard error of 2%
      (Chica-Olmo, 2007)City of Granada, SpainKriging and cokrigingYesEstimating and mapping housing prices using kriging and cokriging approachesCokriging has a lower standard error compared with that of kriging
      (Inoue et al., 2007)Tokyo 23 wards, JapanKrigingYesMapping estimated land prices in Tokyo’s 23 wards from 1975 to 2004Kriging model-based results were more accurate than those for OLS with the average error ranging from 2% to 10%
      Geostatistical methods(Tsutsumi et al., 2011)Tokyo metropolitan area, JapanRegression krigingYesDeveloping a system to estimate and map residential land price in the Tokyo metropolitan area10% was the average error ratio for the exponential model but 18.3% for the Gaussian model
      (Kuntz and Helbich, 2014)Metropolitan area of Vienna, AustriaKriging and cokrigingYesMapping predicted real estate pricesUniversal cokriging showed better results in terms of cross-validation results
      (Chica-Olmo et al., 2019)City of Grenada, SpainRegression and universal cokrigingYesSpatiotemporally estimating housing price variations 1988-2005Regression cokriging was found to be slightly better
      (Palma et al., 2019)ItalyJackknife krigingNoPredicting real estate prices based on socioeconomic factors for the period 2014-2016Accuracy of the model improved when considering the spatio-temporal correlation
      Machine learning algorithms(Gu et al., 2011)A district of Tangshan city, ChinaHybrid genetic algorithm and support vector machine model (G-SVM), Grey Model (GM)NoForecasting housing pricesG-SVM outperformed GM in many aspects
      (Antipov and Pokryshevskaya, 2012)Saint Petersburg, RussiaMachine learning algorithmsNoEstimating residential apartmentsRandom forest was found to be the most robust among all methods
      (Wang et al., 2014)Chongqing city, ChinaSVM optimized by particle swarm optimization (PSO), BP neural networkNoForecasting real estate price based on PSO-optimized SVM compared to other BP neural networkPSO-SVM showed higher forecasting accuracy than BP neural network
      (Park and Bae, 2015)Fairfax County, Virginia, USAMachine learning algorithms (C4.5, RIPPER, Naïve Bayesian, and AdaBoost)NoPrediction of housing prices using different machine learning methodsRIPPER model outperformed all selected methods
      Comparison of various approaches(Bourassa et al., 2010)Jefferson County, Kentucky, USAOLS, nearest neighbors, geostatistical and trend surface modelsNoComparing the outcomes of several methods estimating house pricesThe geostatistical model showed better results in terms of prediction errors
      (Sampathkumar et al., 2015)Chennai metropolitan area, IndiaMultiple regression and neural networkNoModeling and estimation of land prices based on economic and social factorsNeural network and multiple regression performed well with a slight superiority of the former
      (Hu et al., 2016)Wuhan city, ChinaEmpirical Bayesian kriging (EBK), GWR, OLSYesModeling and visualizing dependency of urban residential land price and the influential variablesEstimated coefficients of variables impacting land prices depend on the location based on GWR results which outperformed OLS
      (Schernthanner et al., 2016)Potsdam, GermanyHedonic regression, kriging, and random forestYesComparing estimated rental prices by three methods and visualize the outcomeRF found to be the most accurate method
    • Table 2.

      The three mathematical models used for kriging and their abbreviations

      View table
      View in Article

      Table 2.

      The three mathematical models used for kriging and their abbreviations

      CategoryModelAbbreviationR package
      GeostatisticalUniversal krigingExponentialkrig.EXPgstat (Pebesma, 2004)
      Gaussiankrig.GAU
      Sphericalkrig.SPH
    • Table 3.

      Summary of spatial prediction models used in this study: Linear, nonlinear, and regression trees models are grouped as proposed by Kuhn and Johnson (2013). Abbreviations are used to refer to each method in the manuscript

      View table
      View in Article

      Table 3.

      Summary of spatial prediction models used in this study: Linear, nonlinear, and regression trees models are grouped as proposed by Kuhn and Johnson (2013). Abbreviations are used to refer to each method in the manuscript

      CategoryModelAbbreviationR package
      LinearGeneralized linear modelGLMbase
      Generalized additive model using splinesGAMSmgcv
      Support vector machines with linear kernelSVMLinearkernlab
      NonlinearMultivariate adaptive regression splineMARSearth
      k-nearest neighborskNNbase
      Support vector machines with radial basis function kernelSVMRadialkernlab
      Regression treesCubistCubistCubist
      Stochastic gradient boostingGBMgbm (Ridgeway, 2005)
      Random forestRFrandomForest (Breiman, 2001)
    • Table 4.

      List of explanatory variables selected in this study with their data sources and the related abbreviations

      View table
      View in Article

      Table 4.

      List of explanatory variables selected in this study with their data sources and the related abbreviations

      Explanatory variablesDataGIS functionVariable descriptionAbbreviation
      Distance to the nearest railway station (m)Railway stationsNearCalculated using the railway stations layerDistance
      Area of rice fields [m2]Land uses within a square kilometerSpatial JoinThe areas of different land-uses within one square kilometer classified according to the National Land Numerical InformationPaddy
      Area of other agricultural land (m2)Agricultural
      Area of forests (m2)Forests
      Area of uncultivated land (m2)Uncultivated
      Area of roads (m2)Roads
      Area of railways (m2)Railways
      Area of other land uses (m2)Other uses
      Area of water bodies (m2)Water
      Area of seashore (m2)Seashore
      Area of the surface of the sea (m2)Sea
      Area of golf courses (m2)Golf
      Dummy variable for urbanization promoting areaPromoted urbanization areasSpatial JoinA dummy variable; if the point location falls inside the area, the variable value receives 1, else 0Promotion
      Population density (persons/km2)PopulationSpatial JoinCalculated using the population data of 2015 for every minor municipal districtDensity
      Number of enterprisesEnterprisesSpatial JoinStatistical GIS data of 2015 for every minor municipal districtEnterprises
      Number of employeesEmployeesEmployees
      Elevation (m)DEMExtract Multi Values to PointsElevation of the point locationElevation
    • Table 5.

      Overview of datasets used in the study, their sources, and the year of release

      View table
      View in Article

      Table 5.

      Overview of datasets used in the study, their sources, and the year of release

      Data layersSourceYear
      Land price observations (published and prefectural)National Land Numerical Information2015
      Railway stations2015
      Land uses within 1 km2 area and their areas2014
      Promoted urbanization areas2011
      Population of every minor municipal districtStatistics Bureau of Japan2015
      Number of enterprises and employees of every minor municipal district
      DEMUSGS-
    • Table 6.

      Regression results with detailed explanatory variables and their estimated coefficients

      View table
      View in Article

      Table 6.

      Regression results with detailed explanatory variables and their estimated coefficients

      VariablesUnitCoefficients’ estimate
      Intercept-4.439***
      Distance to the nearest railway stationm-2.09 × 10-5***
      Population densitypersons/km23.104 × 10-5***
      Area of rice fieldsm2-3.935 × 10-7***
      Area of other agricultural landm2-4.731 × 10-7***
      Area of forestsm2-2.733 × 10-7***
      Area of uncultivated landm2-7.437 × 10-7.
      Area of roadsm27.211 × 10-7**
      Area of railwaysm2-3.301 × 10-8
      Area of other land usesm2-8.97 × 10-8
      Area of water bodiesm2-3.086 × 10-7***
      Area of seashorem2-1.922 × 10-6
      Area of the surface of the seam2-1.25 × 10-7
      Area of golf coursesm2-5.843 × 10-8
      Dummy variable for urbanization promoting area-1.819 × 10-1***
      Elevationm-1.556 × 10-4**
      Number of enterprises-3.363 × 10-4**
      Number of employees--2.951 × 10-5*
      Number of samples = 1092; residual standard error = 0.1683, multiple R2 = 0.7408, adjusted R2 = 0.7349; F-statistic = 125.7, p-value = < 2.2 × 10-16*** = sign. at 1% level ** = sign. at 5% level
    • Table 7.

      Prediction errors of validation and cross-validation tests for the three kriging models

      View table
      View in Article

      Table 7.

      Prediction errors of validation and cross-validation tests for the three kriging models

      Mathematical modelsValidationCross-validation
      RMSEV (%)RMSECV (%)
      Exponential15.3215.1
      Gaussian15.8615.57
      Spherical15.5715.5
    • Table 8.

      Prediction errors and accuracy of machine learning methods

      View table
      View in Article

      Table 8.

      Prediction errors and accuracy of machine learning methods

      Method10-fold cross-validationTesting samplesDifference
      MAE (%)RMSE (%)R2CV (%)R2test (%)R2CV (%) - R2test (%)
      LinearGLM13.5017.2972.4759.94+12.53
      GAMS12.0315.3778.1368.72+9.41
      SVMLinear13.3817.2572.7359.12+13.61
      NonlinearMARS12.1115.5277.9070.78+7.12
      kNN13.3817.3572.2468.03+4.21
      SVMRadial12.5516.2775.5370.02+5.51
      Regression treeCubist12.1915.6077.7272.74+4.98
      GBM12.1615.6877.4070.83+6.57
      RF11.3914.9779.1777.68+1.49
    Tools

    Get Citation

    Copy Citation Text

    Ahmed DERDOURI, Yuji MURAYAMA. A comparative study of land price estimation and mapping using regression kriging and machine learning algorithms across Fukushima prefecture, Japan[J]. Journal of Geographical Sciences, 2020, 30(5): 794

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Feb. 19, 2019

    Accepted: Sep. 9, 2019

    Published Online: Sep. 30, 2020

    The Author Email:

    DOI:10.1007/s11442-020-1756-1

    Topics