Journal of Infrared and Millimeter Waves, Volume. 44, Issue 2, 197(2025)

Urban tree species classification based on multispectral airborne LiDAR

Pei-Lun HU1...2, Yu-Wei CHEN1,*, Mohammad IMANGHOLILOO2, Markus HOLOPAINEN2, Yi-Cheng WANG3,** and Juha HYYPPÄ1 |Show fewer author(s)
Author Affiliations
  • 1Department of Remote Sensing and Photogrammetry,Finnish Geospatial Research Institute,Espoo 02150,Finland
  • 2Department of Forest Sciences,University of Helsinki,Helsinki 00014,Finland
  • 3Advanced Laser Technology Laboratory of Anhui Province,Hefei 230037,China
  • show less

    Urban tree species provide various essential ecosystem services in cities, such as regulating urban temperatures, reducing noise, capturing carbon, and mitigating the urban heat island effect. The quality of these services is influenced by species diversity, tree health, and the distribution and composition of trees. Traditionally, data on urban trees has been collected through field surveys and manual interpretation of remote sensing images. In this study, we evaluated the effectiveness of multispectral airborne laser scanning (ALS) data in classifying 24 common urban roadside tree species in Espoo, Finland. Tree crown structure information, intensity features, and spectral data were used for classification. Eight different machine learning algorithms were tested, with the extra trees (ET) algorithm performing the best, achieving an overall accuracy of 71.7% using multispectral LiDAR data. This result highlights that integrating structural and spectral information within a single framework can improve classification accuracy. Future research will focus on identifying the most important features for species classification and developing algorithms with greater efficiency and accuracy.

    Keywords

    Introduction

    Today,approximately 56% of the world's population—4.4 billion people—live in cities. Urban trees play a significant role in mitigating global climate change1 and are uniquely susceptible to climate change impacts. Urban Forest Effects model23 is widely used in urban areas globally to estimate urban forest structure,species diversity,and ecosystem functions. However,conducting urban forest inventories is labor-intensive,especially on private properties,and the results are often not spatially detailed. While remotely sensed data is commonly used in forest applications,traditional optical remote sensing methods struggle to capture three-dimensional forest structures,especially in unevenly aged,mixed-species forests with multiple canopy layers4.

    Airborne laser scanning(ALS)is effective for extracting biophysical variables and revising forest inventory maps. The successful use of ALS data has been demonstrated for various applications. For example,ALS has been used to estimate tree height56,identify tree species7-9,and estimate tree volume biomass1011,and growth1213. Tree species information at an individual tree level is particularly useful in growth and yield estimates and has been primarily studied for forest applications,such as updating forest inventories. Tree species classification using ALS has not been intensively studied compared with studies on the successful use of ALS for other forest attribute mapping because of the lack of spectral information.

    Previous studies have also revealed that combining multispectral information with 3D ALS data can improve the accuracy of tree extraction and tree species classification,as we can take advantage of both datasets. However,challenging factors limit the effective operational use of the fused datasets1415. For example,geometric and radiometric registration between two datasets is demanding because data are normally acquired at different times using different sensors. The recently developed multispectral laser scanning technique is becoming an attractive option for forest mapping because it can provide not only a dense point cloud but also spectral information,which can simplify data processing and facilitate the interpretation of data.

    Given the limitations of traditional optical remote sensing in capturing three-dimensional forest structures,it is essential to explore the potential of multispectral laser scanning for urban tree inventories,particularly for species classification. This study aims to assess the feasibility of using multispectral ALS data for urban tree species classification and to analyze the information content of features derived from point clouds and intensity data.

    1 Materials and methods

    1.1 Study area and establishment of sample plots

    The MLS datasets used in this study were acquired in a suburban area in Espoolahti,southern Finland(60°9′18″N,24°38´24″E)in the southern Boreal Forest Zone. We choose around 822 trees in this area as our field dataset. The land area is approximately 5 km2. In our research,we concentrated solely on the vegetated areas,excluding the sea using a water mask created from topographic map data. The area included a diverse range of boreal tree species.

    The points were updated through visual interpretation of Titan data and open datasets from the City of Espoo,the National Land Survey of Finland,Google Maps,and Google Street View. Field checks validated the analysis and resolved uncertainties. The reference points' attributes included species,geographic location,living conditions,tree height,and planting date for each tree.

    Map of the study area and tree samples in the research area.

    Figure 1.Map of the study area and tree samples in the research area.

    1.2 Multispectral ALS data

    Multispectral Optech Titan data(Teledyne Optech,Toronto,ON,Canada)for the study area were collected in May and June 2016 in collaboration with TerraTec Oy(Helsinki,Finland)from a 650 m flight height. The data acquisition was carried out using a fixed-wing aircraft flying at a constant altitude. The sensor comprises three Titan channels:green(532 nm),near-infrared(1 064 nm),and shortwave infrared(1 550 nm). Each channel provided separate point clouds. In our preprocessed dataset,the point densities over land areas were approximately 9 points/m² for Channel 1,9 points/m² for Channel 2,and 8 points/m² for Channel 3.

    TerraScan(TerraSolid Oy,Helsinki,Finland)was used to preprocess the ALS data and differentiate between ground and nonground points using a standardized procedure. This procedure involved removing noise,such as points detected below the ground level or above the canopy. Subsequently,the point clouds were height-normalized. Ground elevation was subtracted from the point cloud height measurements using a digital terrain model created from the classified ground points of the three channels to eliminate potential discrepancies.

    Radiometric calibration of ALS intensity is crucial to ensure successful classification. Therefore,in this study,we implemented relative radiometric calibration. We observed that the intensity values were higher in the middle of the flight path compared to other areas and decreased with scanning height. A range correction was applied to mitigate such effects.

    Ic=I×Di2Dref2,

    where Ic is the modified intensity,I is the original intensity,Di is the distance from the LiDAR to the point cloud and Dref  is the flying altitude(650 m).

    1.3 Creating canopy height model and single tree detection

    Individual trees were detected using a minimum curvature-based algorithm,which started with creating a canopy height model(CHM). According to our field dataset of each tree coordination,we set the potential crown area within 5 m2. A local maximum filtering algorithm was used to find the treetops in this area. Subsequently,the watershed segmentation method was used to delineate tree crown boundaries without setting a flow threshold in the CHM. Eventually,the point cloud of each tree from the multispectral ALS dataset was created. In the segmentation process,the shape and position of individual tree crowns were identified using the segment boundaries and the location of the highest point within each segment. In this study,first return points from all three channels were utilized to generate CHM.

    1.4 Multispectral ALS data feature extraction

    In this experiment,the features were primarily divided into two types:intensity features and geometric features. The maximum height(Hmax)of each tree was calculated from the highest point of all point cloud in each tree segment.

    Simultaneously,we got 137 features in each channel from the multispectral ALS data.

    • Table 1. List of all features from Multispectral ALS data (i refers to channel numbers, and subscript F represents the single-channel intensity feature used)

      Table 1. List of all features from Multispectral ALS data (i refers to channel numbers, and subscript F represents the single-channel intensity feature used)

      FeatureDefinition
      Single-channel Intensity(SCI)features
      ImaxMaximum intensity
      IminMinimum intensity
      ImeanMean intensity
      IstdThe standard deviation of intensity
      IcovCoefficient of variation(i.e.,relative standard deviation)of intensity
      IskSkewness of intensity
      IrangeRange of intensity
      IkutKurtosis of intensity
      I5 to I95Percentiles of intensity values of points above the ground threshold from 5% to 95% in 5% increments
      Multi-channel Intensity(MCI)features
      RiF= IiF/(I1F+I2F+I3F)Ratios of intensity features in each channel
      gNDVIF= (I1F-I3F)/(I2F+I3F)Green normalized differential vegetation index(gNDVI)
      gSRF= (I2F/I3F)Green simple ratio vegetation index(gSR)
      Geometric features
      HmaxMaximum of the heights of all points
      HmeanArithmetic mean of the height of all points above 1 m threshold
      HstdStandard deviation of height of all points above 1 m threshold
      HrangeRange of normalized height of all points above 1 m threshold
      PPenetration as a ratio between the number of returns below 1 m and total returns
      CACrown area as the area of the convex hull in 2D
      CVCrown volume as the convex hull in 3D
      CDCrown diameter calculated from crown area considering crown as a circle
      HP10 to HP90Percentiles of the points above 1 m height from 10% to 90% at 10% incremental.
      D1 to D10Di = Ni/Ntotal,where i = 1 to 10,Ni is the number of points within the ith layer when tree height was divided into 10 intervals starting from 1 m,Ntotal is the number of all points.

    1.5 Tree species classification and accuracy evaluation

    In this study,we use 8 machine learning algorithms to compare the classification of tree species.:extra trees(ET),random forest(RF),K-nearest neighbour(KNN),logistic regression(LR),linear discriminant analysis(LDA),classification and regression tree(CART),naive bayes(NB),support vector machine(SVM). Tree species were estimated based on prediction models by 8 machine learning algorithms using tree features as predictors and tree species as a response for correctly detected trees.

    2 Results

    2.1 Accuracy of classification

    As presented in Fig. 2,using all the intensity and geometric features,the overall tree species classification performed best in the extra tree algorithm and reached 71.7%. When we only use channel 1 features for classification,overall values can only reach 65.7%. Only using features from channel 2 yielded overall values that can only reach 68.3%. Only using features from channel 3 yielded overall values that can only reach 64.8%. The accuracy of all the classifications for each species is shown in Fig. 3.

    Titan intensity image of Study area in Espoolahti(Red:Channel 1;Green:Channel 2;Blue:Channel 3).

    Figure 2.Titan intensity image of Study area in Espoolahti(Red:Channel 1;Green:Channel 2;Blue:Channel 3).

    The comparison of classification accuracy of 24 tree species:ET,RF,KNN,LR,LDA,CART,NB,SVM

    Figure 3.The comparison of classification accuracy of 24 tree species:ET,RF,KNN,LR,LDA,CART,NB,SVM

    The confusion matrix analysis reveals a model that performs well for most classes but struggles with a few,particularly Quercus and Sorbus according to Table 2 and Fig. 4. Certain classes,such as AcerLarix,and Thuja,exhibit high accuracy(≥93%),indicating the model’s ability to correctly classify instances for these classes. By addressing these shortcomings through feature refinement,data augmentation,and model optimization,the overall classification accuracy can be significantly improved. Future work should focus on integrating domain-specific knowledge to enhance feature representation and reduce class overlap.

    • Table 2. List of tree sample

      Table 2. List of tree sample

      Tree speciesThe index numberNumber of Trees
      Pinta-ala12
      Abies213
      Acer3249
      Alnus45
      Betula526
      Fallopia61
      Fraxinus72
      Juglans85
      Larix911
      Malus108
      Picea1115
      Pinus1284
      Populus1316
      Prunus1410
      Quercus1523
      Ribes165
      Salix174
      Sambucus181
      Sorbus1984
      Syringa201
      Taxus214
      Thuja222
      Tilia2388
      Ulmus24163

    The confusion matrix of classification with geometric and intensity features for each species.

    Figure 4.The confusion matrix of classification with geometric and intensity features for each species.

    2.2 Feature importance analysis

    We also investigated which input features and channels are most relevant for tree species classification based on the measure provided by the RF algorithm for assessing feature importance. If a feature influences the prediction,permuting its values should affect the model error. If a feature is not influential,then permuting its values should have little or no effect on the model error. Table 3 lists the top three features in the classifications based on different combinations of the features. The most important features in the classification based on point cloud features were penetration and higher-level percentiles. Two density-related features at higher and middle layers were also scored as important as higher percentiles. In the case of classification using single-channel features,the 1 064nm wavelength(Channel 2)appears to provide the most valuable information for distinguishing between pine,spruce,and birch species. This is followed by the 1 550nm wavelength(Channel 1)and then the 532nm wavelength(Channel 3).

    • Table 3. The features have the most predictive power in different classification scenarios

      Table 3. The features have the most predictive power in different classification scenarios

      CasesTop 3 features
      All featuresI2minP1.5I3min

    3 Conclusions

    Multispectral LiDAR data improved the classification accuracy by approximately 5% to 10% for all channels compared to each channel. This proves our hypothesis about the ability of mALS features in classification. For example,the overall accuracy of 71.7% was obtained in multispectral LiDAR all-channel data,while accuracies of 65.7%,68.3%,and 64.8% were achieved when using only Channel 1,Channel 2,and Channel 3,respectively. Our findings demonstrated the advantage of combining multichannel features over single-channel data in classifying urban trees. However,the sample size of each tree species in this experiment was uneven,which may have affected the model's accuracy. Consequently,a larger and more representative sample will be used in future research. The imbalance in measurement samples reduced classification accuracy to some extent. Addressing this limitation will be a key focus in subsequent studies.

    In this study,eight machine learning algorithms were evaluated for their classification performance,each demonstrating distinct strengths and limitations. The selection of an appropriate classification algorithm depends on the specific characteristics of the dataset,including size,dimensionality,and the underlying relationship between features and class labels. Extra trees(ET)and random forests(RF)proved effective in our study due to their ability to handle large,high-dimensional datasets and their robustness against overfitting,which suited the conditions of our dataset. Naive Bayes(NB)was efficient and scalable,especially for high-dimensional data,but its assumption of feature independence limited its applicability in cases with high feature correlation.

    It is also important to note that overall accuracy(OA)is influenced by factors such as species composition,stand structure,age,and the methods used to select the best features,which vary among studies. In this research,however,the intensity of laser returns was not calibrated. This limitation can be addressed in future studies. First,we can investigate whether calibrated intensity affects classification results. Second,the use of MCI features in this study mitigated potential variations in intensity.

    In conclusion,the ability of mALS compared to single-channel ALS(SCI-Ch)data to characterize tree species in urban areas was assessed in this study. Our classification results indicate that mALS data provided more accurate results than single-channel ALS data for urban tree species classification.

    Tools

    Get Citation

    Copy Citation Text

    Pei-Lun HU, Yu-Wei CHEN, Mohammad IMANGHOLILOO, Markus HOLOPAINEN, Yi-Cheng WANG, Juha HYYPPÄ. Urban tree species classification based on multispectral airborne LiDAR[J]. Journal of Infrared and Millimeter Waves, 2025, 44(2): 197

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Infrared Spectroscopy and Remote Sensing Technology

    Received: Jun. 26, 2024

    Accepted: --

    Published Online: Mar. 14, 2025

    The Author Email: CHEN Yu-Wei (chinaway.fgi@gmail.com), WANG Yi-Cheng (skl_wyc@163.com)

    DOI:10.11972/j.issn.1001-9014.2025.02.008

    Topics