Acta Optica Sinica, Volume. 45, Issue 12, 1228005(2025)

Change Detection Method of Remote Sensing Buildings Based on Spatiotemporal Fusion and Multi-Feature Relationships Network

Xue Li, Dong Li*, Jiandong Fang, and Xueying Feng
Author Affiliations
  • School of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, Inner Mongolia , China
  • show less

    Objective

    Building change detection has caught wide attention as an important research direction with the continuous progress made in change detection technology of remote sensing images. Accurate building change detection is crucial for land utilization assessment, urban development monitoring, and disaster damage assessment. Although traditional change detection methods can provide some assistance for building change detection, they usually rely on spectral information or simple pixel-level differences and have certain limitations, especially when dealing with high-resolution remote sensing images of complex scenes with low accuracy. With the rise of deep learning, especially convolutional neural networks (CNNs), the change detection tasks of remote sensing images have been significantly improved. However, methods based on CNNs usually employ simple fusion operations as the last step of the detection results and fail to pay sufficient attention to effective change information extraction. Additionally, existing feature extraction methods tend to ignore the feature interactions between two spatiotemporal images and usually focus only on features at isolated time points, which restricts the ability to capture the change information and fails to recognize the dynamic feature interactions between two spatiotemporal images. When high-resolution remote sensing images still face shortcomings such as complex spatial features and much scale information, especially during extracting the relationship between the target of interest and other targets in the changing region, the Transformer-based method also cannot fully capture the long-distance dependency between different areas, resulting in limited performance improvement. To this end, we propose a new method for change detection in high-resolution remote sensing images based on spatiotemporal fusion and SFMRNet.

    Methods

    The proposed SFMRNet employs an encoder-decoder architecture, where a two-branch weight-sharing encoder processes the dual time-phase images, feature extraction is carried out in each branch by adopting ResNet 18, and a feature exchange module (FEM) is utilized to efficiently extract the key information related to building changes after the stage 1 and stage 3 of ResNet 18. The extracted dual time-phase features from each layer are processed by the spatiotemporal fusion module (STM) to capture important information between different temporal features. The fused output is further fed into the multi-feature relationship module (MFA), which leverages self-attention and cross-attention mechanisms to capture intra-class relationships and parse the interaction information between the changing region and the environment respectively. Next, the multi-layer perceptron (MLP) is adopted to optimize the global information related to the channels in the feature map and generate the attention map. During the decoding, the attention map is restored to its original spatial resolution by up-sampling step by step to reduce the spatial information lost from deep features and ensure full utilization of multi-scale information. Finally, the difference feature maps restored to their original size are processed by a pixel classifier to generate the final change prediction map.

    Results and Discussions

    We conduct experiments on two public datasets (LEVIR-CD and WHU-CD) to validate the model’s effectiveness. The results show that SFMRNet achieves 91.54%, 90.32%, 81.54%, and 89.80% on the WHU-CD dataset for the precision (Pr), F1 score, intersection of union (IoU), and recall (Rc) metrics respectively, which have improved by 1.90 percentage points, 2.34 percentage points, 3.56 percentage points, and 4.32 percentage points compared to the BIT method as the second best-ranked method in terms of composite ranking. Especially in the F1 metric, SFMRNet reaches 90.32%, which is 0.82 percentage points higher than the second-ranked SNUNet (89.50%), while in the over accuracy (OA) metric, it is 0.30 percentage points higher than BIT. On the LEVIR-CD dataset, SFMRNet achieves 90.32% and 81.54% for F1 and IoU respectively. Compared with FC-EF, FC-Sima-Diff, FC-Sima-Conc, DTCDSCN, SNUNet, BIT, and STANet, the F1 values of SFMRNet are improved by 7.71 percentage points, 4.80 percentage points, 7.42 percentage points, 3.44 percentage points, 2.95 percentage points, 1.80 percentage points, and 4.34 percentage points respectively. In addition, SFMRNet achieves an OA metric of 99.14%, which is 0.22 percentage points higher than the second-ranked BIT (98.92%). The visualization results further demonstrate the effectiveness of SFMRNet, showing that the model can effectively avoid the interference of shadows and environmental factors in detection. The generated change maps retain the continuous boundaries of the changing buildings and have high internal compactness, which are closer to the real labels. To validate the effectiveness of the proposed modules FEM, STM, and MFA, we conduct a series of ablation experiments on the two datasets, the results of which are shown in Table 3. These experiments further confirm the effectiveness of each FEM, STM, and MFA and show their synergistic effect in improving the change detection performance.

    Conclusions

    We propose a remote sensing change detection network that integrates time-domain fusion and multi-feature relationships. The network employs the FEM to enhance feature interactions between dual-temporal images and filter out irrelevant information, thereby improving building change detection. The STM dynamically identifies important features by fusing temporal information, thus enhancing the integration of dual-temporal features and ensuring key information retention. Additionally, the MFA utilizes self-attention and cross-attention mechanisms to capture the varying levels of intrinsic relationships between the features, which enhances the segmentation accuracy of changing regions. We validate the superiority of SFMRNet via qualitative and quantitative comparisons across multiple remote-sensing image datasets. Ablation experiments further confirm the contribution of each module to overall performance, demonstrating SFMRNet’s capability to capture subtle change information and reduce background noise interference. These results indicate that SFMRNet provides an innovative and efficient solution for change detection, thereby facilitating performance improvement in practical applications.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Xue Li, Dong Li, Jiandong Fang, Xueying Feng. Change Detection Method of Remote Sensing Buildings Based on Spatiotemporal Fusion and Multi-Feature Relationships Network[J]. Acta Optica Sinica, 2025, 45(12): 1228005

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Remote Sensing and Sensors

    Received: Nov. 15, 2024

    Accepted: Jan. 2, 2025

    Published Online: Jun. 13, 2025

    The Author Email: Dong Li (lidong@imut.edu.cn)

    DOI:10.3788/AOS241759

    CSTR:32393.14.AOS241759

    Topics