Change Detection Method of Remote Sensing Buildings Based on Spatiotemporal Fusion and Multi-Feature Relationships Network

Building change detection has caught wide attention as an important research direction with the continuous progress made in change detection technology of remote sensing images. Accurate building change detection is crucial for land utilization assessment, urban development monitoring, and disaster damage assessment. Although traditional change detection methods can provide some assistance for building change detection, they usually rely on spectral information or simple pixel-level differences and have certain limitations, especially when dealing with high-resolution remote sensing images of complex scenes with low accuracy. With the rise of deep learning, especially convolutional neural networks (CNNs), the change detection tasks of remote sensing images have been significantly improved. However, methods based on CNNs usually employ simple fusion operations as the last step of the detection results and fail to pay sufficient attention to effective change information extraction. Additionally, existing feature extraction methods tend to ignore the feature interactions between two spatiotemporal images and usually focus only on features at isolated time points, which restricts the ability to capture the change information and fails to recognize the dynamic feature interactions between two spatiotemporal images. When high-resolution remote sensing images still face shortcomings such as complex spatial features and much scale information, especially during extracting the relationship between the target of interest and other targets in the changing region, the Transformer-based method also cannot fully capture the long-distance dependency between different areas, resulting in limited performance improvement. To this end, we propose a new method for change detection in high-resolution remote sensing images based on spatiotemporal fusion and SFMRNet.

Methods

The proposed SFMRNet employs an encoder-decoder architecture, where a two-branch weight-sharing encoder processes the dual time-phase images, feature extraction is carried out in each branch by adopting ResNet 18, and a feature exchange module (FEM) is utilized to efficiently extract the key information related to building changes after the stage 1 and stage 3 of ResNet 18. The extracted dual time-phase features from each layer are processed by the spatiotemporal fusion module (STM) to capture important information between different temporal features. The fused output is further fed into the multi-feature relationship module (MFA), which leverages self-attention and cross-attention mechanisms to capture intra-class relationships and parse the interaction information between the changing region and the environment respectively. Next, the multi-layer perceptron (MLP) is adopted to optimize the global information related to the channels in the feature map and generate the attention map. During the decoding, the attention map is restored to its original spatial resolution by up-sampling step by step to reduce the spatial information lost from deep features and ensure full utilization of multi-scale information. Finally, the difference feature maps restored to their original size are processed by a pixel classifier to generate the final change prediction map.

Results and Discussions

We conduct experiments on two public datasets (LEVIR-CD and WHU-CD) to validate the model’s effectiveness. The results show that SFMRNet achieves 91.54%, 90.32%, 81.54%, and 89.80% on the WHU-CD dataset for the precision (Pr), $F_{1}$ score, intersection of union (IoU), and recall (Rc) metrics respectively, which have improved by 1.90 percentage points, 2.34 percentage points, 3.56 percentage points, and 4.32 percentage points compared to the BIT method as the second best-ranked method in terms of composite ranking. Especially in the $F_{1}$ metric, SFMRNet reaches 90.32%, which is 0.82 percentage points higher than the second-ranked SNUNet (89.50%), while in the over accuracy (OA) metric, it is 0.30 percentage points higher than BIT. On the LEVIR-CD dataset, SFMRNet achieves 90.32% and 81.54% for $F_{1}$ and IoU respectively. Compared with FC-EF, FC-Sima-Diff, FC-Sima-Conc, DTCDSCN, SNUNet, BIT, and STANet, the $F_{1}$ values of SFMRNet are improved by 7.71 percentage points, 4.80 percentage points, 7.42 percentage points, 3.44 percentage points, 2.95 percentage points, 1.80 percentage points, and 4.34 percentage points respectively. In addition, SFMRNet achieves an OA metric of 99.14%, which is 0.22 percentage points higher than the second-ranked BIT (98.92%). The visualization results further demonstrate the effectiveness of SFMRNet, showing that the model can effectively avoid the interference of shadows and environmental factors in detection. The generated change maps retain the continuous boundaries of the changing buildings and have high internal compactness, which are closer to the real labels. To validate the effectiveness of the proposed modules FEM, STM, and MFA, we conduct a series of ablation experiments on the two datasets, the results of which are shown in Table 3. These experiments further confirm the effectiveness of each FEM, STM, and MFA and show their synergistic effect in improving the change detection performance.

Conclusions

We propose a remote sensing change detection network that integrates time-domain fusion and multi-feature relationships. The network employs the FEM to enhance feature interactions between dual-temporal images and filter out irrelevant information, thereby improving building change detection. The STM dynamically identifies important features by fusing temporal information, thus enhancing the integration of dual-temporal features and ensuring key information retention. Additionally, the MFA utilizes self-attention and cross-attention mechanisms to capture the varying levels of intrinsic relationships between the features, which enhances the segmentation accuracy of changing regions. We validate the superiority of SFMRNet via qualitative and quantitative comparisons across multiple remote-sensing image datasets. Ablation experiments further confirm the contribution of each module to overall performance, demonstrating SFMRNet’s capability to capture subtle change information and reduce background noise interference. These results indicate that SFMRNet provides an innovative and efficient solution for change detection, thereby facilitating performance improvement in practical applications.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

building change detection multi-feature relationship remote sensing image spatial feature exchange spatiotemporal fusion

Tools

Get Citation

Copy Citation Text

Xue Li, Dong Li, Jiandong Fang, Xueying Feng. Change Detection Method of Remote Sensing Buildings Based on Spatiotemporal Fusion and Multi-Feature Relationships Network[J]. Acta Optica Sinica, 2025, 45(12): 1228005

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites