Acta Photonica Sinica, Volume. 54, Issue 1, 0110003(2025)
Spatial-spectral Collaborative Unrolling Network for Pansharpening
To address the limitations inherent in physical device acquisition, pansharpening offers a computational alternative. This process aims to enhance the spatial resolution of Low-Resolution Multispectral Images (LRMS) by integrating textural information from Panchromatic (PAN) images, thereby generating High-Resolution Multispectral images (HRMS). Recently, a growing number of deep learning-based methods, leveraging their enhanced feature extraction capabilities, have been introduced, demonstrating exceptional results in improving fusion quality. However, many of these methods continue to exhibit two notable shortcomings. For one thing, the universally adopted black-box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local-and-global dependencies at the same time, inevitably limiting the overall performance. By gathering the merits of nonlinear network architectures and interpretable optimization schemes, Deep Unfolding Network (DUN) has shed new light on pansharpening. However, current DUNs lack a dedicated design for both estimating the degradation matrices and extracting intricate information from the proximal operator. To address the conundrums, we propose a novel Spatial-Spectral Collaborative Unrolling Network (SCUN). An alternating optimization-based Half-Quadratic Splitting (HQS) is practiced to solve the resulting model, giving rise to an elementary iteration mechanism. Under the guidance of iterative optimization theory, this network achieves Adaptive Degradation Matrix Estimation (ADME) and spatial-spectral prior operator learning through multi-scale cascade strategies, point convolution operations, and Transformer technology. During the ADME step, the overall estimation undergoes an end-to-end iterative block, allowing for adaptive modeling of complex spatial and spectral structures. On that basis, we employ customized multiscale convolution and point convolution to simulate the degradation processes of both spatial and spectral degradation matrices. Moreover, the proposed convolution method is reassigned in each unfolding iteration, endowing it with a highly adaptive capability. To address the limitations of prior operators, we propose a collaborative complementary mechanism that enables the approximation of operators and facilitates the joint exploration and acquisition of global-local and spatial-spectral features. This is achieved through a combination of convolutional layers and attention mechanisms. The entire prior module is designed as a U-shaped architecture network, following the process of “embedding-encoder-bottleneck layer-decoder-deembedding” to extract refined feature representations. Initially, the intermediate variables are processed through an embedding layer, which segments them into non-overlapping patch markers. These patch markers are then fed into two Spatial-Spectral Collaborative Modules (SSCMs) and a bottleneck layer consisting of a single SSCM to explore comprehensive properties. Each SSCM is composed of three key components, including Spatial-Spectral Collaborative Attention (SSCA), Scale-Aware Channel Collaboration (SACC), and Mixed-Scale Feed-forward Layer (MSFL). Specifically, the SSCA subassembly includes two Transformer blocks. The first is the Spatial Transformer Block, which primarily transfers high-frequency texture features from PAN images to HRMS. The second is the Spectral Transformer Block, which focuses on transferring spectral features from LRMS to HRMS images. After extracting these two attention features, a multi-head self-attention mechanism is further applied to deeply fuse the spatial and spectral information, thereby achieving enhanced collaboration and complementarity of the target information. Within SACC, we dynamically assimilate and cross-converge characteristics originating from size-varied receptive fields via multiscale convolution, while simultaneously introducing channel attention to model the spectral dependency of MSIs. Similarly, to amplify the nonlinear feature transformation stemming from attention layers, our MSFL incorporates a mixed-scale strategy and subsequently a cross-complementary mechanism is introduced to emphasize the important components of the multiscale convolutions. With all modules organically assembled, the final proposal stands out as the initial attempt to systematically capture local-global and spatial-spectral information during model unfolding, guaranteeing an appealing pansharpening performance. Experimental results on multiple remote sensing datasets demonstrate that the proposed method outperforms comparative methods, achieving a PSNR gain of 0.798 dB on the GF-2 dataset.
Get Citation
Copy Citation Text
Jianwei ZHENG, Hongyi XIA, Honghui XU. Spatial-spectral Collaborative Unrolling Network for Pansharpening[J]. Acta Photonica Sinica, 2025, 54(1): 0110003
Category:
Received: Jul. 2, 2024
Accepted: Sep. 2, 2024
Published Online: Mar. 5, 2025
The Author Email: XU Honghui (xhh@zjut.edu.cn)