Spatial-spectral Collaborative Unrolling Network for Pansharpening

Jianwei ZHENG; Hongyi XIA; Honghui XU

doi:10.3788/gzxb20255401.0110003

Acta Photonica Sinica, Volume. 54, Issue 1, 0110003(2025)

Spatial-spectral Collaborative Unrolling Network for Pansharpening

Jianwei ZHENG... Hongyi XIA and Honghui XU* |Show fewer author(s)

Author Affiliations

School of Computer Science and Technology，Zhejiang University of Technology，Hangzhou 310023，China

show less

Abstract Get PDF(in Chinese)

To address the limitations inherent in physical device acquisition, pansharpening offers a computational alternative. This process aims to enhance the spatial resolution of Low-Resolution Multispectral Images (LRMS) by integrating textural information from Panchromatic (PAN) images, thereby generating High-Resolution Multispectral images (HRMS). Recently, a growing number of deep learning-based methods, leveraging their enhanced feature extraction capabilities, have been introduced, demonstrating exceptional results in improving fusion quality. However, many of these methods continue to exhibit two notable shortcomings. For one thing, the universally adopted black-box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local-and-global dependencies at the same time, inevitably limiting the overall performance. By gathering the merits of nonlinear network architectures and interpretable optimization schemes, Deep Unfolding Network (DUN) has shed new light on pansharpening. However, current DUNs lack a dedicated design for both estimating the degradation matrices and extracting intricate information from the proximal operator. To address the conundrums, we propose a novel Spatial-Spectral Collaborative Unrolling Network (SCUN). An alternating optimization-based Half-Quadratic Splitting (HQS) is practiced to solve the resulting model, giving rise to an elementary iteration mechanism. Under the guidance of iterative optimization theory, this network achieves Adaptive Degradation Matrix Estimation (ADME) and spatial-spectral prior operator learning through multi-scale cascade strategies, point convolution operations, and Transformer technology. During the ADME step, the overall estimation undergoes an end-to-end iterative block, allowing for adaptive modeling of complex spatial and spectral structures. On that basis, we employ customized multiscale convolution and point convolution to simulate the degradation processes of both spatial and spectral degradation matrices. Moreover, the proposed convolution method is reassigned in each unfolding iteration, endowing it with a highly adaptive capability. To address the limitations of prior operators, we propose a collaborative complementary mechanism that enables the approximation of operators and facilitates the joint exploration and acquisition of global-local and spatial-spectral features. This is achieved through a combination of convolutional layers and attention mechanisms. The entire prior module is designed as a U-shaped architecture network, following the process of “embedding-encoder-bottleneck layer-decoder-deembedding” to extract refined feature representations. Initially, the intermediate variables are processed through an embedding layer, which segments them into non-overlapping patch markers. These patch markers are then fed into two Spatial-Spectral Collaborative Modules (SSCMs) and a bottleneck layer consisting of a single SSCM to explore comprehensive properties. Each SSCM is composed of three key components, including Spatial-Spectral Collaborative Attention (SSCA), Scale-Aware Channel Collaboration (SACC), and Mixed-Scale Feed-forward Layer (MSFL). Specifically, the SSCA subassembly includes two Transformer blocks. The first is the Spatial Transformer Block, which primarily transfers high-frequency texture features from PAN images to HRMS. The second is the Spectral Transformer Block, which focuses on transferring spectral features from LRMS to HRMS images. After extracting these two attention features, a multi-head self-attention mechanism is further applied to deeply fuse the spatial and spectral information, thereby achieving enhanced collaboration and complementarity of the target information. Within SACC, we dynamically assimilate and cross-converge characteristics originating from size-varied receptive fields via multiscale convolution, while simultaneously introducing channel attention to model the spectral dependency of MSIs. Similarly, to amplify the nonlinear feature transformation stemming from attention layers, our MSFL incorporates a mixed-scale strategy and subsequently a cross-complementary mechanism is introduced to emphasize the important components of the multiscale convolutions. With all modules organically assembled, the final proposal stands out as the initial attempt to systematically capture local-global and spatial-spectral information during model unfolding, guaranteeing an appealing pansharpening performance. Experimental results on multiple remote sensing datasets demonstrate that the proposed method outperforms comparative methods, achieving a PSNR gain of 0.798 dB on the GF-2 dataset.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

Deep learning Feature interaction Multi-scale convolution Pansharpening Remote sensing image Transformer Unrolling network

Tools

Get Citation

Copy Citation Text

Jianwei ZHENG, Hongyi XIA, Honghui XU. Spatial-spectral Collaborative Unrolling Network for Pansharpening[J]. Acta Photonica Sinica, 2025, 54(1): 0110003

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites