ObjectiveDunhuang mural restoration is a vital task in preserving cultural heritage, yet it faces significant challenges due to the intrinsic complex degradation patterns of these ancient artworks. The murals often exhibit multi-scale cracks, localized missing regions, and chromatic fading, which not only obscure the subtle textures and structural details but also make it extremely difficult to balance high-fidelity restoration with computational efficiency. Existing methods, constrained by quadratic computational complexity and susceptible to interference from irrelevant features, perform inadequately when dealing with high-resolution murals with intricate structural details. To address these challenges, this study proposes a joint gated attention and residual dense Transformer model (MGRT) specifically designed for mural inpainting. This approach aims to: 1) significantly reduce the quadratic complexity of traditional attention mechanisms while preserving essential global feature interactions; 2) effectively mitigate the interference of invalid features during restoration through adaptive gating; and 3) enhance structural coherence via hierarchical feature fusion.
MethodsIn this paper, a novel framework is proposed that integrates three key components. First, a Focused Linear Attention Module (FLAM) is introduced to replace the conventional Softmax attention mechanism. By approximating the exponential operations using a first-order Taylor series expansion, FLAM achieves linear computational scaling while accurately modeling long-range dependencies. Second, a joint gated attention mechanism is incorporated that fuses spatial-channel gating with mask-aware positional encoding. This mechanism dynamically suppresses irrelevant features and amplifies critical structural and textural cues in damaged regions. Third, a Residual Dense Transformer Block (RDTB) is seamlessly embedded within a U-Net architecture to establish cross-scale feature propagation pathways through densely connected self-attention layers, enabling the progressive refinement of intricate mural details. Together, these components deliver enhanced computational efficiency and restoration quality, providing a robust solution for high-fidelity mural inpainting.
Results and DiscussionsExtensive experiments on the Dunhuang mural dataset demonstrate the significant advantages of the proposed method, with similarly promising results observed on the FFHQ dataset (
Fig.8). Quantitative evaluations reveal that our approach outperforms the compared methods in both subjective and objective assessments. Specifically, ablation studies confirm that the model improves the Structural Similarity Index (SSIM) by 0.001 to 0.012, increases the Peak Signal-to-Noise Ratio (PSNR) by 0.053 to 0.215 dB compared to baseline algorithms, reduces the
L1 error by 0.001 to 0.007, and decreases the LPIPS metric by 0.007 to 0.009 (
Tab.2). These results highlight the robustness and effectiveness of our method in achieving high-fidelity mural restoration.
ConclusionsThe proposed Mural Restoration via Joint Gated Attention and Residual Dense Transformer (MGRT) model successfully addresses the challenges of long-range dependency and detail preservation in mural restoration. The framework uses a Focused Linear Attention Module (FLAM) module to reduce computational complexity from quadratic to linear while effectively capturing critical texture and structural features. In addition, the Residual Dense Transformer Block (RDTB) enhances feature reuse by establishing dense residual connections between shallow and deep layers, which improves the fusion of high-frequency details such as edges and textures. Experiments on the Dunhuang mural dataset demonstrate that the MGRT model achieves superior structural coherence and detail fidelity compared to existing methods, with both subjective and objective evaluations indicating a closer resemblance to the original murals. This work offers a practical solution for high-fidelity restoration of cultural heritage artifacts with complex degradation patterns.