Acta Optica Sinica, Volume. 45, Issue 7, 0717001(2025)
LTDA‐Mamba: Retinal Vessel Segmentation Based on a Hybrid CNN‐Mamba Network
Retinal vessel segmentation is a crucial task in ophthalmology, as it aids in the early detection and monitoring of various eye diseases, such as glaucoma, diabetic retinopathy, and hypertension-related retinopathy. Accurate segmentation can provide valuable insights into the microvasculature of the eye, which is essential for diagnosing and managing these conditions. However, retinal vessel segmentation remains challenging due to the complexity and variability of retinal images, including factors like low contrast, illumination variations, and vessel thickness discrepancies. Therefore, the objective of this study is to develop a robust and accurate segmentation algorithm that can effectively address these challenges.
To achieve this objective, we propose a novel CNN-Mamba network that integrates local intensity order transformation (LIOT) and dual cross-attention mechanisms. The proposed network architecture consists of three main components: a convolutional neural network (CNN) encoder for feature extraction, a series of Mamba blocks that incorporate dual cross-attention mechanisms to capture complex dependencies between distant regions in the image, and a segmentation head for producing the final vessel segmentation mask. In the preprocessing stage, LIOT is applied to the input retinal image to enhance its contrast and detail. LIOT works by rearranging pixel intensities within a local window so that the intensity order reflects the underlying structure of the vessels. This preprocessing step facilitates better feature extraction by the CNN encoder, as it highlights the edges and contours of the vessels. The CNN encoder is responsible for extracting local features from the preprocessed image and consists of a series of convolutional layers, batch normalization layers, and ReLU activation functions. The output of the CNN encoder is a set of feature maps that capture various aspects of the retinal image, such as texture, edges, and shapes. The Mamba blocks are the core of the proposed network. Each Mamba block contains two parallel branches: a pixel-level selective structured state space model (PiM) and a patch-level selective structured state space model (PaM). The PiM branch focuses on processing local features and capturing neighboring pixel information, while the PaM branch handles remote dependency modeling and global patch interactions. The dual cross-attention mechanisms within the Mamba blocks enable the network to capture complex dependencies between distant regions in the image, improving its ability to segment fine vascular structures. Finally, the segmentation head consists of a series of convolutional layers and a sigmoid activation function, which produce the final vessel segmentation mask.
Experimental results on benchmark retinal vessel segmentation datasets demonstrate the effectiveness of the proposed CNN-Mamba network. The network achieves superior performance in terms of accuracy, sensitivity, and specificity compared to state-of-the-art methods. In particular, the integration of LIOT and dual cross-attention mechanisms significantly improves the network’s ability to segment fine vascular structures, even in challenging cases with low contrast or high variability in vessel thickness. We also conduct ablation studies to analyze the contributions of LIOT and the dual cross-attention mechanisms to the overall performance of the network. The results show that both components are essential for achieving optimal segmentation performance. Specifically, LIOT enhances the contrast and detail of the input image, facilitating better feature extraction by the CNN encoder. The dual cross-attention mechanisms within the Mamba blocks enable the network to capture complex dependencies between distant regions in the image, which is crucial for segmenting fine vascular structures. LTDA-Mamba demonstrates excellent vessel segmentation and blood vessel pixel identification capabilities, which leads to a reduction in the subjectivity associated with manual labeling. In general, LTDA-Mamba outperforms other cutting-edge methods with high sensitivity. Specifically, for the DRIVE, CHASE_DB1, and STARE datasets, the accuracy rates are 0.9689, 0.9741, and 0.9792, respectively. The sensitivities are 0.7868, 0.7697, and 0.7488, while the F1 scores are 0.8151, 0.8043, and 0.8219, respectively.
In conclusion, the proposed CNN-Mamba network, incorporating LIOT and dual cross-attention mechanisms, represents a significant advancement in retinal vessel segmentation. The network demonstrates the ability to accurately and consistently segment fine vascular structures, even in challenging cases. This capability suggests its potential for early disease detection, patient monitoring, and treatment planning in ophthalmology. The integration of LIOT and dual cross-attention mechanisms further enhances the network’s robustness and accuracy, which makes it a powerful tool for ophthalmic image analysis. Future work will focus on optimizing the network architecture and exploring additional preprocessing steps to further strengthen segmentation performance.
Get Citation
Copy Citation Text
Yuanyuan Peng, Haoyang Li, Wen Li, Yuejin Zhang. LTDA‐Mamba: Retinal Vessel Segmentation Based on a Hybrid CNN‐Mamba Network[J]. Acta Optica Sinica, 2025, 45(7): 0717001
Category: Medical optics and biotechnology
Received: Dec. 13, 2024
Accepted: Jan. 16, 2025
Published Online: Mar. 19, 2025
The Author Email: Peng Yuanyuan (pengmi467347713@126.com)