Network structure and feature optimization | Multi-scale feature extraction | Effectively capture tumor features at different scales, improve segmentation accuracy for small tumors and boundary regions | High computational cost, complex model, which can burden training and inference processes | RAPNet[22] | Multi-scale, dilated convolutions, attention mechanism |
HA-RUnet[23] | Attention mechanism, SE modules |
Ga-U-Net[24] | Gabor convolutions, attention mechanisms |
U-shaped encoder-decoder model[25] | Compact split attention, enhanced feature extraction |
Residual and skip connection | Increase network depth and width while avoiding gradient vanishing problems, improve feature extraction and classification accuracy | May not fully solve gradient vanishing issues, limited ability to capture small tumor boundaries | IRDNU-Net[27] | Residual-Inception modules |
Residual learning U-Net[29] | Residual learning, feature extraction |
dResU-Net[30] | The training process is optimized and feature extraction is enhanced by jump connections between residual and convolutional blocks |
MMGAN[31] | Residual learning, reduced parameters |
Lightweight design | Reduce model parameters, lower computational cost, and improve robustness and efficiency | May lose detail in complex tumor areas, perform poorly on highly complex data | SEDNet[36] | Hierarchical convolution, feature learning, robustness, efficiency, optimized architecture, segmentation, fewer parameters |
GA-UNet[38] | Lightweight design, GhostV2 bottleneck, attention module |
Contextual information and attention mechanism | Context information and attention mechanism | Improve tumor localization and boundary recognition, reduce background interference | Limited ability to capture complex boundaries and fine details, performance may be affected by background complexity | MMS-Net[39] | Triple attention modules, multi-modal MRI segmentation |
TDPC-Net[40] | 3D attention, decoupled convolution units |
Dual attention U-Net[43] | Dual attention mechanism, iterative feature aggregation |
3D U-Net with attention[44] | Residual network, attention mechanisms, adaptive learning |
Transformer fusion | Strong global context capture ability, improved segmentation accuracy, especially in multi-modal data | High computational and memory overhead, long training time, leading to lower processing efficiency | UNETR[53] | Transformer encoder, global context modeling |
TransMVU[55] | Transformer + U-Net, multi-view performance |
Swin-UNet[58] | Swin-Transformer, global context modeling |
Spatial pyramid pooling | Expand the receptive field and allow for multi-scale feature extraction, preserve fine details | Performance may degrade with very high-resolution images, less effective in very fine details | Attention-UNet with ASPP[61] | Attention mechanism, ASPP, multi-scale feature extraction, expanded receptive fields, preserved fine details, and improved segmentation accuracy |
Training strategy and performance improvement | Training strategy and performance improvement | Significantly enhance segmentation of small tumors and imbalanced data, improve boundary and overlap accuracy | Less effective for large and complex tumors, may not handle large tumor structures well | Weighted loss + Dice loss[62] | Generalized Dice loss, attention mechanism |
MUNet[63] | mIoU loss, Dice loss, boundary loss, small tumor regions, overlap, similarity |
SBTC-Net[64] | Transfer learning, segmentation and classification |