Applications and Advancements of U-Net and Its Variants in Brain Tumor Image Segmentation

Nan Wang; Hua Wang; Dejian Wei; Liang Jiang; Peihong Han; Hui Cao

doi:10.3788/LOP242385

Dataset

Main application

Applicable scenario

Modal

Image size

Source

BraTS

Brain tumor segmentation， multimodal MRI data

Tumor detection， segmentation model benchmarking

T1、T1-Gd、T2、FLAIR

240 pixel ×240 pixel ×155 pixel （typical）

https：//www.med.upenn.edu/sbia/brats2020/

TCIA

General medical imaging， cancer research

Cancer diagnosis， image segmentation， model development

MRI

256 pixel ×256 pixel （variational）

https：//www.cancerimagingarchive.net/

ISLES

Stroke lesion detection， MRI segmentation

Stroke detection， lesion localization， clinical study

MRI

256 pixel ×256 pixel or 128 pixel ×128 pixel

http：//www.isles-challenge.org/

RSNA

Clinical brain MRI data， disease diagnosis

Disease diagnosis， brain structure analysis

MRI

512 pixel ×512 pixel （variational）

https：//www.rsna.org/en

Lower-Grade Glioma

Lower-Grade Glioma segmentation， MRI data

Lower-Grade Glioma research， tumor progression study

MRI

240 pixel ×240 pixel ×155 pixel （typical）

https：//www.med.upenn.edu/sbia/lgg-challenge/

CMB Dataset

Microbleed detection， MRI data

Microbleed detection， small lesion analysis

MRI

256 pixel ×256 pixel （variational）

https：//pubmed.ncbi.nlm.nih.gov/

Index

Formula

Dice coefficient

E_{D i c e} = \frac{2 T_{T P}}{2 T_{T P} + F_{F P} + F_{F N}}

Sensitivity

R_{S e n s i t i v i t y} = \frac{T_{T P}}{T_{T P} + F_{F N}}

Specificity

R_{S p e c i f i c i t y} = \frac{T_{T N}}{T_{T N} + F_{F P}}

Precision

R_{P r e c i s i o n} = \frac{T_{T P}}{T_{T P} + F_{F P}}

Method

Mechanism

Advantage

Limitation

Algorithm

Feature

Network structure and feature optimization

Multi-scale feature extraction

Effectively capture tumor features at different scales， improve segmentation accuracy for small tumors and boundary regions

High computational cost， complex model， which can burden training and inference processes

RAPNet^［22］

Multi-scale， dilated convolutions， attention mechanism

HA-RUnet^［23］

Attention mechanism， SE modules

Ga-U-Net^［24］

Gabor convolutions， attention mechanisms

U-shaped encoder-decoder model^［25］

Compact split attention， enhanced feature extraction

Residual and skip connection

Increase network depth and width while avoiding gradient vanishing problems， improve feature extraction and classification accuracy

May not fully solve gradient vanishing issues， limited ability to capture small tumor boundaries

IRDNU-Net^［27］

Residual-Inception modules

Residual learning U-Net^［29］

Residual learning， feature extraction

dResU-Net^［30］

The training process is optimized and feature extraction is enhanced by jump connections between residual and convolutional blocks

MMGAN^［31］

Residual learning， reduced parameters

Lightweight design

Reduce model parameters， lower computational cost， and improve robustness and efficiency

May lose detail in complex tumor areas， perform poorly on highly complex data

SEDNet^［36］

Hierarchical convolution， feature learning， robustness， efficiency， optimized architecture， segmentation， fewer parameters

GA-UNet^［38］

Lightweight design， GhostV2 bottleneck， attention module

Contextual information and attention mechanism

Context information and attention mechanism

Improve tumor localization and boundary recognition， reduce background interference

Limited ability to capture complex boundaries and fine details， performance may be affected by background complexity

MMS-Net^［39］

Triple attention modules， multi-modal MRI segmentation

TDPC-Net^［40］

3D attention， decoupled convolution units

Dual attention U-Net^［43］

Dual attention mechanism， iterative feature aggregation

3D U-Net with attention^［44］

Residual network， attention mechanisms， adaptive learning

Transformer fusion

Strong global context capture ability， improved segmentation accuracy， especially in multi-modal data

High computational and memory overhead， long training time， leading to lower processing efficiency

UNETR^［53］

Transformer encoder， global context modeling

TransMVU^［55］

Transformer + U-Net， multi-view performance

Swin-UNet^［58］

Swin-Transformer， global context modeling

Spatial pyramid pooling

Expand the receptive field and allow for multi-scale feature extraction， preserve fine details

Performance may degrade with very high-resolution images， less effective in very fine details

Attention-UNet with ASPP^［61］

Attention mechanism， ASPP， multi-scale feature extraction， expanded receptive fields， preserved fine details， and improved segmentation accuracy

Training strategy and performance improvement

Significantly enhance segmentation of small tumors and imbalanced data， improve boundary and overlap accuracy

Less effective for large and complex tumors， may not handle large tumor structures well

Weighted loss + Dice loss^［62］

Generalized Dice loss， attention mechanism

MUNet^［63］

mIoU loss， Dice loss， boundary loss， small tumor regions， overlap， similarity

SBTC-Net^［64］

Transfer learning， segmentation and classification

Method

Mechanism

Advantage

Limitation

Algorithm

Feature

Multi-modal fusion

Utilize complementary information from different modals （MRI， CT） for enhanced feature extraction

Improve segmentation accuracy by integrating information from different modals

Potential risk of insufficient feature fusion

ACMINet^［68］

Cross-modal feature alignment， accurate tumor segmentation， volume brain tumor focus

F2 Net^［69］

Effective feature fusion strategies that allow information from different modals to complement each other

CKD-TransBTS^［70］

By integrating the radiologist's clinical experience with the model's structure and combining a two-branch hybrid encoder with the cross-attention mechanism

SDPN^［71］

Lightweight dual-path network， multi-spectral attention， and tensor ring decomposition

GAN and Image Augmentation

Combine U-Net with GANs （cGAN） to generate realistic tumor images and improve segmentation

Enhance segmentation precision and boundary details， improve model robustness to different tumor shapes

Stability issues with GAN parameters， high complexity

Adversarial U-Net^［73］

Generative adversarial network integration， U-Net as generator， enhanced segmentation image quality

Pix2Pix^［74］

Conditional GAN model， image-to-image translation， high-quality image generation， effective for paired datasets

CycleGAN^［75］

Unpaired image-to-image translation， cycle-consistency loss， no need for paired datasets

Self-supervised and semi-supervised learning

Use self-supervised learning to pre-train models and semi-supervised learning with few labeled data

Enhance model performance with fewer labeled images， improve robustness and adaptability

Self-supervised learning still has limitations in complex segmentation tasks

SSW-AN^［80］

Self-supervised learning， adjustable activation function， dynamic weight attention module， low/high-frequency channel separation

Similarity-based algorithm^［81］

Semi-supervised learning， similarity constraints， effective with limited labeled data， improved generalization

Real-time segmentation and clinical application

Combine GAN， self-supervised learning， and efficient architectures to support fast inference and clinical applications

Increase segmentation speed while maintaining accuracy， suitable for real-time clinical use

Dependent on high-quality input data， limited application in low-quality data scenarios

Cascaded method^［82］

Two-stage approach， WT region detection， fine-grained sub-region segmentation

TransU2-Net^［84］

Transformer module integration， lightweight design， efficient feature extraction

EfficientNet-U-Net^［85］

EfficientNet integration， optimized computational efficiency， high segmentation accuracy

Table 1. Commonly used brain tumor image datasets

Table 1. Commonly used brain tumor image datasets

Table 2. Commonly used brain tumor image segmentation indices and their expressions

Table 2. Commonly used brain tumor image segmentation indices and their expressions

Table 3. Comparison of the improved U-Net mechanisms

Table 3. Comparison of the improved U-Net mechanisms

Table 4. Comparison of pre-training methods for U-Net

Table 4. Comparison of pre-training methods for U-Net

Table 5. Performance comparison of different types of U-Net in brain tumor image segmentation tasks

Table 5. Performance comparison of different types of U-Net in brain tumor image segmentation tasks

Table 6. Performance comparison of U-Net with different training strategies on BraTS 2020

Table 6. Performance comparison of U-Net with different training strategies on BraTS 2020