Computer Engineering, Volume. 51, Issue 8, 16(2025)

Review of Application of SAM and Its Improved Models in Image Segmentation

Mayilamu Musideke, GAO Yuxin, ZHANG Situo, FENG Ke, Abudukelimu Abulizi, and Halidanmu Abudukelimu*
Author Affiliations
  • College of Information Management, Xinjiang University of Finance and Economics, Urumqi 830012, Xinjiang, China
  • show less
    References(94)

    [1] [1] BOMMASANI R, HUDSON D A, ADELI E, et al. On the opportunities and risks of foundation models[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2108.07258v3.

    [2] [2] DUBEY A, JAUHRI A, PANDEY A, et al. The Llama 3 herd of models[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2407.21783.

    [3] [3] ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2303.08774.

    [5] [5] WANG H Y, GUO S Z, YE J, et al. SAM-Med3D: towards general-purpose segmentation models for volumetric medical images[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2310.15161v3.

    [6] [6] PANDEY S, CHEN K F, DAM E B. Comprehensive multimodal segmentation in medical imaging: combining YOLOv8 with SAM and HQ-SAM models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2023: 2584-2590.

    [7] [7] PARULEKAR B, SINGH N, RAMIYA A M. Evaluation of Segment Anything Model (SAM) for automated labelling in machine learning classification of UAV geospatial data[J]. Earth Science Informatics, 2024, 17(5): 4407-4418.

    [8] [8] HETANG C R, XUE H R, LE C, et al. Segment anything model for road network graph extraction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2024: 2556-2566.

    [9] [9] ZHAO X Q, WU Z, CHEN Y B, et al. Fine-grained high-resolution remote sensing image change detection by SAM-U-Net change detection model[J]. Remote Sensing, 2024, 16(19): 3620.

    [10] [10] ZHANG J J, BAI C J, HE H R, et al. SAM-E: leveraging visual foundation model with sequence imitation for embodied manipulation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2405.19586v1.

    [11] [11] CHENG Y M, LI L L, XU Y Y, et al. Segment and track anything[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.06558v1.

    [12] [12] AHMADI M, LONBAR A G, NAEINI H K, et al. Application of segment anything model for civil infrastructure defect assessment[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2304.12600v2.

    [13] [13] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 3992-4003.

    [14] [14] ZHANG Y C, SHEN Z R, JIAO R S. Segment anything model for medical image segmentation: current applications and future directions[J]. Computers in Biology and Medicine, 2024, 171: 108238.

    [17] [17] ALI M, WU T, HU H J, et al. A review of the Segment Anything Model (SAM) for medical image analysis: accomplishments and perspectives[J]. Computerized Medical Imaging and Graphics, 2025, 119: 102473.

    [18] [18] DOSOVITSKIY A. An image is worth 16 × 16 words: Transformers for image recognition at scale[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2010.11929.

    [19] [19] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[M]. Berlin, Germany: Springer International Publishing, 2015.

    [20] [20] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/1706.05587v3.

    [21] [21] HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 15979-15988.

    [22] [22] ZHAO X, DING W C, AN Y Q, et al. Fast segment anything[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2306.12156v1.

    [23] [23] ZHANG C N, HAN D S, QIAO Y, et al. Faster segment anything: towards lightweight SAM for mobile applications[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2306.14289v2.

    [24] [24] ZHANG C N, HAN D S, ZHENG S, et al. MobileSAMv2: faster segment anything to everything[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2312.09579v1.

    [25] [25] XIONG Y Y, VARADARAJAN B, WU L M, et al. EfficientSAM: leveraged masked image pretraining for efficient segment anything[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2024: 16111-16121.

    [26] [26] ZHANG Z Y, CAI H, HAN S. EfficientViT-SAM: accelerated segment anything model without performance loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2024: 7859-7863.

    [27] [27] ZHOU C, LI X T, LOY C C, et al. EdgeSAM: prompt-in-the-loop distillation for on-device deployment of SAM[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2312.06660v2.

    [28] [28] KE L, YE M, DANELLJAN M, et al. Segment anything in high quality[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2306.01567.

    [29] [29] SONG Y, ZHOU Q, LI X, et al. BA-SAM: scalable bias-mode attention mask for segment anything model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2024: 3162-3173.

    [30] [30] LI F, ZHANG H, SUN P Z, et al. Semantic-SAM: segment and recognize anything at any granularity[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2307.04767v1.

    [31] [31] FENG Z S, ZHANG Y L, CHEN Y H, et al. SwinSAM: fine-grained polyp segmentation in colonoscopy images via segment anything model integrated with a Swin Transformer decoder[J]. Biomedical Signal Processing and Control, 2025, 100: 107055.

    [32] [32] CHEN W T, VONG Y J, KUO S Y, et al. RobustSAM: segment anything robustly on degraded images[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2406.09627v1.

    [33] [33] ZHANG L, LIANG Y, ZHANG R, et al. BLO-SAM: bi-level optimization based finetuning of the segment anything model for overfitting-preventing semantic segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2402.16338.

    [34] [34] JIANG M Z, ZHOU J Y, WU J D, et al. Uncertainty-Aware Adapter: adapting Segment Anything Model (SAM) for ambiguous medical image segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2403.10931v2.

    [35] [35] WU J D, JI W, LIU Y P, et al. Medical SAM adapter: adapting segment anything model for medical image segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2304.12620v7.

    [36] [36] MA J, HE Y, LI F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15(1): 654.

    [37] [37] ZHANG K D, LIU D. Customized segment anything model for medical image segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2304.13785v2.

    [38] [38] GAO Y F, XIA W, HU D D, et al. DeSAM: decoupled segment anything model for generalizable medical image segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2306.00499v2.

    [39] [39] CHEN T R, ZHU L Y, DING C T, et al. SAM-Adapter: adapting segment anything in underperformed scenes[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2023: 3359-3367.

    [40] [40] HUANG Y, LAI W B, JI J Y, et al. HRSAM: efficient interactive segmentation in high-resolution images[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2407.02109v2.

    [41] [41] LI B, XIAO H K, TANG L. ASAM: boosting segment anything model with adversarial tuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2024: 3699-3710.

    [43] [43] CHEN K Y, LIU C Y, CHEN H, et al. RSPrompter: learning to prompt for remote sensing instance segmentation based on visual foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4701117.

    [44] [44] YUE W X, ZHANG J, HU K, et al. SurgicalSAM: efficient class promptable surgical instrument segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 6890-6898.

    [45] [45] SUN Y P, CHEN J H, ZHANG S, et al. VRP-SAM: SAM with visual reference prompt[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2024: 23565-23574.

    [46] [46] MO S T, TIAN Y P. AV-SAM: segment anything model meets audio-visual localization and segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.01836v1.

    [47] [47] ZHANG Y X, CHENG T H, ZHU L H, et al. EVF-SAM: early vision-language fusion for text-prompted segment anything model[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2406.20076v5.

    [48] [48] RAJI F, KE L, TAI Y W, et al. Segment anything meets point tracking[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2307.01197v2.

    [49] [49] CHEN P F, XIE L X, HUO X Y, et al. SAM-CP: marrying SAM with composable prompts for versatile segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2407.16682v1.

    [50] [50] ZHANG R R, JIANG Z K, GUO Z Y, et al. Personalize segment anything model with one shot[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.03048v2.

    [51] [51] ZHOU C P, NING K J, SHEN Q Q, et al. SAM-SP: self-prompting makes SAM great again[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2408.12364v1.

    [52] [52] XU Y S, TANG J Q, MEN A D, et al. EviPrompt: a training-free evidential prompt generation method for segment anything model in medical images[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2311.06400v1.

    [53] [53] CHEN Z, XU Q, LIU X Y, et al. UN-SAM: universal prompt-free segmentation for generalized nuclei images[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2402.16663v1.

    [55] [55] LENG T A, ZHANG Y M, HAN K, et al. Self-sampling meta SAM: enhancing few-shot medical image segmentation with meta-learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2024: 7910-7920.

    [56] [56] QI X Y, WU Y F, MAO Y Q, et al. Self-guided few-shot semantic segmentation for remote sensing imagery based on large vision models[M]. Berlin, Germany: Springer, 2024.

    [57] [57] HE C, LI K, ZHANG Y, et al. Weakly-supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.11003.

    [58] [58] HU M Z, LI Y H, YANG X F. SkinSAM: empowering skin cancer segmentation with segment anything model[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2304.13973v1.

    [59] [59] CAO Y K, XU X H, SUN C, et al. Segment any anomaly without training via hybrid prompt regularization[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.10724v1.

    [60] [60] CUI C, DENG R N, LIU Q, et al. All-in-SAM: from weak annotation to pixel-wise nuclei segmentation with prompt-based finetuning[J]. Journal of Physics: Conference Series, 2024, 2722(1): 012012.

    [61] [61] DAI H X, MA C, YAN Z L, et al. SAMAug: point prompt augmentation for segment anything model[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2307.01187v4.

    [62] [62] WU K, ZHANG J N, PENG H W, et al. TinyViT: fast pretraining distillation for small vision transformers[M]. Berlin, Germany: Springer, 2022.

    [63] [63] ZHANG H J, SU Y Y, XU X, et al. Improving the generalization of segmentation foundation model under distribution shift via weakly supervised adaptation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2024: 23385-23395.

    [64] [64] SAHOO P, SINGH A K, SAHA S, et al. A systematic survey of prompt engineering in large language models: techniques and applications[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2402.07927v2.

    [65] [65] ANTONIOU A, EDWARDS H, STORKEY A. How to train your MAML[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2402.16338.

    [66] [66] SUN W X, LIU Z Y, ZHANG Y H, et al. An alternative to WSSS? An empirical study of the Segment Anything Model (SAM) on weakly-supervised semantic segmentation problems[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.01586v2.

    [67] [67] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2006.11239.

    [68] [68] XU Q, LI J X, HE X J, et al. ESP-MedSAM: efficient self-prompting SAM for universal domain-generalized medical image segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2407.14153v4.

    [69] [69] YILDIZ Z, GU H, ZHANG J, et al. SegmentWithSAM: 3D slicer extension for Segment Anything Model (SAM)[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2408.15224.

    [70] [70] WANG D, ZHANG J, DU B, et al. SAMRS: scaling-up remote sensing segmentation dataset with segment anything model[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2305.02034.

    [72] [72] ZHANG J, YANG X B, JIANG R, et al. RSAM-Seg: a SAM-based approach with prior knowledge integration for remote sensing image semantic segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2402.19004v1.

    [73] [73] LEE H, KIM K, LEE K. Application of Geo-Segment Anything Model (SAM) scheme to water body segmentation: an experiment study using CAS500-1 images[J]. Korean Journal of Remote Sensing, 2024, 40(4): 343-350.

    [74] [74] ZHANG X, LIU Y, LIN Y M, et al. UV-SAM: adapting segment anything model for urban village identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 22520-22528.

    [75] [75] XI L D, YU J C, GE D Q, et al. SAM-CFFNet: SAM-based cross-feature fusion network for intelligent identification of landslides[J]. Remote Sensing, 2024, 16(13): 2334.

    [76] [76] GIANNAKIS I, BHARDWAJ A, SAM L, et al. Deep learning universal crater detection using Segment Anything Model (SAM)[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2304.07764v1.

    [77] [77] ZHANG S M, LU Q H. Innovative integration of visual foundation model with a robotic arm on a mobile platform[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2404.18720v1.

    [78] [78] MOENCK K, WENDT A, PRNTE P, et al. Industrial segment anything—a case study in aircraft manufacturing, intralogistics, maintenance, repair, and overhaul[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2307.12674v1.

    [79] [79] LIANG W, MA X G. Group-Mix SAM: lightweight solution for industrial assembly line applications[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2403.10053v1.

    [80] [80] LI Z S, HUO D, MEURER M, et al. Efficient cutting tool wear segmentation based on segment anything model[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2407.01211.

    [81] [81] YANG Y H, WU X Y, HE T, et al. SAM3D: segment anything in 3D scenes[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2306.03908v1.

    [82] [82] CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 3213-3223.

    [83] [83] NEUHOLD G, OLLMANN T, BUL S R, et al. The Mapillary Vistas dataset for semantic understanding of street scenes[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 5000-5009.

    [84] [84] LAKHANI P, MONGAN J, SINGHAL C, et al. The 2021 SIIM-FISABIO-RSNA machine learning COVID-19 challenge: annotation and standard exam classification of COVID-19 chest radiographs[J]. Journal of Digital Imaging, 2023, 36(1): 365-372.

    [85] [85] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[M]. Berlin, Germany: Springer, 2014.

    [86] [86] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.

    [87] [87] ZHOU B L, ZHAO H, PUIG X, et al. Semantic understanding of scenes through the ADE20K dataset[J]. International Journal of Computer Vision, 2019, 127(3): 302-321.

    [88] [88] MARTIN D, FOWLKES C, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings of the 8th IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2001: 416-423.

    [89] [89] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2009: 248-255.

    [90] [90] WANG L J, LU H C, WANG Y F, et al. Learning to detect salient objects with image-level supervision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 3796-3805.

    [91] [91] WANG J J, ZHENG Z, MA A L, et al. LoveDA: a remote sensing land-cover dataset for domain adaptive semantic segmentation[EB/OL]. [2024-10-11]. https://arxiv.org/abs/2110.08733v6.

    [92] [92] LECLERC S, SMISTAD E, PEDROSA J, et al. Deep learning for segmentation using an open large-scale dataset in 2D echocardiography[J]. IEEE Transactions on Medical Imaging, 2019, 38(9): 2198-2210.

    [93] [93] ZHANG J, FAN D P, DAI Y C, et al. RGB-D saliency detection via cascaded mutual information minimization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 4318-4327.

    [94] [94] TU Z Z, XIA T, LI C L, et al. RGB-T image saliency detection via collaborative graph learning[J]. IEEE Transactions on Multimedia, 2020, 22(1): 160-173.

    [95] [95] QIN X B, DAI H, HU X B, et al. Highly accurate dichotomous image segmentation[M]. Berlin, Germany: Springer, 2022.

    [96] [96] FAN D P, JI G P, SUN G L, et al. Camouflaged object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 2777-2787.

    [97] [97] VICENTE T F Y, HOU L, YU C P, et al. Large-scale training of shadow detectors with noisily-annotated shadow examples[M]. Berlin, Germany: Springer International Publishing, 2016.

    [98] [98] FAN D P, JI G P, XU P, et al. Advances in deep concealed scene understanding[J]. Visual Intelligence, 2023, 1(1): 16.

    [99] [99] TAJBAKHSH N, GURUDU S R, LIANG J M. Automated polyp detection in colonoscopy videos using shape and context information[J]. IEEE Transactions on Medical Imaging, 2016, 35(2): 630-644.

    [100] [100] SHUMAILOV I, SHUMAYLOV Z, ZHAO Y R, et al. AI models collapse when trained on recursively generated data[J]. Nature, 2024, 631(8022): 755-759.

    Tools

    Get Citation

    Copy Citation Text

    Mayilamu Musideke, GAO Yuxin, ZHANG Situo, FENG Ke, Abudukelimu Abulizi, Halidanmu Abudukelimu. Review of Application of SAM and Its Improved Models in Image Segmentation[J]. Computer Engineering, 2025, 51(8): 16

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Nov. 18, 2024

    Accepted: Aug. 26, 2025

    Published Online: Aug. 26, 2025

    The Author Email: Halidanmu Abudukelimu (abdklmhldm@gmail.com)

    DOI:10.19678/j.issn.1000-3428.0070619

    Topics