Remote Sensing Technology and Application, Volume. 40, Issue 1, 1(2025)

Remote Sensing Large Models: Review and Future Prospects

Shuaihao ZHANG and Zhigang PAN*
Author Affiliations
  • Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing100190, China
  • show less
    Figures & Tables(9)
    Flowchart of foundation models for vision tasks on remote sensing images
    ViT architecture
    CLIP pretrained model
    Adapter-Tuning architecture diagram
    Summary of Self-Supervised Methods and Classical Models
    Diagram of Remote Sensing Foundation Model Applications
    • Table 2. Summary of Vision Foundation Models

      View table
      View in Article

      Table 2. Summary of Vision Foundation Models

      模型架构参数量训练数据发布日期
      ViT[9]基于Transformer1.843 BJFT-3B2021/6/8
      Swin-T[10]基于ViT增加shifted windows3 BImageNet,CoCo2021/11/8
      CoAtNet[11]基于Transformer24.4 BImageNet,JFT2021/6/9
      CoCa结合了 contrastive loss和caption loss21 BJFT-3B, ALIGN2022/6/14
      DALL·E 3它在DALL·E2的基础上进行了改进,提高了图像的保真度和质量未公布未公布2023/10/19
    • Table 3. Summary of Vision-Language Foundation Models

      View table
      View in Article

      Table 3. Summary of Vision-Language Foundation Models

      模型架构训练数据发布日期
      CLIP[5]

      基于ResNets、ViT和Transformer

      图文双流对比预训练

      WIT(WebImageText)2021/2/26
      ALBEF[13]基于BERT和ViT图文双流先对齐后融合预训练,增加了动量蒸馏解决噪声问题COCO和Visual Genome2021/10/7
      BLIP[14]基于BERT和ViT多模态混合编码器-解码器,增加了CapFilt提高文本语料库质量COCO、Visual Genome、Conceptual Captions2022/2/15
      Flamingo[15]基于NFNet和Transformer图文交错多模态融合预训练,通过感知器重采样减少视觉标签COCO、OKVQA、VQAv2、MSVDQA2022/11/15
      LLaVA[16]基于Vicuna、CLIP视觉编码器进行特征对齐预训练,利用指令进行端到端的微调ScienceQA、CC-595K、LLaVA-Instruct-158K2023/12/11
      Kosmos1[17]基于MAGNETO对齐感知和语言模型,使用xPos相对位置编码更好地进行长上下文建模The Pile、Common Crawl、LAION-2B、LAION-400M2023/03/1
    • Table 4. Summary of large models in remote sensing

      View table
      View in Article

      Table 4. Summary of large models in remote sensing

      模型训练策略数据集基础网络数据集模态
      RingMo[68]MAE+PIMask多个数据集Vit单模态
      RSPrompter[69]冻结+微调WHU,NWPU,SSDDsam单模态
      SpectralGPT[70]MAE+3DMaskfMoW/BigEarthNetS2Vit单模态
      RemoteCLIP[52]MAE10个数据集Vit-14多模态
      GeoChat[71]LoRA微调318k个指令对的RS数据集LLaVA1.5多模态
      RSGPT[63]文本监督图像+文本描述+指令InstructBLIP多模态
      EarthPT[46]自回归ClearSkyUnsupervised Multitask Learners多模态
      DINO-MM[72]蒸馏+对比多个数据集Self-supervised Multitask Learners多模态
      SkySense[47]冻结+微调16个不同任务的数据集Vit多模态
    Tools

    Get Citation

    Copy Citation Text

    Shuaihao ZHANG, Zhigang PAN. Remote Sensing Large Models: Review and Future Prospects[J]. Remote Sensing Technology and Application, 2025, 40(1): 1

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Jul. 21, 2024

    Accepted: --

    Published Online: May. 22, 2025

    The Author Email: Zhigang PAN (zgpan@mail.ie.ac.cn)

    DOI:10.11873/j.issn.1004-0323.2025.1.0001

    Topics