Journal of Electronic Science and Technology, Volume. 23, Issue 1, 100301(2025)

On large language models safety, security, and privacy: A survey

Ran Zhang, Hong-Wei Li*, Xin-Yuan Qian, Wen-Bo Jiang, and Han-Xiao Chen
Author Affiliations
  • School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
  • show less
    Figures & Tables(4)
    Definition of safety, security, and privacy in LLMs.
    Training and inference phases of LLMs.
    Overview of safety, security, and privacy issues and their defense methods.
    • Table 1. Classification of safety, security, and privacy issues in LLMs.

      View table
      View in Article

      Table 1. Classification of safety, security, and privacy issues in LLMs.

      CategoryFeatureExample
      SafetyToxicity and biasToxicity and bias refer to the inappropriate or prejudiced content that may be generated or amplified by LLMs due to biased training data. This can result in harmful or discriminatory outcomes.[12,13]
      HallucinationHallucination refers to the generation of text that is nonsensical, irrelevant, or factually incorrect, typically arising from the model’s inability to accurately understand or process the context of the input data.[14,15]
      JailbreakJailbreak attack refers to the exploitation of vulnerabilities to bypass the model’s intended constraints and generate content that violates its operational guidelines or ethical safeguards.[16,17]
      SecurityBackdoor & poisoning attacksBackdoor and poisoning attacks refer to the malicious insertion of hidden triggers or corrupted data during the training process, which can cause the model to produce harmful or targeted outputs when prompted with specific inputs.[6,18]
      Adversarial attackAdversarial attack is a method where carefully crafted inputs are used to deceive the model into making errors or generating unintended outputs, often by exploiting the model’s weaknesses or vulnerabilities.[19,20]
      PrivacyPrivacy leakagePrivacy leakage occurs when sensitive or personal information is inadvertently disclosed through the model’s responses, due to the model’s exposure to or training on data containing such information.[7,21]
      Inference attackInference attacks involve the exploitation of model responses to deduce sensitive information about the training data or the underlying algorithms, potentially compromising privacy or security.[22,23]
      Extraction attackExtraction attacks are attempts to reverse-engineer or illicitly obtain proprietary information, such as training data or model parameters, by interacting with the model’s outputs.[24,25]
    Tools

    Get Citation

    Copy Citation Text

    Ran Zhang, Hong-Wei Li, Xin-Yuan Qian, Wen-Bo Jiang, Han-Xiao Chen. On large language models safety, security, and privacy: A survey[J]. Journal of Electronic Science and Technology, 2025, 23(1): 100301

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Sep. 25, 2024

    Accepted: Jan. 9, 2025

    Published Online: Apr. 7, 2025

    The Author Email: Hong-Wei Li (hongweili@uestc.edu.cn)

    DOI:10.1016/j.jnlest.2025.100301

    Topics