Journal of the Chinese Ceramic Society, Volume. 53, Issue 7, 1844(2025)

Large Language Models Extract Synthesis Information of Lithium-ion Battery Solid-state Electrolytes from Literature

WEI Shihao, LI Shuyuan, WANG Yaxin, and SUN Shaorui
Author Affiliations
  • College of Materials Science and Engineering, Beijing University of Technology, Beijing 100124, China
  • show less

    IntroductionLithium-ion batteries are widely used in electric vehicles due to their high energy density and environmental compatibility, but their safety and endurance are limited by liquid-state electrolytes. Solid-state electrolytes have attracted recent attention for their high safety and energy density, but their research faces some challenges. Artificial intelligence technologies, such as machine learning, can accelerate the discovery and optimization of solid-state electrolyte materials. However, extracting synthesis information from literature is time-consuming and prone to errors. Large language models (LLMs) become popular in the field of natural language processing. This paper was to propose the use of LLMs to automatically extract solid-state electrolyte synthesis information from literature via building datasets to train and fine-tune models.MethodsThe process of extracting synthesis information for solid-state electrolytes included five parts, i.e., literature downloading, parsing and processing, paragraph classification, information extraction, and synthesis process visualization. Initially, the downloaded XML documents were parsed to extract all paragraphs under the headings of "Experimental" or "Method". A classification dataset was constructed using these paragraphs, with the training set used to train and fine-tune a Bayesian model, a BERT model, and three different LLMs. The test set was used to evaluate the classification performance of different models. Three LLMs are fine-tuned using a dataset previously built for the extraction of synthesis information in inorganic catalysts, enabling them to complete the task of extracting synthesis information for solid-state electrolytes, outputting four parts of contents, i.e., product, raw materials, methods, and steps, and testing and evaluating with the synthesis paragraphs filtered by the classification models.Results and discussionThe NB model slightly underperforms the other models in terms of precision, recall, and F1 score. This may be attributed to the pre-training of LLMs and BERT on extensive text data and their fine-tuning on paragraph classification data, endowing them with strong language comprehension abilities. The comparable overall performance of the 3B and 8B models may be attributed to the 3B model being derived from the 8B model through pruning and knowledge distillation, resulting in similar performance characteristics. The classification task in this paper is limited to the classification of solid-state electrolyte synthesis paragraphs and the training and testing data are relatively scarce, the superior performance of the 7B model does not imply that it is always superior to other models. The comparison between the 3B and 8B models demonstrates that in certain scenarios, smaller-parameter models can outperform their larger-parameter counterparts. Therefore, it is not always necessary to pursue models with more parameters. Instead, reasonable choices should be made based on the specific requirements of the task. In misclassifications, there is a higher incidence of predicting label 0 as 1, which may be due to the inclusion of descriptions of electrode preparation and battery assembly in paragraphs about non-solid electrolytes, involving the use of electrolytes, leading to the model to misjudge these paragraphs as solid-state electrolyte paragraphs. To further enhance the classification performance of LLMs, the way prompts the construction of the dataset that can be adjusted to allow LLMs to better understand the subtle differences between different paragraphs.The three LLMs perform exceptionally well in the task of extracting synthesis information for solid-state electrolytes, demonstrating that the combination of pre-training and fine-tuning can fully leverage the text understanding and instruction following capabilities of large models, making them competent for the crucial task of extracting important information from the synthesis process of solid-state electrolytes. Moreover, the fine-tuning dataset used originates from the field of inorganic catalysts and shows excellent performance in the field of solid-state electrolytes, indicating a great generalization ability of large models. They learn the ability to extract structured information from unstructured text, rather than just extracting information about fixed materials and synthesis processes. The extraction effects of the 3B, 7B, and 8B models are comparable, indicating that the size of the model is not a sole factor determining performance. The design and training strategies of the models are equally important.ConclusionsIn the paragraph classification task, LLMs and BERT, benefiting from their pre-training on large-scale text data and fine-tuning for specific classification tasks, demonstrated classification performance that surpassed that of Bayesian models. Among the LLMs of different sizes, the 3B model, derived from the 8B model through pruning and knowledge distillation, showed similar classification effectiveness to the 8B model. The 7B model achieved an F1 score exceeding 0.9, thus highlighting its superior performance in handling such tasks.In the synthesis information extraction task, LLMs were fine-tuned using a dataset of inorganic catalyst synthesis information. The results showed that the F1 scores of all three models exceeded 0.9, demonstrating great generalization capabilities of LLMs. This indicated that LLMs could learn the ability to extract structured information from unstructured text, rather than just extracting information about fixed materials and synthesis processes. The extraction performance of the 3B model was slightly better than that of the 7B and 8B models, thus indicating that the number of parameters could not be a sole factor determining performance. The design and training strategies of the models could be equally important.

    Tools

    Get Citation

    Copy Citation Text

    WEI Shihao, LI Shuyuan, WANG Yaxin, SUN Shaorui. Large Language Models Extract Synthesis Information of Lithium-ion Battery Solid-state Electrolytes from Literature[J]. Journal of the Chinese Ceramic Society, 2025, 53(7): 1844

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Special Issue:

    Received: Jan. 2, 2025

    Accepted: Aug. 12, 2025

    Published Online: Aug. 12, 2025

    The Author Email:

    DOI:10.14062/j.issn.0454-5648.20240845

    Topics