Journal of the Chinese Ceramic Society, Volume. 51, Issue 2, 427(2023)

A Data Quality and Quantity Governance for Machine Learning in Materials Science

LIU Yue1,2, MA Shuchang1, YANG Zhengwei1, ZOU Xinxin1, and SHI Siqi3,4
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • 4[in Chinese]
  • show less

    Data-driven machine learning is widely used in materials property prediction and structure-activity relationship research due to its accurate and efficient predictive ability. Data determines the upper limit of machine learning. However, materials data often have various quality and quantity problems (i.e., multiple sources, large noise, small samples, and high dimensionality), affecting the application of machine learning in the materials field. In this paper, by analyzing the data quality and quantity problems and their related governance work, we find that data quality and data quantity jointly determine this problem. Following this, a data quality and quantity governance framework embedded by materials domain knowledge in the whole process of materials machine learning is proposed. We define twelve dimensions to analyze the connotation of materials data quality and quantity. A life cycle model of data quality and quantity governance is constructed to ensure that data quality and quantity governance activities are carried out in an orderly manner. To manage data quality and quantity accurately and comprehensively, a series of corresponding governance processing models are established from domain knowledge and data-driven aspects, which provides technical support for the specific implementation of the life cycle model. This framework realizes the overall evaluation and improvement of materials data quality and quantity, providing theoretical guidance and candidate solutions for high-quality and appropriate-quantity data acquisition and accelerating the in-depth application of machine learning in materials research and development.

    Tools

    Get Citation

    Copy Citation Text

    LIU Yue, MA Shuchang, YANG Zhengwei, ZOU Xinxin, SHI Siqi. A Data Quality and Quantity Governance for Machine Learning in Materials Science[J]. Journal of the Chinese Ceramic Society, 2023, 51(2): 427

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Special Issue:

    Received: Nov. 18, 2022

    Accepted: --

    Published Online: Mar. 11, 2023

    The Author Email:

    DOI:10.14062/j.issn.0454-5648.20220991

    Topics