Journal of the Chinese Ceramic Society, Volume. 51, Issue 2, 427(2023)
A Data Quality and Quantity Governance for Machine Learning in Materials Science
Data-driven machine learning is widely used in materials property prediction and structure-activity relationship research due to its accurate and efficient predictive ability. Data determines the upper limit of machine learning. However, materials data often have various quality and quantity problems (i.e., multiple sources, large noise, small samples, and high dimensionality), affecting the application of machine learning in the materials field. In this paper, by analyzing the data quality and quantity problems and their related governance work, we find that data quality and data quantity jointly determine this problem. Following this, a data quality and quantity governance framework embedded by materials domain knowledge in the whole process of materials machine learning is proposed. We define twelve dimensions to analyze the connotation of materials data quality and quantity. A life cycle model of data quality and quantity governance is constructed to ensure that data quality and quantity governance activities are carried out in an orderly manner. To manage data quality and quantity accurately and comprehensively, a series of corresponding governance processing models are established from domain knowledge and data-driven aspects, which provides technical support for the specific implementation of the life cycle model. This framework realizes the overall evaluation and improvement of materials data quality and quantity, providing theoretical guidance and candidate solutions for high-quality and appropriate-quantity data acquisition and accelerating the in-depth application of machine learning in materials research and development.
Get Citation
Copy Citation Text
LIU Yue, MA Shuchang, YANG Zhengwei, ZOU Xinxin, SHI Siqi. A Data Quality and Quantity Governance for Machine Learning in Materials Science[J]. Journal of the Chinese Ceramic Society, 2023, 51(2): 427
Special Issue:
Received: Nov. 18, 2022
Accepted: --
Published Online: Mar. 11, 2023
The Author Email: