Journal of the Chinese Ceramic Society, Volume. 51, Issue 2, 427(2023)
A Data Quality and Quantity Governance for Machine Learning in Materials Science
[1] [1] ROBERT C. Machine learning, a probabilistic perspective[J]. Chance, 2014, 27(2): 62?63.
[2] [2] LIU Y, ZHAO T L, JU W W, et al. Materials discovery and design using machine learning[J]. J Materiomics, 2017, 3(3): 159?177.
[3] [3] SCHMIDT J, MARQUES M R G, BOTTI S, et al. Recent advances and applications of machine learning in solid-state materials science[J]. NPJ Comput Mater, 2019, 5(1): 1?36.
[4] [4] CHEN C, ZUO Y X, YE W K, et al. A critical review of machine learning of energy materials[J]. Adv Energy Mater, 2020, 10(8): 1903242.
[5] [5] CHEN H H, CHEN J P, DING J H. Data evaluation and enhancement for quality improvement of machine learning[J]. IEEE Trans Reliab, 2021, 70(2): 831?847.
[6] [6] MEHRABI N, MORSTATTER F, SAXENA N, et al. A survey on bias and fairness in machine learning[J]. Acm Comput Surveys, 2021, 54(6): 1?32.
[7] [7] OAKI Y, IGARASHI Y. Materials informatics for 2d materials combined with sparse modeling and chemical perspective: Toward small-data-driven chemistry and materials science[J]. Bull Chem Soc Jpn, 2021, 94(10): 2410?2422.
[8] [8] LIU Y, GUO B R, ZOU X X, et al. Machine learning assisted materials design and discovery for rechargeable batteries[J]. Energy Storage Mater, 2020, 31: 434?450.
[9] [9] BEAL M S, HAYDEN B E, LE GALL T, et al. High throughput methodology for synthesis, screening, and optimization of solid state lithium ion electrolytes[J]. ACS Comb Sci, 2011, 13(4): 375?381.
[10] [10] RAJAN A C, MISHRA A, SATSANGI S, et al. Machine- learning-assisted accurate band gap predictions of functionalized mxene[J]. Chem Mater, 2018, 30(12): 4031?4038.
[11] [11] LU P, ZHUO Z, ZHANG W H, et al. A hybrid feature selection combining wavelet transform for quantitative analysis of heat value of coal using laser-induced breakdown spectroscopy[J]. APPL Phys B-Lasers O, 2021, 127(19): 1?11.
[12] [12] YUAN J, WANG Q, LI Z, et al. Domain-knowledge-oriented data pre-processing and machine learning of corrosion-resistant γ-u alloys with a small database[J]. Comput Mater Sci, 2021, 194: 110472.
[14] [14] GHARAGHEIZI F, SATTARI M, ILANI-KASHKOULI P, et al. A “non-linear” quantitative structure-property relationship for the prediction of electrical conductivity of ionic liquids[J]. Chem Eng Sci, 2013, 101: 478?485.
[15] [15] HEMMATI-SARAPARDEH A, TASHAKKORI M, HOSSEINZADEH M, et al. On the evaluation of density of ionic liquid binary mixtures: Modeling and data assessment[J]. J Mol Liq, 2016, 222: 745?751.
[16] [16] LI W, JACOBS R, MORGAN D. Predicting the thermodynamic stability of perovskite oxides using machine learning models[J]. Comput Mater Sci, 2018, 150: 454?463.
[17] [17] XU Q, LI Z, LIU M, et al. Rationalizing perovskite data for machine learning and materials design[J]. J Phys Chem Lett, 2018, 9(24): 6948?6954.
[18] [18] WUEST T, MAK-DADANSKI J, THOBEN K-D. Data quality in materials science: A quality management manual approach[C]//IFIP International conference on advances in production management systems, Springer, 2014: 42?49.
[19] [19] WENZLICK M, MAMUN O, DEVANATHAN R, et al. Assessment of outliers in alloy datasets using unsupervised techniques[J]. J Materiomics, 2022, 74(7): 2846?2859.
[20] [20] WILKINSON M D, DUMONTIER M, AALBERSBERG I J, et al. The fair guiding principles for scientific data management and stewardship[J]. Sci Data, 2016, 3: 160018.
[22] [22] IWASAKI Y, SAWADA R, STANEV V, et al. Identification of advanced spin-driven thermoelectric materials via interpretable machine learning[J]. NPJ Comput Mater, 2019, 5(103): 1?6.
[23] [23] AGRAWAL A, DESHPANDE P D, CECEN A, et al. Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters[J]. Integr Mater Manuf I, 2014, 3: 90?108.
[24] [24] SHIN D, YAMAMOTO Y, BRADY M P, et al. Modern data analytics approach to predict creep of high-temperature alloys[J]. Acta Mater, 2019, 168: 321?330.
[25] [25] IM J, LEE S, KO T W, et al. Identifying Pb-free perovskites for solar cells by machine learning[J]. NPJ Comput Mater, 2019, 5(37): 1?8.
[26] [26] DENG Q, LIN B. Exploring structure-composition relationships of cubic perovskite oxides via extreme feature engineering and automated machine learning[J]. Mater Today Commun, 2021, 28: 102590.
[27] [27] MANGAL A, HOLM E A. A comparative study of feature selection methods for stress hotspot classification in materials[J]. Integr Mater Manuf I, 2018, 7(3): 87?95.
[28] [28] QI Z C, ZHANG N X, YONG L, et al. Prediction of mechanical properties of carbon fiber based on cross-scale fem and machine learning[J]. Compos Struct, 2019, 212: 199?206.
[29] [29] WANG X L, XIAO R J, LI H, et al. Quantitative structure-property relationship study of cathode volume changes in lithium ion batteries using ab-initio and partial least squares analysis[J]. J Materiomics, 2017, 3(3): 178?183.
[30] [30] ZENG Y Z, LI Q X, BAI K W. Prediction of interstitial diffusion activation energies of nitrogen, oxygen, boron and carbon in bcc, fcc, and hcp metals using machine learning[J]. Comput Mater Sci, 2018, 144: 232?247.
[31] [31] ATTARIAN SHANDIZ M, GAUVIN R. Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries[J]. Comput Mater Sci, 2016, 117: 270?278.
[32] [32] STANEV V, OSES C, KUSNE A G, et al. Machine learning modeling of superconducting critical temperature[J]. NPJ Comput Mater, 2018, 4(29): 1?14.
[33] [33] FURMANCHUK A, SAAL J E, DOAK J W, et al. Prediction of Seebeck coefficient for compounds without restriction to fixed stoichiometry: A machine learning approach[J]. J Comput Chem, 2018, 39(4): 191?202.
[34] [34] FENG H Q, WU B H, LIU Y Y, et al. The application of particle swarm optimization algorithm on absorbent materials[J]. Appl Mech Mater, 2014, 446?447: 1541?1545.
[35] [35] LIU Y, ZOU X X, MA S C, et al. Feature selection method reducing correlations among features by embedding domain knowledge[J]. Acta Mater, 2022, 238: 118195.
[36] [36] YAN C, LIANG J, ZHAO M, et al. A novel hybrid feature selection strategy in quantitative analysis of laser-induced breakdown spectroscopy[J]. Anal Chim Acta, 2019, 1080: 35?42.
[37] [37] STURLAUGSON L E, SHEPPARD J W. Principal component analysis preprocessing with bayesian networks for battery capacity estimation[C]//2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Minneapolis, MN, USA, 2013: 98?101.
[38] [38] CURTAROLO S, MORGAN D, PERSSON K, et al. Predicting crystal structures with data mining of quantum calculations[J]. Phys Rev Lett, 2003, 91(13): 135503.
[39] [39] OUYANG R H, CURTAROLO S, AHMETCIK E, et al. Sisso: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates[J]. Phys Rev Mater, 2018, 2: 083802.
[40] [40] ANDERSEN M, LEVCHENKO S V, SCHEFFLER M, et al. Beyond scaling relations for the description of catalytic materials[J]. Acs Catalysis, 2019, 9(4): 2752?2759.
[41] [41] BARTEL C J, MILLICAN S L, DEML A M, et al. Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature- dependent materials chemistry[J]. Nat Commun, 2018, 9: 4168?4177.
[42] [42] WENG B, SONG Z, ZHU R, et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts[J]. Nat Commun, 2020, 11: 3513?3520.
[43] [43] HE M, ZHANG L. Machine learning and symbolic regression investigation on stability of mxene materials[J]. Comput Mater Sci, 2021, 196: 110578.
[44] [44] TRAN B, XUE B, ZHANG M, et al. A new representation in pso for discretization-based feature selection[J]. IEEE Trans Cybern, 2018, 48(6): 1733?1746.
[45] [45] HANCHUAN P, FUHUI L, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Trans Pattern Anal Mach Intell, 2005, 27(8): 1226?1238.
[46] [46] GENUER R, POGGI J M, TULEAU-MALOT C. Variable selection using random forests[J]. Pattern Recognit Lett, 2010, 31(14): 2225?2236.
[47] [47] BALAKRISHNAN K, DHANALAKSHMI R. Feature selection techniques for microarray datasets: A comprehensive review, taxonomy, and future directions[J]. Front Inf Technol Electron Eng, 2022, 23(10): 1451?1478.
[48] [48] JAIN A, ONG S P, HAUTIER G, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation[J]. Apl Materials, 2013, 1(1): 011002.
[49] [49] KIRKLIN S, SAAL J E, MEREDIG B, et al. The open quantum materials database (oqmd): Assessing the accuracy of dft formation energies[J]. NPJ Comput Mater, 2015, 1(1): 15010.
[50] [50] HE B, CHI S, YE A, et al. High-throughput screening platform for solid electrolytes combining hierarchical ion-transport prediction algorithms[J]. Sci Data, 2020, 7(1): 151.
[51] [51] WU Y J, FANG L, XU Y B. Predicting interfacial thermal resistance by machine learning[J]. NPJ Comput Mater, 2019, 5(1): 56.
[52] [52] WANG Y Q, YAO Q M, KWOK J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. Acm Comput Surv, 2020, 53(3): 1?34.
[53] [53] SONG Y, SIRIWARDANE E M D, ZHAO Y, et al. Computational discovery of new 2D materials using deep learning generative models[J]. ACS Appl Mater Interfaces, 2021, 13(45): 53303?53313.
[54] [54] DAN Y, ZHAO Y, LI X, et al. Generative adversarial networks (gan) based efficient sampling of chemical composition space for inverse design of inorganic materials[J]. NPJ Comput Mater, 2020, 6(1): 84.
[55] [55] NOH J, KIM J, STEIN H S, et al. Inverse design of solid-state materials via a continuous representation[J]. Matter, 2019, 1(5): 1370?1384.
[56] [56] HOFFMANN J, MAESTRATI L, SAWADA Y, et al. Data-driven approach to encoding and decoding 3-d crystal structures[J]. Arxiv, 2019. Doi: 10.48550/arXiv.1909.00949.
[57] [57] LOOKMAN T, BALACHANDRAN P V, XUE D Z, et al. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design[J]. NPJ Comput Mater, 2019, 5(1): 21.
[58] [58] MIN K, CHO E. Accelerated discovery of potential ferroelectric perovskite via active learning[J]. J Mater Chem C, 2020, 8(23): 7866?7872.
[59] [59] PRUKSAWAN S, LAMBARD G, SAMITSU S, et al. Prediction and optimization of epoxy adhesive strength from a small dataset through active learning[J]. Sci Technol Adv Mater, 2019, 20(1): 1010?1021.
[60] [60] JEONG M H, SULLIVAN C J, GAO Y Z, et al. Robust abnormality detection methods for spatial search of radioactive materials[J]. Trans GIS, 2019, 23(4): 860?877.
[62] [62] LIU Y, WU J, AVDEEV M, et al. Multi-layer feature selection incorporating weighted score-based expert knowledge toward modeling materials with targeted properties[J]. Adv Theory Simul, 2020, 3: 1900215.
[63] [63] LIU Y, GE X Y, YANG Z W, et al. An automatic descriptors recognizer customized for materials science literature[J]. J Power Sources, 2022, 545: 231946.
[64] [64] TSHITOYAN V, DAGDELEN J, WESTON L, et al. Unsupervised word embeddings capture latent knowledge from materials science literature[J]. Nature, 2019, 571(7763): 95?98.
Get Citation
Copy Citation Text
LIU Yue, MA Shuchang, YANG Zhengwei, ZOU Xinxin, SHI Siqi. A Data Quality and Quantity Governance for Machine Learning in Materials Science[J]. Journal of the Chinese Ceramic Society, 2023, 51(2): 427
Special Issue:
Received: Nov. 18, 2022
Accepted: --
Published Online: Mar. 11, 2023
The Author Email: