Meta-learning for estimating the energy gaps of aromatic molecules
L. S. PETROSYAN1,2, I. P. KOSKIN1, M. S. KAZANTSEV1
1Vorozhtsov Novosibirsk Institute of Organic Chemistry, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia 2Novosibirsk State University, Novosibirsk, Russia
Keywords: machine learning, energy gap, conjugated molecules, meta-learning
Abstract
A novel predictive model based on meta-learning algorithms for estimating the energy gap between the frontier molecular orbitals of aromatic p-conjugated molecules is presented. The main goal of the study was to develop a highly accurate and robust model capable of replacing computationally expensive quantum chemistry calculations, in particular the methods based on density functional theory, for screening organic compounds in optoelectronic applications. The filtered subset of the publicly available PubChemQC PM6 database was used as the primary dataset. Molecular structure was encoded using Morgan fingerprints, which served as input for training three base models: Random Forest, Gradient Boosted Trees, and a fully connected Neural Network. Among these, the Gradient Boosted Trees model achieved the best performance (the mean absolute error was 0.1795 eV). To improve prediction accuracy, a meta-model was implemented and trained on the outputs of the above-mentioned base models. This approach demonstrated improved accuracy: the final mean absolute error was 0.1744 eV, which is 8 % better than simple averaging and 3 % better than the best-performing individual model. The proposed approach can be further enhanced by expanding the dataset and incorporating additional models, which pave the way for more accurate and efficient prediction of the properties of organic conjugated molecules in optoelectronics.
|