深度學習模型Schnet的分析、簡化與改進方法探討
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
在深度學習應用化學領域的研究中,以對分子特定性質的高準確度、低計算成本預測的研究一直是具有高關注度的研究方向。在本研究中所使用的Schnet便是對此研究方向提出的一個相對成熟的深度學習模型,具有能預測QM9數據集中分子包含HOMO能量 (EH)、LUMO能量 (EL)以及兩者能量差距(EG)在內的性質平均絕對誤差達到接近或小於1 kcal/mol的準確度,並且其計算成本遠低於經典的DFT計算。針對Schnet的優秀預測效果,本研究對於其架構的主要部分進行分析,得到類似於誘導效應的資料關聯。之後,利用QM9數據集中的所具有的分子SMILES,產生鍵步資訊並替換Schnet架構中的一部份以達成輸入資訊的簡化,最終獲得了與原始Schnet相比大約1-2倍左右的平均絕對誤差。在架構的更改上,進一步利用來自Deep4Chem的分子數據集來測試Schnet經過簡化之後的架構其預測螢光放光波長、吸光波長以及量子產率的能力,然後再外加一層用以對分子環境不同作為標示的嵌入層,將分子環境的資訊輸入模型中,以獲得更好的預測結果。在對Schnet模型的後續改進中,對Deep4Chem分子數據集中的螢光分子之吸光與放光波長的預測,其平均絕對誤差達到了0.131 eV與0.087 eV;而在Schnet中加入一層嵌入層之後,對吸光與放光波長的預測之平均絕對誤差則被降低到了0.083 eV與0.082 eV。在預測量子產率的表現上,兩種模型分別的平均絕對誤差為0.336與0.292。
In the field of deep learning applied chemistry, the study of molecule-specific property prediction with high accuracy and low computational cost has been a high interest research direction. The Schnet model used in this study is a relatively mature deep learning model for this research direction, which can predict the mean absolute errors (MAEs) of molecules in the QM9 dataset including HOMO energy levels (EH), LUMO energy levels (EL) and the energy gap between them (EG) with an accuracy close to or less than 1 kcal/mol, and its computational cost is much lower than that of classical DFT calculations. In the present work, for the better prediction of Schnet, the main part of its structure is analyzed to obtain data correlations similar to inductive effect. Then, using the molecular SMILES available in the QM9 dataset to generate the bond steps information and replace part of the structure of Schnet to achieve simplification of the input information, which results in MAEs of about 1-2 times compared to the original Schnet. To change the structure of Schnet, the simplified structure of Schnet is further tested to predict fluorescence emission wavelengths, absorption wavelengths, and quantum yields of molecular data sets from Deep4Chem, and then an additional embedding layer is added to label the differences in molecular environments to input information about molecular environments into the model for better prediction results.In the subsequent improvement of the Schnet model, MAEs of the prediction of the absorption wavelengths and emission wavelengths of the fluorescent molecules in the Deep4Chem molecular dataset reached 0.131 eV and 0.087 eV, while MAEs of the prediction of the absorption wavelengths and emission wavelengths is reduced to 0.083 eV and 0.082 eV by adding an embedded layer to Schnet. MAEs in the predicted quantum yield is 0.336 and 0.292 for two models.
In the field of deep learning applied chemistry, the study of molecule-specific property prediction with high accuracy and low computational cost has been a high interest research direction. The Schnet model used in this study is a relatively mature deep learning model for this research direction, which can predict the mean absolute errors (MAEs) of molecules in the QM9 dataset including HOMO energy levels (EH), LUMO energy levels (EL) and the energy gap between them (EG) with an accuracy close to or less than 1 kcal/mol, and its computational cost is much lower than that of classical DFT calculations. In the present work, for the better prediction of Schnet, the main part of its structure is analyzed to obtain data correlations similar to inductive effect. Then, using the molecular SMILES available in the QM9 dataset to generate the bond steps information and replace part of the structure of Schnet to achieve simplification of the input information, which results in MAEs of about 1-2 times compared to the original Schnet. To change the structure of Schnet, the simplified structure of Schnet is further tested to predict fluorescence emission wavelengths, absorption wavelengths, and quantum yields of molecular data sets from Deep4Chem, and then an additional embedding layer is added to label the differences in molecular environments to input information about molecular environments into the model for better prediction results.In the subsequent improvement of the Schnet model, MAEs of the prediction of the absorption wavelengths and emission wavelengths of the fluorescent molecules in the Deep4Chem molecular dataset reached 0.131 eV and 0.087 eV, while MAEs of the prediction of the absorption wavelengths and emission wavelengths is reduced to 0.083 eV and 0.082 eV by adding an embedded layer to Schnet. MAEs in the predicted quantum yield is 0.336 and 0.292 for two models.
Description
Keywords
化學資訊學, 機器學習, 深度學習, 神經網路, Cheminformatics, Machine learning, Deep learning, Neural network