以機器學習方法預測溶劑對其有機螢光分子之放光波長
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
在19、20世紀,定量構效關係之方法逐漸發展,以機器學習方法對於預測化學分子的生物活性、藥物性質等的研究也與日俱增。許多軟體可以用於計算分子描述符,描述符為用於表示分子的物理化學性質。透過機器學習方法,我們可以預測有機螢光分子之放光波長,對於不同分子描述符以及溶劑效應之影響。本研究中,為使用SKlearn作為機器學習的方法。並使用線性迴歸、LASSO、隨機森林三種不同的迴歸方法訓練模型,且搭配K-means分群法及聚合階層式分群法來探討其模型訓練之表現。對於11146種SMILES分子,加入8種溶劑描述符後,以隨機森林迴歸方法進行模型訓練,或基於K-means分群及LASSO迴歸方法進行隨機森林迴歸方法之模型訓練,亦或是基於沃德法及LASSO迴歸方法進行隨機森林迴歸方法之模型訓練。其R^2分別有0.01至0.02的不等提升,且分別在各模型之重要性特徵,8種溶劑描述符有包含在其中,且有與共軛π鍵相關的描述符,對於預測放光波長有顯著的貢獻,與參考文獻結果具一致的解釋性。
In the 19th and 20th centuries, the method of quantitative structure-activity relationship gradually developed, and theresearch on predicting the biological activity and drug properties of chemical molecules by machine learning methods also increased day by day. Many of software can be used to compute molecular descriptors, which represent the physicochemical properties of molecules. Through machine learning methods, we can predict the emission wavelengths of organic fluorescent molecules, the impact of different molecular descriptors and solvent effects.In this study, SKlearn is used as a machine learning method. And use linear regression, LASSO, random forest three different regression methods to train the model, and use K-means clustering method and aggregation hierarchical clustering method to explore the performance of the model training.For 11146 kinds of SMILES molecules, after adding 8 kinds of solvent descriptors, the model training is carried out by random forest regression method, or the model training of random forest regression method is carried out based on K-means clustering and LASSO regression method, or based on Ward's method and The LASSO regression method performs model training of the random forest regression method. Its coefficient of determination has an improvement ranging from 0.01 to 0.02, and the importance features of each model, 8 kinds of solvent descriptors are included in it, and there are descriptors related to conjugated π-bonding , which have a significant contribution to the prediction of emission wavelengths, and have consistent interpretation with references.
In the 19th and 20th centuries, the method of quantitative structure-activity relationship gradually developed, and theresearch on predicting the biological activity and drug properties of chemical molecules by machine learning methods also increased day by day. Many of software can be used to compute molecular descriptors, which represent the physicochemical properties of molecules. Through machine learning methods, we can predict the emission wavelengths of organic fluorescent molecules, the impact of different molecular descriptors and solvent effects.In this study, SKlearn is used as a machine learning method. And use linear regression, LASSO, random forest three different regression methods to train the model, and use K-means clustering method and aggregation hierarchical clustering method to explore the performance of the model training.For 11146 kinds of SMILES molecules, after adding 8 kinds of solvent descriptors, the model training is carried out by random forest regression method, or the model training of random forest regression method is carried out based on K-means clustering and LASSO regression method, or based on Ward's method and The LASSO regression method performs model training of the random forest regression method. Its coefficient of determination has an improvement ranging from 0.01 to 0.02, and the importance features of each model, 8 kinds of solvent descriptors are included in it, and there are descriptors related to conjugated π-bonding , which have a significant contribution to the prediction of emission wavelengths, and have consistent interpretation with references.
Description
Keywords
定量構效關係, 機器學習, 螢光分子, 溶劑效應, QSAR, Machine learning, Flourescent molecules, Solvent effect