應用深度學習語言模型於偵測安心專線中自殺訊息之研究
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
自殺是一個全球性的死亡主因,許多研究都嘗試以早期偵測的角度來切入自殺防治。找出潛在的自殺者是個重要但困難的任務,本研究嘗試使用近幾年快速發展的深度學習自然語言處理技術來建立自殺風險預測模型,以及偵測語句中兩類自殺相關訊息,包含「表達自殺意念/自殺企圖」和「含有自殺方式」。研究資料使用866個安心專線的通話錄音檔,將錄音檔騰打為逐字稿後進行文字處理和分析。在建立安心專線來電者的自殺風險預測模型上,使用了Sentence-BERT的語意相似度與兩個自殺量表題目比對,篩選出有助於預測的句子,並進一步透過Sentence-BERT對句子進行特徵提取以建立分類模型。此外,本研究嘗試透過人工標註的方式提供句子層級的自殺相關訊息,人工標註的結果會用於建立句子層級的自殺相關訊息預測模型,微調BERT以及兩類機器學習模型將被用於訓練此類模型。最後,本研究會檢驗標註的自殺相關訊息對於預測自殺風險的增益效果。研究結果顯示,使用Sentence-BERT提取的句子嵌入資訊能夠有效預測自殺風險,結合主成分分析與隨機森林之分類正確率達到83.9%。而在偵測語句中自殺相關訊息的任務上,微調BERT訓練的模型表現優於另外兩類使用句子嵌入資訊的機器學習模型,在「表達自殺意念/自殺企圖」與「含有自殺方式」的分類正確率分別為95.8%、99.1%。最後,本研究使用的兩類自殺相關訊息對於預測自殺風險並沒有額外的增益效果。
Suicide is a leading cause of death in the globe, and many studies underscore early detection in suicide prevention. However, it is a difficult task to identify suicide attempters. The present research employs the natural language processing techniques based on deep learning to build the model for suicide risk prediction and detecting suicide-related information, including “expression of suicidal ideation/suicide attempt” and “mentioning of suicide method.”In this paper, 866 speech recordings from the Lifeline were transcribed and analyzed. These recordings were used to build a suicide risk prediction model. The model used Sentence-BERT to compare the semantic similarity between the recordings and two suicide scales in order to identify sentences high in predictive power. Furthermore, Sentence-BERT was again used to feature extraction on these sentences in order to build a classification model. In addition, to provide suicide-related information on the sentence level, this paper used manual coding for the recordings. These manual coding were used to build model on the sentence-level and fine tune the BERT model. Finally, the paper examined the incremental predictive effect of model-classified suicide-related information on predicting suicide risk.Results showed that information of sentence embedding from Sentence-Bert was able to effectively predict suicide risks. The accuracy of random forest combined with principal component analysis was to 83.9%. For detection of suicide-related information, the fine-tuned BERT model was better than the other two machine-learning models. The accuracy of suicidal ideation/suicide attempt and suicide method is 95.8% and 99.1%, respectively. Finally, the paper did not find incremental predictive effect on predicting two kinds of suicide-related information to predict suicide risks.
Suicide is a leading cause of death in the globe, and many studies underscore early detection in suicide prevention. However, it is a difficult task to identify suicide attempters. The present research employs the natural language processing techniques based on deep learning to build the model for suicide risk prediction and detecting suicide-related information, including “expression of suicidal ideation/suicide attempt” and “mentioning of suicide method.”In this paper, 866 speech recordings from the Lifeline were transcribed and analyzed. These recordings were used to build a suicide risk prediction model. The model used Sentence-BERT to compare the semantic similarity between the recordings and two suicide scales in order to identify sentences high in predictive power. Furthermore, Sentence-BERT was again used to feature extraction on these sentences in order to build a classification model. In addition, to provide suicide-related information on the sentence level, this paper used manual coding for the recordings. These manual coding were used to build model on the sentence-level and fine tune the BERT model. Finally, the paper examined the incremental predictive effect of model-classified suicide-related information on predicting suicide risk.Results showed that information of sentence embedding from Sentence-Bert was able to effectively predict suicide risks. The accuracy of random forest combined with principal component analysis was to 83.9%. For detection of suicide-related information, the fine-tuned BERT model was better than the other two machine-learning models. The accuracy of suicidal ideation/suicide attempt and suicide method is 95.8% and 99.1%, respectively. Finally, the paper did not find incremental predictive effect on predicting two kinds of suicide-related information to predict suicide risks.
Description
Keywords
自殺預測, 自然語言處理, BERT, Sentence-BERT, suicide prediction, natural language processing, BERT, Sentence- BERT