醫療檢驗報告關鍵字擷取與結構化之研究
No Thumbnail Available
Date
2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
近來醫療科技的進步,可以對病人做更精確且詳細的檢查。但很多檢驗報告並非是一些數據化的數值,而是檢驗科醫生對於檢驗過程中採用儀器與技術檢查觀察發現的結果,以文字描述說明。若能將上述非結構化的文字檢驗報告轉換成一種結構化檢驗報告,將可幫助診斷醫師較有效率了解病人在不同檢查項目的狀況,更進一步可進行病症資料關聯分析,找出影響病症的潛在因素。本論文對腎臟科病理檢驗報告,運用自然語言詞性分析設計出可自動擷取出關鍵字詞組的方法,建立檢驗報告中各個段落的醫療詞彙字典,作為檢驗報告結構化之詞彙擷取依據。並運用主題機率模型分析,提出可自動擷取檢驗報告主要檢驗細項關鍵字詞的方法。最後利用醫療詞彙字典,實作出將檢驗報告依照不同段落特性個別進行結構化的方法。實驗結果顯示本論文提供的處理技術,能有效將檢驗報告進行結構化,並可擷取出常見檢驗細項關鍵詞,將有助醫療文字報告的自動處理及分析。
In recent years, the patients usually accept more and more accurate and detailed examinations because of the rapid advances in medical technology. Many of the examination reports are not represented in numerical data, but are text documents written by the medical examiners according to the observations obtained from the instruments and biochemical tests. If the above-mentioned unstructured data can be converted into a examination report in a structured form, it will help the doctors to understand the patient's status in different examination items more efficiently. Besides, further association analysis on the structural data can be performed to identify potential factors that affect a disease. In this thesis, from the pathology examination reports of renal disease, we applied the POS tagging result of natural language analysis to automatically extract the keyword phrases. Then a medical vocabulary dictionary of examination report for each paragraph is established, which is used as the basic information for retrieving the terms to construct a structured form of the report. Besides, a topic probability modeling method is applied to automatically find the keywords of the examination items from the reports. Finally, a system is implemented to generate the structured form for the various types of paragraphs in an examination report with the assistance of the constructed medical dictionary. The results of experiments showed that the methods proposed in this paper can effectively construct a structural form of examination reports. Furthermore, the keywords of the popular examination items can be extracted correctly. The above techniques will help automatic processing and analysis of medical text reports.
In recent years, the patients usually accept more and more accurate and detailed examinations because of the rapid advances in medical technology. Many of the examination reports are not represented in numerical data, but are text documents written by the medical examiners according to the observations obtained from the instruments and biochemical tests. If the above-mentioned unstructured data can be converted into a examination report in a structured form, it will help the doctors to understand the patient's status in different examination items more efficiently. Besides, further association analysis on the structural data can be performed to identify potential factors that affect a disease. In this thesis, from the pathology examination reports of renal disease, we applied the POS tagging result of natural language analysis to automatically extract the keyword phrases. Then a medical vocabulary dictionary of examination report for each paragraph is established, which is used as the basic information for retrieving the terms to construct a structured form of the report. Besides, a topic probability modeling method is applied to automatically find the keywords of the examination items from the reports. Finally, a system is implemented to generate the structured form for the various types of paragraphs in an examination report with the assistance of the constructed medical dictionary. The results of experiments showed that the methods proposed in this paper can effectively construct a structural form of examination reports. Furthermore, the keywords of the popular examination items can be extracted correctly. The above techniques will help automatic processing and analysis of medical text reports.
Description
Keywords
關鍵字詞擷取, 醫療檢驗報告結構化, 醫學詞彙字典建立, keyword extraction, structuralization for medical report, establishment of medical vocabulary dictionary