使用多種鑑別式模型以及特徵資訊於語音文件摘要之研究

dc.contributor陳柏琳zh_TW
dc.contributorBerlin Chanen_US
dc.contributor.author張鈺玫zh_TW
dc.contributor.authorYu Mei Changen_US
dc.date.accessioned2019-09-05T11:33:55Z
dc.date.available2010-8-25
dc.date.available2019-09-05T11:33:55Z
dc.date.issued2010
dc.description.abstract已有許多機器學習的摘要方法被應用於語音文件摘要,它們通常將文件摘要視分類問題(分兩類),嘗試從文件中挑選重要的語句做為摘要結果;然而,訓練語料不平衡的問題有時會影響這些摘要方法的效能。另一方面,藉由以增進分類正確率而訓練的摘要方法並不見得擁有較好的摘要結果。鑑於此種現象,本論文首先探討使用兩個不同的訓練準則的摘要方法,以減輕上述問題所造成的負面影響,並且得以提高摘要效能。其一為將訓練文件中成對語句之間的重要性排序資訊,做為摘要方法訓練之依據;另一則以直接最大化其摘要評估分數為準則做為計摘要方法訓練之依據。另外,一些訓練語句和特徵選取的方法也在本論文被廣泛地研究與比較。摘要實驗是在中文廣播新聞上進行;我們發現所使用的兩種訓練準則皆能夠展現出比基礎實驗方法較好的結果,但於訓練語句以及特徵選取方法似乎並不能顯地改善摘要效能。zh_TW
dc.description.abstractMany of the existing machine-learning approaches to speech summarization cast important sentence selection as a two-class classification problem; however, the imbalanced data problem sometimes results in a trained speech summarizer with unsatisfactory performance. On the other hand, training the summarizer by improving the associated classification accuracy does not always lead to better summarization evaluation performance. In view of such phenomena, this thesis investigates two different training criteria to alleviate the negative effects caused by them, as well as to boost the summarizer’s performance. One is to learn the classification capability of a summarizer on the basis of the pair-wise ordering information of sentences in a training document according to a degree of importance. The other is to train the summarizer by directly maximizing the associated evaluation score. Alternatively, a few methods for training sentence and feature selection are also extensively studied and compared. Experiment results on a broadcast news summarization task show that the presented two training criteria can drive up the performance as compared to baseline summarization system, while training sentence and feature selection seems to show mixed effectiveness.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifierGN0697470133
dc.identifier.urihttp://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0697470133%22.&%22.id.&
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106772
dc.language中文
dc.subject語音文件zh_TW
dc.subject摘錄式摘要zh_TW
dc.subject逐點式方法zh_TW
dc.subject成對式方法zh_TW
dc.subject序列式方法zh_TW
dc.subject訓練語料不平衡zh_TW
dc.subject貪婪演算法zh_TW
dc.subjectSpoken documenten_US
dc.subjectExtractive Summarizationen_US
dc.subjectPoint-wise Approachen_US
dc.subjectPair-wise Approachen_US
dc.subjectList-wise Approachen_US
dc.subjectUnbalance Training Dataen_US
dc.subjectGreedy Algorithmen_US
dc.title使用多種鑑別式模型以及特徵資訊於語音文件摘要之研究zh_TW
dc.titleExploiting Various Discriminative Models and Information Cues for Spoken Document Summarizationen_US

Files

Collections