人與機器在現代詩裡所呈現之韻律研究
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
此研究探討人與機器在閱讀與生成中文現代詩所呈現之韻律表現。過去文獻多探討唐詩或古典韻律詩的韻律特徵,本研究旨在分析朗讀中文現代詩中的韻律,並且進而探討文字佈局與性別對韻律特徵的影響。我們著重分析韻律中的時間性特徵,包含聲檔中聲韻單位的個數、聲韻單位時長、停頓個數、停頓位置與停頓時長。為研究人和機器呈現語音之韻律差異,此研究採用一文字轉語音系統生成語音,並招募一組中文母語者朗讀對照。我們以兩首結構各異的現代詩作為朗讀材料,分別以四種不同文字佈局呈現給朗讀者。讀者錄音檔和線上語音系統所生成之語音檔案於匯集、下載後,依據聲韻單位標記原則進行分析。
研究結果顯示人類說話者表現出更多並且更短的聲韻單位、更多變的停頓位置與更長的停頓時間。機器則表現出相對少量但更長的聲韻單位、可預測的停頓位置和更短的停頓時長。此外,本研究中也發現文字佈局與性別對韻律特徵的影響。相較於沒有標點符號的文本,人類和機器的語音皆在有標點符號的文本中呈現更多停頓。然而在含有不同小節或段落的文本中,只有機器呈現較長的停頓時間。性別對停頓策略的影響則顯示於女性朗讀語音中更多的停頓和男性朗讀語音中對停頓的省略。
本研究結果提供應用於語音系統朗讀現代詩的韻律特徵,並證實文字佈局和性別對人類語音韻律呈現息息相關。為促進語音合成系統在更多文本類型與說話風格之韻律表現,此實驗所探討之聲韻特徵如停頓位置、停頓時長可作為語音系統發展之重要影響因素。
關鍵字:韻律特徵、現代詩朗讀、停頓、文字轉語音
This study investigates prosodic performances between human speech and machine-generated speech in reading/generating Mandarin poems. Previous research has acknowledged the prosodic features in Tang poetry or poems with classical rhythmic format. This study therefore explores the prosody in contemporary Mandarin poems, and puts forward to the investigation of text layouts and gender effect. We focused on the durational features in prosody, namely the number of prosodic units (PU), PU duration, the number of pauses, pause location , and pause duration in each speech type.To examine the prosodic differences between human and machine-generated speech, one Text-to-Speech (TTS) system and a group of Mandarin native speakers were recruited to read the poems. The two selected poems featured in varied structure were placed in four different text layouts as reading materials. We downloaded the machine-generated speech from the online TTS website and recorded the human speech, then analyzed each speech file with the PU-labeling principles. Concerning the different prosodic performances between human and machine, our results showed that human speakers showed more and shorter PUs, more flexibility in pausing location and longer pause duration, while the machine displayed relatively fewer and longer PUs, predictable pause location and shorter pause duration. Evident effect of layouts can be seen in the current study as well. More pauses were shown in the layouts with punctuations in both human and machine speech compared with the text without punctuations, and longer pause duration was presented in the machine- generated speech in text with stanza breaks. Additionally, gender effect was observed in the pausing strategy, in which female speakers displayed more pauses and the male speakers missed more pauses.These findings shed light on how prosodic features can be applied to TTS systems in poetry reading style, and demonstrated that text layouts and gender differences are encoded in prosody of human speech. To enhance TTS development in more text types or speaking styles, durational features such as pause location and pause duration may be the influential factors for the furtherance of machine speech. Keywords: prosodic features, poem reading, durational features, pause, Text-to-speech synthesis
This study investigates prosodic performances between human speech and machine-generated speech in reading/generating Mandarin poems. Previous research has acknowledged the prosodic features in Tang poetry or poems with classical rhythmic format. This study therefore explores the prosody in contemporary Mandarin poems, and puts forward to the investigation of text layouts and gender effect. We focused on the durational features in prosody, namely the number of prosodic units (PU), PU duration, the number of pauses, pause location , and pause duration in each speech type.To examine the prosodic differences between human and machine-generated speech, one Text-to-Speech (TTS) system and a group of Mandarin native speakers were recruited to read the poems. The two selected poems featured in varied structure were placed in four different text layouts as reading materials. We downloaded the machine-generated speech from the online TTS website and recorded the human speech, then analyzed each speech file with the PU-labeling principles. Concerning the different prosodic performances between human and machine, our results showed that human speakers showed more and shorter PUs, more flexibility in pausing location and longer pause duration, while the machine displayed relatively fewer and longer PUs, predictable pause location and shorter pause duration. Evident effect of layouts can be seen in the current study as well. More pauses were shown in the layouts with punctuations in both human and machine speech compared with the text without punctuations, and longer pause duration was presented in the machine- generated speech in text with stanza breaks. Additionally, gender effect was observed in the pausing strategy, in which female speakers displayed more pauses and the male speakers missed more pauses.These findings shed light on how prosodic features can be applied to TTS systems in poetry reading style, and demonstrated that text layouts and gender differences are encoded in prosody of human speech. To enhance TTS development in more text types or speaking styles, durational features such as pause location and pause duration may be the influential factors for the furtherance of machine speech. Keywords: prosodic features, poem reading, durational features, pause, Text-to-speech synthesis
Description
Keywords
韻律特徵, 現代詩朗讀, 停頓, 文字轉語音, prosodic features, poem reading, durational features, pause, Text-to-speech synthesis