基於視覺和聽覺的教學影片內容分析與分類
No Thumbnail Available
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
現在大部分的教室仍使用黑板,以黑板授課的教學影片亦相當普及,但黑板授課的教學影片在多媒體語意分析的領域深具挑戰性但極少被討論。本論文針對黑板授課的教學影片,提出一個基於視覺和聽覺的研究方法,針對講者的肢體行為與語音內容進行探討,用以提醒學生在不同時段的教學影片上要投入多少的注意力。在視覺分析上,針對講者於教學中出現的各種姿態作分析,辨別出講者姿態所代表的意義;而在聽覺分析上本研究提出一個基於語音情緒辨識的模型,針對講者的語音內容將講者語音分類為快樂、生氣、厭倦、悲傷、正常等五種聲音情緒,再藉由講者語音情緒上的變化來分析講者的教學狀態。
綜合視覺與聽覺的分析結果,我們可以評估出講者在教學時候各時段的重要性,同時也反映語意的強度。學習者可以根據每個時段下講者教學的重要性投注適當的注意力,讓學習者更有效率的藉由教學影片學習。
Most of the classrooms come with blackboards, and blackboards are widely used as a teaching prop in lecture video recordings. However, there are very few discussions about lecture video recordings that use blackboard as teaching prop concerning its multimedia semantics analysis. The article used a visual and optical based research method to explore speaker’s body languages and tone of speech in the blackboard lecture recordings, and how the amount of attention to pay in different segments of lecture recordings to enhance students’ learning. The visual analysis focused on semantics implied in speaker’s postures. The optical analysis focused on the variations of speaker’s speech emotions in his flow of teaching. The article proposed a speech emotion recognition model that divides speech emotions into five categories of happy, angry, bored, sad, and normal. The results of the analysis showed semantic intensity of the speaker and the importance of speakers teaching in different segments, and how students can learn more effectively with their variations in amount of attention according to the importance of speakers’ teaching throughout lecture video recordings.
Most of the classrooms come with blackboards, and blackboards are widely used as a teaching prop in lecture video recordings. However, there are very few discussions about lecture video recordings that use blackboard as teaching prop concerning its multimedia semantics analysis. The article used a visual and optical based research method to explore speaker’s body languages and tone of speech in the blackboard lecture recordings, and how the amount of attention to pay in different segments of lecture recordings to enhance students’ learning. The visual analysis focused on semantics implied in speaker’s postures. The optical analysis focused on the variations of speaker’s speech emotions in his flow of teaching. The article proposed a speech emotion recognition model that divides speech emotions into five categories of happy, angry, bored, sad, and normal. The results of the analysis showed semantic intensity of the speaker and the importance of speakers teaching in different segments, and how students can learn more effectively with their variations in amount of attention according to the importance of speakers’ teaching throughout lecture video recordings.
Description
Keywords
教學影片分析, 語音情緒辨識, 肢體辨識, lecture videos analysis, speech emotion recognition, gesture recognition