多重樂器自動採譜之探討
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
自動音樂採譜 (Automatic Music Transcription, AMT)是音樂資訊檢索 (Music Information Retrieval, MIR)中最重要的任務之一,由於其訊號的複雜性,它已被視為訊號處理中最具挑戰性的領域之一。在許多 AMT 任務中,多樂器採譜任務是通用採譜系統的關鍵步驟之一,但相關領域的研究卻很少。模型必須在一首樂曲當中,同時辨識多種樂器和其相應音高,而其中包括了不同樂器的各種音色和豐富的諧波(Harmonics),可能導致訊號彼此相互干擾,造成更為複雜的情況,因此與傳統的單樂器採譜研究相比,多樂器採譜成為了一個更進階且複雜的問題。除了存在技術本質上的困難,統整與協調不同層次的採譜問題、處理複雜的交互影響,也需要更加清晰與明確的問題定義,並針對最後的結果發展一套有效的評估方法。
在這項研究中,我們提出了一個多樂器自動採譜的方法。藉由發展一套從訊號層級的特徵工程、到最終評估結果的端到端流程,整合了多項技術以更好的處理此複雜的問題。當中結合了能夠清楚顯現音高特徵的訊號處理技術、新穎的深度學習模型,以及從多目標識別(Multi-object Recognition),實例分割(Instance Segmentation)、計算機視覺中,圖到圖轉換所激發出來的概念,進一步整合新發展的後處理演算法,提出來的系統對於多樂器採譜中的所有子任務,呈現出通用彈性且十分有效率的表現。在針對不同子任務進行綜合評估後,於各項指標上皆表現出了至今為止最優的結果,其中包括了過去從未被研究的多樂器音符層級採譜任務(Note-level Transcription)。
Automatic music transcription (AMT), one of the most important tasks in music information retrieval (MIR), has been seen as one of the most challenging field in signal processing because of its inherent complexity of signals. Among many of the AMT tasks, multi-instrument is one critical step for general transcription system, but yet a less investigated field. The requirement of identifying multiple instruments and the corresponding pitch in music performances, which consists of various timbres and rich harmonic information that could interfere with each other, making it a more advanced problem in comparison with the conventional single-instrument AMT problem. Despite the technical difficulties, to orchestrate different levels of the complex problem scopes, a clear definition of problem scenarios and efficient evaluation approaches are also needed. In this research, we propose a multi-instrument AMT approach, with a complete end-to-end flow from signal-level feature engineering to the final evaluation. Combined with signal processing techniques capable of specifying pitch saliency, novel deep learning methods, concepts inspired from multi-object recognition, instance segmentation, and image-to-image translation in computer vision, meanwhile being integrated with a newly developed post-processing algorithm, the proposed system is flexible and efficient for all the sub-tasks in multi-instrument AMT. Comprehensive evaluations on different sub-tasks have shown state-of-the-art performance, including the task of multi-instrument note tracking which has not been investigated before.
Automatic music transcription (AMT), one of the most important tasks in music information retrieval (MIR), has been seen as one of the most challenging field in signal processing because of its inherent complexity of signals. Among many of the AMT tasks, multi-instrument is one critical step for general transcription system, but yet a less investigated field. The requirement of identifying multiple instruments and the corresponding pitch in music performances, which consists of various timbres and rich harmonic information that could interfere with each other, making it a more advanced problem in comparison with the conventional single-instrument AMT problem. Despite the technical difficulties, to orchestrate different levels of the complex problem scopes, a clear definition of problem scenarios and efficient evaluation approaches are also needed. In this research, we propose a multi-instrument AMT approach, with a complete end-to-end flow from signal-level feature engineering to the final evaluation. Combined with signal processing techniques capable of specifying pitch saliency, novel deep learning methods, concepts inspired from multi-object recognition, instance segmentation, and image-to-image translation in computer vision, meanwhile being integrated with a newly developed post-processing algorithm, the proposed system is flexible and efficient for all the sub-tasks in multi-instrument AMT. Comprehensive evaluations on different sub-tasks have shown state-of-the-art performance, including the task of multi-instrument note tracking which has not been investigated before.
Description
Keywords
自動音樂採譜, 多音預測, 深度學習, 自注意力機制, 多音多樂器預測, automatic music transcription, multi-pitch estimation, multi-pitch streaming, deep learning, self-attention