建立和應用具有幽默風格的生成對話系統

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本研究旨在建置與應用一個具有幽默風格之對話系統。藉由2019 年CECG(Chinese Emotional Conversation Generation)評估任務所使用的170萬則對話語料,整合 GPT-2 與 BERT 等工具與技術進行實作,建立與應用一個具備情感對話的系統;而後結合LCCC(Large-scale Cleaned Chinese Conversation)base版本680萬則對話語料,讓對話系統擁有更豐富的對話內容;最後加上 156 句具有幽默風格的少量撩妹語料進行微調(fine-tuning),同時透過前導文句調整(prefix-tuning)來控制文字的生成。 系統成效評估是基於以下準則:(一)建立兩個對話系統,一個經由CECG 和 LCCC-base 語料庫進行訓練,並用撩妹幽默語料進行微調,另一個僅由CECG 和 LCCC-base 語料庫進行訓練。(二)在第一輪當中,使用帶有調情性質的自訂文句作為會話的開始,並測試50次。(三)評估每次對話是否連貫流暢,同時,最後一輪的結束對話是否具有如同調情般的幽默風格。(四)測試最多3輪。 過程由四位人工判斷,沒有使用撩妹語料進行微調的對話系統,其生成回應具有撩妹效果的有29%,而使用撩妹語料進行微調的對話系統,其生成回應具有撩妹效果的有62%。 本研究的主要貢獻如下:(一)將情感融入發文字串,作為條件求機率,以便簡潔地依原方式訓練,並使用 GPT-2。(二)運用 BERT 來預測回應文句的連貫性,以作為排序的依據。(三)透過少量的語料來微調預訓練模型,改變模型的文字生成風格。(四)透過前導文句的調整,來實作出具有幽默風格的多輪對話系統。
The purpose of this study is to build and apply a generative dialogue system with humorous styles. Based on the corpora provided by the 2019 Chinese Emotional Conversation Generation (CECG) evaluation task, Large-scale Cleaned Chinese Conversation base version (LCCC-base) and flirting conversation retrieved from the Internet, an emotional conversation system is implemented in this paper using GPT-2 and BERT. Meanwhile, the generation of response from this system is refined via prefix-tuning. The effectiveness of this system is evaluated based on the steps as shown below: (1) Build two dialogue systems one is trained by the corpora of CECG and LCCC-base and fine-tuned with flirting corpus; the other is only trained by the corpora of CECG and LCCC-base. (2) Use a customized sentence with flirting words in the initial conversation and test this kind of conversation 50 times. (3) Evaluate whether every conversation is coherent and fluent; meanwhile, evaluate whether the ending dialogue of the final round is with humorous style like flirting. (4) Converse with the system at most 3 rounds in each conversation. Following these steps, four human annotators converse with the system. The results show that the effectiveness of the dialogue system which is only trained by the corpora of CECG and LCCC-base is 29%, and the effectiveness of the other which is trained by the corpora of CECG and LCCC-base and fine-tuned with flirting corpus is 62%. The main contributions of this study are: (1) Integrating emotions into the post string as a condition for computing probability, without changing the way to train and apply GPT-2; (2) Applying BERT to predict the coherence of response sentences as a basis for response ranking; (3) Fine-tuning a language model with few-shot to change the styles of the response generated from a dialogue system; (4) Implementing a multi-turn dialogue system with humorous styles via prefix-tuning.

Description

Keywords

對話系統, 文字生成, 文意理解, 深度學習, 人工智慧, Conversational system, Text generation, Text understanding, Deep learning, Artificial intelligence

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By