基於檢索增強分類之網路注入攻擊偵測—以跨站腳本攻擊為例
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
隨著網頁應用日益複雜,跨站腳本(XSS)攻擊手法亦持續演化,傳統依賴關鍵字比對或語法規則的偵測方法在語句變形與語意擾動下易失效,對實際防禦效果造成挑戰。為解決此問題,本研究提出一套結合語意檢索與分類推論的混合式架構,稱為 Retrieval-Augmented Classification(RAC),透過 Sentence-BERT 建立語句語意嵌入,並結合 FAISS 向量資料庫進行語意比對與分類推論,實現具備語意理解能力的 XSS 攻擊偵測系統。本研究特別針對資料稀疏環境設計多組實驗,模擬從完全無攻擊樣本到高密度攻擊場景的語意資料庫組成,觀察語意檢索在不同樣本條件下的辨識穩定性與分類效果。實驗結果顯示,即使在極低樣本支援下,本架構仍能快速達到高分類精度,並於與多組語言模型之比較中展現準確性、資源效率與樣本利用率等多項優勢。整體系統架構模組化、可擴展性高,適用於多類型惡意語句的辨識任務,亦可作為未來結合多語境資訊或延伸至 SQLi、Command Injection 等其他注入型攻擊防禦應用之基礎。綜合而言,本研究提供一套兼顧語意理解與實作可行性的 XSS 偵測解法,為語言模型應用於資安領域提供實務參考與理論支持。
Cross-site scripting (XSS) attacks have evolved to bypass traditional detection methods, which often rely on keyword matching or syntax rules. These conventional approaches struggle with detecting obfuscated or semantically modified inputs. To address this, we propose a Retrieval-Augmented Classification (RAC) framework that integrates Sentence-BERT for semantic embedding with FAISS-based vector retrieval. The system enables accurate legality judgment of input queries through semantic similarity comparison.We design experiments under data-scarce conditions, simulating scenarios from zero to high-density malicious samples in the semantic database. Results show that the proposed method maintains high classification accuracy even with minimal labeled data and outperforms baseline language models in accuracy and sample efficiency.The system’s modular and extensible design supports broader malicious input detection and can be extended to other injection attacks such as SQLi and command injection. This research demonstrates a practical and semantically aware approach to XSS detection, offering a viable strategy for applying language models in cybersecurity.
Cross-site scripting (XSS) attacks have evolved to bypass traditional detection methods, which often rely on keyword matching or syntax rules. These conventional approaches struggle with detecting obfuscated or semantically modified inputs. To address this, we propose a Retrieval-Augmented Classification (RAC) framework that integrates Sentence-BERT for semantic embedding with FAISS-based vector retrieval. The system enables accurate legality judgment of input queries through semantic similarity comparison.We design experiments under data-scarce conditions, simulating scenarios from zero to high-density malicious samples in the semantic database. Results show that the proposed method maintains high classification accuracy even with minimal labeled data and outperforms baseline language models in accuracy and sample efficiency.The system’s modular and extensible design supports broader malicious input detection and can be extended to other injection attacks such as SQLi and command injection. This research demonstrates a practical and semantically aware approach to XSS detection, offering a viable strategy for applying language models in cybersecurity.
Description
Keywords
XSS 攻擊, 語意檢索, 語句分類, 資訊安全, Sentence-BERT, XSS attack, semantic retrieval, Sentence-BERT, sentence classification, cybersecurity