表單文件手寫資料欄位擷取之研究

dc.contributor李忠謀zh_TW
dc.contributorGreg C. Leeen_US
dc.contributor.author楊淑雅zh_TW
dc.contributor.authorShu-Ya Yangen_US
dc.date.accessioned2019-08-29T07:55:56Z
dc.date.available2007-09-01
dc.date.available2019-08-29T07:55:56Z
dc.date.issued2007
dc.description.abstract本研究旨在針對表單文件自動化處理進行研究,針對表單處理中之手寫欄位分類、擷取與手寫資料擷取等問題提出解決的方法。在表單手寫欄位擷取的階段,分別利用表單中物件的尺寸大小、比例、物件整體性結構特性與物件方向性結構特徵,作為物件之分類特徵。為便於取得物件之結構特徵,本研究利用影像編碼的方式,將空白表單影像轉換成簡化的結構圖。同時為區辨說明欄位與包含說明文字之填寫欄位,分別利用欄位區域水平及垂直方向之像素投影,配合說明文字之分佈、大小與文字間距等特徵,進行分析辨識。 在手寫資料擷取的階段中,將已填寫之表單影像與已知空白表單樣本進行比 對後,根據相同類別的空白表單之手寫欄位資訊,擷取出已填寫表單中之手寫欄位資料。對於所擷取出之手寫資料中,因框線去除後,造成與框線相交之手寫筆畫斷裂的問題,提出判斷筆畫相交區段,並重建相交區段之手寫筆畫的方法,修補破碎手寫筆畫。 本研究之測試影像,共分為一般單純格式之表單影像與格式複雜之複合式表 單影像等兩類。由實驗結果可證明本研究所提出之方法,針對不同類型之表單影像,皆可得到不錯的效果。zh_TW
dc.description.abstractForm document analysis is one of the most essential tasks in document analysis and recognition. The problems of form fields and filled-in data extraction are two important parts of form document analysis. For form field extraction, the first major task was to classify the preprinted text, lines, check boxes, text boxes and the tables of a form. This thesis proposes a method which based on direction-invariant global structural features and directional dependant structural features to classify the form fields, and then extract the filled-in spaces in a form document. Since tables can contain both name fields and data fields, for the second task, we used a method based on horizontal and vertical color histogram distribution features to segment the fields and extract the data fields. For filled-in data extraction, we propose a method which based on Run-based algorithm and the idea of interpolation to detect the character strokes overlapped by printed form frame and reconstruct the broken strokes after removing the frame line. The experimental results on different types of form documents showed a 99% recognition rate on form fields extraction, and a 91% successful filled-in data extraction rate was achieved.en_US
dc.description.sponsorship資訊教育研究所zh_TW
dc.identifierGN0693080289
dc.identifier.urihttp://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0693080289%22.&%22.id.&
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/92918
dc.language中文
dc.subject表單文件辨識zh_TW
dc.subject表單手寫欄位擷取zh_TW
dc.subject手寫資料萃取zh_TW
dc.subject破碎字修補zh_TW
dc.subjectRun-Based 演算法zh_TW
dc.subjectForm document analysis and recognitionen_US
dc.subjectForm field extractionen_US
dc.subjectFilled-in data extractionen_US
dc.subjectBroken stroke reconstructionen_US
dc.subjectRun-based Algorithmen_US
dc.title表單文件手寫資料欄位擷取之研究zh_TW
dc.titleForm Field and Filled-in Data Extraction from Printed Documentsen_US

Files

Original bundle

Now showing 1 - 5 of 7
No Thumbnail Available
Name:
n069308028901.pdf
Size:
182.62 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
n069308028902.pdf
Size:
192.7 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
n069308028903.pdf
Size:
365.72 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
n069308028904.pdf
Size:
1.7 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
n069308028905.pdf
Size:
641.83 KB
Format:
Adobe Portable Document Format

Collections