基於深度神經網路之橢球作物自動採收辨識系統研究
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
由於農業面臨日益嚴重的勞動力短缺問題,使得智慧農業已成為發展的方向,其中在自動化採收中視覺辨識與姿態估計為待解決的課題之一。由於現有深度學習方法受限於作物外觀的高度變異、遮蔽狀況及對大量標註資料的依賴。同時,市售 RGB-D 相機在近距離(約 1 公尺)內所產生量化誤差,使點雲數據出現階梯狀失真,嚴重影響姿態估算的準確性。針對此問題,本研究以經濟價值高且形狀近似橢球的小型番茄為對象,提出結合Mask R-CNN和EPNet之創新視覺系統,EllipsoidParamsNet,以有效辨識並估計其三維位置與空間姿態。系統中, Mask R-CNN可進行實例分割, EPNet 是包含具空間轉換網路、自編碼器與點雲補償功能的量化誤差修正模組,與對應此模組之橢球姿態估計的最小平方法,及用於採收時的橢球參數修正器。結果顯示,與傳統高斯濾波及 RANSAC 方法相比, EPNet 在10000 筆點雲資料比較實驗中的處理速度上快近 190 倍,並具備更高的準確性與穩定性。而在溫室採收實驗中,對於不同傾斜角度之小型番茄,其採收成功率大於 70% 。除此,本研究亦成功應用於小黃瓜採收,顯示本系統可實現跨作物之採收。
Due to the growing labor shortage in agriculture, this study addresses the challenges of visual recognition and pose estimation in automated harvesting. Current deep learning methods are limited by high crop appearance variability, occlusions, and reliance on large annotated datasets. Moreover, commercial RGB-D cameras often produce quantization errors at close range (within ~1 meter), causing step-like distortions in point cloud data that significantly impair pose estimation. To address these issues, this study targets small tomatoes—economically valuable and ellipsoid-like in shape—and proposes an innovative visual system to accurately identify and estimate their 3D position and orientation. The proposed system, EllipsoidParamsNet, integrates instance segmentation via Mask R-CNN with EPNet—a quantization error correction module incorporating Spatial Transformer Networks, autoencoders, point cloud compensation, least-squares ellipsoid pose estimation, and a parameter refinement module tailored for harvesting. Compared to conventional methods such as Gaussian filtering and RANSAC, EPNet achieves nearly 190 faster processing in the comparison experiment of 10,000 point cloud data while offering superior accuracy and stability. Greenhouse experiments confirm the system’s robustness under various tilt angles, with harvesting success rates exceeding 70%. While multi-object scenes remain challenging, the system has also been successfully applied to cucumbers, demonstrating promising cross-crop applicability and laying a foundation for future developments in multi-target and generalizable harvesting systems.
Due to the growing labor shortage in agriculture, this study addresses the challenges of visual recognition and pose estimation in automated harvesting. Current deep learning methods are limited by high crop appearance variability, occlusions, and reliance on large annotated datasets. Moreover, commercial RGB-D cameras often produce quantization errors at close range (within ~1 meter), causing step-like distortions in point cloud data that significantly impair pose estimation. To address these issues, this study targets small tomatoes—economically valuable and ellipsoid-like in shape—and proposes an innovative visual system to accurately identify and estimate their 3D position and orientation. The proposed system, EllipsoidParamsNet, integrates instance segmentation via Mask R-CNN with EPNet—a quantization error correction module incorporating Spatial Transformer Networks, autoencoders, point cloud compensation, least-squares ellipsoid pose estimation, and a parameter refinement module tailored for harvesting. Compared to conventional methods such as Gaussian filtering and RANSAC, EPNet achieves nearly 190 faster processing in the comparison experiment of 10,000 point cloud data while offering superior accuracy and stability. Greenhouse experiments confirm the system’s robustness under various tilt angles, with harvesting success rates exceeding 70%. While multi-object scenes remain challenging, the system has also been successfully applied to cucumbers, demonstrating promising cross-crop applicability and laying a foundation for future developments in multi-target and generalizable harvesting systems.
Description
Keywords
農業採收機器人, 機器視覺, 姿態估計, 橢球擬合, 量化誤差修正, Agricultural Harvesting Robot, Machine Vision, Pose Estimation, Ellipsoid Fitting, Quantization Error Correction