下偏矩近端策略最佳化：提升機器人在平衡板上的穩定性

包傑奇Jacky Baltes廖翊承Liao, Yi-Cheng2025-12-092025-06-302025https://etds.lib.ntnu.edu.tw/thesis/detail/56bc277e0be6270e6a86aad3a13f69d1/http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125044noneThis study proposes an improved version of the Proximal Policy Optimization (PPO) algorithm by incorporating the Lower Partial Moment (LPM) method. The added loss function penalizes low advantage values, aiming to enhance the policy’s robustness against noise and performance. The new LPM-PPO algorithm is compared with leading methods such as SAC, DDPG, TRPO, and RPO across multiple Isaac Gym simulation environments to verify its effectiveness. For the Sim2Real transfer, the research applies the balance board task to a real-world humanoid robot. This process accounts for complex physical factors like friction, inertia, mass distribution, and motor dynamics. To accurately collect observations, the study uses OpenCV for vision-based tracking, forward kinematics for position estimation, and adds noise during training to mimic real-world sensor errors—improving the robot’s real-world adaptability and robustness.noneHumanoid RobotsLPM-PPOReinforcement LearningSim2RealBalance Board下偏矩近端策略最佳化：提升機器人在平衡板上的穩定性Lower Partial Moment Proximal Policy Optimization: Enhancing Robot Stability on Balance Boards學術論文