TY - JOUR
T1 - Fast Autolearning for Multimodal Walking in Humanoid Robots with Variability of Experience
AU - Figueroa, Nicolas F.
AU - Tafur, Julio C.
AU - Kheddar, Abderrahmane
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Recent advancements in reinforcement learning (RL) and humanoid robotics are rapidly addressing the challenge of adapting to complex, dynamic environments in real time. This letter introduces a novel approach that integrates two key concepts: experience variability (a criterion for detecting changes in loco-manipulation) and experience accumulation (an efficient method for storing acquired experiences based on a selection criterion). These elements are incorporated into the development of RL agents and humanoid robots, with an emphasis on stability. This focus enhances adaptability and efficiency in unpredictable environments. Our approach enables more sophisticated modeling of such environments, significantly improving the system's ability to adapt to real-world complexities. By combining this method with advanced RL techniques, such as Proximal Policy Optimization (PPO) and Model-Agnostic Meta-Learning (MAML), and incorporating self-learning driven by stability, we improve the system's generalization capabilities. This facilitates rapid learning from novel and previously unseen scenarios. We validate our algorithm through both simulations and real-world experiments on the HRP-4 humanoid robot, utilizing an intrinsically stable model predictive controller.
AB - Recent advancements in reinforcement learning (RL) and humanoid robotics are rapidly addressing the challenge of adapting to complex, dynamic environments in real time. This letter introduces a novel approach that integrates two key concepts: experience variability (a criterion for detecting changes in loco-manipulation) and experience accumulation (an efficient method for storing acquired experiences based on a selection criterion). These elements are incorporated into the development of RL agents and humanoid robots, with an emphasis on stability. This focus enhances adaptability and efficiency in unpredictable environments. Our approach enables more sophisticated modeling of such environments, significantly improving the system's ability to adapt to real-world complexities. By combining this method with advanced RL techniques, such as Proximal Policy Optimization (PPO) and Model-Agnostic Meta-Learning (MAML), and incorporating self-learning driven by stability, we improve the system's generalization capabilities. This facilitates rapid learning from novel and previously unseen scenarios. We validate our algorithm through both simulations and real-world experiments on the HRP-4 humanoid robot, utilizing an intrinsically stable model predictive controller.
KW - Deep Learning Methods and Reinforcement Learning
KW - Humanoid and Bipedal Locomotion
UR - http://www.scopus.com/inward/record.url?scp=85218934204&partnerID=8YFLogxK
U2 - 10.1109/LRA.2025.3546168
DO - 10.1109/LRA.2025.3546168
M3 - Article
AN - SCOPUS:85218934204
SN - 2377-3766
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
ER -