Abstract or Introduction
Robotic agents can be made to learn various tasks through simulating many years of robotic interaction with the environment which cannot be made in case of real robots. With the abundance of a large amount of replay data and the increasing fidelity of simulators to implement complex physical interaction between the robots and the environment, we can make them learn various tasks that would require a lifetime to master. But, the real benefits of such training are only feasible, if it is transferable to the real machines. Although simulations are an effective environment for training agents, as they provide a safe manner to test and train agents, often in robotics, the policies trained in simulation do not transfer well to the real world. This difficulty is compounded by the fact that oftentimes the optimization algorithms based on deep learning exploit simulator flaws to cheat the simulator in order to reap better reward values. Therefore, we would like to apply some commonly used reinforcement learning algorithms to train a simulated agent modelled on the Aldebaran NAO humanoid robot.
The problem of transferring the simulated experience to real life is called the reality gap. In order to bridge the reality gap between the simulated and real agents, we employ a Difference model which will learn the difference between the state distributions of the real and simulated agents. The robot is trained on two basic tasks of navigation and bipedal walking. Deep Reinforcement Learning algorithms such as Deep Q-Networks (DQN) and Deep Deterministic Policy Gradients(DDPG) are used to achieve proficiency in these tasks. We then evaluate the performance of the learned policies and transfer them to a real robot using a Difference model based on an addition to the DDPG algorithm.
- Quote paper
- Suman Deb (Author), 2019, Humanoid robot control policy and interaction design- a study on simulation to machine deployment, Munich, GRIN Verlag, https://www.grin.com/document/493652