Theodorou E.A.,University of Southern California |
Buchli J.,University of Southern California |
Schaal S.,University of Southern California |
Schaal S.,ATR Computational Neuroscience Laboratories
Journal of Machine Learning Research | Year: 2010
With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi- Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs. © 2010 Evangelos Theodorou, Jonas Buchli and Stefan Schaal.
Sugimoto N.,Japan National Institute of Information and Communications Technology |
Morimoto J.,ATR Computational Neuroscience Laboratories
IEEE-RAS International Conference on Humanoid Robots | Year: 2011
In this study, we introduce a phase-dependent trajectory optimization method for Central Pattern Generator (CPG)-based biped walking controllers. By exploiting the synchronization property of the CPG controller, many legged locomotion studies have shown that the CPG-based walking controller is robust against external perturbations and works well in real environments. However, due to the nonlinear dynamic property of the coupled oscillator system composed of the CPG controller and the robot, analytically designing the biped trajectory to satisfy the requirements of a target walking pattern is rather difficult. Therefore, using a nonlinear optimization method is reasonable to improve the walking trajectory. To optimize the walking trajectory, a model-free optimal control method is preferable because precise modeling of the ground contact is difficult. On the other hand, model-free trajectory optimization methods have been considered as quite computationally demanding approach. However, because of recent advances in the nonlinear trajectory optimization method, using the model-free optimization method is now a realistic approach fro biped trajectory optimization. We use a path integral reinforcement learning method to improve the biped walking trajectory for CPG-based walking controllers. © 2011 IEEE.
Izawa J.,Johns Hopkins University |
Izawa J.,ATR Computational Neuroscience Laboratories |
Pekny S.E.,Johns Hopkins University |
Marko M.K.,Johns Hopkins University |
And 5 more authors.
Autism Research | Year: 2012
The brain builds an association between action and sensory feedback to predict the sensory consequence of selfgenerated motor commands. This internal model of action is central to our ability to adapt movements and may also play a role in our ability to learn from observing others. Recently, we reported that the spatial generalization patterns that accompany adaptation of reaching movements were distinct in children with autism spectrum disorder (ASD) as compared with typically developing (TD) children. To test whether the generalization patterns are specific to ASD, here, we compared the patterns of adaptation with those in children with attention deficit hyperactivity disorder (ADHD). Consistent with our previous observations, we found that in ASD, the motor memory showed greater than normal generalization in proprioceptive coordinates compared with both TD children and children with ADHD; children with ASD also showed slower rates of adaptation compared with both control groups. Children with ADHD did not show this excessive generalization to the proprioceptive target, but they did show excessive variability in the speed of movements with an increase in the exponential distribution of responses (τ) as compared with both TD children and children with ASD. The results suggest that slower rate of adaptation and anomalous bias towards proprioceptive feedback during motor learning are characteristics of autism, whereas increased variability in execution is a characteristic of ADHD. © 2012 International Society for Autism Research, Wiley Periodicals, Inc.
Choi K.,ATR Computational Neuroscience Laboratories
European Journal of Applied Physiology | Year: 2012
To construct and evaluate a novel wheelchair system that can be freely controlled via electroencephalogram signals in order to allow people paralyzed from the neck down to interact with society more freely. A brain-machine interface (BMI) wheelchair control system was constructed by effective signal processing methods, and subjects were trained by a feedback method to decrease the training time and improve accuracy. The implemented system was evaluated through experiments on controlling bars and avoiding obstacles using three subjects. Furthermore, the effectiveness of the feedback training method was evaluated by comparison with an imaginary movement experiment without any visual feedback for two additional subjects. In the bar-controlling experiment, two subjects achieved a 95.00% success rate, and the third had a 91.66% success rate. In the obstacle avoidance experiment, all three achieved success rate over 90% success rate, and required almost the same amount of time to reach as that when driving with a joystick. In the experiment on imaginary movement without visual feedback, the two additional subjects adapted to the experiment far slower than they did with visual feedback. In this study, the feedback training method allowed subjects to easily and rapidly gain accurate control over the implemented wheelchair system. These results show the importance of the feedback training method using neuroplasticity in BMI systems. © 2011 Springer-Verlag.
Kruger V.,University of Aalborg |
Herzog D.,CVMI |
Ude A.,ATR Computational Neuroscience Laboratories |
Ude A.,Jozef Stefan Institute
IEEE Robotics and Automation Magazine | Year: 2010
In the area of imitation learning, one of the important research problems is action representation. There has been a growing interest in expressing actions as a combination of meaningful subparts called action primitives. Action primitives could be thought of as elementary building blocks for action representation. In this article, we present a complete concept of learning action primitives to recognize and synthesize actions. One of the main novelties in this work is the detection of primitives in a unified framework, which takes into account objects and actions being applied to them. As the first major contribution, we propose an unsupervised learning approach for action primitives that make use of the human movements as well as object state changes. As the second major contribution, we propose using parametric hidden Markov models (PHMMs) ,  for representing the discovered action primitives. PHMMs represent movement trajectories as a function of their desired effect on the object, and we will discuss 1) how these PHMMs can be trained in an unsupervised manner, 2) how they can be used for synthesizing movements to achieve a desired effect, and 3) how they can be used to recognize an action primitive and the effect from an observed acting human. © 2006 IEEE.