Demonstration-Guided Motion Planning
- Authors: Ye G., Alterovitz R.
- Venue: The 15th International Symposium ISRR (2017)
- Year: 2017
- Reviewed by: Josh Ashley, Daniel Kennedy, Landon Clark, Huitao Guan,
Broad area/overview
This paper deals with the problem of demonstrations as inputs to path planning and utilizing these demonstrations using statistics to determine critical aspects of a trajectory to complete a path while being able to also avoid novel obstacles.
Notation
None of the equations are necessary for discussing the novelty of this paper.
Specific Problem
The problem that this paper seeks to address is how to extract meaningful information about a trajectory from a given demonstration even from differing environments.
Solution Ideas
The paper uses variance of the kinematics of the manipulator as a function of time to determine significant aspects of demonstrations. Aspects with low variation correspond to actions the manipulator will try to reproduce in the new environment.
For obstacle avoidance, the manipulator uses a sampling-based trajectory planning algorithm called Multi-Component Rapidly-Exploring Roadmap (MC-RRM).
Their implementation starts by transposing the guide path onto the current environment. If the guide path collides with an obstacle, MC-RRM is used to connect the two, now disjointed segments of the path.
The lowest cost path of the MC-RRM is computed using Dijkstra's algorithm.
Additionally, a cost metric is applied to the MC-RRM path planner that is a representation of the constraints derived from the demonstration.
The cost metric is a function of time and joint configuration. Where the farther the manipulator skews from the mean demonstration variables, the higher the cost becomes.
Comments
The algorithm was successful in creating new trajectories while maintaining constraints only conveyed by the demonstrations. In particular, it could avoid novel objects while continuing to hold a spoon full of sugar upright.
Simple statistical modelling of the kinematics might not be enough to convey certain, more complex constraints. Additionally, maybe time is not the only dependency of a given constraint in a demonstration. For instance, people will intuitively move slower and more precisely when close to an obstacle to minimize risk, if the obstacles are novel then this would not be conveyed by demonstrations.
Multivariant analysis with more complex feature extraction methods could have great potential in analyzing the demonstrations. Particularly developing potential policy networks from the demonstrations to produce the cost.
Recent Papers
Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control - As said in the comments, developing better feature extraction of the demonstration through machine learning.
Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach - Same concept as the other recent paper more focused and applied.
© Hasan Poonawala. Last modified: March 17, 2021. Website built with Franklin.jl and the Julia programming language.