Master Thesis MSTR-2017-118

BibliographyZaheri, Hamid Reza: Hierarchical manipulation learning from demonstration.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 118 (2017).
41 pages, english.
Abstract

Despite many efforts in the field of Robotics, human-robot interface still suffers from unnecessary complexities that in turn restricts the application of robots. A plausible solution is an intuitive interface to teach robots variety of tasks via demonstration. In this work, we attempt to design such a framework that allows the robot to learn multi step object manipulation tasks from only a handful of demonstration. In particular, we are interested in the problem of pick and placing. To make the learning process tractable and allow the skills to be generalized, we break down the monolithic policies to hierarchical skills. In other words, Robot acquires different set of skills such as grasping different object and variety of trajectories from different learning sessions and is then able to combine those skills to achieve the objective of a novel task the might require a combination of the previously learned skills by the robot. To accomplish this, we define a notion of action types and use them as labels to train our segmentation model. The process of features extraction from the segmentation process will then be merged with the training process of our controller model where it helps the controller to more carefully learn the relevant features that are associated with the preconditions and postconditions of each segment. To efficiently extract visual features from the cameras installed on the robot, we propose a novel method of training an encoder to transform RGB stereo images into their corresponding depth map, and while doing that, extracting the most important visual features that are relevant to object positions and their corresponding distance from the camera. This is implemented using convolutional neural networks with a bottleneck (feature vector) in the middle to reduce the dimensionality of the data. Next, the extracted visual features are combined with the kinematic features of the demonstrations i.e. pose of the robot’s end-effector as an input to our segmentation and controller modules. Moreover, we introduce a novel methods of grasp prediction using a two-step prediction model. Our algorithm uses a coarse to fine approach to predict the position and orientation of the free grasping points on different object. Also grasp predictions are based on the user predefined position on each object, which is one of the practical requirements in robot grasping tasks in industrial applications. Only the simulated scenes has been used in this work. This might raise the concern over the transferability of the learned skills to the real world applications. We address this issue by suggesting different ways to mitigate this problem, especially for the grasp prediction skill.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Machine Learning und Robotics
Superviser(s)Toussaint, Prof. Marc
Entry dateMay 9, 2022
   Publ. Computer Science