Master Thesis MSTR-2018-114

BibliographyKim, Hangbeom: Neural network user intent prediction for robot teleoperation.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 114 (2018).
49 pages, english.
Abstract

The focus of this thesis is to design a machine learning framework that predicts user intent during robot teleoperation using eye and hand gesture tracking devices. In this thesis, we focus on a natural pick-and-place task. A simple virtual environment simulates the operation of a gripper as well as collects input data from hand gestures and eye movements. Teleoperation technology allows to remotely operate robotic systems located in hostile, and inaccessible environments. Such environments are often incompletely known, and therefore not suitable for fully autonomous robots to operate in. In these situations, teleoperation can be used, which combines the problem-solving capabilities of the human with the precision and durability of the robot. Robots are extremely good at performing accurate precision motions, while humans are proficient in making complex planning and decisions. Recent systems succeed by first identifying their operator’s intentions, typically by analyzing the user’s direct input through a joystick or a keyboard. Further, to enhance this method, the user input is combined with semi-autonomous control in shared autonomy. Especially, the indirect inputs like eye gaze data are a plentiful source of information for assessing operator intention. When people perform manipulation tasks, their gaze center tends to observe goal objects before starting the movements towards the corresponding objects and also glimpse the objects during the tasks. In this thesis, we present an intent predictor and we compare it to six different models, developed using two low dimensional time series inputs (hand motion and eye movements tracking), to predict user intent. These models trained hand motion tracking, eye gaze tracking, and a combination of hand tracking and eye gaze tracking, respectively. With this implicit information, the models based on long short-term memory (LSTM) architecture for recurrent neural networks sequentially learn the goal region of the manipulation task regarding users intent. LSTMs are both general and effective at capturing long-term temporal dependencies. From this study, the eye gaze and hand-based prediction model enables to understand the salient region faster and more accurate. The mean value and the standard deviation of the distance error between the reference goal position and the predicted position with the highest probability showed to be less than one cell in the 28x28 grid. Also, the probability distribution around the goal position in the reference data and the predicted data showed similar shapes with KL divergence of 0.4. Our findings underline the intent predictor for achieving efficient human-robot interaction works.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Machine Learning und Robotics
Superviser(s)Toussaint, Prof. Marc; Mainprice, Dr. Jim; Oh, Yoojin
Entry dateFebruary 15, 2022
   Publ. Computer Science