Bachelor Thesis BCLR-2017-105

BibliographyBöpple, Teresa: Relational regression methods to speed up Monte-Carlo planning.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 105 (2017).
47 pages, english.
Abstract

Monte-Carlo Tree Search is a planning algorithm that tries to find the best possible next action by using random simulations and estimating the return by them. One big advantage of Monte-Carlo Tree Search is that it can be used with very little domain knowledge, is implemented easily and is applicable for many problems. This thesis shows how Monte-Carlo planning is speed up by applying Relational Regression. A relational domain is given with states that consist of facts. In order to use regression on relation states, features have to be created that map the state to a vector that can be used by regression. Therefore, training data are created that contain states, possible actions and an estimated return of these state-action pairs. These data are created by a standard Monte-Carlo planner. All facts that occur in any state from the training data are written into a factlist. With the help of that factlist, features are created. Every row of the feature stands for a fact. If this fact occurs in a state, the feature of this state contains a 1 in the corresponding row. If not, the row contains a 0. Other features that are created also consider combinations of facts or actions. With the help of regression, these features can be mapped to a real value that corresponds to the expected return of the state or state-action pair. This is evaluated by testing it on a test dataset. The results of this test are that for a big enough and accurate training dataset the return calculated by regression is very close to the one calculated by the planner for the test data. Because of this promising results, the regression is actually integrated into a Monte-Carlo planner. For a well chosen set of training data, that contain a wide range of both terminal- and dead-end states, the planner improves in many ways. First, in average the modified planner needs less steps to reach a terminal state than the original planner. Second, the modified planner reaches a terminal state more often than the original planner, because the planner gets into a dead-end state less often now. With this the actual goal of this work is achieved and it is demonstrated that the planning process speeds up.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Machine Learning und Robotics
Superviser(s)Toussaint, Prof. Marc; Ngo, Hung; Ngo, Ph.D. Vien
Entry dateMay 9, 2022
New Report   New Article   New Monograph   Computer Science