Beyond Trial & Error: A Tutorial on Automated Reinforcement Learning

Date: 10.09.2024, 15:30-17:00

Room: 26-25/105

Speakers

Theresa Eimer, Leibniz University Hannover
André Biedenkapp, Albert-Ludwigs University Freiburg

Motivation

AutoML as a field continues to predominately focus on supervised learning as a target domain even though other fields within machine learning offer potential impact that is just as large. This tutorial serves as an introduction to one such field, Automated Reinforcement Learning (AutoRL). As Reinforcement Learning (RL) requires a significant amount of configuration and selection of algorithms, hyperparameters, and task sequencing, all of which are currently predominantly done by hand, AutoML methods can contribute to making RL algorithms more efficient and easier to apply. We will introduce the specific challenges that come with RL as a target domain, focusing on the dynamic configuration ideas that are especially important in this inherently nonstationary setting. Furthermore, we will discuss the current state of the art in different AutoRL categories like HPO, meta-learning, or task design to
enable AutoML researchers to apply their ideas in AutoRL.

Applying RL to a novel problem set is challenging for many different reasons, including choosing the best approach from a wealth of available algorithms, (meta)algorithmic extensions, and low level design decisions like hyperparameters and task sequencing. So far, this has been done largely by hand, leading to RL papers having significant cost overheads for, e.g., hyperparameter search. This cost, combined with the poor experimental and reporting practice often associated with it, is a significant barrier of entry to RL research and the application of RL to real-world problems. The AutoML community has significant expertise that could help solve these issues, but standard AutoML methods do not capture the unique challenges of the RL setting, such as the dynamic nature of the problem. This tutorial will enable AutoML researchers to apply their ideas to RL by introducing them to the most prominent problem settings and state-of-the-art AutoRL methods, opening an avenue for AutoML to impact a thriving research field.

The tutorial will follow our survey article on AutoRL (by Parker-Holder et al., 2022), including topics such as (dynamic) configuration or environment design that have become increasingly important in the RL community. In addition, we will extend this with recent progress in hyperparameter optimization for different RL paradigms (e.g. offline or multi-objective RL) and how AutoRL interacts with experimental design in RL.

Outline

Part 1: Introduction and algorithmic part on AutoRL

Motivation: Why does AutoRL matter?
Formal definition of AutoRL
Categories of AutoRL approaches (e.g. learning to learn, environment design, etc.)
Properties of AutoRL landscapes:
What are AutoRL-specific challenges compared to AutoML for supervised learning?
Why are dynamic configuration approaches important for RL, and how do we learn them?

Part 2: Practical guidelines and case study of hyperparameters

Examples of successful AutoRL, DAC and online approaches
Evaluation and Generalization of AutoRL
HPO for RL
Hyperparameters and experimental design
Forms of optimization with pros and cons (AC methods, PBT, heuristics, meta-gradients, etc.)
Combining HPO with other AutoRL domains and why this is important for RL generally

Speakers

Theresa Eimer

Theresa Eimer is a senior PhD student at the Leibniz University Hannover since 2019, supervised by Prof. Dr. Marius Lindauer. She completed her Master’s degree at the University of Freiburg, studying automated curriculum learning for Reinforcement Learning. In particular, her focus is on automated meta-reinforcement learning (Parker-Holder et al., 2022) to improve generalization across different task settings (Eimer et al., 2021b; Benjamins et al., 2023) as well as understanding the effects of hyperparameters in reinforcement learning (Eimer et al., 2021a, 2023). She served as the Diversity Chair of the AutoML Conf 2022 and co-organized the DAC4AutoML competition 2022 on learning hyperparameter schedules for Computer Vision and Reinforcement Learning. She has been a teaching assistant for the Reinforcement Learning lecture in Hannover from 2021 to 2023, co-lead the Social Responsibility in ML course from 2021 to 2023 and designed the new Advanced Topics in Deep RL lecture in 2024.

André Biedenkapp

André Biedenkapp is a Postdoctoral Researcher at the University of Freiburg. He received his Ph.D. from the University of Freiburg (Germany) in 2022 under the guidance of Prof. Dr. Frank Hutter and Prof. Dr. Marius Lindauer. He specializes in the fields of automated machine learning and dynamic algorithm configuration via meta-learned RL policies. His work continuously led him to study AutoRL in model-free Parker-Holder et al. (2022) and model-based settings (Zhang et al., 2021), including several new AutoRL methods (Franke et al., 2021; Zhang et al., 2021; Biedenkapp et al., 2021; Eimer et al., 2021b; Shala et al., 2023), as well as benchmarks (Shala et al., 2022; Rajan et al., 2023). He has received the Best Paper Award at GECCO’22 and was part of the team that won the NeurIPS 21 Black-Box Optimization Challenge. He has been serving as the general chair for the COSEAL (COnfiguration and SElection for ALgorithms) group since 2022. In this function he is involved in organizing the yearly COSEAL workshops. He serves as AutoML Conference – Online Experience Chair in 2023 and 2024, was local organizer of the 2nd AutoML Fall School in 2022 and he also organized weekly ELLIS unit Freiburg meetups until December 2022.Together with Prof. Dr. Marius Lindauer, he gave a tutorial on “Algorithm Configuration: Challenges, Methods, and Perspectives” at IJCAI 2020 and PPSN 2020. Since 2018 he has been a teaching assistant for the Machine Learning for Automated Algorithm Design and Automated Machine Learning graduate courses and he has also given a guest lecture on “Meta-Algorithmics & AutoML” for the undergraduate course “Artificial Intelligence Practice”.