Editor’s note: Maggie is a speaker for ODSC APAC 2021. Check out her talk, “How to Get Started With Deep Reinforcement Learning on a Variety of Use Cases?” there!

I, like many others, first heard of reinforcement learning (RL) in the context of games. I watched a documentary about AlphaGo winning against Lee Sedol, the world Go champion, which, I must admit, made me shed a tear or two.

deep reinforcement learning

DeepMind’s AlphaGo playing Go.

However, when the opportunity emerged to try to apply RL in my work, a whole bunch of new questions flew into my mind. What algorithm should we use? Do we need to implement them all? How is it gonna work with our customers’ use cases? Are they gonna be okay to wait for it to train? Do they want a lot of flexibility, which comes with higher complexity, or do they prefer us to handle the machine learning details and let them focus on what they do best, which is solving the specific problems they ultimately care about?

https://odsc.com/apac/#register

A lot of the art of applying research lies in being creative. How can I redefine my problem to fit this structure, so I can use the available tools? As researchers, we try to add value with algorithms that have the least requirements and assumptions, but as applied machine learning practitioners we need to look at our problems with fresh eyes and some fundamental intuition, which often involves quite a bit of trial and error.

deep reinforcement learning

OpenAI solving a Rubik’s Cube with a robot hand.

Ideally, we’d want to iterate fast and be very clear on how we define success, since that will shed some light on what metrics we should compare alternative tools against. For instance, how bad is it for the training to take too long? Are customers okay to leave the optimization running overnight, for instance? How often do they need to run this? How slow or costly is it for the reinforcement learning agent to interact with the environment at each step of the training? In the speed-performance tradeoff, where do our users stand? How bad is it if some of the actions tried are very inefficient? Can it be dangerous, like in some robotics applications?

If our problem is ultimately set in the real world, can we do most of the training in simulation and fine-tune or finish training in real life? How can we create such a simulation? Building a realistic one can be an immensely difficult task in itself, can we use an ensemble instead? 

deep reinforcement learning

Agent learning to walk in the MuJoCo physics engine.

After we’ve given these questions some thought, we can start by trying out various algorithms out there. Bear in mind, new algorithms are coming out often, however, it often takes a while for a new paper to be translated into a benchmark open-source implementation. But don’t worry, many people have seen great results with even the most vanilla and older algorithms out there. Also, newer algorithms tend to be way more complex to understand, implement and debug when we eventually find out that our implementation isn’t performing as well as we wish for some cryptic reason. So start small, because that might be all you need in the end. Like we say in software engineering, premature optimization is the root of all evil.

Most resources online tend to describe the general RL framework, give you some theory, show you an example applied to a gaming context and then leave you hanging. How do we go from there to trying this out on our own use cases?

About the author/ODSC APAC 2021 speaker on Deep Reinforcement Learning:

Maggie Liuzzi is a Machine Learning Engineer and AI Researcher who has been studying the application of deep reinforcement learning to quantum control problems at Q-CTRL, and then turning research insights into useful product features for customers to enjoy. She also has experience working with deep learning for robotic applications and LiDAR sensors for autonomous vehicles. LinkedIn | GitHub.