*Editor’s Note: Oliver is speaking at ODSC West 2019, see his talk “Reinforcement Learning with TF Agents & TensorFlow 2.0: Hands On” there.*

Have a look at our friend Orso the bear.

Orso lives in his cave and knows his area and where he can typically find some honey. The honey places are scattered all over his home turf and connected by some pathways. Some connections are easy like walking over a grassy field (light green, very low cost) or forests (dark green, low cost), a bit strenuous like walking over a hill (brown, higher costs) and very arduous like swimming through a lake (blue, very high costs).

There are other bears around competing for the honey pots and sometimes the bees just do not deliver the honey our bear requires. So some spots have honey and others don’t.

*[Related article: Deep Learning with Reinforcement Learning]*

Every morning, Orso awakes and starts looking for some honey. But every day, the honey pots are at a new place. So Orso climbs a tree near his cave to find out where the honey is. Even with this knowledge, it is not clear which route he should take. Being a bear also means being lazy, so Orso wants to get as much honey as possible at the lowest cost possible. At each spot, he has to decide where to go next. His daily journey ends when he’s back in his own cave.

But what is the best path to go from honey pot to honey pot and eventually returning to his cave for a lazy bear who wants to minimize his effort? While you can solve such problems programmatically, there is a different approach where we simulate the bear’s environment and make a set of controlled experiments in it. Using the results of the experiments we train a neural network that successively learns how to navigate the bear through its turf. Like a real bear would do, only with a lot more experiments. Such an approach to machine learning is called “reinforcement learning”.

TensorFlow 2 and Tf-Agents are two powerful libraries that help you tackle these kinds of problems. While Tf-Agents provides us with the reinforcement learning strategies, TensorFlow serves as the implementation of a neural network that learns which is the best path to go. During the learning process data to train the neural network is generated by experiments the agent – our bear Orso – conducts. This is not done using brute force, but which step is chosen next, depends on the prediction of the neural network.

You might have already noticed, but our hungry bear Orso is just a fun example for something people do for serious applications like operations research, robotics, or advanced gameplay. Often exact algorithms exist but have an exponential complexity in the worst case. Opposed to that, strategies learned using reinforcement learning typically are only approximate but with a complexity linear to the size of the problem. Reinforcement learning can thus often be an alternative when there either is no existing solution at all or you want to trade exactness for linear runtime complexity.

*[Related article: Watch: Introduction to Reinforcement Learning]*

In our Half Day Hands-on Training At ODSC West in San Francisco we will show you more details on how you can use reinforcement learning practically. Using Colab notebooks we will model problems as a simulation environment (Orso’s world) and train your agent (Orso himself) to learn a good strategy. Hope to see soon!

**More on the second speaker:** Christian Hidber

Christian is a consultant at bSquare with a focus on machine learning & .net development. He has a PhD in computer algebra from ETH Zurich and did a postdoc at UC Berkeley where he researched online data mining algorithms. Currently, he applies reinforcement learning to industrial hydraulics simulations.

More info at https://www.linkedin.com/in/christian-hidber/