Humans learn from past experiences (or, you know, at least they should). You didn’t get so charming by accident. Years of positive compliments as well as negative criticism have all helped shape who you are today. This chapter is all about designing a machine learning system driven by criticisms and rewards.
Consider the following examples. You learn what makes people happy by interacting with friends, family, or even strangers, and you figure out how to ride a bike by trying out different muscle movements until it just clicks. When you perform actions, you’re sometimes rewarded immediately. For example, finding a restaurant nearby might yield instant gratification. Other times, the reward doesn’t appear right away, such as travelling a long distance to find an exceptional place to eat. Reinforcement learning is all about making the right actions given any state.
- Concept 1: Reinforcement learning
- Listing 1-10:
rl.py