forked from rasbt/python-machine-learning-book-3rd-edition
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added ch18/readme * Fixed the notebook links
- Loading branch information
Showing
1 changed file
with
78 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
Python Machine Learning - Code Examples | ||
|
||
|
||
## Chapter 18: Reinforcement Learning for Decision Making in Complex Environments | ||
|
||
|
||
### Chapter Outline | ||
|
||
- Introduction: learning from experience | ||
- Understanding reinforcement learning | ||
- Defining the agent-environment interface of a reinforcement learning system | ||
- The theoretical foundations of RL | ||
- Markov decision processes | ||
- The mathematical formulation of Markov decision processes | ||
- Visualization of a Markov process | ||
- Episodic versus continuing tasks | ||
- RL terminology: return, policy, and value function | ||
- The return | ||
- Policy | ||
- Value function | ||
- Dynamic programming using the Bellman equation | ||
- Reinforcement learning algorithms | ||
- Dynamic programming | ||
- Policy evaluation – predicting the value function with dynamic programmin | ||
- Improving the policy using the estimated value function | ||
- Policy iteration | ||
- Value iteration | ||
- Reinforcement learning with Monte Carlo | ||
- State-value function estimation using MC | ||
- Action-value function estimation using MC | ||
- Finding an optimal policy using MC control | ||
- Policy improvement – computing the greedy policy from the action-value function | ||
- Temporal difference learning | ||
- TD prediction | ||
- On-policy TD control (SARSA) | ||
- Off-policy TD control (Q-learning) | ||
- Implementing our first RL algorithm | ||
- Introducing the OpenAI Gym toolkit | ||
- Working with the existing environments in OpenAI Gym | ||
- A grid world example | ||
- Implementing the grid world environment in OpenAI Gym | ||
- Solving the grid world problem with Q-learning | ||
- Implementing the Q-learning algorithm | ||
- A glance at deep Q-learning | ||
- Training a DQN model according to the Q-learning algorithm | ||
- Replay memory | ||
- Determining the target values for computing the loss | ||
- Implementing a deep Q-learning algorithm | ||
- Chapter and book summary | ||
|
||
### A note on using the code examples | ||
|
||
The recommended way to interact with the code examples in this book is via Jupyter Notebook (the `.ipynb` files). Using Jupyter Notebook, you will be able to execute the code step by step and have all the resulting outputs (including plots and images) all in one convenient document. | ||
|
||
![](../ch02/images/jupyter-example-1.png) | ||
|
||
|
||
|
||
Setting up Jupyter Notebook is really easy: if you are using the Anaconda Python distribution, all you need to install jupyter notebook is to execute the following command in your terminal: | ||
|
||
conda install jupyter notebook | ||
|
||
Then you can launch jupyter notebook by executing | ||
|
||
jupyter notebook | ||
|
||
A window will open up in your browser, which you can then use to navigate to the target directory that contains the `.ipynb` file you wish to open. | ||
|
||
**More installation and setup instructions can be found in the [README.md file of Chapter 1](../ch01/README.md)**. | ||
|
||
**(Even if you decide not to install Jupyter Notebook, note that you can also view the notebook files on GitHub by simply clicking on them: [`ch18.ipynb`](ch18.ipynb))** | ||
|
||
In addition to the code examples, I added a table of contents to each Jupyter notebook as well as section headers that are consistent with the content of the book. Also, I included the original images and figures in hope that these make it easier to navigate and work with the code interactively as you are reading the book. | ||
|
||
![](../ch02/images/jupyter-example-2.png) | ||
|
||
|
||
When I was creating these notebooks, I was hoping to make your reading (and coding) experience as convenient as possible! However, if you don't wish to use Jupyter Notebooks, I also converted these notebooks to regular Python script files (`.py` files) that can be viewed and edited in any plaintext editor. |