Skip to content

Commit

Permalink
Added ch18/readme (rasbt#97)
Browse files Browse the repository at this point in the history
* Added ch18/readme

* Fixed the notebook links
  • Loading branch information
vmirly authored Dec 1, 2019
1 parent 0b64eff commit 5ec6b5c
Showing 1 changed file with 78 additions and 0 deletions.
78 changes: 78 additions & 0 deletions ch18/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
Python Machine Learning - Code Examples


## Chapter 18: Reinforcement Learning for Decision Making in Complex Environments


### Chapter Outline

- Introduction: learning from experience
- Understanding reinforcement learning
- Defining the agent-environment interface of a reinforcement learning system
- The theoretical foundations of RL
- Markov decision processes
- The mathematical formulation of Markov decision processes
- Visualization of a Markov process
- Episodic versus continuing tasks
- RL terminology: return, policy, and value function
- The return
- Policy
- Value function
- Dynamic programming using the Bellman equation
- Reinforcement learning algorithms
- Dynamic programming
- Policy evaluation – predicting the value function with dynamic programmin
- Improving the policy using the estimated value function
- Policy iteration
- Value iteration
- Reinforcement learning with Monte Carlo
- State-value function estimation using MC
- Action-value function estimation using MC
- Finding an optimal policy using MC control
- Policy improvement – computing the greedy policy from the action-value function
- Temporal difference learning
- TD prediction
- On-policy TD control (SARSA)
- Off-policy TD control (Q-learning)
- Implementing our first RL algorithm
- Introducing the OpenAI Gym toolkit
- Working with the existing environments in OpenAI Gym
- A grid world example
- Implementing the grid world environment in OpenAI Gym
- Solving the grid world problem with Q-learning
- Implementing the Q-learning algorithm
- A glance at deep Q-learning
- Training a DQN model according to the Q-learning algorithm
- Replay memory
- Determining the target values for computing the loss
- Implementing a deep Q-learning algorithm
- Chapter and book summary

### A note on using the code examples

The recommended way to interact with the code examples in this book is via Jupyter Notebook (the `.ipynb` files). Using Jupyter Notebook, you will be able to execute the code step by step and have all the resulting outputs (including plots and images) all in one convenient document.

![](../ch02/images/jupyter-example-1.png)



Setting up Jupyter Notebook is really easy: if you are using the Anaconda Python distribution, all you need to install jupyter notebook is to execute the following command in your terminal:

conda install jupyter notebook

Then you can launch jupyter notebook by executing

jupyter notebook

A window will open up in your browser, which you can then use to navigate to the target directory that contains the `.ipynb` file you wish to open.

**More installation and setup instructions can be found in the [README.md file of Chapter 1](../ch01/README.md)**.

**(Even if you decide not to install Jupyter Notebook, note that you can also view the notebook files on GitHub by simply clicking on them: [`ch18.ipynb`](ch18.ipynb))**

In addition to the code examples, I added a table of contents to each Jupyter notebook as well as section headers that are consistent with the content of the book. Also, I included the original images and figures in hope that these make it easier to navigate and work with the code interactively as you are reading the book.

![](../ch02/images/jupyter-example-2.png)


When I was creating these notebooks, I was hoping to make your reading (and coding) experience as convenient as possible! However, if you don't wish to use Jupyter Notebooks, I also converted these notebooks to regular Python script files (`.py` files) that can be viewed and edited in any plaintext editor.

0 comments on commit 5ec6b5c

Please sign in to comment.