Skip to content

Commit

Permalink
ch08 rewrites (rasbt#18)
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Jul 27, 2017
1 parent 88bf92b commit ff05533
Showing 1 changed file with 27 additions and 2 deletions.
29 changes: 27 additions & 2 deletions code/ch08/ch08.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Obtaining the IMDb movie review dataset](#Obtaining-the-IMDb-movie-review-dataset)\n",
"- [Preparing the IMDb movie review data for text processing](#Preparing-the-IMDb-movie-review-data-for-text-processing)\n",
" - [Obtaining the IMDb movie review dataset](#Obtaining-the-IMDb-movie-review-dataset)\n",
" - [Preprocessing the movie dataset into more convenient format](#Preprocessing-the-movie-dataset-into-more-convenient-format)\n",
"- [Introducing the bag-of-words model](#Introducing-the-bag-of-words-model)\n",
" - [Transforming words into feature vectors](#Transforming-words-into-feature-vectors)\n",
" - [Assessing word relevancy via term frequency-inverse document frequency](#Assessing-word-relevancy-via-term-frequency-inverse-document-frequency)\n",
Expand All @@ -113,7 +115,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Obtaining the IMDb movie review dataset"
"# Preparing the IMDb movie review data for text processing "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Obtaining the IMDb movie review dataset"
]
},
{
Expand All @@ -130,6 +139,13 @@
"B) If you are working with Windows, download an archiver such as [7Zip](http://www.7-zip.org) to extract the files from the download archive."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preprocessing the movie dataset into more convenient format"
]
},
{
"cell_type": "code",
"execution_count": 2,
Expand Down Expand Up @@ -902,6 +918,15 @@
" n_jobs=-1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Important Note**\n",
"\n",
"Please note that it is highly recommended to use `n_jobs=-1` (instead of `n_jobs=1`) in the previous code example to utilize all available cores on your machine and speed up the grid search. However, some Windows users reported issues when running the previous code with the `n_jobs=-1` setting related to pickling the tokenizer and tokenizer_porter functions for multiprocessing on Windows. Another workaround would be to replace those two functions, `[tokenizer, tokenizer_porter]`, with `[str.split]`. However, note that the replacement by the simple str.split would not support stemming.\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
Expand Down

0 comments on commit ff05533

Please sign in to comment.