Skip to content

Commit

Permalink
Change capitalization for Head of Data role
Browse files Browse the repository at this point in the history
I changed all occurrences of "head of data" to "company's Head of Data" to avoid confusion. Given the data science context of the notebook, with "head of data" in lowercase I thought it meant the first few lines of a dataset, but it actually means a person at the company.
  • Loading branch information
allardbrain authored Sep 14, 2017
1 parent dd0f1eb commit 50b2859
Showing 1 changed file with 7 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@
"\n",
"For the purposes of this exercise, let's pretend we're working for a startup that just got funded to create a smartphone app that automatically identifies species of flowers from pictures taken on the smartphone. We're working with a moderately-sized team of data scientists and will be building part of the data analysis pipeline for this app.\n",
"\n",
"We've been tasked by our head of data science to create a demo machine learning model that takes four measurements from the flowers (sepal length, sepal width, petal length, and petal width) and identifies the species based on those measurements alone.\n",
"We've been tasked by our company's Head of Data Science to create a demo machine learning model that takes four measurements from the flowers (sepal length, sepal width, petal length, and petal width) and identifies the species based on those measurements alone.\n",
"\n",
"<img src=\"images/petal_sepal.jpg\" />\n",
"\n",
Expand Down Expand Up @@ -163,15 +163,15 @@
"\n",
">Did you define the metric for success before beginning?\n",
"\n",
"Let's do that now. Since we're performing classification, we can use [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision) — the fraction of correctly classified flowers — to quantify how well our model is performing. Our head of data has told us that we should achieve at least 90% accuracy.\n",
"Let's do that now. Since we're performing classification, we can use [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision) — the fraction of correctly classified flowers — to quantify how well our model is performing. Our company's Head of Data has told us that we should achieve at least 90% accuracy.\n",
"\n",
">Did you understand the context for the question and the scientific or business application?\n",
"\n",
"We're building part of a data analysis pipeline for a smartphone app that will be able to classify the species of flowers from pictures taken on the smartphone. In the future, this pipeline will be connected to another pipeline that automatically measures from pictures the traits we're using to perform this classification.\n",
"\n",
">Did you record the experimental design?\n",
"\n",
"Our head of data has told us that the field researchers are hand-measuring 50 randomly-sampled flowers of each species using a standardized methodology. The field researchers take pictures of each flower they sample from pre-defined angles so the measurements and species can be confirmed by the other field researchers at a later point. At the end of each day, the data is compiled and stored on a private company GitHub repository.\n",
"Our company's Head of Data has told us that the field researchers are hand-measuring 50 randomly-sampled flowers of each species using a standardized methodology. The field researchers take pictures of each flower they sample from pre-defined angles so the measurements and species can be confirmed by the other field researchers at a later point. At the end of each day, the data is compiled and stored on a private company GitHub repository.\n",
"\n",
">Did you consider whether the question could be answered with the available data?\n",
"\n",
Expand Down Expand Up @@ -1302,7 +1302,7 @@
"source": [
"Our data is normally distributed for the most part, which is great news if we plan on using any modeling methods that assume the data is normally distributed.\n",
"\n",
"There's something strange going with the petal measurements. Maybe it's something to do with the different `Iris` types. Let's color code the data by the class again to see if that clears things up."
"There's something strange going on with the petal measurements. Maybe it's something to do with the different `Iris` types. Let's color code the data by the class again to see if that clears things up."
]
},
{
Expand Down Expand Up @@ -1487,7 +1487,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"With our data split, we can start fitting models to our data. Our head of data is all about decision tree classifiers, so let's start with one of those.\n",
"With our data split, we can start fitting models to our data. Our company's Head of Data is all about decision tree classifiers, so let's start with one of those.\n",
"\n",
"Decision tree classifiers are incredibly simple in theory. In their simplest form, decision tree classifiers ask a series of Yes/No questions about the data — each time getting closer to finding out the class of each entry — until they either classify the data set perfectly or simply can't differentiate a set of entries. Think of it like a game of [Twenty Questions](https://en.wikipedia.org/wiki/Twenty_Questions), except the computer is *much*, *much* better at it.\n",
"\n",
Expand Down Expand Up @@ -1953,7 +1953,7 @@
"source": [
"(This classifier may look familiar from earlier in the notebook.)\n",
"\n",
"Alright! We finally have our demo classifier. Let's create some visuals of its performance so we have something to show our head of data."
"Alright! We finally have our demo classifier. Let's create some visuals of its performance so we have something to show our company's Head of Data."
]
},
{
Expand Down Expand Up @@ -2301,7 +2301,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"There we have it: We have a complete and reproducible Machine Learning pipeline to demo to our head of data. We've met the success criteria that we set from the beginning (>90% accuracy), and our pipeline is flexible enough to handle new inputs or flowers when that data set is ready. Not bad for our first week on the job!"
"There we have it: We have a complete and reproducible Machine Learning pipeline to demo to our company's Head of Data. We've met the success criteria that we set from the beginning (>90% accuracy), and our pipeline is flexible enough to handle new inputs or flowers when that data set is ready. Not bad for our first week on the job!"
]
},
{
Expand Down

0 comments on commit 50b2859

Please sign in to comment.