Skip to content

Latest commit

 

History

History

chicago_taxi_pipeline

Chicago Taxi Example

The Chicago Taxi example demonstrates the end-to-end workflow and the steps required to analyze, validate, and transform data, train a model, analyze its performance, and serve it. This example uses the following TFX components:

  • ExampleGen ingests and splits the input dataset.
  • StatisticsGen calculates statistics for the dataset.
  • SchemaGen examines the statistics and creates a data schema.
  • ExampleValidator looks for anomalies and missing values in the dataset.
  • Transform performs feature engineering on the dataset.
  • Trainer trains the model using native Keras. or Keras.
  • Evaluator performs deep analysis of the training results.
  • InfraValidator checks the model is actually servable from the infrastructure, and prevents bad model from being pushed.
  • Pusher deploys the model to a serving infrastructure.
  • BulkInferrer batch inference on the model with unlabelled examples.

Inference in the example is powered by:

The dataset

This example uses the Taxi Trips dataset released by the City of Chicago.

Note: This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.

You can read more about the dataset in Google BigQuery. Explore the full dataset in the BigQuery UI.

Local prerequisites

Install dependencies

Development for this example will be isolated in a Python virtual environment. This allows us to experiment with different versions of dependencies.

There are many ways to install virtualenv, see the TensorFlow install guides for different platforms, but here are a couple:

  • For Linux:
sudo apt-get install python-pip python-virtualenv python-dev build-essential
  • For Mac:
sudo easy_install pip
pip install --upgrade virtualenv

Create a Python 3.6 virtual environment for this example and activate the virtualenv:

virtualenv -p python3.6 taxi_pipeline
source ./taxi_pipeline/bin/activate

Configure common paths:

export AIRFLOW_HOME=~/airflow
export TAXI_DIR=~/taxi
export TFX_DIR=~/tfx

Next, install the dependencies required by the Chicago Taxi example:

pip install apache-airflow==1.10.9
pip install -U tfx[examples]

Next, initialize Airflow

airflow initdb

Copy the pipeline definition to Airflow's DAG directory

The benefit of the local example is that you can edit any part of the pipeline and experiment very quickly with various components. First let's download the data for the example:

mkdir -p $TAXI_DIR/data/simple
wget -O $TAXI_DIR/data/simple/data.csv https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv?raw=true

Next, copy the TFX pipeline definition to Airflow's DAGs directory ($AIRFLOW_HOME/dags) so it can run the pipeline. To find the location of your TFX installation, use this command:

pip show tfx

Use the location shown when setting the TFX_EXAMPLES path below.

export TFX_EXAMPLES=~/taxi_pipeline/lib/python3.6/site-packages/tfx/examples/chicago_taxi_pipeline

Copy the Chicago Taxi example pipeline into the Airflow DAG folder.

mkdir -p $AIRFLOW_HOME/dags/
cp $TFX_EXAMPLES/taxi_pipeline_simple.py $AIRFLOW_HOME/dags/

The module file taxi_utils.py used by the Trainer and Transform components will reside in $TAXI_DIR. Copy it there.

cp $TFX_EXAMPLES/taxi_utils.py $TAXI_DIR

Run the local example

Start Airflow

Start the Airflow webserver (in 'taxi_pipeline' virtualenv):

airflow webserver

Open a new terminal window:

source ./taxi_pipeline/bin/activate

and start the Airflow scheduler:

airflow scheduler

Open a browser to 127.0.0.1:8080 and click on the chicago_taxi_simple example. It should look like the image below if you click the Graph View option.

Pipeline view

Run the example

If you were looking at the graph above, click on the DAGs button to get back to the DAGs view.

Enable the chicago_taxi_simple pipeline in Airflow by toggling the DAG to On. Now that it is schedulable, click on the Trigger DAG button (triangle inside a circle) to start the run. You can view status by clicking on the started job, found in the Last run column. This process will take several minutes.

Serve the TensorFlow model

Once the pipeline completes, the model will be copied by the Pusher to the directory configured in the example code:

ls $TAXI_DIR/serving_model/chicago_taxi_simple

Now serve the created model with TensorFlow Serving. For this example, run the server in a Docker container that is run locally. Instructions for installing Docker locally are found in the Docker install documentation.

In the terminal, run the following script to start a server:

bash $TFX_EXAMPLES/serving/start_model_server_local.sh \
$TAXI_DIR/serving_model/chicago_taxi_simple

This script pulls a TensorFlow Serving serving image and listens for for gRPC requests on localhost port 9000. The model server loads the latest model exported from the Pusher at above path.

To send a request to the server for model inference, run:

bash $TFX_EXAMPLES/serving/classify_local.sh \
$TAXI_DIR/data/simple/data.csv \
$TFX_DIR/pipelines/chicago_taxi_simple/SchemaGen/output/CHANGE_TO_LATEST_DIR/schema.pbtxt

For Gooogle Cloud AI Platform serving example, use start_model_server_aiplatform.sh and classify_aiplatform.sh in a similar way as above local example with local directory changing to gs://YOUR_BUCKET.

For more information, see TensorFlow Serving.

Chicago Taxi Kubeflow Orchestrator Example

Use Kubeflow as orchestrator, check here for details.

Chicago Taxi Native Keras Example (tfx 0.21.1)

Instead of estimator, this example uses native Keras in user module file taxi_utils_native_keras.py.

Learn more

Please see the TFX User Guide to learn more.