Getting Started - TensorFlow
Getting Started - TensorFlow
GET STARTED
TUTORIALS
MECHANICS
TensorFlow Mechanics
101
DOCS
Code: tensorflow/g3doc/tutorials/mnist/
RESOURCES
The goal of this tutorial is to show how to use TensorFlow to train and evaluate a simple
ABOUT
Tutorial Files
This tutorial references the following files:
File
TensorFlow
Mechanics
101
Convolutional
Neural
Networks
Purpose
python fully_connected_feed.py
Sequenceto-Sequence
Models
Mandelbrot
Set
For more information, refer to Yann LeCun's MNIST page or Chris Olah's visualizations of
MNIST.
Download
At the top of the run_training() method, the input_data.read_data_sets()
function will ensure that the correct data has been downloaded to your local training folder
and then unpack that data to return a dictionary of DataSet instances.
NOTE: The fake_data flag is used for unit-testing purposes and may be safely ignored by
the reader.
Dataset
Purpose
data_sets.train
data_sets.validation
data_sets.test
For more information about the data, please read the Download tutorial.
Further down, in the training loop, the full image and label datasets are sliced to fit the
batch_size for each step, matched with these placeholder ops, and then passed into the
sess.run() function using the feed_dict parameter.
Inference
Inference
The inference() function builds the graph as far as needed to return the tensor that
would contain the output predictions.
It takes the images placeholder as input and builds on top of it a pair of fully connected
layers with ReLu activation followed by a ten node linear layer specifying the output logits.
Each layer is created beneath a unique tf.name_scope that acts as a prefix to the items
created within that scope.
Within the defined scope, the weights and biases to be used by each of these layers are
generated into tf.Variable instances, with their desired shapes:
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),
name='biases')
When, for instance, these are created under the hidden1 scope, the unique name given to
the weights variable would be "hidden1/weights".
Each variable is given initializer ops as part of their construction.
In this most common case, the weights are initialized with the tf.truncated_normal
and given their shape of a 2d tensor with the first dim representing the number of units in
the layer from which the weights connect and the second dim representing the number of
units in the layer to which the weights connect. For the first layer, named hidden1, the
dimensions are [IMAGE_PIXELS, hidden1_units] because the weights are
connecting the image inputs to the hidden1 layer. The tf.truncated_normal initializer
generates a random distribution with a given mean and standard deviation.
Then the biases are initialized with tf.zeros to ensure they start with all zero values, and
their shape is simply the number of units in the layer to which they connect.
The graph's three primary ops -- two tf.nn.relu ops wrapping tf.matmul for the hidden
layers and one extra tf.matmul for the logits -- are then created, each in turn, with their
tf.Variable instances connected to the input placeholder or the output tensor of the
layer beneath each.
Finally, the logits tensor that will contain the output is returned.
Loss
The loss() function further builds the graph by adding the required loss ops.
First, the values from the label_placeholder are encoded as a tensor of 1-hot values. For
example, if the class identifier is '3' the value is converted to:
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(
concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
output logits from the inference() function and the 1-hot labels.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
onehot_labels
,
name='xentrop
y')
It then uses tf.reduce_mean to average the cross entropy values across the batch
dimension (the first dimension) as the total loss.
And the tensor that will then contain the loss value is returned.
Training
The training() function adds the operations needed to minimize the loss via gradient
descent.
Firstly, it takes the loss tensor from the loss() function and hands it to a
tf.scalar_summary, an op for generating summary values into the events file when used
with a SummaryWriter (see below). In this case, it will emit the snapshot value of the loss
every time the summaries are written out.
tf.scalar_summary(loss.op.name, loss)
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
We then generate a single variable to contain a counter for the global training step and the
minimize() op is used to both update the trainable weights in the system and increment
the global step. This is, by convention, known as the train_op and is what must be run by
a TensorFlow session in order to induce one full step of training (see below).
The Graph
At the top of the run_training() function is a python with command that indicates all
of the built ops are to be associated with the default global tf.Graph instance.
with tf.Graph().as_default():
The Session
Once all of the build preparation has been completed and all of the necessary ops
generated, a tf.Session is created for running the graph.
sess = tf.Session()
The empty parameter to session indicates that this code will attach to (or create if not yet
created) the default local session.
Immediately after creating the session, all of the tf.Variable instances are initialized by
calling sess.run() on their initialization op.
init = tf.initialize_all_variables()
sess.run(init)
The sess.run() method will run the complete subset of the graph that corresponds to the
op(s) passed as parameters. In this first call, the init op is a tf.group that contains only
the initializers for the variables. None of the rest of the graph is run here, that happens in the
training loop below.
Train Loop
After initializing the variables with the session, training may begin.
The user code controls the training per step, and the simplest loop that can do useful
training is:
However, this tutorial is slightly more complicated in that it must also slice up the input data
for each step to match the previously generated placeholders.
A python dictionary object is then generated with the placeholders as keys and the
representative feed tensors as values.
feed_dict = {
images_placeholder: images_feed,
labels_placeholder: labels_feed,
}
This is passed into the sess.run() function's feed_dict parameter to provide the input
examples for this step of training.
Because there are two tensors passed as parameters, the return from sess.run() is a
tuple with two items. The returned items are themselves tensors, filled with the values of the
passed op-tensors during this step of training.
The value of the train_op is actually None and, thus, discarded. But the value of the loss
tensor may become NaN if the model diverges during training.
Assuming that the training runs fine without NaNs, the training loop also prints a simple
status text every 100 steps to let the user know the state of training.
if step % 100 == 0:
print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, dura
tion)
summary_op = tf.merge_all_summaries()
instantiated to output into the given directory the events files, containing the Graph itself and
the values of the summaries.
summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
graph_def=sess.graph_def)
Lastly, the events file will be updated with new summary values every time the summary_op
is run and the ouput passed to the writer's add_summary() function.
When the events files are written, TensorBoard may be run against the training folder to
display the values from the summaries.
NOTE: For more info about how to build and run Tensorboard, please see the accompanying
tutorial Tensorboard: Visualizing Your Training.
Save a Checkpoint
In order to emit a checkpoint file that may be used to later restore a model for further
training or evaluation, we instantiate a tf.train.Saver.
saver = tf.train.Saver()
In the training loop, the saver.save() method will periodically be called to write a
checkpoint file to the training directory with the current values of all the trainable variables.
At some later point in the future, training might be resumed by using the
saver.restore() method to reload the model parameters.
saver.restore(sess, FLAGS.train_dir)
Before entering the training loop, the Eval op should have been built by calling the
evaluation() function from mnist.py with the same logits/labels parameters as the
loss() function.
Eval Output
One can then create a loop for filling a feed_dict and calling sess.run() against the
eval_correct op to evaluate the model on the given dataset.
The true_count variable simply accumulates all of the predictions that the in_top_k op
has determined to be correct. From there, the precision may be calculated from simply
dividing by the total number of examples.
Num examples: %d
Num correct: %d
(
num_examples, true_count, precision)
Precision @ 1: %0.02f' %