This is a companion notebook for the book Deep Learning with Python, Second Edition.
For
readability, it only contains runnable code blocks and section titles, and omits everything
else in the book: text paragraphs, figures, and pseudocode.
If you want to be able to follow what's going on, I recommend reading the notebook
side by side with your copy of the book.
This notebook was generated for TensorFlow 2.6.
Fundamentals of machine learning
Generalization: The goal of machine learning
Underfitting and overfitting
Noisy training data
Ambiguous features
Rare features and spurious correlations
Adding white-noise channels or all-zeros channels to MNIST
from [Link] import mnist
import numpy as np
(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
train_images_with_noise_channels = [Link](
[train_images, [Link]((len(train_images), 784))], axis=1)
train_images_with_zeros_channels = [Link](
[train_images, [Link]((len(train_images), 784))], axis=1)
Training the same model on MNIST data with noise channels or all-zero channels
from tensorflow import keras
from [Link] import layers
def get_model():
model = [Link]([
[Link](512, activation="relu"),
[Link](10, activation="softmax")
])
[Link](optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model
model = get_model()
history_noise = [Link](
train_images_with_noise_channels, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)
model = get_model()
history_zeros = [Link](
train_images_with_zeros_channels, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)
Plotting a validation accuracy comparison
import [Link] as plt
val_acc_noise = history_noise.history["val_accuracy"]
val_acc_zeros = history_zeros.history["val_accuracy"]
epochs = range(1, 11)
[Link](epochs, val_acc_noise, "b-",
label="Validation accuracy with noise channels")
[Link](epochs, val_acc_zeros, "b--",
label="Validation accuracy with zeros channels")
[Link]("Effect of noise channels on validation accuracy")
[Link]("Epochs")
[Link]("Accuracy")
[Link]()
The nature of generalization in deep learning
Fitting a MNIST model with randomly shuffled labels
(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
random_train_labels = train_labels[:]
[Link](random_train_labels)
model = [Link]([
[Link](512, activation="relu"),
[Link](10, activation="softmax")
])
[Link](optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
[Link](train_images, random_train_labels,
epochs=100,
batch_size=128,
validation_split=0.2)
The manifold hypothesis
Interpolation as a source of generalization
Why deep learning works
Training data is paramount
Evaluating machine-learning models
Training, validation, and test sets
Simple hold-out validation
K-fold validation
Iterated K-fold validation with shuffling
Beating a common-sense baseline
Things to keep in mind about model evaluation
Improving model fit
Tuning key gradient descent parameters
Training a MNIST model with an incorrectly high learning rate
(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
model = [Link]([
[Link](512, activation="relu"),
[Link](10, activation="softmax")
])
[Link](optimizer=[Link](1.),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
[Link](train_images, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)
The same model with a more appropriate learning rate
model = [Link]([
[Link](512, activation="relu"),
[Link](10, activation="softmax")
])
[Link](optimizer=[Link](1e-2),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
[Link](train_images, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)
Leveraging better architecture priors
Increasing model capacity
A simple logistic regression on MNIST
model = [Link]([[Link](10, activation="softmax")])
[Link](optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
history_small_model = [Link](
train_images, train_labels,
epochs=20,
batch_size=128,
validation_split=0.2)
import [Link] as plt
val_loss = history_small_model.history["val_loss"]
epochs = range(1, 21)
[Link](epochs, val_loss, "b--",
label="Validation loss")
[Link]("Effect of insufficient model capacity on validation loss")
[Link]("Epochs")
[Link]("Loss")
[Link]()
model = [Link]([
[Link](96, activation="relu"),
[Link](96, activation="relu"),
[Link](10, activation="softmax"),
])
[Link](optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
history_large_model = [Link](
train_images, train_labels,
epochs=20,
batch_size=128,
validation_split=0.2)
Improving generalization
Dataset curation
Feature engineering
Using early stopping
Regularizing your model
Reducing the network's size
Original model
from [Link] import imdb
(train_data, train_labels), _ = imdb.load_data(num_words=10000)
def vectorize_sequences(sequences, dimension=10000):
results = [Link]((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
train_data = vectorize_sequences(train_data)
model = [Link]([
[Link](16, activation="relu"),
[Link](16, activation="relu"),
[Link](1, activation="sigmoid")
])
[Link](optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_original = [Link](train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)
Version of the model with lower capacity
model = [Link]([
[Link](4, activation="relu"),
[Link](4, activation="relu"),
[Link](1, activation="sigmoid")
])
[Link](optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_smaller_model = [Link](
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)
Version of the model with higher capacity
model = [Link]([
[Link](512, activation="relu"),
[Link](512, activation="relu"),
[Link](1, activation="sigmoid")
])
[Link](optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_larger_model = [Link](
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)
Adding weight regularization
Adding L2 weight regularization to the model
from [Link] import regularizers
model = [Link]([
[Link](16,
kernel_regularizer=regularizers.l2(0.002),
activation="relu"),
[Link](16,
kernel_regularizer=regularizers.l2(0.002),
activation="relu"),
[Link](1, activation="sigmoid")
])
[Link](optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_l2_reg = [Link](
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)
Different weight regularizers available in Keras
from [Link] import regularizers
regularizers.l1(0.001)
regularizers.l1_l2(l1=0.001, l2=0.001)
Adding dropout
Adding dropout to the IMDB model
model = [Link]([
[Link](16, activation="relu"),
[Link](0.5),
[Link](16, activation="relu"),
[Link](0.5),
[Link](1, activation="sigmoid")
])
[Link](optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_dropout = [Link](
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)
Summary