Name		Name	Last commit message	Last commit date
parent directory ..
activations		activations
initializers		initializers
layers		layers
losses		losses
models		models
modules		modules
optimizers		optimizers
schedulers		schedulers
tests		tests
utils		utils
wrappers		wrappers
README.md		README.md

README.md

Neural network models

This module implements building-blocks for larger neural network models in the Keras-style. This module does not implement a general autograd system in order emphasize conceptual understanding over flexibility.

Activations. Common activation nonlinearities. Includes:
- Rectified linear units (ReLU) (Hahnloser et al., 2000)
- Leaky rectified linear units (Maas, Hannun, & Ng, 2013)
- Hyperbolic tangent (tanh)
- Logistic sigmoid
- Affine
Losses. Common loss functions. Includes:
- Squared error
- Categorical cross entropy
- VAE Bernoulli loss (Kingma & Welling, 2014)
- Wasserstein loss with gradient penalty (Gulrajani et al., 2017)
Wrappers. Layer wrappers. Includes:
- Dropout (Srivastava, et al., 2014)
Layers. Common layers / layer-wise operations that can be composed to create larger neural networks. Includes:
- Fully-connected
- Sparse evolutionary (Mocanu et al., 2018)
- Dot-product attention (Luong, Pho, & Manning, 2015; Vaswani et al., 2017)
- 1D and 2D convolution (with stride, padding, and dilation) (van den Oord et al., 2016; Yu & Kolton, 2016)
- 2D "deconvolution" (with stride and padding) (Zeiler et al., 2010)
- Restricted Boltzmann machines (with CD-n training) (Smolensky, 1996; Carreira-Perpiñán & Hinton, 2005)
- Elementwise multiplication
- Summation
- Flattening
- Softmax
- Max & average pooling
- 1D and 2D batch normalization (Ioffe & Szegedy, 2015)
- 1D and 2D layer normalization (Ba, Kiros, & Hinton, 2016)
- Recurrent (Elman, 1990)
- Long short-term memory (LSTM) (Hochreiter & Schmidhuber, 1997)
Optimizers. Common modifications to stochastic gradient descent. Includes:
- SGD with momentum (Rummelhart, Hinton, & Williams, 1986)
- AdaGrad (Duchi, Hazan, & Singer, 2011)
- RMSProp (Tieleman & Hinton, 2012)
- Adam (Kingma & Ba, 2015)
Learning Rate Schedulers. Common learning rate decay schedules.
- Constant
- Exponential decay
- Noam/Transformer scheduler (Vaswani et al., 2017)
- King/Dlib scheduler (King, 2018)
Initializers. Common weight initialization strategies.
- Glorot/Xavier uniform and normal (Glorot & Bengio, 2010)
- He/Kaiming uniform and normal (He et al., 2015)
- Standard normal
- Truncated normal
Modules. Common multi-layer blocks that appear across many deep networks. Includes:
- Bidirectional LSTMs (Schuster & Paliwal, 1997)
- ResNet-style "identity" (i.e., same-convolution) residual blocks (He et al., 2015)
- ResNet-style "convolutional" (i.e., parametric) residual blocks (He et al., 2015)
- WaveNet-style residual block with dilated causal convolutions (van den Oord et al., 2016)
- Transformer-style multi-headed dot-product attention (Vaswani et al., 2017)
Models. Well-known network architectures. Includes:
- vae.py: Bernoulli variational autoencoder (Kingma & Welling, 2014)
- wgan_gp.py: Wasserstein generative adversarial network with gradient penalty (Gulrajani et al., 2017; Goodfellow et al., 2014)
Utils. Common helper functions, primarily for dealing with CNNs. Includes:
- im2col
- col2im
- conv1D
- conv2D
- dilate
- deconv2D
- minibatch
- Various weight initialization utilities
- Various padding and convolution arithmetic utilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neural_nets

neural_nets

README.md

Neural network models

Files

neural_nets

Directory actions

More options

Directory actions

More options

Latest commit

History

neural_nets

Folders and files

parent directory

README.md

Neural network models