Optimizers¶

class Optimizer(ABC)¶: All optimizers defined here inherit from Optimizer

summary()¶

Returns:	A one line short string with the description of the optimizer

Stochastic Gradiend Descent (SGD)¶

__init__(self, learning_rate: float = 0.01, momentum: float = 0.0, name: str = 'SGD')¶

Parameters:	learning_rate – The learning rate. Defaults to 0.001. momentum – float hyperparameter between [0, 1) that accelerates gradient descent in the relevant direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient descent. name – Optional name for the operations created when applying gradients. Defaults to “Adam”.

class RMSprop¶

Optimizer that implements the RMSprop algorithm. Reference

The gist of RMSprop is to:

The centered version additionally maintains a moving average of the gradients, and uses that average to estimate the variance.

__init__(self, learning_rate=0.001, rho=0.9, momentum=0.0, epsilon=1e-07, name="RMSprop")

Parameters:

learning_rate – The learning rate. Defaults to 0.001.
rho – Discounting factor for the history/coming gradient. Defaults to 0.9.
momentum – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
epsilon – A small constant for numerical stability. Default 1e-07.
name – Optional name for the operations created when applying gradients. Defaults to “Adam”.

Warning

Adam implementation appears to be rendering lower results than tf.keras.optimizers.Adam implementation. Further debugging is required.

class Adam¶

Optimizer that implements the Adam algorithm.

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.

__init__(self, learning_rate: float = 0.001, beta_1: float = 0.9, beta_2: float = 0.999, epsilon: float = 1e-07, name="Adam")

Parameters:

learning_rate – The learning rate. Defaults to 0.001.
beta_1 – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 – The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.
epsilon – A small constant for numerical stability. Default 1e-07.
name – Optional name for the operations created when applying gradients. Defaults to “Adam”.

Diederik P. Kingma, Jimmy Ba “Adam: A Method for Stochastic Optimization” arXiv:1412.6980 LG cs, 2015. Available: https://arxiv.org/abs/1412.6980