TY - GEN
T1 - Ideal Step Size Estimation for the Multinomial Logistic Regression
AU - Ramirez, Gabriel
AU - Rodriguez, Paul
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - At the core of deep learning optimization problems reside algorithms such as Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in using gradient information from past iterations, generating momentum or memory that enables a more accurate prediction of the true gradient slope in future iterations, thus accelerating convergence. Nevertheless, these algorithms still need an initial (scalar) learning rate (LR) as well as a LR scheduler. In this work we propose a new SGD algorithm that estimates the initial (scalar) LR via an adaptation of the ideal Cauchy step size for the multinomial logistic regression; furthermore, the LR is recursively updated up to a given number of epochs, after which a decaying LR scheduler is used. The proposed method is assessed for several well-known multiclass classification architectures and favorably compares against other well-tuned (scalar and spatially) adaptive alternatives, including the Adam algorithm.
AB - At the core of deep learning optimization problems reside algorithms such as Stochastic Gradient Descent (SGD), which employs a subset of the data per iteration to estimate the gradient in order to minimize a cost function. Adaptive algorithms, based on SGD, are well known for being effective in using gradient information from past iterations, generating momentum or memory that enables a more accurate prediction of the true gradient slope in future iterations, thus accelerating convergence. Nevertheless, these algorithms still need an initial (scalar) learning rate (LR) as well as a LR scheduler. In this work we propose a new SGD algorithm that estimates the initial (scalar) LR via an adaptation of the ideal Cauchy step size for the multinomial logistic regression; furthermore, the LR is recursively updated up to a given number of epochs, after which a decaying LR scheduler is used. The proposed method is assessed for several well-known multiclass classification architectures and favorably compares against other well-tuned (scalar and spatially) adaptive alternatives, including the Adam algorithm.
KW - adaptive step size
KW - Deep learning
KW - multinomial logistic regression
KW - stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85192270361&partnerID=8YFLogxK
U2 - 10.1109/LASCAS60203.2024.10506124
DO - 10.1109/LASCAS60203.2024.10506124
M3 - Conference contribution
AN - SCOPUS:85192270361
T3 - LASCAS 2024 - 15th IEEE Latin American Symposium on Circuits and Systems, Proceedings
BT - LASCAS 2024 - 15th IEEE Latin American Symposium on Circuits and Systems, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE Latin American Symposium on Circuits and Systems, LASCAS 2024
Y2 - 27 February 2024 through 1 March 2024
ER -