TY - GEN
T1 - Assessment of a Two-step Integration Method as an Optimizer for Deep Learning
AU - Rodriguez, Paul
N1 - Publisher Copyright:
© 2023 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2023
Y1 - 2023
N2 - It is a known fact that accelerated (non-stochastic) optimization methods can be understood as multi-step integration ones: e.g. the heavy ball's Polyak and Nesterov accelerations can be derived as particular instances of a two-step integration method applied to the gradient flow. However, in the stochastic context, to the best of our knowledge, multi-step integration methods have not been exploited as such, only as some particular instances, i.e. SGD (stochastic gradient descent) with momentum or with the Nesterov acceleration. In this paper we propose to directly use a two-step (TS) integration method in the stochastic context. Furthermore, we assess the computational effectiveness of selecting the TS method's weights after considering its lattice representation. Our experiments includes several well-known multiclass classification architectures (AlexNet, VGG16 and EfficientNetV2) as well as several established stochastic optimizer e.g. SGD along with momentum/Nesterov acceleration and ADAM. The TS based method attains a better test accuracy than the first two, whereas it is competitive with to a well-tuned (E/learning rate) ADAM.
AB - It is a known fact that accelerated (non-stochastic) optimization methods can be understood as multi-step integration ones: e.g. the heavy ball's Polyak and Nesterov accelerations can be derived as particular instances of a two-step integration method applied to the gradient flow. However, in the stochastic context, to the best of our knowledge, multi-step integration methods have not been exploited as such, only as some particular instances, i.e. SGD (stochastic gradient descent) with momentum or with the Nesterov acceleration. In this paper we propose to directly use a two-step (TS) integration method in the stochastic context. Furthermore, we assess the computational effectiveness of selecting the TS method's weights after considering its lattice representation. Our experiments includes several well-known multiclass classification architectures (AlexNet, VGG16 and EfficientNetV2) as well as several established stochastic optimizer e.g. SGD along with momentum/Nesterov acceleration and ADAM. The TS based method attains a better test accuracy than the first two, whereas it is competitive with to a well-tuned (E/learning rate) ADAM.
KW - gradient flow
KW - stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85178342520&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO58844.2023.10289761
DO - 10.23919/EUSIPCO58844.2023.10289761
M3 - Conference contribution
AN - SCOPUS:85178342520
T3 - European Signal Processing Conference
SP - 1245
EP - 1249
BT - 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 31st European Signal Processing Conference, EUSIPCO 2023
Y2 - 4 September 2023 through 8 September 2023
ER -