Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch

Ruo Ando; Yoshihisa Fukuhara; Yoshiyasu Takefuji

doi:10.54105/ijainn.D1070.063423

PDF

Published: Feb 6, 2024

DOI: https://doi.org/10.54105/ijainn.D1070.063423

Keywords:

Characterization of Optimizers, Adaptive Optimizer, Reverse Mode Differentiation, CNN

Ruo Ando

National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan.

https://orcid.org/0000-0002-0538-1688

Yoshihisa Fukuhara

Musashino University, Department of Data Science, 3-3-3 Ariake, Koto-Ku, Tokyo, Japan.

https://orcid.org/0000-0003-3458-7463

Yoshiyasu Takefuji

Musashino University, Department of Data Science, 3-3-3 Ariake, Koto-Ku, Tokyo, Japan.

https://orcid.org/0000-0002-1826-742X

Abstract

Recently, datasets have been discovered for which adaptive optimizers are not more than adequate. No evaluation criteria have been established for optimization as to which algorithm is appropriate. In this paper, we propose a characterization method by implementing backward automatic differentiation and characterizes the optimizer by tracking the gradient and the value of the signal flowing to the output layer at each epoch. The proposed method was applied to a CNN (Convolutional Neural Network) recognizing CIFAR-10, and experiments were conducted comparing and Adam (adaptive moment estimation) and SGD (stochastic gradient descent). The experiments revealed that for batch sizes of 50, 100, 150, and 200, SGD and Adam significantly differ in the characteristics of the time series of signals sent to the output layer. This shows that the ADAM optimizer can be clearly characterized from the input signal series for each batch size.

Downloads

Download data is not yet available.

How to Cite

[1]

Ruo Ando, Yoshihisa Fukuhara, and Yoshiyasu Takefuji , Trans., “Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch”, IJAINN, vol. 3, no. 4, pp. 1–6, Feb. 2024, doi: 10.54105/ijainn.D1070.063423.

Issue

Vol. 3 No. 4 (2023): Volume-3 Issue-4 June 2023

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

CC-BY-NC-ND 4.0

How to Cite

[1]

Ruo Ando, Yoshihisa Fukuhara, and Yoshiyasu Takefuji , Trans., “Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch”, IJAINN, vol. 3, no. 4, pp. 1–6, Feb. 2024, doi: 10.54105/ijainn.D1070.063423.

Download Citation

References

Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht: The Marginal Value of Adaptive Gradient Methods in Machine Learning. CoRR abs/1705.08292 (2017)

Pytorch. https://github.com/pytorch/pytorch

Martin Abadi et al.: ``TensorFlow: A System for Large-Scale Machine Learning'', OSDI 2016: 265-283

Ando, R. and Takefuji", Y, "A constrained recursion algorithm for batch normalization of tree turctured lstm'', https://arxiv.org/abs/2008.09409

Andreas Veit, Michael J. Wilber, Serge J. Belongie: Residual Networks Behave Like Ensembles of Relatively Shallow Networks. NIPS 2016: 550-558

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors, Nature volume 323, pages533-536 (1986)

B.T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics Volume 4, Issue 5, 1964, Pages 1-17

Geoffrey Hinton Neural Networks for machine learning online course. https://www.coursera.org/learn/neural-networks/home/welcome

Frdric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio: Theano: new features and speed improvements. CoRR abs/1211.5590 (2012)

Y-Lan Boureau, Nicolas Le Roux, Francis R. Bach, Jean Ponce, Yann LeCun: Ask the locals: Multi-way local pooling for image recognition. ICCV 2011: 2651-2658

Y-Lan Boureau, Jean Ponce, Yann LeCun: A Theoretical Analysis of Feature Pooling in Visual Recognition. ICML 2010: 111-118

Brownlee, J, "a gentle introduction to the rectified linear unit (relu)'', Machine Learning Mastery, 2021

Duchi, J., Hazan, E.Singer et al. "adaptive subgradient methods for online learning and stochastic optimization", Journal of Machine Learning Research, 2121-2159

Frosst, N. and Hinton, G. , ``distilling a neural network into a soft decision tree", https://arxiv.org/abs/1711.09784

Ioffe, S. and Szegedy, C, ``batch normalization: Accelerating deep network training by reducing internal covariate shift'',"arXiv:1502.03167. 2015

Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama. "caffe: Convolutional architecture for fast feature embedding", CoRR abs/1408.5093

Kingma, D.~P. ,Ba, "adam: A method for stochastic optimization", ICLR (Poster), 2015.

Yann LeCun, Lawrence D. Jackel, Bernhard E. Boser, John S. Denker, Hans Peter Graf, Isabelle Guyon, Don Henderson, Richard E. Howard, Wayne E. Hubbard: Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun. Mag. 27(11): 41-46 (1989)

Kyung Soo Kim, Yong Suk Choi: HyAdamC: A New Adam-Based Hybrid Optimization Algorithm for Convolution Neural Networks. Sensors 21(12): 4054 (2021)

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

How to Cite

References

Most read articles by the same author(s)