Influence of Digital Fluctuations on Behavior of Neural Networks
Main Article Content
Abstract
This paper deals with effect of digital noise to numerical stability of neural networks. Digital noise arises from the inexactness of floating point values operations. Accumulated errors finally lead to the loss of significance. Experiments show that more redundant networks have higher noise influence. This effect is tested in both model and real world samples. As a result, one should exclude all the networks results from the beginning of fluctuations. Results of experiments allow us to hypothesize that minimal values of loss function preserving significance were achieved for the networks of size close to the complexity of the dataset. So, it is a reason to choose sizes of network layers in accordance with complexity of particular datasets and not universally for an architecture and general problem statement without relation to data. In the case of fine tuning this suggests that pruning of network layers can improve result accuracy and reliability of prediction due to decrease of numerical noise influence. Results of this article are based on analysis of numerical experiments with train of more than 50000 neural networks for thousands epochs for each network. Almost all the networks begin to fluctuate.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
Arnold, Vladimir I. 2009. “On Functions of Three Variables.” Collected Works: Representations of Functions, Celestial Mechanics and KAM Theory, 1957–1965, 5–8.
Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, PrafullaDhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” In Advances in Neural Information Processing Systems, edited by H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, 33:1877–1901. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
Chen, Tianping, Hong Chen, and Ruey-wen Liu. 1992. “A Constructive Proof and an Extension of Cybenko’s Approximation Theorem.” In Computing Science and Statistics, edited by Connie Page and Raoul LePage, 163–68. New York, NY: Springer New York. [CrossRef]
Cybenko, George. 1989. “Approximation by Superpositions of a Sigmoidal Function.” Mathematics of Control, Signals and Systems 2 (4): 303–14. [CrossRef]
Cybenko, George V. 1989. “Approximation by Superpositions of a Sigmoidal Function.” Mathematics of Control, Signals and Systems 2: 303–14. [CrossRef]
Funahashi, Ken-Ichi. 1989. “On the Approximate Realization of Continuous Mappings by Neural Networks.” Neural Networks 2 (3): 183–92. [CrossRef]
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” CoRR abs/1512.03385. http://arxiv.org/abs/1512.03385.
I., Galushkin A. 1974. SintezMnogosloinyh System RaspoznavaniyaObrazov (in Russian). M.: Energiya.
Kidger, Patrick, and Terry Lyons. 2020. “Universal Approximation with Deep Narrow Networks.” In Conference on Learning Theory, 2306–27. PMLR.
Kolmogorov, A. N. 1957. “On the Representation of Continuous Functions of Many Variables by Superposition of Continuous Functions of One Variable and Addition.” Dokl. Akad. Nauk SSSR 114: 953–56.
Kon, Mark A, and LeszekPlaskota. 2000. “Information Complexity of Neural Networks.” Neural Networks 13 (3): 365–75. [CrossRef]
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, edited by F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger. Vol. 25. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
Lecun, Yann. 1989. “Generalization and Network Design Strategies.” In Connectionism in Perspective, edited by R. Pfeifer, Z. Schreter, F. Fogelman, and L. Steels. Elsevier.
Lecun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1990. “Handwritten Digit Recognition with a Back-Propagation Network.” In Advances in Neural Information Processing Systems 2, edited by D. S. Touretzky, 396–404. Morgan Kaufmann.
Lecun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791. [CrossRef]
Nguyen, Andre T., and Edward Raff. 2018. “Adversarial Attacks, Regression, and Numerical Stability Regularization.” arXiv e-Prints, December, arXiv:1812.02885. https://arxiv.org/abs/1812.02885.
Saarinen, S., R. Bramley, and G. Cybenko. 1993. “Ill-Conditioning in Neural Network Training Problems.” SIAM Journal on Scientific Computing 14 (3): 693–714. https://doi.org/10.1137/0914044. [CrossRef]
Simonyan, Karen, and Andrew Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” CoRR abs/1409.1556. http://arxiv.org/abs/1409.1556.
Sinha, Abhishek, Mayank Singh, and Balaji Krishnamurthy. 2019. “Neural Networks in an Adversarial Setting and Ill-Conditioned Weight Space.” In ECML PKDD 2018 Workshops, edited by Carlos Alzate, Anna Monreale, HaythamAssem, Albert Bifet, Teodora Sandra Buda, Bora Caglayan, Brett Drury, et al., 177–90. Cham: Springer International Publishing. [CrossRef]
Song, Le, Santosh Vempala, John Wilmes, and Bo Xie. 2017. “On the Complexity of Learning Neural Networks.” Advances in Neural Information Processing Systems 30.
Szegedy, Christian, Wei Liu, YangqingJia, Pierre Sermanet, Scott E. Reed, DragomirAnguelov, DumitruErhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. “Going Deeper with Convolutions.” CoRR abs/1409.4842. http://arxiv.org/abs/1409.4842.
Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and ZbigniewWojna. 2015. “Rethinking the Inception Architecture for Computer Vision.” CoRR abs/1512.00567. http://arxiv.org/abs/1512.00567.
Szegedy, Christian, WojciechZaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. “Intriguing properties of neural networks.” arXiv e-Prints, December, arXiv:1312.6199. https://arxiv.org/abs/1312.6199.
Werbos, Paul. 1974. “Beyond Regression:" New Tools for Prediction and Analysis in the Behavioral Sciences.” Ph. D. Dissertation, Harvard University.
Y., LeCun, Boser B., Denker J. S., D. Henderson, Howard R. E., Hubbard W., and Jackel L. D. 1989. “Backpropagation Applied to Handwritten Zip Code Recognition.” Neural Computation 1: 541–51. https://doi.org/10.1162/neco.1989.1.4.541. [CrossRef]