ITMO
ru/ ru

ISSN: 1023-5086

ru/

ISSN: 1023-5086

Scientific and technical

Opticheskii Zhurnal

A full-text English translation of the journal is published by Optica Publishing Group under the title “Journal of Optical Technology”

Article submission Подать статью
Больше информации Back

УДК: 004.931'1

Investigation of the generalizing capabilities of convolutional neural networks in forming rotation-invariant attributes

For Russian citation (Opticheskii Zhurnal):

Малашин Р.О., Кадыков А.Б. Исследование обобщающих способностей сверточных нейронных сетей при формировании признаков, инвариантных к вращению // Оптический журнал. 2015. Т. 82. № 8. С. 24–32.

 

Malashin R.O., Kadykov A.B. Investigation of the generalizing capabilities of convolutional neural networks in forming rotation-invariant attributes [in Russian] // Opticheskii Zhurnal. 2015. V. 82. № 8. P. 24–32.

For citation (Journal of Optical Technology):

R. O. Malashin and A. B. Kadykov, "Investigation of the generalizing capabilities of convolutional neural networks in forming rotation-invariant attributes," Journal of Optical Technology. 82(8), 509-515 (2015). https://doi.org/10.1364/JOT.82.000509

Abstract:

This paper gives the results of a study of the possibilities of convolutional neural networks to generalize knowledge concerning primitive geometrical image transformations when solving pattern-recognition problems of handwritten numerals. Experiments were directed to the study of how the recognition of patterns in arbitrary orientations is affected by broadening the training sample with rotated images. Results are presented for convolutional neural networks of two architectures, showing that, to ensure rotation-invariant recognition, it is necessary for all classes of images in the entire range of rotations to be present in the training sample.

Keywords:

convolutional neural networks, generalizing capability, rotation-invariant image recognition

Acknowledgements:

This work was carried out with the support of the Ministry of Education and Science of the Russian Federation and with the partial state support of the leading universities of the Russian Federation (Subsidy 074-U01).

OCIS codes: 100.4996, 100.5760

References:

1. “Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) results,” http://image‑net.org/challenges/LSVRC/2014/results.
2. A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” Adv. Neural Inform. Proc. 25, 1097 (2012).
3. P. Felzenszwalb, R. Girshick, and D. McAllester, “Cascade object detection with deformable part models,” in Proceedings of the IEEE CVPR, June 13–18, 2010, San Francisco, CA, pp. 2241–2248.
4. X. Wang, M. Yang, S. Zhu, and Y. Lin, “Regionlets for generic object detection,” in ICCV, December 1–8, 2013, Sydney, Australia, pp. 17–24.
5. V. Lutsiv, A. Potapov, T. Novikova, and N. Lapina, “Hierarchical 3D structural matching in the aerospace photographs and indoor scenes,” Proc. SPIE 5807, 455 (2005).
6. A. S. Potapov, “Image matching with the use of the minimum-description-length approach,” Proc. SPIE 5426, 164 (2004).
7. R. Malashin, “Matching of aerospace photographs with the use of local features,” J. Phys. Conf. Ser. 536, 012018 (2014).
8. A. Jerebko, N. Barabanov, V. Lutsiv, and N. Allinson, “Neural-net-based image matching,” Proc. SPIE 3962, 128 (2000).
9. Y. Bengio, M. Monperrus, and H. Larochelle, “Non-local estimation of manifold structure,” Neur. Comput. 81, 2509 (2006).
10. “Caffe: Deep learning framework by the BVLC,” http://caffe.berkeleyvision.org/.
11. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” Proceedings of the ACM International Conference on Multimedia, Orlando, FL, November 3–7, 2014, pp. 675–678.
12. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278 (1998).
13. “The MNIST Database of handwritten digits,” http://yann.lecun.com/exdb/mnist/.
14. “Cuda-convnet: High-performance C++/CUDA implementation of convolutional neural networks,” https://code.google.com/p/cuda‑convnet.
15. “Learning multiple layers of features from tiny images,” Tech. Rep., April 8, 2009, http://www.cs.toronto.edu/~kriz/learning‑features‑2009‑TR.pdf.
16. B. Russell, A. Torralba, K. Murphy, and W. Freeman, “Labelme: A data-base and web-based tool for image annotation,” Int. J. Comput. Vis. 77, 157 (2008).
17. A. Potapov, V. Batishcheva, and M. Peterson, “Limited generalization capabilities of autoencoders with logistic regression on training sets of small sizes,” IFIP Adv. Inform. Commun. Technol. 436, 256 (2014).