Dynamic termination of computations in computer vision systems

Malashin, R.O.

Full text «Opticheskii Zhurnal»

Full text on elibrary.ru

Publication in Journal of Optical Technology

For Russian citation (Opticheskii Zhurnal):

Малашин Р.О. Динамическая остановка вычислений в системах компьютерного зрения // Оптический журнал. 2022. Т. 89. № 8. С. 54–63. http://doi.org/ 10.17586/1023-5086-2022-89-08-54-63

Malashin R.O. Dynamic termination of computations in computer vision systems [in Russian] // Opticheskii Zhurnal. 2022. V. 89. № 8. P. 54–63. http://doi.org/ 10.17586/1023-5086-2022-89-08-54-63

For citation (Journal of Optical Technology):

R. O. Malashin, "Dynamic termination of computations in computer vision systems," Journal of Optical Technology. 89(8), 469-475 (2022). https://doi.org/10.1364/JOT.89.000469

Abstract:

Subject of study. Two classes of dynamically configurable computer vision systems trained using reinforcement learning algorithms were considered. The first class of models comprises models of visual attention that recognize images by successively viewing their fragments. The second class of models comprises least action classifiers that analyze images indirectly by successively calling pretrained convolutional neural networks. Aim of study. This study investigated the possibility of adding actions to the system for termination of computations so that the models spend more resources on analysis of complex images than on analysis of simpler images. Method. A stop network for termination of computations that receives a hidden state vector of the system at its input and returns a signal to stop or continue computations was added to the investigated architectures. Three-stage curriculum training of the individual network modules was used, and the obtained strategies of image viewing and classifier selection were analyzed. Main results. The proposed model of visual attention with dynamic termination of computations significantly surpassed the existing solutions in terms of accuracy in the recognition of images in the MNIST database and average number of image fragments intelligible to the agent. The importance of curriculum learning was demonstrated. The agent’s use of a similar attention control strategy for different images with adaptations to specific images was demonstrated. A similar effect was observed for a common model of visual attention trained using ImageNet. The dynamic termination of computation for least action classifiers also reduced the average number of actions required for image analysis at a specified recognition accuracy. However, the increase in effectiveness in this case was less prominent. Practical significance. The methods of visual attention developed in this study can be advantageous for designing optoelectronic systems with intelligent control of a camera with a narrow-field lens for target recognition. The technology used in the least action classifiers can be applied to reduce computations in solutions obtained by the Bagging algorithm that averages several models.

Keywords:

recurrent model of attention control, classifier of least action, dynamically configurable systems, stop of calculations

Acknowledgements:

The research was supported by the grant of RSF No. 19-71-00146.

OCIS codes: 150.1135

References:

1. Y. Shelepin, N. Krasilnikov, G. Trufanov, A. Harauzov, S. Pronin, and A. Foking, “The principle of least action and visual perception,” in 29th European Conference on Visual Perception (2006), pp. 725–730.
2. Y. Shelepin and N. Krasilnikov, “Principle of least action, physiology of vision and conditioned reflex theory,” Ross. Fiziol. Zh. im. I. M. Sechenova 89(6), 725–730 (2003).
3. R. Malashin, “Principle of least action in dynamically configured image analysis systems,” J. Opt. Technol. 86(11), 678–685 (2019) [Opt. Zh. 86(11), 5–13 (2019)].
4. R. O. Malashin, “Sparsely ensembled convolutional neural network classifiers via reinforcement learning,” in Proceedings of the 2021 6th International Conference on Machine Learning Technologies (2021), pp. 102–110.
5. K. A. Skuratova, E. Yu, Shelepin, and N. P. Yarovaya, “Optical search and visual expertise,” J. Opt. Technol. 88(12), 700–705 (2021) [Opt. Zh. 88(12), 28–35 (2021)].
6. A. Biedenkapp, H. Furkan Bozkurt, T. Eimer, F. Hutter, and M. Lindauer, “Dynamic algorithm configuration: foundation of a new meta-algorithmic framework,” in 24th European Conference on Artificial Intelligence (2020), pp. 427–434.
7. Z. Li, Y. Yi, X. Liu, F. Zhou, S. Wen, and W. Xu, “Dynamic computational time for visual attention,” arXiv:1703.10332 (2017).
8. V. Mnih, N. Heess, A. Graves, and K. Kavukcouglu, “Recurrent models of visual attention,” arXiv:1406.6247 (2014).
9. Y. Wang, K. Lv, R. Huang, S. Song, L. Yang, and G. Huang, “Glance and focus: a dynamic approach to reducing spatial redundancy in image classification,” in Advances in Neural Information Processing Systems (2020), pp. 2429–2441.
10. K. Yu, X. Wang, C. Dong, X. Tang, and C. Loy, “Path-restore: learning network path selection for image restoration,” arXiv:1904.10343 (2019).
11. C. Huang, S. Lucey, and D. Ramanan, “Learning policies for adaptive tracking with deep feature cascades,” in International Conference on Computer Vision (2017), pp. 105–114.
12. T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama, “Adaptive neural networks for efficient inference,” in International Conference on Machine Learning (2017), pp. 527–536.
13. A. Ruiz and J. Verbeek, “Adaptative inference cost with convolutional neural mixture models,” in International Conference on Computer Vision (2019), pp. 1872–1881.
14. R. O. Malashin, “Training an improved recurrent attention model using an alternative reward function,” J. Opt. Technol. 88(3), 127–130 (2021) [Opt. Zh. 88(3), 18–23 (2021)].
15. A. Cheng, C. H. Lin, D. Juan, and M. Sun, “InstaNAS: Instance-aware Neural Architecture Search,” arXiv:1811.10201 (2019).
16. J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” arXiv:1412.7755v2 (2015).