ru/ ru

ISSN: 1023-5086


ISSN: 1023-5086

Scientific and technical

Opticheskii Zhurnal

A full-text English translation of the journal is published by Optica Publishing Group under the title “Journal of Optical Technology”

Article submission Подать статью
Больше информации Back

DOI: 10.17586/1023-5086-2022-89-08-08-23

УДК: 004.93.12, 004.932.72.1, 004.832.2

Training a dynamically configurable classifier with deep Q-learning

For Russian citation (Opticheskii Zhurnal):

Малашин Р.О., Бойко А.А. Обучение динамически конфигурируемого классификатора с использованием глубокого Q-обучения // Оптический журнал. 2022. Т. 89. № 8. С. 8–23.


Malashin R.O., Boiko A.A. Training a dynamically configurable classifier with deep Q-learning  [in Russian] // Opticheskii Zhurnal. 2022. V. 89. № 8. P. 8–23.

For citation (Journal of Optical Technology):

R. O. Malashin and A. A. Boiko, "Training a dynamically configurable classifier with deep Q-learning," Journal of Optical Technology. 89(8), 437-447 (2022).


Subject of study. We studied dynamic networks capable of performing calculations from input data. Aim. We studied whether deep Q-learning can be used for the construction of dynamic computer vision networks. Methods. In modern dynamically configurable systems, image analysis is typically performed using a policy gradient algorithm. We propose a method for hybrid Q-learning by an image classification agent taking into account limitations on available computer resources. We train the agent to recognize images using a set of pretrained classifiers, and the resulting dynamically configurable system is capable of constructing a computational graph that takes into account the limitations on the number of operations with a trajectory that corresponds to the maximum expected accuracy. The agent only receives an award when the image is correctly recognized within a limit on the number of actions that can be taken by the agent. Experiments were performed using the CIFAR-10 image database and a set of six external classifiers that the agent was trained to control. The experiments performed showed that the standard deep learning method using action values (Deep Q-Network) does not permit the agent to learn strategies that are better than random ones in terms of recognition accuracy. We therefore propose a Q-least-action classifier that approximates the desired classifier selection function by reinforcement learning and the label prediction function by supervised learning. Main results. The trained agent exceeded the recognition accuracy of random strategies (reduces the error by 9.65%). We show that such an agent can make explicit use of information from several classifiers since the accuracy increases when the number of permitted actions increases. Practical significance. Our research shows that the deep Q-learning method is capable of extracting information from sparse responses by classifiers as well as a least-action classifier trained by the policy-gradient method. In addition, the method proposed herein did not require the development of special loss functions.


dynamically configurable calculations, least-action principle, reinforcement learning, ensemble of methods, deterministic planning, image analysis, image classification


The research was supported by Russian Science Foundation (project No. 19-71-00146).

OCIS codes: 150.1135, 100.4996


1. E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,” arXiv:1511.06297 (2015).
2. N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: the sparsely-gated mixture-of-experts layer,” arXiv:1701.06538 (2017).
3. T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama, “Adaptive neural networks for efficient inference,” in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 527–536.
4. A. Ruiz and J. Verbeek, “Adaptative inference cost with convolutional neural mixture models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 1872–1881.
5. Y. Wang, K. Lv, R. Huang, S. Song, L. Yang, and G. Huang, “Glance and focus: a dynamic approach to reducing spatial redundancy in image classification,” Adv. Neural Inf. Process. Syst. 33, 2432–2444 (2020).
6. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: a large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 248–255.
7. A.-C. Cheng, C. H. Lin, D.-C. Juan, W. Wei, and M. Sun, “InstaNAS: instance-aware neural architecture search,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence (2020), pp. 3577–3584.
8. R. O. Malashin, “Principle of least action in dynamically configured image analysis systems,” J. Opt. Technol. 86(11), 678–685 (2019) [Opt. Zh. 86(11), 5–13 (2019)].
9. A. Biedenkapp, H. F. Bozkurt, T. Eimer, F. Hutter, and M. Lindauer, “Dynamic algorithm configuration: foundation of a new meta-algorithmic framework,” in Proceedings of the Twenty-Fourth European Conference on Artificial Intelligence (2020), pp. 427–434.
10. L. S. Polak, ed., Variational Principles in Mechanics (Fizmatlit, Moscow, 1959).
11. Yu. E. Shelepin and N. N. Krasil’nikov, “The least action principle, physiology of vision, and stimulus-response theory,” Ross. Fiziol. Zh. im. I. M. Sechenova 89(6), 725–730 (2003).
12. R. O. Malashin, “Sparsely ensembled convolutional neural network classifiers via reinforcement learning,” in 6th International Conference on Machine Learning Technologies (2021), pp. 102–110.
13. T. Van de Wiele, D. Warde-Farley, A. Mnih, and V. Mnih, “Q-learning in enormous action spaces via amortized approximate maximization,” arXiv:2001.08116 (2020).
14. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature 518, 529–533 (2015).
15. D. H. Wolpert, “Stacked generalization,” Neural Networks 5, 241–259 (1992).
16. M. Lapan, Deep Reinforcement Learning Hands-On: Apply Modern RL Methods to Practical Problems of Chatbots, Robotics, Discrete Optimization, Web Automation, and More (Packt Publishing Ltd., Birmingham, UK, 2020).