ITMO
ru/ ru

ISSN: 1023-5086

ru/

ISSN: 1023-5086

Scientific and technical

Opticheskii Zhurnal

A full-text English translation of the journal is published by Optica Publishing Group under the title “Journal of Optical Technology”

Article submission Подать статью
Больше информации Back

DOI: 10.17586/1023-5086-2023-90-01-37-48

УДК: 004.93

Categorization of objects and scenes by a neural network whose input modules are pre-trained to decode spatial texture inhomogeneities

For Russian citation (Opticheskii Zhurnal):

Явна Д.В., Бабенко В.В., Горбенкова О.А., Плавельский И.В., Вороная В.Д., Столетний А.С. Категоризация объектов и сцен нейронной сетью, входы которой предварительно обучены декодированию пространственных неоднородностей текстуры // Оптический журнал. 2023. Т. 90. № 1. С. 37–48. http://doi.org/10.17586/1023-5086-2023-90-01-37-48

 

Yavna D.V., Babenko V.V., Gorbenkova O.A., Plavelsky I.V., Voronaya V.D., Stoletniy A.S. Categorization of objects and scenes by a neural network whose input modules are pretrained to decode spatial texture inhomogeneities [in Russian] // Opticheskii Zhurnal. 2023. V. 90. № 1. P. 37–48. http://doi.org/10.17586/1023-5086-2023-90-01-37-48

For citation (Journal of Optical Technology):

D. V. Yavna, V. V. Babenko, O. A. Gorbenkova, I. V. Plavelsky, V. D. Voronaya, and A. S. Stoletniy, "Classification of objects and scenes by a neural network with pretrained input modules to decode spatial texture inhomogeneities," Journal of Optical Technology. 90(1), 20-25 (2023). https://doi.org/10.1364/JOT.90.000020

Abstract:

Scope of research. Investigation of the possibility of using neural network models of second-order visual mechanisms as input data for neural network classifiers. Second-order visual mechanisms make it possible to detect spatial inhomogeneities in contrast, orientation, and spatial frequency in an image. These mechanisms are traditionally considered by visual researchers as one of the stages of early visual processing; their role in the perception of textures has been well studied. The purpose of the work is to study whether the use of classifier input modules previously trained to demodulate spatial modulations of brightness gradients will contribute to the categorization of objects and scenes. Method. Neural network modeling was used as the main method. At the first stage of the study, a set of texture images was generated, which is used to train neural network models of second-order visual mechanisms, and these models were trained. At the second stage, samples of objects and scenes were prepared, on which classifier networks were trained. Previously trained models of second-order visual mechanisms with frozen weights were placed at the input of these networks. Main results. The second order information, presented as a map of instantaneous values of the modulation function of contrast, orientation and spatial frequency in the image, may be sufficient to identify only some classes of scenes. In general, within the framework of the proposed neural network architectures, the use of modulation function values for solving the problem of object classification turned out to be ineffective. Thus, the hypothesis that second-order visual filters encode features that allow identifying an object was not confirmed. This result makes it necessary to test an alternative hypothesis that the role of second-order filters is limited to participation in the construction of saliency maps, and the filters themselves are windows through which information comes from the outputs of first-order filters. Practical significance. The possibility of using second-order models of visual mechanisms in computer vision systems was assessed.

 

Acknowledgment: the study was financially supported by the Russian Foundation for Basic Research, project № 18-29-22001 MK "An investigation of neurocognitive technologies of attentional control and formation of mental representations of visual web content".

Keywords:

Keywords: visual processing mechanisms, texture, convolutional neural network, classifier neural network, machine vision

OCIS codes: 100.4996, 330.5370

References:
  1. Treisman A.M., Gelade G. A feature­integration theory of attention // Cognitive Psychology. 1980. V. 12. № 1. P. 97–136.
  2. Sutter A., Beck J., Graham N.V. Contrast and spatial variables in texture segregation: testing a simple spatial­frequency channels model // Percept. Psychophys. 1989. V. 46. № 4. P. 312–332.
  3. Mareschal I., Baker C.L. Temporal and spatial response to second­order stimuli in cat area 18 // J. Neurophysiol. 1998. V. 80. № 6. P. 2811–2823. https://doi.org/10.1152/jn.1998.80.6.2811
  4. Landy M.S., Oruç I. Properties of second­order spatial frequency channels // Vision Res. 2002. V. 42. № 19. P. 2311–2329. https://doi.org/10.1016/s0042­6989(02)00193­1
  5. Derrington A. Second­order visual processing // Optics & Photonics News. 2001. V. 12. № 1. P. 18. https://doi.org/10.1364/OPN.12.1.000018
  6. Huang P.­C., Chen C.­C. A comparison of pedestal effects in first­ and second­order patterns // J. Vision. 2014. V. 14. № 1. P. 9–9. https://doi.org/10.1167/14.1.9
  7. Sutter A., Sperling G., Chubb C. Measuring the spatial frequency selectivity of second­order texture mechanisms // Vision Res. 1995. V. 35. № 7. P. 915–924. https://doi.org/10.1016/0042­6989(94)00196­s
  8. Shelepin Yu.E., Chikhman V.N., Vakhrameeva O.A., Pronin S.V., Foreman N., Pasmore P. Invariance of visual perception [in Russian] // Experimental Psychology (Russia). 2008. V. 1. № 1. P. 7–33. https://elibrary.ru/item.asp?id=13019577
  9. Graham N.V. Beyond multiple pattern analyzers modeled as linear filters (as classical V1 simple cells): useful additions of the last 25 years // Vision Res. 2011. V. 51. № 13. P. 1397–1430. https://doi.org/10.1016/j.visres.2011.02.007
  10. Babenko V.V., Ermakov P.N. Specificity of brain reactions to second­order visual stimuli // Vis Neurosci. 2015. V. 32. P. E011. https://doi.org/10.1017/S0952523815000085
  11. Ellemberg D., Allen H.A., Hess R.F. Second­order spatial frequency and orientation channels in human vision // Vision Res. 2006. V. 46. № 17. P. 2798–2803. https://doi.org/10.1016/j.visres.2006.01.028
  12. Kingdom F.A.A., Prins N., Hayes A. Mechanism independence for texture­modulation detection is consistent with a filter­rectify­filter mechanism // Vis Neurosci. 2003. V. 20. № 1. P. 65–76. https://doi.org/10.1017/s0952523803201073
  13. Schofield A., Cruickshank A. Transfer of tilt after­effects between second­order cues // Spatial Vis. 2005. V. 18. № 4. P. 379–397. https://doi.org/10.1163/1568568054389624
  14. Wolfe J.M. Visual search. Attention. Hove, England: Psychology Press/Erlbaum (UK) Taylor & Francis, 1998. P. 13–73.
  15. Babenko V.V., Yavna D.V. Competition for attention among spatial modulations of brightness gradients [in Russian] // Russian Psychological J. 2018. V. 15. № 3. P. 160–189. https://doi.org/10.21702/rpj.2018.3.8
  16. Yavna D.V., Babenko V.V., Ikonopistseva K.A. Neural network models of second order visual filters // Neural Networks and Neurotechnologies. St. Petersburg, Russia: ВВМ, 2019. P. 198–203.
  17. Yavna D.V., Babenko V.V., Stoletniy A.S., Shchetinina D.P., Alekseeva D.S. Differentiation and decoding of the spatial modulations of textures by the multilayer convolutional neural networks [in Russian] // Russian Foundation for Basic Research J. 2019. V. 4(104). P. 94–104. https://doi.org/10.22204/2410­4639­2019­104­04­94­104
  18. Frey H.­P., König P., Einhäuser W. The role of first­ and second­order stimulus features for human overt attention // Percept Psychophys. 2007. V. 69. № 2. P. 153–161. https://doi.org/10.3758/bf03193738
  19. Johnson A., Zarei A. Second­order saliency predicts observer eye movements when viewing natural images // J. Vision. 2010. V. 10. № 7. P. 526–526. https://doi.org/10.1167/10.7.526
  20. Gavrikov P. Visualkeras software [access mode]: https://github.com/paulgavrikov/visualkeras
  21. Prins N., Kingdom F.A.A. Detection and discrimination of texture modulations defined by orientation, spatial frequency, and contrast // JOSA. A. 2003. V. 20. № 3. P. 401. https://doi.org/10.1364/JOSAA.20.000401
  22. Sandler M., Howard A., Zhu M., et al. MobileNetV2: Inverted residuals and linear bottlenecks // arXiv:1801.04381 [cs]. 2019. Retrieved from http://arxiv.org/abs/1801.04381
  23. Yu F., Seff A., Zhang Y., et al. LSUN: Construction of a large­scale image dataset using deep learning with humans in the loop // arXiv:1506.03365 [cs]. 2016. Retrieved from http://arxiv.org/abs/1506.03365
  24. Xiao J., Hays J., Ehinger K.A., et al. SUN database: Large­scale scene recognition from abbey to zoo // 2010 IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition. 2010. P. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970
  25. Victor J.D., Conte M.M., Chubb C.F. Textures as probes of visual processing // Annu. Rev. Vis. Sci. 2017. V. 3. № 1. P. 275–296. https://doi.org/10.1146/annurev­vision­102016­061316
  26. Uejima T., Niebur E., Etienne­Cummings R. Proto­object based saliency model with second­order texture feature // 2018 IEEE Biomedical Circuits and Systems Conf. (BioCAS). Cleveland, OH: IEEE, 2018. P. 1–4. https://doi.org/10.1109/BIOCAS.2018.8584749
  27. Williams C.C., Castelhano M.S. The changing landscape: High­level influences on eye movement guidance in scenes // J. Vision. 2019. V. 3. № 3. P. 33. https://doi.org/10.3390/vision3030033