Categorization of objects and scenes by a neural network whose input modules are pre-trained to decode spatial texture inhomogeneities

Yavna, D.V., Babenko, V.V., Gorbenkova, O.A., Plavelsky, I.V., Voronaya, V.D., Stoletniy, A.S.

Full text «Opticheskii Zhurnal»

Full text on elibrary.ru

Publication in Journal of Optical Technology

For Russian citation (Opticheskii Zhurnal):

Явна Д.В., Бабенко В.В., Горбенкова О.А., Плавельский И.В., Вороная В.Д., Столетний А.С. Категоризация объектов и сцен нейронной сетью, входы которой предварительно обучены декодированию пространственных неоднородностей текстуры // Оптический журнал. 2023. Т. 90. № 1. С. 37–48. http://doi.org/10.17586/1023-5086-2023-90-01-37-48

Yavna D.V., Babenko V.V., Gorbenkova O.A., Plavelsky I.V., Voronaya V.D., Stoletniy A.S. Categorization of objects and scenes by a neural network whose input modules are pretrained to decode spatial texture inhomogeneities [in Russian] // Opticheskii Zhurnal. 2023. V. 90. № 1. P. 37–48. http://doi.org/10.17586/1023-5086-2023-90-01-37-48

For citation (Journal of Optical Technology):

D. V. Yavna, V. V. Babenko, O. A. Gorbenkova, I. V. Plavelsky, V. D. Voronaya, and A. S. Stoletniy, "Classification of objects and scenes by a neural network with pretrained input modules to decode spatial texture inhomogeneities," Journal of Optical Technology. 90(1), 20-25 (2023). https://doi.org/10.1364/JOT.90.000020

Abstract:

Scope of research. Investigation of the possibility of using neural network models of second-order visual mechanisms as input data for neural network classifiers. Second-order visual mechanisms make it possible to detect spatial inhomogeneities in contrast, orientation, and spatial frequency in an image. These mechanisms are traditionally considered by visual researchers as one of the stages of early visual processing; their role in the perception of textures has been well studied. The purpose of the work is to study whether the use of classifier input modules previously trained to demodulate spatial modulations of brightness gradients will contribute to the categorization of objects and scenes. Method. Neural network modeling was used as the main method. At the first stage of the study, a set of texture images was generated, which is used to train neural network models of second-order visual mechanisms, and these models were trained. At the second stage, samples of objects and scenes were prepared, on which classifier networks were trained. Previously trained models of second-order visual mechanisms with frozen weights were placed at the input of these networks. Main results. The second order information, presented as a map of instantaneous values of the modulation function of contrast, orientation and spatial frequency in the image, may be sufficient to identify only some classes of scenes. In general, within the framework of the proposed neural network architectures, the use of modulation function values for solving the problem of object classification turned out to be ineffective. Thus, the hypothesis that second-order visual filters encode features that allow identifying an object was not confirmed. This result makes it necessary to test an alternative hypothesis that the role of second-order filters is limited to participation in the construction of saliency maps, and the filters themselves are windows through which information comes from the outputs of first-order filters. Practical significance. The possibility of using second-order models of visual mechanisms in computer vision systems was assessed.

Acknowledgment: the study was financially supported by the Russian Foundation for Basic Research, project № 18-29-22001 MK "An investigation of neurocognitive technologies of attentional control and formation of mental representations of visual web content".

Keywords:

Keywords: visual processing mechanisms, texture, convolutional neural network, classifier neural network, machine vision

OCIS codes: 100.4996, 330.5370

References:

Treisman A.M., Gelade G. A featureintegration theory of attention // Cognitive Psychology. 1980. V. 12. № 1. P. 97–136.
Sutter A., Beck J., Graham N.V. Contrast and spatial variables in texture segregation: testing a simple spatialfrequency channels model // Percept. Psychophys. 1989. V. 46. № 4. P. 312–332.
Mareschal I., Baker C.L. Temporal and spatial response to secondorder stimuli in cat area 18 // J. Neurophysiol. 1998. V. 80. № 6. P. 2811–2823. https://doi.org/10.1152/jn.1998.80.6.2811
Landy M.S., Oruç I. Properties of secondorder spatial frequency channels // Vision Res. 2002. V. 42. № 19. P. 2311–2329. https://doi.org/10.1016/s00426989(02)001931
Derrington A. Secondorder visual processing // Optics & Photonics News. 2001. V. 12. № 1. P. 18. https://doi.org/10.1364/OPN.12.1.000018
Huang P.C., Chen C.C. A comparison of pedestal effects in first and secondorder patterns // J. Vision. 2014. V. 14. № 1. P. 9–9. https://doi.org/10.1167/14.1.9
Sutter A., Sperling G., Chubb C. Measuring the spatial frequency selectivity of secondorder texture mechanisms // Vision Res. 1995. V. 35. № 7. P. 915–924. https://doi.org/10.1016/00426989(94)00196s
Shelepin Yu.E., Chikhman V.N., Vakhrameeva O.A., Pronin S.V., Foreman N., Pasmore P. Invariance of visual perception [in Russian] // Experimental Psychology (Russia). 2008. V. 1. № 1. P. 7–33. https://elibrary.ru/item.asp?id=13019577
Graham N.V. Beyond multiple pattern analyzers modeled as linear filters (as classical V1 simple cells): useful additions of the last 25 years // Vision Res. 2011. V. 51. № 13. P. 1397–1430. https://doi.org/10.1016/j.visres.2011.02.007
Babenko V.V., Ermakov P.N. Specificity of brain reactions to secondorder visual stimuli // Vis Neurosci. 2015. V. 32. P. E011. https://doi.org/10.1017/S0952523815000085
Ellemberg D., Allen H.A., Hess R.F. Secondorder spatial frequency and orientation channels in human vision // Vision Res. 2006. V. 46. № 17. P. 2798–2803. https://doi.org/10.1016/j.visres.2006.01.028
Kingdom F.A.A., Prins N., Hayes A. Mechanism independence for texturemodulation detection is consistent with a filterrectifyfilter mechanism // Vis Neurosci. 2003. V. 20. № 1. P. 65–76. https://doi.org/10.1017/s0952523803201073
Schofield A., Cruickshank A. Transfer of tilt aftereffects between secondorder cues // Spatial Vis. 2005. V. 18. № 4. P. 379–397. https://doi.org/10.1163/1568568054389624
Wolfe J.M. Visual search. Attention. Hove, England: Psychology Press/Erlbaum (UK) Taylor & Francis, 1998. P. 13–73.
Babenko V.V., Yavna D.V. Competition for attention among spatial modulations of brightness gradients [in Russian] // Russian Psychological J. 2018. V. 15. № 3. P. 160–189. https://doi.org/10.21702/rpj.2018.3.8
Yavna D.V., Babenko V.V., Ikonopistseva K.A. Neural network models of second order visual filters // Neural Networks and Neurotechnologies. St. Petersburg, Russia: ВВМ, 2019. P. 198–203.
Yavna D.V., Babenko V.V., Stoletniy A.S., Shchetinina D.P., Alekseeva D.S. Differentiation and decoding of the spatial modulations of textures by the multilayer convolutional neural networks [in Russian] // Russian Foundation for Basic Research J. 2019. V. 4(104). P. 94–104. https://doi.org/10.22204/2410463920191040494104
Frey H.P., König P., Einhäuser W. The role of first and secondorder stimulus features for human overt attention // Percept Psychophys. 2007. V. 69. № 2. P. 153–161. https://doi.org/10.3758/bf03193738
Johnson A., Zarei A. Secondorder saliency predicts observer eye movements when viewing natural images // J. Vision. 2010. V. 10. № 7. P. 526–526. https://doi.org/10.1167/10.7.526
Gavrikov P. Visualkeras software [access mode]: https://github.com/paulgavrikov/visualkeras
Prins N., Kingdom F.A.A. Detection and discrimination of texture modulations defined by orientation, spatial frequency, and contrast // JOSA. A. 2003. V. 20. № 3. P. 401. https://doi.org/10.1364/JOSAA.20.000401
Sandler M., Howard A., Zhu M., et al. MobileNetV2: Inverted residuals and linear bottlenecks // arXiv:1801.04381 [cs]. 2019. Retrieved from http://arxiv.org/abs/1801.04381
Yu F., Seff A., Zhang Y., et al. LSUN: Construction of a largescale image dataset using deep learning with humans in the loop // arXiv:1506.03365 [cs]. 2016. Retrieved from http://arxiv.org/abs/1506.03365
Xiao J., Hays J., Ehinger K.A., et al. SUN database: Largescale scene recognition from abbey to zoo // 2010 IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition. 2010. P. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970
Victor J.D., Conte M.M., Chubb C.F. Textures as probes of visual processing // Annu. Rev. Vis. Sci. 2017. V. 3. № 1. P. 275–296. https://doi.org/10.1146/annurevvision102016061316
Uejima T., Niebur E., EtienneCummings R. Protoobject based saliency model with secondorder texture feature // 2018 IEEE Biomedical Circuits and Systems Conf. (BioCAS). Cleveland, OH: IEEE, 2018. P. 1–4. https://doi.org/10.1109/BIOCAS.2018.8584749
Williams C.C., Castelhano M.S. The changing landscape: Highlevel influences on eye movement guidance in scenes // J. Vision. 2019. V. 3. № 3. P. 33. https://doi.org/10.3390/vision3030033