DOI: 10.17586/1023-5086-2023-90-01-37-48
УДК: 004.93
Categorization of objects and scenes by a neural network whose input modules are pre-trained to decode spatial texture inhomogeneities
Full text on elibrary.ru
Publication in Journal of Optical Technology
Явна Д.В., Бабенко В.В., Горбенкова О.А., Плавельский И.В., Вороная В.Д., Столетний А.С. Категоризация объектов и сцен нейронной сетью, входы которой предварительно обучены декодированию пространственных неоднородностей текстуры // Оптический журнал. 2023. Т. 90. № 1. С. 37–48. http://doi.org/10.17586/1023-5086-2023-90-01-37-48
Yavna D.V., Babenko V.V., Gorbenkova O.A., Plavelsky I.V., Voronaya V.D., Stoletniy A.S. Categorization of objects and scenes by a neural network whose input modules are pretrained to decode spatial texture inhomogeneities [in Russian] // Opticheskii Zhurnal. 2023. V. 90. № 1. P. 37–48. http://doi.org/10.17586/1023-5086-2023-90-01-37-48
D. V. Yavna, V. V. Babenko, O. A. Gorbenkova, I. V. Plavelsky, V. D. Voronaya, and A. S. Stoletniy, "Classification of objects and scenes by a neural network with pretrained input modules to decode spatial texture inhomogeneities," Journal of Optical Technology. 90(1), 20-25 (2023). https://doi.org/10.1364/JOT.90.000020
Scope of research. Investigation of the possibility of using neural network models of second-order visual mechanisms as input data for neural network classifiers. Second-order visual mechanisms make it possible to detect spatial inhomogeneities in contrast, orientation, and spatial frequency in an image. These mechanisms are traditionally considered by visual researchers as one of the stages of early visual processing; their role in the perception of textures has been well studied. The purpose of the work is to study whether the use of classifier input modules previously trained to demodulate spatial modulations of brightness gradients will contribute to the categorization of objects and scenes. Method. Neural network modeling was used as the main method. At the first stage of the study, a set of texture images was generated, which is used to train neural network models of second-order visual mechanisms, and these models were trained. At the second stage, samples of objects and scenes were prepared, on which classifier networks were trained. Previously trained models of second-order visual mechanisms with frozen weights were placed at the input of these networks. Main results. The second order information, presented as a map of instantaneous values of the modulation function of contrast, orientation and spatial frequency in the image, may be sufficient to identify only some classes of scenes. In general, within the framework of the proposed neural network architectures, the use of modulation function values for solving the problem of object classification turned out to be ineffective. Thus, the hypothesis that second-order visual filters encode features that allow identifying an object was not confirmed. This result makes it necessary to test an alternative hypothesis that the role of second-order filters is limited to participation in the construction of saliency maps, and the filters themselves are windows through which information comes from the outputs of first-order filters. Practical significance. The possibility of using second-order models of visual mechanisms in computer vision systems was assessed.
Acknowledgment: the study was financially supported by the Russian Foundation for Basic Research, project № 18-29-22001 MK "An investigation of neurocognitive technologies of attentional control and formation of mental representations of visual web content".
Keywords: visual processing mechanisms, texture, convolutional neural network, classifier neural network, machine vision
OCIS codes: 100.4996, 330.5370
References:- Treisman A.M., Gelade G. A featureintegration theory of attention // Cognitive Psychology. 1980. V. 12. № 1. P. 97–136.
- Sutter A., Beck J., Graham N.V. Contrast and spatial variables in texture segregation: testing a simple spatialfrequency channels model // Percept. Psychophys. 1989. V. 46. № 4. P. 312–332.
- Mareschal I., Baker C.L. Temporal and spatial response to secondorder stimuli in cat area 18 // J. Neurophysiol. 1998. V. 80. № 6. P. 2811–2823. https://doi.org/10.1152/jn.1998.80.6.2811
- Landy M.S., Oruç I. Properties of secondorder spatial frequency channels // Vision Res. 2002. V. 42. № 19. P. 2311–2329. https://doi.org/10.1016/s00426989(02)001931
- Derrington A. Secondorder visual processing // Optics & Photonics News. 2001. V. 12. № 1. P. 18. https://doi.org/10.1364/OPN.12.1.000018
- Huang P.C., Chen C.C. A comparison of pedestal effects in first and secondorder patterns // J. Vision. 2014. V. 14. № 1. P. 9–9. https://doi.org/10.1167/14.1.9
- Sutter A., Sperling G., Chubb C. Measuring the spatial frequency selectivity of secondorder texture mechanisms // Vision Res. 1995. V. 35. № 7. P. 915–924. https://doi.org/10.1016/00426989(94)00196s
- Shelepin Yu.E., Chikhman V.N., Vakhrameeva O.A., Pronin S.V., Foreman N., Pasmore P. Invariance of visual perception [in Russian] // Experimental Psychology (Russia). 2008. V. 1. № 1. P. 7–33. https://elibrary.ru/item.asp?id=13019577
- Graham N.V. Beyond multiple pattern analyzers modeled as linear filters (as classical V1 simple cells): useful additions of the last 25 years // Vision Res. 2011. V. 51. № 13. P. 1397–1430. https://doi.org/10.1016/j.visres.2011.02.007
- Babenko V.V., Ermakov P.N. Specificity of brain reactions to secondorder visual stimuli // Vis Neurosci. 2015. V. 32. P. E011. https://doi.org/10.1017/S0952523815000085
- Ellemberg D., Allen H.A., Hess R.F. Secondorder spatial frequency and orientation channels in human vision // Vision Res. 2006. V. 46. № 17. P. 2798–2803. https://doi.org/10.1016/j.visres.2006.01.028
- Kingdom F.A.A., Prins N., Hayes A. Mechanism independence for texturemodulation detection is consistent with a filterrectifyfilter mechanism // Vis Neurosci. 2003. V. 20. № 1. P. 65–76. https://doi.org/10.1017/s0952523803201073
- Schofield A., Cruickshank A. Transfer of tilt aftereffects between secondorder cues // Spatial Vis. 2005. V. 18. № 4. P. 379–397. https://doi.org/10.1163/1568568054389624
- Wolfe J.M. Visual search. Attention. Hove, England: Psychology Press/Erlbaum (UK) Taylor & Francis, 1998. P. 13–73.
- Babenko V.V., Yavna D.V. Competition for attention among spatial modulations of brightness gradients [in Russian] // Russian Psychological J. 2018. V. 15. № 3. P. 160–189. https://doi.org/10.21702/rpj.2018.3.8
- Yavna D.V., Babenko V.V., Ikonopistseva K.A. Neural network models of second order visual filters // Neural Networks and Neurotechnologies. St. Petersburg, Russia: ВВМ, 2019. P. 198–203.
- Yavna D.V., Babenko V.V., Stoletniy A.S., Shchetinina D.P., Alekseeva D.S. Differentiation and decoding of the spatial modulations of textures by the multilayer convolutional neural networks [in Russian] // Russian Foundation for Basic Research J. 2019. V. 4(104). P. 94–104. https://doi.org/10.22204/2410463920191040494104
- Frey H.P., König P., Einhäuser W. The role of first and secondorder stimulus features for human overt attention // Percept Psychophys. 2007. V. 69. № 2. P. 153–161. https://doi.org/10.3758/bf03193738
- Johnson A., Zarei A. Secondorder saliency predicts observer eye movements when viewing natural images // J. Vision. 2010. V. 10. № 7. P. 526–526. https://doi.org/10.1167/10.7.526
- Gavrikov P. Visualkeras software [access mode]: https://github.com/paulgavrikov/visualkeras
- Prins N., Kingdom F.A.A. Detection and discrimination of texture modulations defined by orientation, spatial frequency, and contrast // JOSA. A. 2003. V. 20. № 3. P. 401. https://doi.org/10.1364/JOSAA.20.000401
- Sandler M., Howard A., Zhu M., et al. MobileNetV2: Inverted residuals and linear bottlenecks // arXiv:1801.04381 [cs]. 2019. Retrieved from http://arxiv.org/abs/1801.04381
- Yu F., Seff A., Zhang Y., et al. LSUN: Construction of a largescale image dataset using deep learning with humans in the loop // arXiv:1506.03365 [cs]. 2016. Retrieved from http://arxiv.org/abs/1506.03365
- Xiao J., Hays J., Ehinger K.A., et al. SUN database: Largescale scene recognition from abbey to zoo // 2010 IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition. 2010. P. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970
- Victor J.D., Conte M.M., Chubb C.F. Textures as probes of visual processing // Annu. Rev. Vis. Sci. 2017. V. 3. № 1. P. 275–296. https://doi.org/10.1146/annurevvision102016061316
- Uejima T., Niebur E., EtienneCummings R. Protoobject based saliency model with secondorder texture feature // 2018 IEEE Biomedical Circuits and Systems Conf. (BioCAS). Cleveland, OH: IEEE, 2018. P. 1–4. https://doi.org/10.1109/BIOCAS.2018.8584749
- Williams C.C., Castelhano M.S. The changing landscape: Highlevel influences on eye movement guidance in scenes // J. Vision. 2019. V. 3. № 3. P. 33. https://doi.org/10.3390/vision3030033