Stability investigation of the Pix2Pix conditional generative adversarial network with respect to input semantic image labeling data distortion

Lutsiv, V.R., Yachnaya, V.O.

Full text «Opticheskii Zhurnal»

Full text on elibrary.ru

Publication in Journal of Optical Technology

For Russian citation (Opticheskii Zhurnal):

Ячная В.О., Луцив В.Р. Исследование устойчивости условной генеративно-состязательной сети Pix2Pix к искажению входных данных разметки изображений // Оптический журнал. 2021. Т. 88. № 11. С. 46–55. http://doi.org/10.17586/1023-5086-2021-88-11-46-55

Yachnaya V.O., Lutsiv V.R. Stability investigation of the Pix2Pix conditional generative adversarial network with respect to input semantic image labeling data distortion [in Russian] // Opticheskii Zhurnal. 2021. V. 88. № 11. P. 46–55. http://doi.org/10.17586/1023-5086-2021-88-11-46-55

For citation (Journal of Optical Technology):

V. O. Yachnaya and V. R. Lutsiv, "Stability investigation of the Pix2Pix conditional generative adversarial network with respect to input semantic image labeling data distortion," Journal of Optical Technology. 88(11), 647-653 (2021). https://doi.org/10.1364/JOT.88.000647

Abstract:

The peculiarities of image generation by a pretrained conditional generative adversarial network on the basis of semantic scene labeling are investigated. Semantic labeling can be inaccurate and can contain defects that result, for example, from transforming the graphic file formats in which it was stored or transmitted. Cases are discussed of image generation based on such incorrect data—with modification of the hue, saturation, and brightness of the colors in the color labels of various classes of objects. It is determined that changing the hue of the label has an especially strong negative effect on image generation and could result in altering the class of the labeled object. Thus, the distribution uniformity of the label color parameters through the color spaces should be taken into account. Additional requirements should be introduced on the accuracy with which the color labels are represented.

Keywords:

artificial intelligence, computer vision, artificial neural network, synthetic data, conditional generative adversarial neural network, semantic labeling

OCIS codes: 110.0110, 110.2960, 100.0100, 100.2000, 100.4994

References:

1. H. Yamauchi, J. Haber, and H.-P. Seidel, “Image restoration using multiresolution texture synthesis and image inpainting,” in Proceedings of Computer Graphics International, Tokyo, Japan, 2003, pp. 120–125.
2. M. Roberts, J. Ramapuram, A. Ranjan, A. Kumar, M. A. Bautista, N. Paczan, R. Webb, and J. M. Susskind, “Hypersim: a photorealistic synthetic dataset for holistic indoor scene understanding,” arXiv:2011.02523 (2021).
3. S. I. Nikolenko, “Synthetic data for deep learning,” arXiv:1909.11512 (2019).
4. H. Zhang, T. Xu, H. Li, and S. Zhang, “StackGAN++: realistic image synthesis with stacked generative adversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2019).
5. L. Shapiro and G. Stockman, Computer Vision (Prentice-Hall, New Jersey, 2001).
6. “The task of image translation,” https://neerc.ifmo.ru/wiki/index.php?title=.
7. D. Guo, Y. Pei, K. Zheng, H. Yu, Y. Lu, and S. Wang, “Degraded image semantic segmentation with dense-gram networks,” IEEE Trans. Image Process. 29, 782–795 (2019).
8. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, June 7–12, 2015, pp. 3431–3440.
9. V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: a deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
10. V. O. Yachnaya, M. A. Mikhalkova, E. N. Yablokov, and V. R. Lutsiv, “Noise model effect upon the GAN-synthesized images,” in Proceedings of IEEE Wave Electronics and Its Application in Information and Telecommunication Systems (WECONF-2020), St. Petersburg, Russia, 2020.
11. “Know your data,” https://knowyourdata.withgoogle.com/.
12. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv:1411.1784 (2014).
13. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI-2015), Munich, October 5–9, 2015, pp. 234–241.
14. P. Isola, J.-Y. Zhu, T. Zhou, and A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, July 21–26, 2017, pp. 5967–5976.
15. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7–9, 2015.
16. J. Brownlee, “How to implement Pix2Pix GAN models from scratch with Keras,” 2019, https://machinelearningmastery.com/how-to-implement-pix2pix-gan-models-from-scratch-with-keras/.
17. M. Cordts, M. Omran, S. Ramos, and T. Rehfeld, “The Cityscapes Dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR-2016), Las Vegas, June 27–30, 2016, pp. 3213–3223.
18. A. V. Ga˘ıer, A. V. Sheshkus, and Yu. S. Chernyshova, “Augmentation of the training sample ‘on the fly’ for training neural networks,” Tr. Inst. Sist. Anal. Russ. Akad. Nauk 68(S1), 150–157 (2018).
19. A. Borji, “Pros and cons of GAN evaluation measures,” Comput. Vision Image Understanding 179, 41–65 (2019).
20. Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE Signal Process. Lett. 9(3), 81–84 (2002).
21. P. Jagalingam and A. Hegde, “A review of quality metrics for fused image,” Aquat. Procedia 4, 133–142 (2015).
22. H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Trans. Image Proc. 15(2), 430–444 (2006).