A real-time DeepLabv3+ for pedestrian segmentation

Yang, W., Zhang, J. L., Xu, Z. Y., Hu, K.

Full text «Opticheskii Zhurnal»

Full text on elibrary.ru

Publication in Journal of Optical Technology

For Russian citation (Opticheskii Zhurnal):

W. Yang, J. L. Zhang, Z. Y. Xu, and K. Hu A real-time DeepLabv3+ for pedestrian segmentation (Сегментация сцен с пешеходами в реальном времени на основе метода DeepLabv3+) [на англ. яз.] // Оптический журнал. 2019. Т. 86. № 9. С. 49–59. http://doi.org/10.17586/1023-5086-2019-86-09-49-59

W. Yang, J. L. Zhang, Z. Y. Xu, and K. Hu A real-time DeepLabv3+ for pedestrian segmentation (Сегментация сцен с пешеходами в реальном времени на основе метода DeepLabv3+) [in English] // Opticheskii Zhurnal. 2019. V. 86. № 9. P. 49–59. http://doi.org/10.17586/1023-5086-2019-86-09-49-59

For citation (Journal of Optical Technology):

W. Yang, J. L. Zhang, Z. Y. Xu, and K. Hu, "Real-time DeepLabv3+ for pedestrian segmentation," Journal of Optical Technology. 86(9), 570-578 (2019). https://doi.org/10.1364/JOT.86.000570

Abstract:

In this paper, we propose a real-time pedestrian segmentation method which is built on the structure of semantic segmentation method DeepLabv3+. We design a shallow network as the backbone of DeepLabv3+, and also a new convolution block is proposed to fuse multi-level and multi-type features. We first train our DeepLabv3+ on Cityscapes dataset to segment objects of 19 classes, and then fine tune it with person and rider in Cityscapes and COCO as foreground and other classes as background to get our pedestrian segmentation model. The experimental results show that our DeepLabv3+ can get 89.0% mean Intersection-over-Union pedestrian segmentation accuracy on Cityscapes validation set. Our method also reaches a speed of 33 frames per second on images with a resolution of 720×1280 with GTX 1080Ti GPU. Experimental results prove that our method can be applied to various scenes with fast speed.

Keywords:

pedestrian segmentation, semantic segmentation, convolution

OCIS codes: 100.4996, 100.2000

References:

1. Zhao T., Nevatia R. Bayesian human segmentation in crowded situations // 2003 IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition. Madison, 2003. P. 459.
2. Hernández A., Reyes M., Escalera S., Radeva P. Spatio-temporal grabcut human segmentation for face and pose recovery // 2010 IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition. San Francisco, 2010. P. 33–40.
3. Hernández-Vela A., Reyes M., Ponce V., Escalera S. GrabCut-based human segmentation in video sequences // Sensors. V. 12. № 11. P. 15376–15393.
4. Rother C., Kolmogorov V., Blake A. Grabcut: Interactive foreground extraction using iterated graph cuts // ACM Trans. Graph. V. 23. № 3. P. 309–314.
5. Long J., Shelhamer E., Darrel T. Fully convolutional networks for semantic segmentation // 2015 IEEE Conf. Computer Vision and Pattern Recognition. Boston, 2015. P. 3431–3440.
6. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional networks for biomedical image segmentation // Internat. Conf. Medical Image Computing and Computer-Assisted Intervention. Munich, 2015. P. 234–241.
7. Paszke A., Chaurasia A., Kim S., Culurciello E. E Net: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 [cs.CV].
8. Romera E., Alvarez J.M., Bergasa L.M., Arroyo R. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation // IEEE Trans. Intell. Transp. Syst. V. 19. № 1. P. 263–272.
9. Badrinarayanan V., Kendall A., Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation // IEEE Trans. Pattern Anal. Mach. Intell. V. 39. № 12. P. 2481–2495.

10. Zhao H., Shi J., Qi X., Wang X., Jia J. Pyramid scene parsing network // 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR). Hawaii, 2017. P. 2881–2890.
11. Chen L.C., Papandreou G., Kokkinos I., Murphy K., Yuille A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs // IEEE Trans. Pattern Anal. Mach. Intell. V. 40. № 4. P. 834–848.
12. Chen L.C., Papandreou G., Schroff F., Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 [cs.CV].
13. Chen L.C., Zhu Y., Papandreou G., Schroff F., Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation // European Conf. Computer Vision (ECCV). Munich, 2018. P. 833–851.
14. Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition // ICLR. San Diego, 2015.
15. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition // 2016 IEEE Conf. Computer Vision and Pattern Recognition. Las Vegas, 2016. P. 770–778.
16. Song C., Huang Y., Wang Z., Wang L. 1000 fps human segmentation with deep convolutional neural networks // 2015 3rd IAPR Asian Conf. Pattern Recognition (ACPR). Kuala Lumpur, 2015. P. 474–478 .
17. Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A.C., Li F. Imagenet large scale visual recognition challenge // J. Comput. Vis. V. 115. № 3. P. 211–252.
18. Cordts M., Omran M., Ramos S., Scharwächter T., Enzweiler M., Benenson R., Franke U., Roth S., Schiele B. The cityscapes dataset for semantic urban scene understanding // 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR). Las Vegas, 2016. P. 3213–3223.
19. Lin T.-Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C.L., Dollár P. Microsoft COCO: Common objects in context. arXiv:1405.0312 [cs.CV].
20. Chollet F. Xception: Deep learning with depthwise separable convolutions // 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR). Hawaii, 2017. P. 1800–1807.
21. Ioffe S., Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift // 2015 JMLR Internat. Conf. on Machine Learning (ICML). Lille, 2015. P. 448–456.
22. Lin T., Goyal P., Girshick R., He K., Dollár P. Focal loss for dense object detection // Int. Conf. on Computer Vision (ICCV). Venice, 2017. P. 2999–3007.
23. Paszke A., Gross S., Chintala S., Chanan G., Yang E., DeVito Z., Lin Z., Desmaison A., Antiga L., Lerer A. Automatic differentiation in PyTorch // NIPS 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques. Long Beach, 2017.