{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T14:07:39Z","timestamp":1762956459615,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T00:00:00Z","timestamp":1600041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there has been an increasing interest in convolutional neural network-based architectures to execute such a task. One of these supervised networks\u2019 critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V (Grand Theft Auto V), where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real domain shift leading to a performance drop when applied to real-world images. To mitigate this gap, we propose two different domain adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection. Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data, exploiting the variety of our synthetic dataset. Furthermore, we demonstrate that with our domain adaptation techniques, we can reduce the Synthetic2Real domain shift, making the two domains closer and obtaining a performance improvement when testing the network over the real-world images.<\/jats:p>","DOI":"10.3390\/s20185250","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T20:51:12Z","timestamp":1600116672000},"page":"5250","update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["Virtual to Real Adaptation of Pedestrian Detectors"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-6985-0439","authenticated-orcid":false,"given":"Luca","family":"Ciampi","sequence":"first","affiliation":[{"name":"Institute of Information Science and Technologies, National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0003-3011-2487","authenticated-orcid":false,"given":"Nicola","family":"Messina","sequence":"additional","affiliation":[{"name":"Institute of Information Science and Technologies, National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy"}]},{"given":"Fabrizio","family":"Falchi","sequence":"additional","affiliation":[{"name":"Institute of Information Science and Technologies, National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-3715-149X","authenticated-orcid":false,"given":"Claudio","family":"Gennaro","sequence":"additional","affiliation":[{"name":"Institute of Information Science and Technologies, National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy"}]},{"given":"Giuseppe","family":"Amato","sequence":"additional","affiliation":[{"name":"Institute of Information Science and Technologies, National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2260","DOI":"10.1109\/TCSVT.2016.2581660","article-title":"A low-complexity pedestrian detection framework for smart video surveillance systems","volume":"27","author":"Bilal","year":"2016","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1007\/s12652-016-0369-0","article-title":"Robust real-time pedestrian detection in surveillance videos","volume":"8","author":"Varga","year":"2017","journal-title":"J. Ambient. Intell. Humaniz. Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1007\/s11263-006-9038-7","article-title":"Multi-cue pedestrian detection and tracking from a moving vehicle","volume":"73","author":"Gavrila","year":"2007","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","unstructured":"Shashua, A., Gdalyahu, Y., and Hayun, G. (2004, January 14\u201317). Pedestrian detection for driving assistance systems: Single-frame classification and system level performance. Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Shao, L., Han, J., Kohli, P., and Zhang, Z. (2014). RGB-D sensor-based computer vision assistive technology for visually impaired persons. Computer Vision and Machine Learning with RGB-D Sensors, Springer.","DOI":"10.1007\/978-3-319-08651-4"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. (2009, January 22\u201324). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R., and Cucchiara, R. (2018, January 8\u201314). Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01225-0_27"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Richter, S.R., Vineet, V., Roth, S., and Koltun, V. (2016, January 8\u201316). Playing for data: Ground truth from computer games. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_7"},{"key":"ref_11","unstructured":"Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lopez, A.M. (July, January 26). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Torralba, A., and Efros, A.A. (2011, January 20\u201325). Unbiased look at dataset bias. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995347"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Amato, G., Ciampi, L., Falchi, F., Gennaro, C., and Messina, N. (2019, January 09\u201313). Learning Pedestrian Detection from Virtual Worlds. Proceedings of the 20th International Conference of Image Analysis and Processing (ICIAP), Trento, Italy.","DOI":"10.1007\/978-3-030-30642-7_27"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Benenson, R., Omran, M., Hosang, J., and Schiele, B. (2014, January 06\u201312). Ten Years of Pedestrian Detection, What Have We Learned?. Proceedings of the 13th European Conference on Computer Vision (ECCV) Workshops, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-16181-5_47"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, S., Bauckhage, C., and Cremers, A.B. (2014, January 24\u201327). Informed Haar-like Features Improve Pedestrian Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.126"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, S., Benenson, R., and Schiele, B. (2015, January 7\u201312). Filtered channel features for pedestrian detection. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298784"},{"key":"ref_17","unstructured":"Zhang, S., Benenson, R., Omran, M., Hosang, J., and Schiele, B. (July, January 26). How Far Are We From Solving Pedestrian Detection?. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_18","unstructured":"Nam, W., Dollar, P., and Han, J.H. (2014, January 8\u201313). Local Decorrelation For Improved Pedestrian Detection. Proceedings of the 2014 Neural Information Processing Systems Conference (NIPS), Quebec, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tian, Y., Luo, P., Wang, X., and Tang, X. (2015, January 11\u201318). Deep Learning Strong Parts for Pedestrian Detection. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.221"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, F., Choi, W., and Lin, Y. (July, January 26). Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.234"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.S., and Vasconcelos, N. (2016, January 8\u201316). A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sermanet, P., Kavukcuoglu, K., Chintala, S., and Lecun, Y. (2013, January 23\u201328). Pedestrian Detection with Unsupervised Multi-stage Feature Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.465"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","article-title":"Pedestrian Detection: An Evaluation of the State of the Art","volume":"34","author":"Dollar","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_25","unstructured":"Milan, A., Leal-Taix\u00e9, L., Reid, I., Roth, S., and Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. arXiv."},{"key":"ref_26","unstructured":"Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., and Leal-Taixe, L. (2019). CVPR19 Tracking and Detection Challenge: How crowded can it get?. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, S., Benenson, R., and Schiele, B. (2017). CityPersons: A Diverse Dataset for Pedestrian Detection. arXiv.","DOI":"10.1109\/CVPR.2017.474"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.neucom.2018.05.083","article-title":"Deep visual domain adaptation: A survey","volume":"312","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_29","unstructured":"Ciampi, L., Santiago, C., Costeira, J.P., Gennaro, C., and Amato, G. (2020). Unsupervised Vehicle Counting via Multiple Camera Domain Adaptation. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., and Chandraker, M. (2018, January 18\u201322). Learning to adapt structured output space for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00780"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Kaneva, B., Torralba, A., and Freeman, W.T. (2011, January 6\u201313). Evaluation of image features using a photorealistic virtual world. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126508"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Mar\u00edn, J., V\u00e1zquez, D., Ger\u00f3nimo, D., and L\u00f3pez, A.M. (2010, January 13\u201318). Learning appearance in virtual scenarios for pedestrian detection. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540218"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bochinski, E., Eiselein, V., and Sikora, T. (2016, January 23\u201326). Training a convolutional neural network for multi-class object detection using solely virtual world data. Proceedings of the 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2016), Colorado Springs, CO, USA.","DOI":"10.1109\/AVSS.2016.7738056"},{"key":"ref_34","unstructured":"Leal-Taix\u00e9, L., Milan, A., Reid, I., Roth, S., and Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., and Vasudevan, R. (2016). Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?. arXiv.","DOI":"10.1109\/ICRA.2017.7989092"},{"key":"ref_36","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_37","unstructured":"Ros, G., Stent, S., Alcantarilla, P.F., and Watanabe, T. (2016). Training constrained deconvolutional networks for road scene semantic segmentation. arXiv."},{"key":"ref_38","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_39","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., and Yan, J. (2016, January 8\u201316). Poi: Multiple object tracking with high performance detection and appearance feature. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_3"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Lin, C., Lu, J., Wang, G., and Zhou, J. (2018, January 8\u201314). Graininess-Aware Deep Feature Learning for Pedestrian Detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_45"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.mdpi.com\/1424-8220\/20\/18\/5250\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:09:57Z","timestamp":1760177397000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.mdpi.com\/1424-8220\/20\/18\/5250"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,14]]},"references-count":41,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["s20185250"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.3390\/s20185250","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2020,9,14]]}}}