{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:54:27Z","timestamp":1774630467064,"version":"3.50.1"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,10,18]],"date-time":"2019-10-18T00:00:00Z","timestamp":1571356800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Australia?Germany Joint Research Co?operation Scheme"},{"name":"CMCRC scholarship"},{"DOI":"10.13039\/501100001655","name":"German Academic Exchange Service","doi-asserted-by":"crossref","award":["57388068"],"award-info":[{"award-number":["57388068"]}],"id":[{"id":"10.13039\/501100001655","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>\n            The computational complexity of neural networks for large-scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This article demonstrates, for the case where the neural network architecture and ternary weight values are known\n            <jats:italic>a priori<\/jats:italic>\n            , that extremely high throughput implementations of neural network inference can be made by customising the datapath and routing to remove unnecessary computations and data movement. This approach is ideally suited to FPGA implementations as a specialized implementation of a trained network improves efficiency while still retaining generality with the reconfigurability of an FPGA. A VGG-style network with ternary weights and fixed point activations is implemented for the CIFAR10 dataset on Amazon\u2019s AWS F1 instance. This article demonstrates how to remove 90% of the operations in convolutional layers by exploiting sparsity and compile-time optimizations. The implementation in hardware achieves 90.9 \u00b1 0.1% accuracy and 122k frames per second, with a latency of only 29\u00b5s, which is the fastest CNN inference implementation reported so far on an FPGA.\n          <\/jats:p>","DOI":"10.1145\/3359983","type":"journal-article","created":{"date-parts":[[2019,10,18]],"date-time":"2019-10-18T12:58:45Z","timestamp":1571403525000},"page":"1-23","update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Unrolling Ternary Neural Networks"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-5884-2417","authenticated-orcid":false,"given":"Stephen","family":"Tridgell","sequence":"first","affiliation":[{"name":"The University of Sydney, NSW, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Kumm","sequence":"additional","affiliation":[{"name":"Fulda University of Applied Sciences, Fulda, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Hardieck","sequence":"additional","affiliation":[{"name":"University of Kassel, Kassel, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Boland","sequence":"additional","affiliation":[{"name":"The University of Sydney, NSW, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Duncan","family":"Moss","sequence":"additional","affiliation":[{"name":"The University of Sydney, NSW, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Zipf","sequence":"additional","affiliation":[{"name":"University of Kassel, Kassel, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philip H. W.","family":"Leong","sequence":"additional","affiliation":[{"name":"The University of Sydney, NSW, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2682138"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228584"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2018.00032"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/SiPS.2017.8110021"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1984.1164433"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_2_1_7_1","unstructured":"Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems. 4107--4115.  Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems. 4107--4115."},{"key":"e_1_2_1_8_1","volume-title":"Leong","author":"Faraone Julian","year":"2017","unstructured":"Julian Faraone , Nicholas Fraser , Giulio Gambardella , Michaela Blott , and Philip H. W . Leong . 2017 . Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. In Proceedings of the International Conference on Neural Information Processing. Springer , 393--404. Julian Faraone, Nicholas Fraser, Giulio Gambardella, Michaela Blott, and Philip H. W. Leong. 2017. Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 393--404."},{"key":"e_1_2_1_9_1","unstructured":"Linux Foundation. 2015. Data Plane Development Kit (DPDK). Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/www.dpdk.org.  Linux Foundation. 2015. Data Plane Development Kit (DPDK). Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/www.dpdk.org."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Nicholas J. Fraser Yaman Umuroglu Giulio Gambardella Michaela Blott Philip Leong Magnus Jahre and Kees Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms. ACM 25--30.  Nicholas J. Fraser Yaman Umuroglu Giulio Gambardella Michaela Blott Philip Leong Magnus Jahre and Kees Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms. ACM 25--30.","DOI":"10.1145\/3029580.3029586"},{"key":"e_1_2_1_11_1","volume-title":"Fractional max-pooling","author":"Graham Ben","year":"2014","unstructured":"Ben Graham . 2014. Fractional max-pooling ( 2014 ). arXiv preprint arXiv:1412.6071 (2014). Ben Graham. 2014. Fractional max-pooling (2014). arXiv preprint arXiv:1412.6071 (2014)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICECS.2018.8617860"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2005.859052"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 30th IEEE International System-on-Chip Conference (SOCC'17)","author":"Kim Jin Hee","unstructured":"Jin Hee Kim , Brett Grady , Ruolong Lian , John Brothers , and Jason H. Anderson . 2017. FPGA-based CNN inference accelerator synthesized from multi-threaded C software . In Proceedings of the 30th IEEE International System-on-Chip Conference (SOCC'17) . IEEE, 268--273. Jin Hee Kim, Brett Grady, Ruolong Lian, John Brothers, and Jason H. Anderson. 2017. FPGA-based CNN inference accelerator synthesized from multi-threaded C software. In Proceedings of the 30th IEEE International System-on-Chip Conference (SOCC'17). IEEE, 268--273."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2017.2701365"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2012.6272072"},{"key":"e_1_2_1_19_1","volume-title":"Deep learning. Nature 521, 7553 (5","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun , Yoshua Bengio , and Geoffrey Hinton . 2015. Deep learning. Nature 521, 7553 (5 2015 ), 436--444. DOI:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1038\/nature14539 10.1038\/nature14539 Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (5 2015), 436--444. DOI:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1038\/nature14539"},{"key":"e_1_2_1_20_1","volume-title":"Ternary weight networks. arXiv preprint arXiv:1605.04711","author":"Li Fengfu","year":"2016","unstructured":"Fengfu Li , Bo Zhang , and Bin Liu . 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 ( 2016 ). Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021786"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.09.046"},{"key":"e_1_2_1_23_1","volume-title":"Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462","author":"Mellempudi Naveen","year":"2017","unstructured":"Naveen Mellempudi , Abhisek Kundu , Dheevatsa Mudigere , Dipankar Das , Bharat Kaul , and Pradeep Dubey . 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 ( 2017 ). Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, and Pradeep Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284357"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL\u201917)","author":"Moss Duncan J. M.","unstructured":"Duncan J. M. Moss , Eriko Nurvitadhi , Jaewoong Sim , Asit Mishra , Debbie Marr , Suchit Subhaschandra , and Philip H. W. Leong . 2017. High performance binary neural networks on the Xeon+ FPGA\u2122 platform . In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL\u201917) . IEEE, 1--4. Duncan J. M. Moss, Eriko Nurvitadhi, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H. W. Leong. 2017. High performance binary neural networks on the Xeon+ FPGA\u2122 platform. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL\u201917). IEEE, 1--4."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056850"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847265"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952679"},{"key":"e_1_2_1_31_1","volume-title":"Reconfigurable processor for deep learning in autonomous vehicles. ITU Journal: ICT Discoveries 1","author":"Wang Yanshu","year":"2017","unstructured":"Yanshu Wang , Shuang Liang , Song Yao , Yi Shan , Song Han , Junjie Peng , and Hong Xia Luo . 2017. Reconfigurable processor for deep learning in autonomous vehicles. ITU Journal: ICT Discoveries 1 ( 2017 ). Yanshu Wang, Shuang Liang, Song Yao, Yi Shan, Song Han, Junjie Peng, and Hong Xia Luo. 2017. Reconfigurable processor for deep learning in autonomous vehicles. ITU Journal: ICT Discoveries 1 (2017)."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the World Congress on Engineering and Computer Science","volume":"2","author":"Wu Ning","year":"2013","unstructured":"Ning Wu , Xiaoqiang Zhang , Yunfei Ye , and Lidong Lan . 2013 . Improving common subexpression elimination algorithm with a new gate-level delay computing method . In Proceedings of the World Congress on Engineering and Computer Science , Vol. 2 . Ning Wu, Xiaoqiang Zhang, Yunfei Ye, and Lidong Lan. 2013. Improving common subexpression elimination algorithm with a new gate-level delay computing method. In Proceedings of the World Congress on Engineering and Computer Science, Vol. 2."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021727"},{"key":"e_1_2_1_34_1","volume-title":"DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160","author":"Zhou Shuchang","year":"2016","unstructured":"Shuchang Zhou , Yuxin Wu , Zekun Ni , Xinyu Zhou , He Wen , and Yuheng Zou . 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 ( 2016 ). Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3359983","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/pdf\/10.1145\/3359983","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:27Z","timestamp":1750202007000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3359983"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,18]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3359983"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/3359983","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,18]]},"assertion":[{"value":"2018-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}