{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,12]],"date-time":"2023-09-12T07:15:44Z","timestamp":1694502944702},"reference-count":23,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2012,7,4]],"date-time":"2012-07-04T00:00:00Z","timestamp":1341360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2013,2]]},"abstract":"<jats:title>SUMMARY<\/jats:title><jats:p>The block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix\u2013vector multiplication is the most time\u2010consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the linear algebra step of the number field sieve (NFS) for integer factorization. In this paper, we derive an efficient CUDA implementation of this operation by using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single graphics processing unit (GPU) for a number of tested NFS matrices compared with an optimized multicore implementation. We further present a GPU cluster implementation of the full BW for NFS matrices. A small\u2010sized GPU cluster is able to outperform CPU clusters of larger size for large matrices such as the one obtained from the Kilobit special NFS factorization. Copyright \u00a9 2012 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.2896","type":"journal-article","created":{"date-parts":[[2012,7,4]],"date-time":"2012-07-04T21:58:49Z","timestamp":1341439129000},"page":"586-603","source":"Crossref","is-referenced-by-count":4,"title":["Iterative sparse matrix\u2013vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi\u2010graphics processing unit systems"],"prefix":"10.1002","volume":"25","author":[{"given":"Bertil","family":"Schmidt","sequence":"first","affiliation":[{"name":"Institut f\u00fcr Informatik University of Mainz  55128 Mainz Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hans","family":"Aribowo","sequence":"additional","affiliation":[{"name":"Institut f\u00fcr Informatik University of Mainz  55128 Mainz Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hoang\u2010Vu","family":"Dang","sequence":"additional","affiliation":[{"name":"Institut f\u00fcr Informatik University of Mainz  55128 Mainz Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2012,7,4]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.2307\/2153413"},{"key":"e_1_2_10_3_1","first-page":"106","article-title":"A block Lanczos algorithm for finding dependencies over GF(2)","author":"Montgomery PL","year":"1995","journal-title":"Theory and Application of Cryptographic Techniques"},{"key":"e_1_2_10_4_1","doi-asserted-by":"crossref","unstructured":"AokiK FrankeJ KleinjungT LenstraAK OsvikDA.A kilobit special number field sieve factorization.ASIACRYPT Kuching Malaysia 2007;1\u201312.","DOI":"10.1007\/978-3-540-76900-2_1"},{"key":"e_1_2_10_5_1","unstructured":"KleinjungTet\u2009al.Factorization of a 768\u2010Bit RSA Modulus.International Crytology Conference Santa Barbara CA USA 2010;333\u2013350. DOI:10.1007\/978\u20103\u2010642\u201014623\u20107_18."},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-75651-4_5"},{"key":"e_1_2_10_7_1","doi-asserted-by":"crossref","unstructured":"HwangW KimD.Load balanced block Lanczos algorithm over GF(2) for factorization of large keys.13th International Conference on High Performance Computing Bangalore India 2006;375\u2013386.","DOI":"10.1007\/11945918_38"},{"key":"e_1_2_10_8_1","doi-asserted-by":"crossref","unstructured":"KleinjungT NussbaumL Thom\u00e9E.Using a grid platform for slving large sparse linear systems over GF(2).11th ACM\/IEEE International Conference on Grid Computing (Grid 2010) Brussels Belgique 2010;161\u2013168.","DOI":"10.1109\/GRID.2010.5697952"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-010-0149-0"},{"key":"e_1_2_10_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.41"},{"key":"e_1_2_10_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_10_12_1","unstructured":"BellN GarlandM.Efficient sparse matrix\u2013vector multiplication on CUDA.NVIDIA Technical Report NVR\u20102008\u2010004 NVIDIA Corporation 2008."},{"key":"e_1_2_10_13_1","article-title":"Exact sparse matrix\u2013vector multiplication on GPU's and multicore architectures","volume":"1203","author":"Boyer B","year":"2010","journal-title":"Computing Research RepositoryComputation"},{"key":"e_1_2_10_14_1","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1145\/1693453.1693471","article-title":"Model\u2010driven autotuning of sparse matrix\u2013vector multiply on GPUs","volume":"45","author":"Choi JW","year":"2010","journal-title":"ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"MonakovA LokhmotovA AvetisyanA.Automatically tuning sparse matrix\u2013vector multiplication for GPU architectures.5th International Conference on High Performance Embedded Architectures and Compilers Pisa Italy 2010;111\u2013125.","DOI":"10.1007\/978-3-642-11515-8_10"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23397-5_41"},{"key":"e_1_2_10_17_1","doi-asserted-by":"publisher","DOI":"10.1006\/jsco.2002.0533"},{"key":"e_1_2_10_18_1","unstructured":"NVIDIA.CUDA C Programming Guide 2011. Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/developer.download.nvidia.com\/compute\/DevZone\/docs\/html\/C\/doc\/CUDA_C_Programming_Guide.pdf[Accessed on June 21 2012]."},{"key":"e_1_2_10_19_1","unstructured":"BellN GarlandM.Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. (2010). Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/cusp\u2010library.googlecode.com[Accessed on June 21 2012] version 0.1.0."},{"key":"e_1_2_10_20_1","unstructured":"HarrisM.Optimizing parallel reduction in CUDA 2007. Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/developer.download.nvidia.com\/assets\/cuda\/files\/reduction.pdf[Accessed on June 21 2012]."},{"key":"e_1_2_10_21_1","unstructured":"GaudryPet\u2009al.CADO\u2010NFS 2010. Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/cado\u2010nfs.gforge.inria.fr\/[Accessed on June 21 2012]."},{"key":"e_1_2_10_22_1","unstructured":"BonenbergerD KroneM.2010.Factorization of RSA\u2010170.Technical Report Ostfalia University of Applied Sciences. Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/public.fh\u2010wolfenbuettel.de\/~kronema\/pdf\/rsa170.pdf[Accessed on June 21 2012]."},{"key":"e_1_2_10_23_1","unstructured":"PapadopoulosJ.Msieve 2010. Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/sourceforge.net\/projects\/msieve\/[Accessed on June 21 2012]."},{"key":"e_1_2_10_24_1","unstructured":"RSA190 factored 2010. Available from:https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/www.mersenneforum.org\/showthread.php?t=14177[Accessed on June 21 2012]."}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.2896","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.2896","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,11]],"date-time":"2023-09-11T23:45:04Z","timestamp":1694475904000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.2896"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,4]]},"references-count":23,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,2]]}},"alternative-id":["10.1002\/cpe.2896"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1002\/cpe.2896","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"value":"1532-0626","type":"print"},{"value":"1532-0634","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,4]]}}}