{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T23:26:25Z","timestamp":1768692385512,"version":"3.49.0"},"reference-count":64,"publisher":"Wiley","issue":"8","license":[{"start":{"date-parts":[[2022,2,10]],"date-time":"2022-02-10T00:00:00Z","timestamp":1644451200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","award":["001"],"award-info":[{"award-number":["001"]}],"id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Software Testing Verif &amp; Rel"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Summary<\/jats:title><jats:p>This paper proposes<jats:sc>TRANSMUT\u2010Spark<\/jats:sc>for automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics\/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built\u2010in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by big data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault\u2010based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces<jats:sc>TRANSMUT\u2010Spark<\/jats:sc>for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the<jats:sc>TRANSMUT\u2010Spark<\/jats:sc>automates the mutant generation, test execution and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.<\/jats:p>","DOI":"10.1002\/stvr.1809","type":"journal-article","created":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T03:45:23Z","timestamp":1644551123000},"update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["TRANSMUT\u2010Spark: Transformation mutation for Apache Spark"],"prefix":"10.1002","volume":"32","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-8142-2525","authenticated-orcid":false,"given":"Jo\u00e3o Batista","family":"de Souza Neto","sequence":"first","affiliation":[{"name":"Department of Informatics and Applied Mathematics (DIMAp) Federal University of Rio Grande do Norte (UFRN) Natal Brazil"},{"name":"Department of Informatics, Management and Design (DIGD\u2010DV) Federal Center for Technological Education of Minas Gerais Divin\u00f3polis Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-7707-8469","authenticated-orcid":false,"given":"Anamaria","family":"Martins Moreira","sequence":"additional","affiliation":[{"name":"Computer Science Department (DCC) Federal University of Rio de Janeiro (UFRJ) Rio de Janeiro Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0001-9545-1821","authenticated-orcid":false,"given":"Genoveva","family":"Vargas\u2010Solar","sequence":"additional","affiliation":[{"name":"French Council of Scientific Research (CNRS), LIRIS Lyon France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0001-5589-3895","authenticated-orcid":false,"given":"Martin A.","family":"Musicante","sequence":"additional","affiliation":[{"name":"Department of Informatics and Applied Mathematics (DIMAp) Federal University of Rio Grande do Norte (UFRN) Natal Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2022,2,10]]},"reference":[{"key":"e_1_2_11_2_1","unstructured":"Hadoop.Apache Hadoop documentation 2019.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/hadoop.apache.org\/docs\/r2.7.3\/"},{"issue":"4","key":"e_1_2_11_3_1","first-page":"28","article-title":"Apache Flink: stream and batch processing in a single engine","volume":"38","author":"Carbone P","year":"2015","journal-title":"IEEE Data Eng Bull"},{"key":"e_1_2_11_4_1","unstructured":"YuY IsardM FetterlyD BudiuM ErlingssonU GundaPK CurreyJ.DryadLINQ: a system for general\u2010purpose distributed data\u2010parallel computing using a high\u2010level language. InProceedings of the 8th USENIX Conference on Operating Systems Design and Implementation OSDI'08.USENIX Association:Berkeley CA USA;2008 p.1\u201314.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/dl.acm.org\/citation.cfm?id=1855741.1855742"},{"key":"e_1_2_11_5_1","unstructured":"BeamA.Apache Beam: an advanced unified programming model 2016.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/beam.apache.org\/"},{"key":"e_1_2_11_6_1","unstructured":"ZahariaM ChowdhuryM FranklinMJ ShenkerS StoicaI.Spark: cluster computing with working sets. InProceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing HotCloud'10.USENIX Association:Berkeley CA USA;2010 p.10.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/dl.acm.org\/citation.cfm?id=1863103.1863113"},{"key":"e_1_2_11_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.05.285"},{"key":"e_1_2_11_8_1","doi-asserted-by":"publisher","DOI":"10.1080\/08982112.2014.846119"},{"key":"e_1_2_11_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2015.11.006"},{"key":"e_1_2_11_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/C-M.1978.218136"},{"key":"e_1_2_11_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-49435-3_30"},{"key":"e_1_2_11_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-5939-6_19"},{"key":"e_1_2_11_13_1","unstructured":"DelamaroME MaldonadoJC.Proteum\u2014a tool for the assessment of test adequacy for C programs. InProceedings of the Conference on Performability in Computing Systems (PCS'96).New Brunswick New Jersey;1996 p.79\u201395."},{"key":"e_1_2_11_14_1","unstructured":"Spark.Apache Spark documentation 2019.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/spark.apache.org\/docs\/2.2.0\/"},{"key":"e_1_2_11_15_1","unstructured":"ZahariaM ChowdhuryM DasT DaveA MaJ McCauleyM FranklinMJ ShenkerS StoicaI.Resilient distributed datasets: a fault\u2010tolerant abstraction for in\u2010memory cluster computing. InProceedings of the 9th USENIX Conference on Networked Systems Design and Implementation.NSDI'12.USENIX Association:Berkeley CA USA;2012 p.2.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/dl.acm.org\/citation.cfm?id=2228298.2228301"},{"key":"e_1_2_11_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/9781119254805"},{"key":"e_1_2_11_17_1","volume-title":"Introduction to software testing","author":"Ammann P","year":"2017"},{"key":"e_1_2_11_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2010.62"},{"key":"e_1_2_11_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0164-1212(96)00154-9"},{"key":"e_1_2_11_20_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-024X(199602)26:2<165::AID-SPE5>3.0.CO;2-K"},{"key":"e_1_2_11_21_1","unstructured":"WalshPJ.A measure of test case completeness.Ph.D. Thesis State University of New York at Binghamton Binghamton NY USA;1985 p. AAI8514636."},{"key":"e_1_2_11_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.250104"},{"key":"e_1_2_11_23_1","unstructured":"DeanJ GhemawatS.MapReduce: simplified data processing on large clusters. InOSDI'04: Sixth Symposium on Operating System Design and Implementation:San Francisco CA 2004 p.137\u2013150."},{"key":"e_1_2_11_24_1","unstructured":"CamargoLC VergilioSR.MapReduce program testing: a systematic mapping study. InChilean Computer Science Society (SCCC) 32nd International Conference of the Computation 2013."},{"key":"e_1_2_11_25_1","doi-asserted-by":"publisher","DOI":"10.1002\/smr.2120"},{"key":"e_1_2_11_26_1","doi-asserted-by":"crossref","unstructured":"CsallnerC FegarasL LiC.New ideas track: testing MapReduce\u2010style programs. InProceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering.ESEC\/FSE '11.ACM:New York NY USA;2011 p.504\u2013507.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/2025113.2025204","DOI":"10.1145\/2025113.2025204"},{"key":"e_1_2_11_27_1","doi-asserted-by":"crossref","unstructured":"LiK ReichenbachC SmaragdakisY DiaoY CsallnerC.SEDGE: symbolic example data generation for dataflow programs. In2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE) 2013 p.235\u2013245.","DOI":"10.1109\/ASE.2013.6693083"},{"key":"e_1_2_11_28_1","doi-asserted-by":"crossref","unstructured":"OlstonC ReedB SrivastavaU KumarR TomkinsA.Pig Latin: a not\u2010so\u2010foreign language for data processing. InProceedings of the 2008 ACM SIGMOD International Conference on Management of Data.SIGMOD '08.ACM:New York NY USA;2008 p.1099\u20131110.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/1376616.1376726","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_11_29_1","doi-asserted-by":"crossref","unstructured":"XuZ HirzelM RothermelG WuK.Testing properties of dataflow program operators. In2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE) 2013 p.103\u2013113.","DOI":"10.1109\/ASE.2013.6693071"},{"key":"e_1_2_11_30_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2013.2243535"},{"key":"e_1_2_11_31_1","doi-asserted-by":"crossref","unstructured":"Mor\u00e1nJ de laRivaC TuyaJ.MRTree: functional testing based on MapReduce's execution behaviour. In2014 International Conference on Future Internet of Things and Cloud 2014 p.379\u2013384.","DOI":"10.1109\/FiCloud.2014.67"},{"key":"e_1_2_11_32_1","doi-asserted-by":"crossref","unstructured":"Mor\u00e1nJ RivaC TuyaJ.Testing data transformations in MapReduce programs. InProceedings of the 6th International Workshop on Automating Test Case Design Selection and Evaluation.A\u2010TEST 2015.ACM:New York NY USA;2015 p.20\u201325.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/2804322.2804326","DOI":"10.1145\/2804322.2804326"},{"key":"e_1_2_11_33_1","unstructured":"MattosAJ.Test data generation for testing MapReduce systems.M.Sc. Thesis Universidade Federal do Paran\u00e1 2011."},{"key":"e_1_2_11_34_1","doi-asserted-by":"crossref","unstructured":"LiN EscalonaA GuoY OffuttJ.A scalable big data test framework. In2015 IEEE 8th International Conference on Software Testing Verification and Validation (ICST) 2015 p.1\u20132.","DOI":"10.1109\/ICST.2015.7102619"},{"key":"e_1_2_11_35_1","unstructured":"ChenY\u2010F HongC\u2010D SinhaN WangB\u2010Y.Commutativity of reducers. InTools and algorithms for the construction and analysis of systems BaierC TinelliC(eds).Springer Berlin Heidelberg:Berlin Heidelberg;2015 p.131\u2013146."},{"key":"e_1_2_11_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-59647-1_31"},{"key":"e_1_2_11_37_1","doi-asserted-by":"crossref","unstructured":"D\u00f6rreJ ApelS LengauerC.Static type checking of Hadoop MapReduce programs. InProceedings of the Second International Workshop on MapReduce and Its Applications.MapReduce '11.Association for Computing Machinery:New York NY USA;2011 p.17\u201324.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/1996092.1996096","DOI":"10.1145\/1996092.1996096"},{"key":"e_1_2_11_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24690-6_24"},{"key":"e_1_2_11_39_1","volume-title":"Interactive theorem proving and program development: Coq'Art: the calculus of inductive constructions","author":"Bertot Y","year":"2010"},{"key":"e_1_2_11_40_1","doi-asserted-by":"crossref","unstructured":"BrilloutA HeN MazzucchiM KroeningD PurandareM R\u00fcmmerP WeissenbacherG.Mutation\u2010based test case generation for Simulink models. InInternational Symposium on Formal Methods for Components and Objects. Springer;2009 p.208\u2013227.","DOI":"10.1007\/978-3-642-17071-3_11"},{"key":"e_1_2_11_41_1","unstructured":"MovvaV.Automatic test suite generation for scientific MATLAB code.M.Sc. Thesis University of Minnesota 2015."},{"key":"e_1_2_11_42_1","doi-asserted-by":"crossref","unstructured":"XuZ HirzelM RothermelG WuK\u2010L.Testing properties of dataflow program operators. InProceedings of the 28th IEEE\/ACM International Conference on Automated Software Engineering ASE'13.IEEE Press;2013 p.103\u2013113.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1109\/ASE.2013.6693071","DOI":"10.1109\/ASE.2013.6693071"},{"key":"e_1_2_11_43_1","unstructured":"KarauH.Spark testing base 2015.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/github.com\/holdenk\/spark-testing-base"},{"key":"e_1_2_11_44_1","unstructured":"Otto Group.Flinkspector 2016.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/github.com\/ottogroup\/flink-spector"},{"key":"e_1_2_11_45_1","unstructured":"RiescoA Rodr\u00edguez\u2010Hortal\u00e1J.sscheck: ScalaCheck for Spark 2015.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/github.com\/juanrh\/sscheck"},{"key":"e_1_2_11_46_1","doi-asserted-by":"crossref","unstructured":"ClaessenK HughesJ.QuickCheck: a lightweight tool for random testing of Haskell programs. InProceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming.ICFP '0.Association for Computing Machinery:New York NY USA;2000 p.268\u2013279.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/351240.351266","DOI":"10.1145\/351240.351266"},{"key":"e_1_2_11_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-33693-0_25"},{"key":"e_1_2_11_48_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1471068419000012"},{"key":"e_1_2_11_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2947361"},{"key":"e_1_2_11_50_1","unstructured":"Souza NetoJB.Transformation mutation for Spark programs testing.Ph.D. Thesis Federal University of Rio Grande do Norte (UFRN) Natal\u2010RN Brazil;2020. (In Portuguese)."},{"key":"e_1_2_11_51_1","volume-title":"SBT in action: the simple Scala build tool","author":"Suereth J","year":"2015"},{"key":"e_1_2_11_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-63882-5_7"},{"key":"e_1_2_11_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2010.79"},{"key":"e_1_2_11_54_1","doi-asserted-by":"crossref","unstructured":"OffuttAJ RothermelG ZapfC.An experimental evaluation of selective mutation. InProceedings of 1993 15th International Conference on Software Engineering 1993 p.100\u2013107.","DOI":"10.1109\/ICSE.1993.346062"},{"key":"e_1_2_11_55_1","doi-asserted-by":"crossref","unstructured":"UntchRH OffuttAJ HarroldMJ.Mutation analysis using mutant schemata. InProceedings of the 1993 ACM SIGSOFT International Symposium on Software Testing and Analysis.ISSTA '93.Association for Computing Machinery:New York NY USA;1993 p.139\u2013148.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/154183.154265","DOI":"10.1145\/154183.154265"},{"key":"e_1_2_11_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-5939-6_7"},{"key":"e_1_2_11_57_1","doi-asserted-by":"crossref","unstructured":"ChoiBJ DeMilloRA KrauserEW MartinRJ MathurAP OffuttAJ PanH SpaffordEH.The Mothra tool set (software testing). In[1989] Proceedings of the Twenty\u2010Second Annual Hawaii International Conference on System Sciences. Volume II: Software Track vol. 2 1989;275\u2013284 vol.2.","DOI":"10.1109\/HICSS.1989.48002"},{"key":"e_1_2_11_58_1","volume-title":"Programming in Scala: updated for Scala 2.12","author":"Odersky M","year":"2016"},{"key":"e_1_2_11_59_1","doi-asserted-by":"crossref","unstructured":"INFO SUPPORT.Stryker Mutator 2020.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/stryker-mutator.io","DOI":"10.5465\/AMBPP.2020.12672abstract"},{"key":"e_1_2_11_60_1","doi-asserted-by":"crossref","unstructured":"ColesH LaurentT HenardC PapadakisM VentresqueA.Pit: a practical mutation testing tool for java (demo). InProceedings of the 25th International Symposium on Software Testing and Analysis ISSTA 2016.Association for Computing Machinery:New York NY USA;2016 p.449\u2013452.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/2931037.2948707","DOI":"10.1145\/2931037.2948707"},{"key":"e_1_2_11_61_1","doi-asserted-by":"crossref","unstructured":"SarwarB KarypisG KonstanJ RiedlJ.Item\u2010based collaborative filtering recommendation algorithms. InProceedings of the 10th International Conference on World Wide Web WWW '01.ACM:New York NY USA;2001 p.285\u2013295.","DOI":"10.1145\/371920.372071"},{"key":"e_1_2_11_62_1","unstructured":"AMPLab.Big data benchmark 2019.https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/amplab.cs.berkeley.edu\/benchmark\/"},{"issue":"4","key":"e_1_2_11_63_1","first-page":"19:1","article-title":"The MovieLens datasets: history and context","volume":"5","author":"Harper FM","year":"2015","journal-title":"ACM Trans Interact Intel Syst"},{"key":"e_1_2_11_64_1","doi-asserted-by":"crossref","unstructured":"Z\u00f6llerM\u2010A HuberMF.Benchmark and survey of automated machine learning frameworks 2021. Journal of Artificial Intelligence Research.","DOI":"10.1613\/jair.1.11854"},{"key":"e_1_2_11_65_1","doi-asserted-by":"crossref","unstructured":"FerrariFC MaldonadoJC RashidA.Mutation testing for aspect\u2010oriented programs. In2008 1st International Conference on Software Testing Verification and Validation 2008 p.52\u201361.","DOI":"10.1109\/ICST.2008.37"}],"container-title":["Software Testing, Verification and Reliability"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/stvr.1809","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/stvr.1809","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/stvr.1809","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,18]],"date-time":"2024-09-18T08:43:42Z","timestamp":1726649022000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/10.1002\/stvr.1809"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,10]]},"references-count":64,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["10.1002\/stvr.1809"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1002\/stvr.1809","archive":["Portico"],"relation":{},"ISSN":["0960-0833","1099-1689"],"issn-type":[{"value":"0960-0833","type":"print"},{"value":"1099-1689","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,10]]},"assertion":[{"value":"2021-05-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e1809"}}