{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T05:10:45Z","timestamp":1755839445568,"version":"3.40.5"},"reference-count":39,"publisher":"Wiley","issue":"3","license":[{"start":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T00:00:00Z","timestamp":1668729600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Softw Pract Exp"],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Knowledge data has been widely applied to artificial intelligence applications for interpretable and complex reasoning. Modern knowledge bases are constructed via automatic knowledge extraction from open\u2010accessible sources. Thus the sizes of KBs are continuously growing, heavily burdening the maintenance and application of the knowledge data. Besides the grammatical redundancies, semantically repeated information also frequently appears in knowledge bases but is still under\u2010explored. Existing semantic compressors fail to efficiently discover expressive patterns and thus perform unsatisfyingly on knowledge data. This article proposes <jats:sc>SInC<\/jats:sc>, a semantic inductive compressor, to efficiently induce first\u2010order Horn rules and semantically compress knowledge bases. <jats:sc>SInC<\/jats:sc>\u00a0improves the scalability of top\u2010down rule mining by batching correlated records in the cache and further optimizes the pruning of duplication and specialization via an identifier structure of Horn rules. <jats:sc>SInC<\/jats:sc>\u00a0was evaluated on real\u2010world and synthetic datasets and compared against the state\u2010of\u2010the\u2010art. The results show that the batched caching speed up the rule mining procedure by more than two orders while consuming fewer than three times memory space. The identifier technique speeds up the duplication and specialization pruning by orders of magnitude with less than 5\u2030 and 15% error rates, respectively. <jats:sc>SInC<\/jats:sc>\u00a0outperforms the state\u2010of\u2010the\u2010art from the perspective of overall compression on both scalability and compression effect.<\/jats:p>","DOI":"10.1002\/spe.3165","type":"journal-article","created":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T11:21:34Z","timestamp":1668770494000},"page":"682-703","update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Horn rule discovery with batched caching and rule identifier for proficient compressor of knowledge data"],"prefix":"10.1002","volume":"53","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-8278-3269","authenticated-orcid":false,"given":"Ruoyu","family":"Wang","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering University of New South Wales  New South Wales Sydney Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0003-2342-7421","authenticated-orcid":false,"given":"Daniel","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering University of New South Wales  New South Wales Sydney Australia"},{"name":"Enhitech LLC.  Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-9814-6029","authenticated-orcid":false,"given":"Raymond","family":"Wong","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering University of New South Wales  New South Wales Sydney Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-6610-1328","authenticated-orcid":false,"given":"Rajiv","family":"Ranjan","sequence":"additional","affiliation":[{"name":"Newcastle University  Newcastle upon Tyne UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2022,11,18]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_12_2_1","DOI":"10.1007\/s10115-017-1100-y"},{"doi-asserted-by":"publisher","key":"e_1_2_12_3_1","DOI":"10.1609\/aaai.v33i01.33017346"},{"doi-asserted-by":"publisher","key":"e_1_2_12_4_1","DOI":"10.1007\/978-3-030-49461-2_34"},{"doi-asserted-by":"publisher","key":"e_1_2_12_5_1","DOI":"10.3233\/SW-140134"},{"doi-asserted-by":"crossref","unstructured":"SachanM.Knowledge graph embedding compression. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics;2020:2681\u20102691.","key":"e_1_2_12_6_1","DOI":"10.18653\/v1\/2020.acl-main.238"},{"key":"e_1_2_12_7_1","first-page":"1761","volume-title":"Adaptive Low\u2010Level Storage of Very Large Knowledge Graphs","author":"Urbani J","year":"2020"},{"doi-asserted-by":"publisher","key":"e_1_2_12_8_1","DOI":"10.1016\/j.websem.2013.01.002"},{"unstructured":"ChklovskiT PantelP.Verbocean: mining the web for fine\u2010grained semantic verb relations. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing;2004:33\u201040.","key":"e_1_2_12_9_1"},{"doi-asserted-by":"publisher","key":"e_1_2_12_10_1","DOI":"10.1007\/978-3-642-38288-8_12"},{"key":"e_1_2_12_11_1","first-page":"1115","volume-title":"What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization","author":"Belth C","year":"2020"},{"doi-asserted-by":"publisher","key":"e_1_2_12_12_1","DOI":"10.1109\/TKDE.2017.2766634"},{"doi-asserted-by":"publisher","key":"e_1_2_12_13_1","DOI":"10.1007\/s00778-015-0394-1"},{"doi-asserted-by":"publisher","key":"e_1_2_12_14_1","DOI":"10.1007\/s10994-008-5094-2"},{"doi-asserted-by":"publisher","key":"e_1_2_12_15_1","DOI":"10.1007\/s10994-011-5245-8"},{"doi-asserted-by":"publisher","key":"e_1_2_12_16_1","DOI":"10.1007\/BF00117105"},{"doi-asserted-by":"publisher","key":"e_1_2_12_17_1","DOI":"10.14778\/2735508.2735510"},{"doi-asserted-by":"publisher","key":"e_1_2_12_18_1","DOI":"10.1007\/11615576_9"},{"unstructured":"JagadishH MadarJ NgRT.Semantic compression and pattern extraction with fascicles. VLDB; Vol.99 1999:186\u201097.","key":"e_1_2_12_19_1"},{"doi-asserted-by":"publisher","key":"e_1_2_12_20_1","DOI":"10.1145\/376284.375693"},{"unstructured":"JagadishH NgR OoiBC TungA.It compress: an iterative semantic compression algorithm. Proceedings. 20th International Conference on Data Engineering;2004:646\u2010657.","key":"e_1_2_12_21_1"},{"doi-asserted-by":"publisher","key":"e_1_2_12_22_1","DOI":"10.1007\/978-3-540-88737-9_8"},{"unstructured":"ZneikaM LuccheseC VodislavD KotzinosD.Summarizing linked data RDF graphs using approximate graph pattern mining. Proceedings of the 19th International Conference on Extending Database Technology;2016; Bordeaux France.","key":"e_1_2_12_23_1"},{"doi-asserted-by":"crossref","unstructured":"IlkhechiA CrottyA GalakatosA et al.DeepSqueeze: deep semantic compression for tabular data. Proceedings of the SIGMOD'20. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery;2020:1733\u20101746.","key":"e_1_2_12_24_1","DOI":"10.1145\/3318464.3389734"},{"doi-asserted-by":"publisher","key":"e_1_2_12_25_1","DOI":"10.1145\/335191.335372"},{"issue":"1","key":"e_1_2_12_26_1","first-page":"6","article-title":"RDF primer","volume":"10","author":"Manola F","year":"2004","journal-title":"W3C Recommend"},{"doi-asserted-by":"publisher","key":"e_1_2_12_27_1","DOI":"10.1002\/widm.1207"},{"doi-asserted-by":"crossref","unstructured":"Fournier\u2010VigerP Chun\u2010WeiLJ Truong\u2010ChiT NkambouR.A survey of high utility itemset mining;2019:1\u201045; Springer.","key":"e_1_2_12_28_1","DOI":"10.1007\/978-3-030-04921-8_1"},{"doi-asserted-by":"publisher","key":"e_1_2_12_29_1","DOI":"10.1002\/widm.1242"},{"doi-asserted-by":"publisher","key":"e_1_2_12_30_1","DOI":"10.1145\/3314107"},{"doi-asserted-by":"publisher","key":"e_1_2_12_31_1","DOI":"10.1109\/TKDE.2019.2942594"},{"doi-asserted-by":"crossref","unstructured":"CropperA Duman\u010di\u0107S MuggletonSH.Turning 30: new ideas in inductive logic programming. Proceedings of the Twenty\u2010Ninth International Joint Conference on Artificial Intelligence IJCAI'20;2021.","key":"e_1_2_12_32_1","DOI":"10.24963\/ijcai.2020\/673"},{"doi-asserted-by":"publisher","key":"e_1_2_12_33_1","DOI":"10.1007\/978-3-642-13840-9_13"},{"doi-asserted-by":"publisher","key":"e_1_2_12_34_1","DOI":"10.1007\/s10994-013-5358-3"},{"issue":"5","key":"e_1_2_12_35_1","first-page":"4303","article-title":"Learning accurate and interpretable decision rule sets from neural networks","volume":"35","author":"Qiao L","year":"2021","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"e_1_2_12_36_1","first-page":"239","volume-title":"Graph pattern based RDF data compression","author":"Pan JZ","year":"2014"},{"key":"e_1_2_12_37_1","series-title":"Lecture Notes in Computer Science","volume-title":"RDF Knowledge Base Summarization by Inducing First\u2010order Horn Rules","author":"Wang R","year":"2022"},{"doi-asserted-by":"publisher","key":"e_1_2_12_38_1","DOI":"10.1007\/978-3-642-55481-0"},{"doi-asserted-by":"crossref","unstructured":"GuptaN SinghH SinglaJ.Fuzzy logic\u2010based systems for medical diagnosis\u2013A review;2022:1058\u20101062; IEEE.","key":"e_1_2_12_39_1","DOI":"10.1109\/ICESC54411.2022.9885338"},{"doi-asserted-by":"crossref","unstructured":"ChenJ LiuY LuS O'sullivanB RazgonI.A fixed\u2010parameter algorithm for the directed feedback vertex set problem. Proceedings of the 14th Annual ACM Symposium on Theory of Computing;2008:177\u2013186.","key":"e_1_2_12_40_1","DOI":"10.1145\/1374376.1374404"}],"container-title":["Software: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/spe.3165","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/spe.3165","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/spe.3165","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,19]],"date-time":"2023-08-19T11:48:03Z","timestamp":1692445683000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/10.1002\/spe.3165"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,18]]},"references-count":39,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.1002\/spe.3165"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1002\/spe.3165","archive":["Portico"],"relation":{},"ISSN":["0038-0644","1097-024X"],"issn-type":[{"type":"print","value":"0038-0644"},{"type":"electronic","value":"1097-024X"}],"subject":[],"published":{"date-parts":[[2022,11,18]]},"assertion":[{"value":"2022-05-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}