{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T02:50:49Z","timestamp":1770778249261,"version":"3.50.0"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61802053, 62372387, 62402402, 62001400, and 52441801"],"award-info":[{"award-number":["61802053, 62372387, 62402402, 62001400, and 52441801"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100018542","name":"Natural Science Foundation of Sichuan Province","doi-asserted-by":"crossref","award":["2024NSFSC0508"],"award-info":[{"award-number":["2024NSFSC0508"]}],"id":[{"id":"10.13039\/501100018542","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Sichuan Science and Technology Program","award":["2024NSFSC0494"],"award-info":[{"award-number":["2024NSFSC0494"]}]},{"DOI":"10.13039\/100020593","name":"Fundamental Research Funds for the Central Universities of Beijing University of Chemical Technology","doi-asserted-by":"crossref","award":["2682024ZTPY044 and 2682025ZD004"],"award-info":[{"award-number":["2682024ZTPY044 and 2682025ZD004"]}],"id":[{"id":"10.13039\/100020593","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"crossref","award":["2021M702713"],"award-info":[{"award-number":["2021M702713"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Special Research Funding under Yibin Municipal-University Dual Agreement","award":["YBSCXY2024010012 and YBSCXY2024010006"],"award-info":[{"award-number":["YBSCXY2024010012 and YBSCXY2024010006"]}]},{"name":"Fund of National Laboratory on Adaptive Optics, China","award":["FNLAO-24-ZD-O02"],"award-info":[{"award-number":["FNLAO-24-ZD-O02"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>\n                    Class incremental learning (CIL) requires models to acquire knowledge from sequential tasks containing non-overlapping classes while avoiding catastrophic forgetting. While vision-language foundation models like CLIP demonstrate remarkable potential for CIL through their pre-trained cross-modal alignment capabilities, existing CLIP-based approaches critically overlook\n                    <jats:italic toggle=\"yes\">the progressive degradation of visual representations in incremental scenarios<\/jats:italic>\n                    . Through feature space analysis, we identify\n                    <jats:italic toggle=\"yes\">a crucial dichotomy<\/jats:italic>\n                    : textual embeddings maintain stable discriminative power across sequential tasks, whereas visual features exhibit progressive deterioration manifested by\n                    <jats:italic toggle=\"yes\">intra-task confusion<\/jats:italic>\n                    (ambiguous decision boundaries between co-occurring classes) and\n                    <jats:italic toggle=\"yes\">inter-task interference<\/jats:italic>\n                    (semantic collision between historical and novel categories). To address these dual challenges, we propose task-guided hierarchical multi-modal alignment (THMM-CLIP), a framework that establishes persistent visual-textual coherence through hierarchical multi-modal alignment (HMA) and robust prompt selection (RPS). HMA adapts lightweight task-specific prompt vectors to dynamically recalibrate the CLIP image encoder, thereby achieving: (i) intra-task alignment, (ii) inter-task discriminability alignment, and (iii) global structural alignment with textual features. RPS incorporates a dual-level task identifier that integrates class-level and task-level representative features to ensure precise prompt retrieval during inference. Ablation studies validate all components\u2019 contributions, while t-SNE visualizations, confusion matrices, and Grad-CAM analyses confirm strengthened cross-modal alignment.\n                  <\/jats:p>","DOI":"10.1145\/3785477","type":"journal-article","created":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T16:01:57Z","timestamp":1766073717000},"page":"1-18","update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["THMM-CLIP: Task-Guided Hierarchical Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0009-0004-8390-1193","authenticated-orcid":false,"given":"Yuankang","family":"Pan","sequence":"first","affiliation":[{"name":"School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China   and Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-4083-5155","authenticated-orcid":false,"given":"Zhaoquan","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China, Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu, China   and Manufacturing Industry Chain Collaboration Industrial Software Key Laboratory of Sichuan Province, Chengdu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-8322-8558","authenticated-orcid":false,"given":"Xiao","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-5341-5985","authenticated-orcid":false,"given":"Zechao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0001-8343-9665","authenticated-orcid":false,"given":"Changsheng","family":"Xu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,2,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_9"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3564786"},{"key":"e_1_3_1_4_2","first-page":"15920","volume-title":"Advances in Neural Information Processing Systems","author":"Buzzega Pietro","year":"2020","unstructured":"Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. 2020. Dark experience for general continual learning: A strong, simple baseline. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, Curran Associates, Inc., 15920\u201315930."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00938"},{"key":"e_1_3_1_6_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR \u201919)","author":"Chaudhry Arslan","year":"2019","unstructured":"Arslan Chaudhry, Marc\u2019Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2019. Efficient lifelong learning with A-GEM. In Proceedings of the 7th International Conference on Learning Representations (ICLR \u201919). OpenReview.net. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/openreview.net\/forum?id=Hkf2_sC5FX"},{"key":"e_1_3_1_7_2","unstructured":"Arslan Chaudhry Marcus Rohrbach Mohamed Elhoseiny Thalaiyasingam Ajanthan Puneet K. Dokania Philip H. S. Torr and Marc\u2019Aurelio Ranzato. 2019. On tiny episodic memories in continual learning. arXiv:1902.10486. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/1902.10486"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00528"},{"key":"e_1_3_1_9_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR \u201921)","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR \u201921). OpenReview.net."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01055"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02689"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793982"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02843"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00823"},{"key":"e_1_3_1_15_2","first-page":"11463","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Ali Khan Muhammad Gul Zain","year":"2023","unstructured":"Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc Van Gool, Didier Stricker, Federico Tombari, and Muhammad Zeshan Afzal. 2023. Introducing language guidance in prompt-based continual learning. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 11463\u201311473."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_1_17_2","unstructured":"Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Master\u2019s thesis Deptartment of Computer Science University of Toronto."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2016.05.004"},{"issue":"7","key":"e_1_3_1_19_2","first-page":"3","article-title":"Tiny imagenet visual recognition challenge","volume":"7","author":"Le Ya","year":"2015","unstructured":"Ya Le and Xuan Yang. 2015. Tiny imagenet visual recognition challenge. CS 231n 7, 7 (2015), 3.","journal-title":"CS 231n"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02844"},{"key":"e_1_3_1_21_2","first-page":"3925","volume-title":"Proceedings of the 36th International Conference on Machine LearningProceedings of Machine Learning Research","volume":"97","author":"Li Xilai","year":"2019","unstructured":"Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, and Caiming Xiong. 2019. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. In Proceedings of the 36th International Conference on Machine Learning. Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, PMLR, 3925\u20133934."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2773081"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3576045"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00810"},{"key":"e_1_3_1_25_2","first-page":"109","volume-title":"Psychology of Learning and Motivation","author":"McCloskey Michael","year":"1989","unstructured":"Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation, Vol. 24, Elsevier, 109\u2013165."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3696409.3700182"},{"key":"e_1_3_1_27_2","first-page":"16131","volume-title":"Advances in Neural Information Processing Systems","author":"Pham Quang","year":"2021","unstructured":"Quang Pham, Chenghao Liu, and Steven Hoi. 2021. DualNet: Continual learning, fast and slow. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 16131\u201316144."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_31"},{"key":"e_1_3_1_29_2","first-page":"8748","volume-title":"Proceedings of the 38th International Conference on Machine LearningProceedings of Machine Learning Research","volume":"139","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning. Marina Meila and Tong Zhang (Eds.), Proceedings of Machine Learning Research, Vol. 139, PMLR, 8748\u20138763."},{"key":"e_1_3_1_30_2","volume-title":"Advances in Neural Information Processing Systems","author":"Rajasegaran Jathushan","year":"2019","unstructured":"Jathushan Rajasegaran, Munawar Hayat, Salman H. Khan, Fahad Shahbaz Khan, and Ling Shao. 2019. Random path selection for continual learning. In Advances in Neural Information Processing Systems. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02229"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00542"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_1_34_2","first-page":"4548","volume-title":"Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research","volume":"80","author":"Serra Joan","year":"2018","unstructured":"Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, PMLR, 4548\u20134557."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813687"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01146"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02316"},{"key":"e_1_3_1_38_2","unstructured":"Vishal Thengane Salman Khan Munawar Hayat and Fahad Khan. 2022. Clip model is an efficient continual learner. arXiv:2210.03114. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/2210.03114"},{"issue":"86","key":"e_1_3_1_39_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten Laurens","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_40_2","first-page":"13865","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Kumar Verma Vinay","year":"2021","unstructured":"Vinay Kumar Verma, Kevin J. Liang, Nikhil Mehta, Piyush Rai, and Lawrence Carin. 2021. Efficient feature transformations for discriminative and generative continual learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13865\u201313875."},{"key":"e_1_3_1_41_2","first-page":"22379","volume-title":"Advances in Neural Information Processing Systems","author":"Wang Liyuan","year":"2021","unstructured":"Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, and Yi Zhong. 2021. AFEC: Active forgetting of negative transfer in continual learning. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 22379\u201322391."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00356"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19809-0_36"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00024"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00046"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00303"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3661312"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3573202"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3614434"},{"key":"e_1_3_1_50_2","first-page":"3987","volume-title":"Proceedings of the 34th International Conference on Machine LearningProceedings of Machine Learning Research","volume":"70","author":"Zenke Friedemann","year":"2017","unstructured":"Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning. Doina Precup and Yee Whye Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, PMLR, 3987\u20133995."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01754"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02389"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3492328"},{"key":"e_1_3_1_54_2","volume-title":"Proceedings of the 33rd International Joint Conference on Artificial Intelligence","author":"Zhou Da-Wei","year":"2024","unstructured":"Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. 2024. Continual learning with pre-trained models: A survey. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02223"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-022-01653-1"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/pdf\/10.1145\/3785477","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T12:14:11Z","timestamp":1770725651000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3785477"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,9]]},"references-count":55,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3785477"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/3785477","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,9]]},"assertion":[{"value":"2025-05-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}