{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T10:22:55Z","timestamp":1779013375089,"version":"3.51.4"},"reference-count":168,"publisher":"Association for Computing Machinery (ACM)","issue":"7","funder":[{"name":"Beijing Natural Science Foundation","award":["JQ24021"],"award-info":[{"award-number":["JQ24021"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62125207, 62472411"],"award-info":[{"award-number":["62125207, 62472411"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>Food-centered study has received more attention in the multimedia community for its profound impact on our survival, nutrition and health, pleasure, and enjoyment. Our experience of food is typically multi-sensory: We see food objects, smell its odors, taste its flavors, feel its texture, and hear sounds when chewing. Therefore, multimodal food learning is vital in food-centered study, which aims to relate information from multiple food modalities to support various multimedia tasks, ranging from recognition, retrieval, generation, recommendation, and interaction, enabling applications in different fields like healthcare and agriculture. However, there is no surveys on this topic to our knowledge. To fill this gap, this article formalizes multimodal food learning and comprehensively surveys its typical tasks, technical achievements, existing datasets, and applications to provide the blueprint with researchers and practitioners. Based on the current state of the art, we identify both open research issues and promising research directions, such as multimodal food learning benchmark construction, multimodal food foundation model construction, and multimodality diet estimation. We also point out that closer cooperation from researchers between multimedia and food science can handle some existing challenges and meanwhile open up more new opportunities to advance the fast development of multimodal food learning. This is the first comprehensive survey in this topic and we anticipate about 170 reviewed research articles can benefit academia and industry in this community and beyond.<\/jats:p>","DOI":"10.1145\/3715143","type":"journal-article","created":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T10:34:18Z","timestamp":1742466858000},"page":"1-28","update-policy":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Multimodal Food Learning"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0001-6668-9208","authenticated-orcid":false,"given":"Weiqing","family":"Min","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0009-0002-6693-823X","authenticated-orcid":false,"given":"Xingjian","family":"Hong","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0009-0002-8148-9735","authenticated-orcid":false,"given":"Yuxin","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0009-0008-4092-1273","authenticated-orcid":false,"given":"Mingyu","family":"Huang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0009-0009-5738-8572","authenticated-orcid":false,"given":"Ying","family":"Jin","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0001-6395-8708","authenticated-orcid":false,"given":"Pengfei","family":"Zhou","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0001-8835-8477","authenticated-orcid":false,"given":"Leyi","family":"Xu","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0009-0004-8478-9914","authenticated-orcid":false,"given":"Yilin","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-1596-4326","authenticated-orcid":false,"given":"Shuqiang","family":"Jiang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/orcid.org\/0000-0002-9142-5914","authenticated-orcid":false,"given":"Yong","family":"Rui","sequence":"additional","affiliation":[{"name":"Lenovo Group Ltd, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7,19]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2831627"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1038\/srep00196"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2015.39"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-70742-6_46"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1038\/486S4a"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2022.981294"},{"key":"e_1_3_1_9_2","article-title":"Learning to taste: A multimodal wine dataset","volume":"36","author":"Bender Thoranna","year":"2024","unstructured":"Thoranna Bender, Simon S\u00f8rensen, Alireza Kashani, Kristjan Eldjarn Hjorleifsson, Grethe Hyldig, S\u00f8ren Hauberg, Serge Belongie, and Frederik Warburg. 2024. Learning to taste: A multimodal wine dataset. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2015.83"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351067"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210036"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123428"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964315"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1007\/978-3-319-51811-4_48","volume-title":"MultiMedia Modeling","author":"Chen Jingjing","year":"2017","unstructured":"Jingjing Chen, Lei Pang, and Chong-Wah Ngo. 2017. Cross-modal recipe retrieval: How to cook this dish? In MultiMedia Modeling. Amsaleg Laurent, Gu\u00f0mundsson, Gylfi \u00de\u00f3r, Gurrin Cathal, J\u00f3nsson Bj\u00f6rn \u00de\u00f3r, Satoh Shin\u2019ichi (Eds.), Springer, 588\u2013600."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3045639"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441816"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00800"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i2.19092"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2021.04.022"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.foodqual.2014.05.009"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411763.3441312"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-17829-0_13"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126686.3126742"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1587\/transinf.2018CEP0004"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/2792838.2799665"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-0716-2197-4_23"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080826"},{"key":"e_1_3_1_29_2","volume-title":"Proceedings of the AAAI Fall Symposium: Artificial Intelligence for Gerontechnology","author":"Eskin Yulia","year":"2012","unstructured":"Yulia Eskin and Alex Mihailidis. 2012. An intelligent nutritional assessment system. In Proceedings of the AAAI Fall Symposium: Artificial Intelligence for Gerontechnology."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.3390\/nu11040877"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-08-101107-2.00012-9"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01458"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2792838.2796554"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10095762"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1002\/adma.202004805"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30542-2_81"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.3390\/foods12234293"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3607828.3617796"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.foodcont.2022.109507"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-30796-7_10"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2614861"},{"key":"e_1_3_1_42_2","unstructured":"Luis Herranz Weiqing Min and Shuqiang Jiang. 2018. Food recognition and recipe analysis: Integrating visual content context and external knowledge. arXiv:1801.07239. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/1801.07239"},{"key":"e_1_3_1_43_2","first-page":"1297","article-title":"Recent advances in muscle food safety evaluation: Hyperspectral imaging analyses and applications","volume":"10","author":"Hongbin Pu Qingyi Wei","year":"2023","unstructured":"Qingyi Wei Hongbin Pu and Da-Wen Sun. 2023. Recent advances in muscle food safety evaluation: Hyperspectral imaging analyses and applications. Critical Reviews in Food Science and Nutrition 63, 10 (2023), 1297\u20131313.","journal-title":"Critical Reviews in Food Science and Nutrition 63"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612193"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiolchem.2024.108116"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2968537"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ifset.2020.102527"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tifs.2023.07.012"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654970"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-019-1212-9"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aal2014"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/1459359.1459548"},{"key":"e_1_3_1_53_2","first-page":"1","volume-title":"IEEE Transactions on Multimedia","author":"Lan Xing","year":"2023","unstructured":"Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, and Jian Xue. 2023. FoodSAM: Any food segmentation. IEEE Transactions on Multimedia (2023), 1\u201314."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade4401"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2019.2938627"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.3390\/foods12173145"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341162.3343836"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1038\/s44222-023-00126-5"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11848"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.3390\/s20030772"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2024.3374211"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3554738"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2024.3417280"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9412339"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/EMBC.2019.8856889"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2993948"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.3390\/s20154283"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.foodres.2021.110437"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2022.3210439"},{"key":"e_1_3_1_70_2","doi-asserted-by":"crossref","unstructured":"Jonathan Malmaud Jonathan Huang Vivek Rathod Nick Johnston Andrew Rabinovich and Kevin Murphy. 2015. What\u2019s cookin\u2019? Interpreting cooking videos using text speech and vision. arXiv:1503.01558. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/1503.01558","DOI":"10.3115\/v1\/N15-1015"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.3390\/s23177362"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2927476"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2759499"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2958761"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3329168"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2639382"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123272"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2022.100484"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414031"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3237871"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/2971648.2971677"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45113-7_18"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISM.2011.66"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/2816454"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.146"},{"key":"e_1_3_1_86_2","unstructured":"Saeejith Nair Chi en Amy Tai Yuhao Chen and Alexander Wong. 2023. NutritionVerse-Synth: An open access synthetically generated 2D food scene dataset for dietary intake estimation. arXiv:2312.06192. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/2312.06192"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3167020.3167057"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649137"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41870-019-00277-y"},{"key":"e_1_3_1_90_2","unstructured":"National Institutes of Health (NIH) Nutrition Research Task Force. 2020. 2020\u20132030 Strategic Plan for NIH Nutrition Research. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dpcpsi.nih.gov\/sites\/default\/files\/2020NutritionStrategicPlan_508.pdf"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/2986035.2986040"},{"key":"e_1_3_1_92_2","unstructured":"David Amat Ol\u00f3ndriz Pon\u00e7 Palau Puigdevall and Adri\u00e0 Salvador Palau. 2022. FooDI-ML: A large multi-language dataset of food drinks and groceries images and descriptions. arXiv:2110.02035. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/2110.02035"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413636"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01606"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00819"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-68821-9_46"},{"key":"e_1_3_1_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW61823.2024.00008"},{"key":"e_1_3_1_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3063592"},{"key":"e_1_3_1_99_2","volume-title":"Proceedings of the International Conference on Multimedia Assisted Dietary Management","volume":"1","author":"Pouladzadeh Parisa","year":"2015","unstructured":"Parisa Pouladzadeh, Abdulsalam Yassine, and Shervin Shirmohammadi. 2015. FooDD: An image-based food detection dataset for calorie measurement. In Proceedings of the International Conference on Multimedia Assisted Dietary Management, Vol. 1."},{"key":"e_1_3_1_100_2","unstructured":"Joan Peracaula Prat. 2020. A Multimodal Deep Learning Approach for Food Tray Recognition. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/http\/hdl.handle.net\/2445\/173728"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2023.3243999"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.3390\/s23020560"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1145\/2996462"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1145\/2660579.2660586"},{"key":"e_1_3_1_105_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123440"},{"issue":"3","key":"e_1_3_1_106_2","first-page":"51","article-title":"A personalized diet recommendation system using fuzzy ontology","volume":"7","author":"Raut Madhu","year":"2018","unstructured":"Madhu Raut, Keyur Prabhu, Rachita Fatehpuria, Shubham Bangar, and Sunita Sahu. 2018. A personalized diet recommendation system using fuzzy ontology. International Journal of Engineering and Science Invention 7, 3 (2018), 51\u201355.","journal-title":"International Journal of Engineering and Science Invention"},{"key":"e_1_3_1_107_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-021-01584-2"},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1609\/icwsm.v12i1.15034"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-024-19161-4"},{"key":"e_1_3_1_110_2","unstructured":"Ali Rostami. 2024. An Integrated Framework for Contextual Personalized LLM-Based Food Recommendation. Ph.D. Dissertation. UC Irvine."},{"key":"e_1_3_1_111_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414691"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9412839"},{"key":"e_1_3_1_113_2","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-015-0908-2"},{"key":"e_1_3_1_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3041021.3055137"},{"key":"e_1_3_1_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01070"},{"key":"e_1_3_1_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01522"},{"key":"e_1_3_1_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.327"},{"key":"e_1_3_1_118_2","doi-asserted-by":"publisher","DOI":"10.1021\/acssensors.1c00553"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.foodchem.2023.136309"},{"key":"e_1_3_1_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW56347.2022.00503"},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2024.104071"},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-022-01888-5"},{"key":"e_1_3_1_123_2","doi-asserted-by":"publisher","DOI":"10.1080\/08839514.2019.1602318"},{"key":"e_1_3_1_124_2","doi-asserted-by":"publisher","DOI":"10.1145\/3600095"},{"key":"e_1_3_1_125_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475422"},{"key":"e_1_3_1_126_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2013.2263125"},{"key":"e_1_3_1_127_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-21404-z"},{"key":"e_1_3_1_128_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compag.2017.09.019"},{"key":"e_1_3_1_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00879"},{"key":"e_1_3_1_130_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2929413"},{"key":"e_1_3_1_131_2","first-page":"653","volume-title":"Collaborative Recommendations: Algorithms, Practical Challenges and Applications","author":"Trattner Christoph","year":"2019","unstructured":"Christoph Trattner and David Elsweiler. 2019. Food recommendations. In Collaborative Recommendations: Algorithms, Practical Challenges and Applications. World Scientific, 653\u2013685."},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.brainresrev.2006.09.002"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neubiorev.2005.11.003"},{"key":"e_1_3_1_134_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW63382.2024.00378"},{"key":"e_1_3_1_135_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3313638"},{"key":"e_1_3_1_136_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58583-9_22"},{"key":"e_1_3_1_137_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475226"},{"issue":"3","key":"e_1_3_1_138_2","first-page":"3363","article-title":"Learning structural representations for recipe generation and food retrieval","volume":"45","author":"Wang Hao","year":"2022","unstructured":"Hao Wang, Guosheng Lin, Steven C. H. Hoi, and Chunyan Miao. 2022. Learning structural representations for recipe generation and food retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3363\u20133377.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01184"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3083109"},{"key":"e_1_3_1_141_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.124720"},{"key":"e_1_3_1_142_2","first-page":"1","volume-title":"IEEE Transactions on Multimedia","author":"Wang Lanjun","year":"2024","unstructured":"Lanjun Wang, Chenyu Zhang, An-An Liu, Bo Yang, Mingwang Hu, Xinran Qiao, Lei Wang, Jianlin He, and Qiang Liu. 2024. Toward Chinese food understanding: A cross-modal ingredient-level benchmark. IEEE Transactions on Multimedia (2024), 1\u201315."},{"key":"e_1_3_1_143_2","doi-asserted-by":"publisher","DOI":"10.1145\/3418211"},{"key":"e_1_3_1_144_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. IEEE","author":"Wang Xin","year":"2015","unstructured":"Xin Wang, Devinder Kumar, Nicolas Thome, Matthieu Cord, and Frederic Precioso. 2015. Recipe recognition with large multimodal food dataset. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. IEEE, 1\u20136."},{"key":"e_1_3_1_145_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3193763"},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2023.3247099"},{"key":"e_1_3_1_147_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2023.3333935"},{"key":"e_1_3_1_148_2","doi-asserted-by":"publisher","DOI":"10.1145\/3512527.3531388"},{"key":"e_1_3_1_149_2","first-page":"1210","volume-title":"Proceedings of the IEEE International Conference on Multimedia and Expo","author":"Wu Wen","year":"2009","unstructured":"Wen Wu and Jie Yang. 2009. Fast food recognition from videos of eating for calorie estimation. In Proceedings of the IEEE International Conference on Multimedia and Expo, 1210\u20131213."},{"key":"e_1_3_1_150_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00397"},{"key":"e_1_3_1_151_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612016"},{"key":"e_1_3_1_152_2","first-page":"1","article-title":"Learning text-image joint embedding for efficient cross-modal retrieval with deep feature engineering","volume":"40","author":"Xie Zhongwei","year":"2021","unstructured":"Zhongwei Xie, Ling Liu, Yanzhao Wu, Luo Zhong, and Lin Li. 2021. Learning text-image joint embedding for efficient cross-modal retrieval with deep feature engineering. ACM Transactions on Information Systems 40, 4 (2021), 1\u201327.","journal-title":"ACM Transactions on Information Systems"},{"key":"e_1_3_1_153_2","first-page":"1","volume-title":"IEEE Transactions on Multimedia","author":"Xu Mengling","year":"2024","unstructured":"Mengling Xu, Jie Wang, Ming Tao, Bing-Kun Bao, and Changsheng Xu. 2024. CookGALIP: Recipe controllable generative adversarial CLIPs with sequential ingredient prompts for food image generation. IEEE Transactions on Multimedia (2024), 1\u201311."},{"key":"e_1_3_1_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2438717"},{"key":"e_1_3_1_155_2","doi-asserted-by":"crossref","unstructured":"Semih Yagcioglu Aykut Erdem Erkut Erdem and Nazli Ikizler-Cinbis. 2018. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes. arXiv:1809.00812. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/1809.00812","DOI":"10.18653\/v1\/D18-1166"},{"key":"e_1_3_1_156_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3549203"},{"key":"e_1_3_1_157_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.smhl.2024.100465"},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3050090"},{"key":"e_1_3_1_159_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compind.2017.09.001"},{"key":"e_1_3_1_160_2","doi-asserted-by":"crossref","unstructured":"Yixin Zhang Xin Zhou Qianwen Meng Fanglin Zhu Yonghui Xu Zhiqi Shen and Lizhen Cui. 2024. Multi-modal food recommendation using clustering and self-supervised learning. arXiv:2406.18962. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/2406.18962","DOI":"10.1007\/978-981-96-0116-5_22"},{"key":"e_1_3_1_161_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compag.2020.105959"},{"key":"e_1_3_1_162_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12342"},{"key":"e_1_3_1_163_2","doi-asserted-by":"publisher","DOI":"10.1111\/1541-4337.12492"},{"key":"e_1_3_1_164_2","doi-asserted-by":"crossref","unstructured":"Pengfei Zhou Weiqing Min Chaoran Fu Ying Jin Mingyu Huang Xiangyang Li Shuhuan Mei and Shuqiang Jiang. 2024. FoodSky: A food-oriented large language model that passes the chef and dietetic examination. arXiv:2406.10261. Retrieved from https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/arxiv.org\/abs\/2406.10261","DOI":"10.2139\/ssrn.4972042"},{"key":"e_1_3_1_165_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2024.3360899"},{"key":"e_1_3_1_166_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612661"},{"key":"e_1_3_1_167_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611943"},{"key":"e_1_3_1_168_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00556"},{"key":"e_1_3_1_169_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01174"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/pdf\/10.1145\/3715143","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T19:58:47Z","timestamp":1772222327000},"score":1,"resource":{"primary":{"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3715143"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,19]]},"references-count":168,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3715143"],"URL":"https:\/\/summer-heart-0930.chufeiyun1688.workers.dev:443\/https\/doi.org\/10.1145\/3715143","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,19]]},"assertion":[{"value":"2024-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}