MM-SAP

MM-SAP is a benchmark that systematically evaluates the MLLMs’ self-awareness in perception, which refers to their awareness to recognize what they can and cannot know from the given image.

📝 Paper 📊 Data

News 🔥🔥🔥

[2024.01.15] This repo is made public!

Overview

Leaderboard

Rank	Models	BasicVisQA	KnowVisQA	BeyondVisQA	Total
1	GPT-4V	61.60 ± 2.33	83.43 ± 1.40	81.96 ± 0.70	75.13 ± 1.30
2	Qwen-VL-Chat-7b	70.60 ± 0.68	72.11 ± 1.50	30.33 ± 0.41	57.82 ± 0.63
3	InfMLLM-7b	70.95 ± 1.09	55.54 ± 0.50	41.03 ± 1.30	56.28 ± 0.75
4	ShareGPT-4V-7b	69.70 ± 0.91	56.69 ± 1.58	41.03 ± 1.36	56.19 ± 1.08
5	CogVLM-17b	69.70 ± 0.58	66.00 ± 1.26	30.49 ± 1.38	55.64 ± 0.42
6	LLaVA-13b	68.25 ± 1.56	58.11 ± 1.43	34.02 ± 0.90	53.81 ± 0.39
7	ShareGPT-4V-13b	68.00 ± 1.74	60.29 ± 1.11	30.43 ± 0.54	53.22 ± 0.60
8	LLaVA-7b	62.00 ± 0.85	54.23 ± 1.92	30.27 ± 0.85	49.12 ± 0.60

Data Examples

Citation

@article{wang2024mm,
  title={MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception},
  author={Wang, Yuhao and Liao, Yusheng and Liu, Heyang and Liu, Hongcheng and Wang, Yu and Wang, Yanfeng},
  journal={arXiv preprint arXiv:2401.07529},
  year={2024}
}

Name	Name	Last commit message	Last commit date
Latest commit YHWmz Sep 26, 2024 9920a49 · Sep 26, 2024 History 12 Commits
imgs	imgs	Update README.md	Jan 15, 2024
.DS_Store	.DS_Store	Update README.md	Jan 15, 2024
README.md	README.md	Update README.md	Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MM-SAP

News 🔥🔥🔥

Overview

Leaderboard

Data Examples

Citation

About

Releases

Packages

YHWmz/MM-SAP

Folders and files

Latest commit

History

Repository files navigation

MM-SAP

News 🔥🔥🔥

Overview

Leaderboard

Data Examples

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages