Skip to content

Latest commit

 

History

History
36 lines (16 loc) · 55.1 KB

README.md

File metadata and controls

36 lines (16 loc) · 55.1 KB

简体中文

ImageEval-prompt

ImageEval-prompt is a set of prompts that evaluate text-to-image (T2I) models at a fine-grained level, including entity, style and detail. By conducting comprehensive evaluations at a fine-grained level, researchers can better understand the strengths and limitations of T2I models, in order to further improve their performance.

Construction Method

We have constructed two sets of datasets, one in English and the other in Chinese. The English dataset consists of 1,624 prompts from PartiPrompts, while the Chinese dataset comprises 339 prompts generated by an automated tool. For each prompt, we annotated three dimensions: entity, style, and detail. The entity dimension includes five sub-dimensions: object, state, color, quantity, and position; the style dimension includes two sub-dimensions: painting style and cultural style; the detail dimension includes four sub-dimensions: hands, facial features, gender, and illogical knowledge. The manual annotation results show that the entity and style dimensions are relatively simple, while the detail dimension, except for the gender sub-dimension, is complex. For specific sub-dimension contents and quantities, please refer to the table below.

DimensionSub-dimensionThe number of prompts corresponding to each sub-dimension (English)The number of prompts corresponding to each sub-dimension (Chinese)
Entityobject1624339
status52452
color33772
number1423134
location803131
Stylepaint181113
culture10498
Detailhand4620
face19868
gender23849
illogical10936

The annotation method adopted a "double-blind annotation & third-party arbitration" approach. The table below shows the annotation results for the English and the Chinese datasets. The list corresponding to the object indicates the subject objects contained in the prompt (which can be one or more). In the other columns, 0 indicates that the sub-dimension did not appear, 1 indicates a simple inspection of the sub-dimension, and 2 indicates a complex inspection of the sub-dimension.

Promptobjectstatuscolornumberlocationpaintculturehandfacegenderillogical
a basketball game between a team
of four cats and a team of three dogs
'basketball game', 'cats', 'dogs'0010000002
穿着华丽的衣服的女士坐在椅子上,素描'女士', '椅子', '衣服'0001100210

We verified the annotation results and found that the consistency rate of all sub-dimensions, except for entity objects, exceeded 95%. As entity objects can have multiple options, we used the intersection-over-union method to calculate the consistency of this sub-dimension. The main reason for the relatively low consistency rate (above 80%) in the annotation of entity objects is the inevitable subjective factors, for example, the entity "basketball", some annotators consider "basketball" as an entity, while others consider "ball" as the entity.

DimensionSub-dimensionEnglishChinese
Consistent AmountConsistent RatioConsistent AmountConsistent Ratio
Entityobject13770.841 3080.894
status16090.991 3370.994
color16170.996 3360.991
number16140.994 3391.000
location16060.989 3350.988
Stylepaint16160.995 3360.991
culture16100.991 3340.985
Detailhand16170.996 3380.997
face16110.992 3380.997
gender16220.999 3380.997
illogical16040.988 3380.997
All13130.808 2960.873

Annotation Dimension Description

DimensionSub-dimensionDescriptionExample
EntityobjectSubject object, which may be one or more.The objects in "a bookshelf with ten books stacked vertically" are book and bookshelf.
statusAction description.The status described in "three violins lying on the floor" is lying.
colorColor description.The color described in "a white background with a large blue square" are white and blue.
numberQuantity description.The quantity described in "300 movie titles" is 300.
locationSpatial relation description.The spatial relation described in "a large yellow triangle above a green square and red rectangle" is "above".
StylepaintDescription of the painting style, such as sketch, traditional Chinese painting, or from a particular painter/studio, such as Monet, Van Gogh, or popular on Pixiv, ArtStation, etc.The paint style described in "The Oriental Pearl in sketch style" is sketch style.
cultureDescription of cultural style, such as Chinese style, Baroque, etc.The culture style described in "an Egyptian statue" is Egyptian culture.
DetailhandDescription of the hands of humanoid entities.The hand described in "a man banging on a door" is banging.
faceDescription of the facial features of humanoid entities.The facial features described in "a painting of the Mona Lisa with a frown" is frown.
genderDescription of the gender of humanoid entities.The gender described in "a man standing under a tree" is male
illogicalDescription of prompts that are clearly illogical.The description "a cat reading a book" is clearly illogical.

Acknowledgemen

We are very grateful for the contributions of existing works such as the PartiPrompts benchmark to this project.

Lisense

ImageEval-prompt is licensed under the Apache 2.0 license