Skip to content

laihongphuc/evaluation-code-for-text-to-3d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation-code-for-text-to-3d

Evaluation code for text-to-3D using huggingface model weight (because my remote server can only download weight from huggingface hub) 😆

Note: image rendered by method in threestudio is usually concat with depth map and alpha image.

Background

Inception Variety

The formulation of IV score $$ IV(\theta) = H[\mathbb{E}c[p{cls}(y|g(\theta, c))]] $$ The higher IV signifies that each rendered view is likely to have a distinct label prediction, meaning the 3D creation has higher view diversity => less Janus (???)

The problem with IV score is that if the rendered image is not good => the Inception backbone don't label it certainly => the entropy is high. Combining the IV and IQ score, we have IG (Information Gain) $IG=(IV - IQ)/IQ$

How to use

CLIP Score

We can compute CLIP score between text prompt and images rendered by threestudio.

python main.py --metric "clip" --generate-image-dir <image dir> --text <text prompt> --batch-size 32 --device "cuda"

3D-FID Score

We can compute 3D-FID score between 2 image folders, the first folder is rendered by threestudio, the second folder is generated by Diffusion model.

python main.py --metric "fid" --generate-image-dir <first dir> --real-image-dir <second dir> --batch-size 32 --device "cuda"

TODO

  • Add Clip score
  • Add 3D-FID score (Inception Backbone)
  • Save .npz cache
  • Add 3D-FID (Clip backbone)
  • Add Inception Variety and Inception Gain to reflect the Janus problem

About

Evaluation code for text-to-3D

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages