Evaluation code for text-to-3D using huggingface model weight (because my remote server can only download weight from huggingface hub) 😆
Note: image rendered by method in threestudio is usually concat with depth map and alpha image.
The formulation of IV score $$ IV(\theta) = H[\mathbb{E}c[p{cls}(y|g(\theta, c))]] $$ The higher IV signifies that each rendered view is likely to have a distinct label prediction, meaning the 3D creation has higher view diversity => less Janus (???)
The problem with IV score is that if the rendered image is not good => the Inception backbone don't label it certainly => the entropy is high. Combining the IV and IQ score, we have IG (Information Gain)
We can compute CLIP score between text prompt and images rendered by threestudio.
python main.py --metric "clip" --generate-image-dir <image dir> --text <text prompt> --batch-size 32 --device "cuda"
We can compute 3D-FID score between 2 image folders, the first folder is rendered by threestudio, the second folder is generated by Diffusion model.
python main.py --metric "fid" --generate-image-dir <first dir> --real-image-dir <second dir> --batch-size 32 --device "cuda"
- Add Clip score
- Add 3D-FID score (Inception Backbone)
- Save
.npz
cache - Add 3D-FID (Clip backbone)
- Add Inception Variety and Inception Gain to reflect the Janus problem