LVDiffusor distills functional arrangement knowledge from large models into a diffusion model to generate well-organized and compatible layouts from everyday cluttered scenes.
To install the LVDiffusor, follow these steps:
- Clone the repository:
https://github.com/JZENGYM/LVDiffusor.git
- Navigate to the project directory:
cd LVDiffusor - Install the required dependencies:
pip install -r requirements.txt
The detection module in our work depends on GroundingSAM. Please check the original repo, download checkpoints for DINO and SAM, and complete the setup for local deployment.
You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/Install Segment Anything:
python -m pip install -e segment_anythingInstall Grounding DINO:
pip install --no-build-isolation -e GroundingDINOInstall diffusers:
pip install --upgrade diffusers[torch]Install osx:
git submodule update --init --recursive
cd grounded-sam-osx && bash install.shInstall RAM & Tag2Text:
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/Download the pretrained weights
cd Grounded-Segment-Anything
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pthWe use GPT4 as the large language model, please prepare your openai keys for requesting the openai api. The config in data_gen.py can be edited for prompting the VLM and LLM.
Generate the layout dataset:
python data_gen.py --num_samples 100python training.py --rotation False --scene_name dinning_tableTo use LVDiffusor, download the checkpoints and run the following command:
python eval_model.py --input_rgb ./assets/rgb.png --scene_name dinning_table --rotation FalseThis project is licensed under the MIT License. See the LICENSE file for more information.
Please cite our paper using the following BibTeX entry if you find it helpful:
@ARTICLE{lvdiffusor2024,
author={Zeng, Yiming and Wu, Mingdong and Yang, Long and Zhang, Jiyao and Ding, Hao and Cheng, Hui and Dong, Hao},
journal={IEEE Robotics and Automation Letters},
title={LVDiffusor: Distilling Functional Rearrangement Priors From Large Models Into Diffusor},
year={2024},
volume={9},
number={10},
pages={8258-8265}
}For any questions or inquiries, please contact:
Yiming Zeng: zengym27@mail2.sysu.edu.cn
Mingdong Wu: wmingd@pku.edu.cn
