LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor

LVDiffusor distills functional arrangement knowledge from large models into a diffusion model to generate well-organized and compatible layouts from everyday cluttered scenes.

Installation

To install the LVDiffusor, follow these steps:

Clone the repository:

https://github.com/JZENGYM/LVDiffusor.git

Navigate to the project directory:
```
cd LVDiffusor
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Detection Submodule Installation

The detection module in our work depends on GroundingSAM. Please check the original repo, download checkpoints for DINO and SAM, and complete the setup for local deployment.

You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:

export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/

Install Segment Anything:

python -m pip install -e segment_anything

Install Grounding DINO:

pip install --no-build-isolation -e GroundingDINO

Install diffusers:

pip install --upgrade diffusers[torch]

Install osx:

git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh

Install RAM & Tag2Text:

git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

Download the pretrained weights

cd Grounded-Segment-Anything

wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Usage

Dataset Generation with VLM and LLM

We use GPT4 as the large language model, please prepare your openai keys for requesting the openai api. The config in data_gen.py can be edited for prompting the VLM and LLM.

Generate the layout dataset:

python data_gen.py --num_samples 100

Training

python training.py --rotation False --scene_name dinning_table

Evaluation

To use LVDiffusor, download the checkpoints and run the following command:

python eval_model.py --input_rgb ./assets/rgb.png --scene_name dinning_table --rotation False

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Citation

Please cite our paper using the following BibTeX entry if you find it helpful:

@ARTICLE{lvdiffusor2024,
    author={Zeng, Yiming and Wu, Mingdong and Yang, Long and Zhang, Jiyao and Ding, Hao and Cheng, Hui and Dong, Hao},
    journal={IEEE Robotics and Automation Letters}, 
    title={LVDiffusor: Distilling Functional Rearrangement Priors From Large Models Into Diffusor}, 
    year={2024},
    volume={9},
    number={10},
    pages={8258-8265}
}

Contact

For any questions or inquiries, please contact:

Yiming Zeng: zengym27@mail2.sysu.edu.cn

Mingdong Wu: wmingd@pku.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
logs/ckpt		logs/ckpt
score_matching		score_matching
submodule @ 126abe6		submodule @ 126abe6
utils		utils
.gitmodules		.gitmodules
README.md		README.md
config.py		config.py
data_gen.py		data_gen.py
eval_model.py		eval_model.py
requirements.txt		requirements.txt
run.sh		run.sh
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor

Table of Contents

Installation

Detection Submodule Installation

Usage

Dataset Generation with VLM and LLM

Training

Evaluation

License

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor

Table of Contents

Installation

Detection Submodule Installation

Usage

Dataset Generation with VLM and LLM

Training

Evaluation

License

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages