Skip to content

Commit

Permalink
add multitask policy training benchmark scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
Lirui committed Nov 7, 2023
1 parent ecaf111 commit 06b10df
Show file tree
Hide file tree
Showing 6 changed files with 130 additions and 3 deletions.
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

### Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang

[Project Page](https://liruiw.github.io/gensim) | [Arxiv](https://arxiv.org/abs/2310.01361) | [Gradio Demo](https://huggingface.co/spaces/Gen-Sim/Gen-Sim) | [Huggingface Dataset](https://huggingface.co/datasets/Gen-Sim/Gen-Sim)
[Project Page](https://liruiw.github.io/gensim) | [Arxiv](https://arxiv.org/abs/2310.01361) | [Gradio Demo](https://huggingface.co/spaces/Gen-Sim/Gen-Sim) | [Huggingface Dataset](https://huggingface.co/datasets/Gen-Sim/Gen-Sim) | [Finetuned Code-LLama Model](https://huggingface.co/Gen-Sim/Gen-Sim)

This repo explores the use of an LLM code generation pipeline to write simulation environments and expert goals to augment diverse simulation tasks.
This repo explores the use of an LLM code generation pipeline to write simulation environments and expert goals to augment diverse simulation tasks. Strongly recommend also checking out the [Gradio Demo](https://huggingface.co/spaces/Gen-Sim/Gen-Sim).


![](media/gensim_teaser_v1.gif)
Expand Down Expand Up @@ -65,6 +65,12 @@ python gensim/run_simulation.py disp=True prompt_folder=topdown_chain_of_thoug

8. offline eval: `python -m gensim.evaluate_finetune_model_offline model_output_dir=after_finetune_CodeLlama-13b-Instruct-hf_fewshot_False_epoch_10_0`

## 🤖 Policy Training Benchmark
0. Note that the 100+ generated tasks by GenSim can be used for benchmarking algorithms in multitask policy training. See `prompts/policy_training_list.json` for a list of training tasks.
1. Generate multitask demonstrations. For example, run `bash scripts/generate_datasets.sh data 'align-box-corner assembling-kits block-insertion' `
2. Single-task training `sh scripts/train_test_multi_task.sh data "[align-rope,align-box-corner] `
3. Multi-task training `sh scripts/train_test_single_task.sh data align-box-corner`

## ✅ Note
0. Temperature `0.5-0.8 `is good range for diversity, `0.0-0.2` is for stable results.
1. The generation pipeline will print out statistics regarding compilation, runtime, task design, and diversity scores. Note that these metric depend on the task compexity that LLM tries to generate.
Expand Down
2 changes: 1 addition & 1 deletion cliport/cfg/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ max_env_run_cnt: 3 # maximum number of runs for each environment
trials: 10 # how many times of spawning each environment generated
output_folder: 'output/output_stats'
model_output_dir: '' # to be filled in with date
gpt_model: "gpt-4-0613" # which openai gpt model to use
gpt_model: "gpt-4-1106-preview" # which openai gpt model to use
openai_key: ${oc.env:OPENAI_KEY}

# Advanced options
Expand Down
1 change: 1 addition & 0 deletions prompts/policy_training_list.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
["color-coordinated-box-ball-matching", "ball-on-box-on-container", "insert-sphere-into-container", "sphere-container-color-match", "colored-balls-sorting-in-corner", "color-cued-ball-corner-sorting", "place-ball-in-elevated-bowl", "stack-blocks-in-container", "sorting-blocks-into-pallets", "ball-sorting-with-blocks-barrier", "color-coordinated-zone-arrangement", "color-specific-container-fill", "align-balls-in-colored-boxes", "color-ordered-container-arrangement", "cylinder-balancing-and-placement", "color-coordinated-cylinder-ball-match", "color-coordinated-zone-stacking", "color-coordinated-block-shifting", "kit-in-bowl-in-zone", "put-kit-in-bowl", "ball-in-bowl-obstacle-course-new", "color-ordered-blocks-on-pallet", "color-coordinated-insertion", "sort-insert-color-coordinated-blocks", "color-coordinated-sphere-insertion", "insert-blocks-lineup", "mixed-color-block-barrier-insertion", "block-on-cylinder-on-pallet", "vertical-insertion-blocks", "color-sorted-block-race", "color-coordinated-sphere-on-pallet-pyramid", "color-blocks-in-cylinder-maze", "color-coordinated-block-tower", "sort-and-stack-clr-blocks", "create-pyramid-blocks-and-container", "colorful-block-tower-on-cylinder-base", "sweep-and-sort-blocks", "color-coordinated-ball-stacking", "stack-color-coordinated-blocks", "color-coordinated-cylinder-pyramid", "block-pyramid-with-limited-space", "multicolor-block-bridge", "construct-colorful-arch", "color-coordinated-block-bridge", "multi-level-pyramid-construction", "create-pyramid-with-color-coded-ells", "color-coordinated-cylinder-tower", "color-coordinated-arch-construction", "cylinder-ring-stack", "color-sorted-container-stack", "color-structured-block-tower", "rainbow-stack", "sort-and-assemble-block-castle", "symmetric-block-bridge-construction", "colored-cylinder-in-square", "build-cylinder-structure", "stack-three-layer-red-wall", "insert-ell-along-square-path", "guided-block-path", "color-ordered-insertion-new", "construct-corner-blocks", "build-car", "build-house", "put-blocks-between-zones", "corner-sort-cylinders", "place-blue-on-line-ends", "align-pair-colored-blocks-along-line", "align-rope-along-line", "sphere-align-stand", "align-rope-cross-zone"]
18 changes: 18 additions & 0 deletions scripts/generate_datasets.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
DATA_DIR=$1
TASK=$2
DISP=False

echo "Generating dataset... Folder: $DATA_DIR"
trap "kill 0" SIGINT

LANG_TASKS=$2

for task in $LANG_TASKS
do
python cliport/demos.py n=200 task=$task mode=train data_dir=$DATA_DIR disp=$DISP record.save_video=False +regenerate_data=True &
python cliport/demos.py n=50 task=$task mode=val data_dir=$DATA_DIR disp=$DISP record.save_video=False +regenerate_data=True &
python cliport/demos.py n=100 task=$task mode=test data_dir=$DATA_DIR disp=$DISP record.save_video=False +regenerate_data=True &
done
wait

echo "Finished Language Tasks."
56 changes: 56 additions & 0 deletions scripts/train_test_multi_task.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash

DATA_DIR=$1
TRAINTASK=${2-'[rainbow-stack,bowl-ball-placement]'}
TESTTASK=${3-'[rainbow-stack,bowl-ball-placement]'}
TASKNAME=${4-'mix-two'}
STEPS=${5-'10000'}

DISP=False

echo "Training multi-task dataset... Folder: $DATA_DIR Task $TRAINTASK"
trap "kill 0" SIGINT

python cliport/train.py train.task=$TRAINTASK \
train.agent=cliport \
train.model_task=$TASKNAME \
train.attn_stream_fusion_type=add \
train.trans_stream_fusion_type=conv \
train.lang_fusion_type=mult \
train.n_demos=50 \
train.n_steps=${STEPS} \
dataset.cache=True \
train.exp_folder=exps/exp-$TASKNAME \
dataset.type=multi \
train.load_from_last_ckpt=False


# Convert Python list to Bash array
bash_array=$(python3 -c "import sys; print(' '.join((sys.argv[1])[1:-1].split(',')))" "$TESTTASK")

# Convert the space-separated string to a bash array
echo "Testing multi-task dataset... Folder: $DATA_DIR Task $TESTTASK"


for task in $bash_array
do
echo "Testing $task"
# TEST
# bash scripts/generate_gpt_datasets.sh data $task

python cliport/eval.py model_task=$TASKNAME \
eval_task=$task \
agent=cliport \
mode=test \
n_demos=100 \
train_demos=50 \
checkpoint_type=test_best \
type=single \
exp_folder=exps/exp-$TASKNAME \
update_results=True &
done
wait

python notebooks/print_results.py -r=exps/exp-$TASKNAME

echo "Finished Training."
46 changes: 46 additions & 0 deletions scripts/train_test_single_task.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash

DATA_DIR=$1
TASK=$2
DISP=False

echo "Training dataset... Folder: $DATA_DIR Task $TASK"

# You can parallelize these depending on how much resources you have

#############################
## Language-Conditioned Tasks
trap "kill 0" SIGINT
LANG_TASKS=$2


for task in $LANG_TASKS
do
# Generate data
bash scripts/regenerate_gpt_datasets.sh data $task

# TRAIN
python cliport/train.py train.task=$task \
train.agent=cliport \
train.attn_stream_fusion_type=add \
train.trans_stream_fusion_type=conv \
train.lang_fusion_type=mult \
train.n_demos=100 \
train.n_steps=5000 \
train.exp_folder=exps/exps-singletask \
dataset.cache=True

# TEST
python cliport/eval.py eval_task=$task \
agent=cliport \
mode=test \
n_demos=100 \
train_demos=200 \
checkpoint_type=test_best \
exp_folder=exps/exps-singletask \
update_results=True
done

python notebooks/print_results.py -r=exps/exps-singletask

echo "Finished Training."

0 comments on commit 06b10df

Please sign in to comment.