Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HITL - Data collection #1967

Merged
merged 7 commits into from
May 20, 2024
Merged

HITL - Data collection #1967

merged 7 commits into from
May 20, 2024

Conversation

0mdc
Copy link
Contributor

@0mdc 0mdc commented May 20, 2024

Motivation and Context

This changeset enables the rearrange_v2 app to record data for HITL experiments.

How it works

This adds a session recorder, which has its lifetime tied to a single HITL session.

When reaching the End Session state (see this PR), the session is uploaded to S3 before being destroy.

Configuration:

The following configuration will do the following:

  • Create output file output/session.json.gz
  • Upload it to S3 at Placeholder/[completed/incomplete]/session.json.gz.

The bucket is defined via the environment variable S3_BUCKET.

rearrange_v2:
  data_collection:
    s3_path: "Placeholder/"
    output_file_name: "session"

Notes

  • S3 is currently the only supported data provider.
  • Files are available for inspection locally. They are deleted before uploading the next session.

Depends on:

How Has This Been Tested

Tested on EC2 instances running single and multi user applications.

Types of changes

  • [Development]

Checklist

  • My code follows the code style of this project.
  • I have updated the documentation if required.
  • I have read the CONTRIBUTING document.
  • I have completed my CLA (see CONTRIBUTING)
  • I have added tests to cover my changes if required.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label May 20, 2024
Copy link
Contributor

@zephirefaith zephirefaith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification questions mostly.

I skipped the files which largely looked the same as #1965. I couldn't tell the diff, LMK if you want me to look again at the diff, if any, once the previous one is merged.

examples/hitl/rearrange_v2/util.py Outdated Show resolved Hide resolved
"t": elapsed_time,
"users": [],
"object_states": self.get_objects_state(),
"agent_states": self.get_agents_state(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a "world-graph" API here if you want to save those object-to-furniture/agent/receptacle relations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be perfect.

"task_completed": u.episode_finished,
"task_succeeded": u.episode_success,
"camera_transform": u.cam_transform,
"held_object": u.ui._held_object_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this already but this ID if passed to grasp_mgr, in a way understood by sim, would solve what habitat-llm needs.

Comment on lines +173 to +178
# Register UI callbacks
self.ui.on_pick.registerCallback(self._on_pick)
self.ui.on_place.registerCallback(self._on_place)
self.ui.on_open.registerCallback(self._on_open)
self.ui.on_close.registerCallback(self._on_close)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use these, would I need to pass on ui object to LLMController?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably just do it from rearrange_v2 initialization code, which has access to both the UI and LLMController.

Something like this would work:

self._user_data[0].ui.on_pick.registerCallback(llm_controller.on_pick())

If we ever want to scale this to N users, we would just pass the user index in the event data. For now, this does the job.

Comment on lines +309 to +326
def _on_open(self, e: UI.OpenEventData):
self.ui_events.append(
{
"type": "open",
"obj_handle": e.object_handle,
"obj_id": e.object_id,
}
)

def _on_close(self, e: UI.CloseEventData):
self.ui_events.append(
{
"type": "close",
"obj_handle": e.object_handle,
"obj_id": e.object_id,
}
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is irrelevant to PR, but do we need open/close for single-learn data collection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do it right now. That will most likely change with the inclusion of object states.

def on_exit(self):
super().on_exit()

episode_success = all(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

episode_success here means "user thinks episode was done/success", right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, success is the only outcome.

In a following PR, I'll be adding a way for users to report either success or failure via a form.

@@ -486,9 +589,12 @@ def sim_update(self, dt: float, post_sim_update_dict):

# Collect data.
self._elapsed_time += dt
# TODO: Always record with non-human agent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In single-user and multi-user modes, we can skip recording of frames when there's no user input.

With the LLM agent, we'll most likely have to record every frame.

Comment on lines +76 to +85
def record_frame(
self,
frame_data: Dict[str, Any],
):
self.data["end_timestamp"] = timestamp()
self.data["frame_count"] += 1

self.data["episodes"][-1]["end_timestamp"] = timestamp()
self.data["episodes"][-1]["frame_count"] += 1
self.data["episodes"][-1]["frames"].append(frame_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this frame different from the frame you already have the FrameRecorder for? Why twice?

Copy link
Contributor

@zephirefaith zephirefaith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust you've run this before so this works :)

Code LGTM, a couple nits.

examples/hitl/rearrange_v2/app_state_end_session.py Outdated Show resolved Hide resolved
Comment on lines +98 to +99
if os.path.exists(output_folder):
shutil.rmtree(output_folder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure this will not delete useful data?

Copy link
Contributor Author

@0mdc 0mdc May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only contains the .json.gz file. The directory is expected to contain more data in the future (e.g. replay file, screenshots, etc).

You could however change the path in the config to any directory 🤔

@0mdc 0mdc merged commit 409d0c3 into main May 20, 2024
3 of 4 checks passed
@0mdc 0mdc deleted the 0mdc/hitl_data_collection branch May 20, 2024 23:04
0mdc added a commit that referenced this pull request May 23, 2024
* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.
dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request Jun 26, 2024
* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.
dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request Jul 8, 2024
* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.
dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request Jul 8, 2024
* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants