New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

HITL - Data collection #1967

Merged

0mdc merged 7 commits into main from 0mdc/hitl_data_collection

May 20, 2024

Contributor

0mdc commented May 20, 2024 •

edited

Loading

Motivation and Context

This changeset enables the rearrange_v2 app to record data for HITL experiments.

How it works

This adds a session recorder, which has its lifetime tied to a single HITL session.

When reaching the End Session state (see this PR), the session is uploaded to S3 before being destroy.

Configuration:

The following configuration will do the following:

Create output file output/session.json.gz
Upload it to S3 at Placeholder/[completed/incomplete]/session.json.gz.

The bucket is defined via the environment variable S3_BUCKET.

rearrange_v2:
  data_collection:
    s3_path: "Placeholder/"
    output_file_name: "session"

Notes

S3 is currently the only supported data provider.
Files are available for inspection locally. They are deleted before uploading the next session.

Depends on:

HITL - Rearrange session handling #1965

How Has This Been Tested

Tested on EC2 instances running single and multi user applications.

Types of changes

[Development]

Checklist

My code follows the code style of this project.
I have updated the documentation if required.
I have read the CONTRIBUTING document.
I have completed my CLA (see CONTRIBUTING)
I have added tests to cover my changes if required.

0mdc added 5 commits

May 19, 2024 09:46


          Add session management.

22e51de


          Formatting changes.

fd126c3


          Add clarifications to episode resolution.

0ec3939


          Document temporary hack to check for client-side loading status.

ab29747


          Add session recorder, ui events and data upload.

061295c

0mdc requested review from jturner65, aclegg3, zephirefaith and Ram81

May 20, 2024 16:30

facebook-github-bot added the CLA Signed label

zephirefaith approved these changes

View reviewed changes

Contributor

zephirefaith left a comment

Clarification questions mostly.

I skipped the files which largely looked the same as #1965. I couldn't tell the diff, LMK if you want me to look again at the diff, if any, once the previous one is merged.

examples/hitl/rearrange_v2/util.py Outdated Show resolved Hide resolved

examples/hitl/rearrange_v2/rearrange_v2.py

+                          "t": elapsed_time,
+                          "users": [],
+                          "object_states": self.get_objects_state(),
+                          "agent_states": self.get_agents_state(),

Contributor

zephirefaith May 20, 2024

I can add a "world-graph" API here if you want to save those object-to-furniture/agent/receptacle relations here.

Contributor Author

0mdc May 20, 2024

That would be perfect.

examples/hitl/rearrange_v2/rearrange_v2.py

+                              "task_completed": u.episode_finished,
+                              "task_succeeded": u.episode_success,
+                              "camera_transform": u.cam_transform,
+                              "held_object": u.ui._held_object_id,

Contributor

zephirefaith May 20, 2024

We've discussed this already but this ID if passed to grasp_mgr, in a way understood by sim, would solve what habitat-llm needs.

examples/hitl/rearrange_v2/rearrange_v2.py

Comment on lines +173 to +178

+                      # Register UI callbacks
+                      self.ui.on_pick.registerCallback(self._on_pick)
+                      self.ui.on_place.registerCallback(self._on_place)
+                      self.ui.on_open.registerCallback(self._on_open)
+                      self.ui.on_close.registerCallback(self._on_close)

Contributor

zephirefaith May 20, 2024

<3

Contributor

zephirefaith May 20, 2024

To use these, would I need to pass on ui object to LLMController?

Contributor Author

0mdc May 20, 2024

You can probably just do it from rearrange_v2 initialization code, which has access to both the UI and LLMController.

Something like this would work:

self._user_data[0].ui.on_pick.registerCallback(llm_controller.on_pick())

If we ever want to scale this to N users, we would just pass the user index in the event data. For now, this does the job.

examples/hitl/rearrange_v2/rearrange_v2.py

Comment on lines +309 to +326

+                  def _on_open(self, e: UI.OpenEventData):
+                      self.ui_events.append(
+                          {
+                              "type": "open",
+                              "obj_handle": e.object_handle,
+                              "obj_id": e.object_id,
+                          }
+                      )
+                  def _on_close(self, e: UI.CloseEventData):
+                      self.ui_events.append(
+                          {
+                              "type": "close",
+                              "obj_handle": e.object_handle,
+                              "obj_id": e.object_id,
+                          }
+                      )

Contributor

zephirefaith May 20, 2024

This is irrelevant to PR, but do we need open/close for single-learn data collection?

Contributor Author

0mdc May 20, 2024

We don't need to do it right now. That will most likely change with the inclusion of object states.

examples/hitl/rearrange_v2/rearrange_v2.py

                   def on_exit(self):
                       super().on_exit()
+                      episode_success = all(

Contributor

zephirefaith May 20, 2024

episode_success here means "user thinks episode was done/success", right?

Contributor Author

0mdc May 20, 2024

In the current state, success is the only outcome.

In a following PR, I'll be adding a way for users to report either success or failure via a form.

examples/hitl/rearrange_v2/rearrange_v2.py

@@ @@ -486,9 +589,12 @@ def sim_update(self, dt: float, post_sim_update_dict): @@
                       #  Collect data.
                       self._elapsed_time += dt
+                      # TODO: Always record with non-human agent.

Contributor

zephirefaith May 20, 2024

What does this mean?

Contributor Author

0mdc May 20, 2024

In single-user and multi-user modes, we can skip recording of frames when there's no user input.

With the LLM agent, we'll most likely have to record every frame.

examples/hitl/rearrange_v2/session_recorder.py

Comment on lines +76 to +85

+                  def record_frame(
+                      self,
+                      frame_data: Dict[str, Any],
+                  ):
+                      self.data["end_timestamp"] = timestamp()
+                      self.data["frame_count"] += 1
+                      self.data["episodes"][-1]["end_timestamp"] = timestamp()
+                      self.data["episodes"][-1]["frame_count"] += 1
+                      self.data["episodes"][-1]["frames"].append(frame_data)

Contributor

zephirefaith May 20, 2024

Is this frame different from the frame you already have the FrameRecorder for? Why twice?


          Merge remote-tracking branch 'origin/main' into 0mdc/hitl_data_collec…

4199eea

…tion

zephirefaith approved these changes

View reviewed changes

Contributor

zephirefaith left a comment

I trust you've run this before so this works :)

Code LGTM, a couple nits.

examples/hitl/rearrange_v2/app_state_end_session.py Outdated Show resolved Hide resolved

examples/hitl/rearrange_v2/app_state_end_session.py

Comment on lines +98 to +99

		if os.path.exists(output_folder):
		shutil.rmtree(output_folder)

Contributor

zephirefaith May 20, 2024

Just making sure this will not delete useful data?

Contributor Author

0mdc May 20, 2024 •

edited

Loading

This only contains the .json.gz file. The directory is expected to contain more data in the future (e.g. replay file, screenshots, etc).

You could however change the path in the config to any directory 🤔

0mdc mentioned this pull request

HITL - Add end episode form and error reporting #1968

Merged

5 tasks


          Change path handling in session upload code.

8738d8d

0mdc merged commit 409d0c3 into main

3 of 4 checks passed

0mdc deleted the 0mdc/hitl_data_collection branch

May 20, 2024 23:04

0mdc added a commit that referenced this pull request


          HITL - Data collection (#1967)

a9a2912

* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.

dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request


          HITL - Data collection (facebookresearch#1967)

ef1aff8

* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.

dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request


          HITL - Data collection (facebookresearch#1967)

27d2109

* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.

dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request


          HITL - Data collection (facebookresearch#1967)

966a082

* Add session management.

* Formatting changes.

* Add clarifications to episode resolution.

* Document temporary hack to check for client-side loading status.

* Add session recorder, ui events and data upload.

* Change path handling in session upload code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels