Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Observation Histories #1208

Closed
1 of 3 tasks
KyleM73 opened this issue Oct 10, 2024 · 15 comments · Fixed by #1439
Closed
1 of 3 tasks

[Proposal] Observation Histories #1208

KyleM73 opened this issue Oct 10, 2024 · 15 comments · Fixed by #1439
Assignees
Labels
enhancement New feature or request

Comments

@KyleM73
Copy link
Contributor

KyleM73 commented Oct 10, 2024

Proposal

Many methods use observation histories (e.g. RMA/Student-Teacher Methods, Async ActorCritic, etc.) but histories are not currently supported by default. It is easy to make a wrapper class observation that stores histories on a per-observation term level, but there doesn't appear to be a way to store histories at the observation group level. Sometimes, e.g. for images it may be desirable to stack the history in terms of the individual observation term, but other times it may be more desirable to stack the history at the group level (which would require a change to the manager). This was something we had an implementation for at BDAI. Is this something that is currently already being worked on? If not I would be happy to take it on, but wasn't going to start working on it for a PR if someone else is already doing so.

Alternatives

  • Using the Observation Term class functionality to implement histories on a per-term basis, eg this minimal implementation. This stacks obs terms A B C as AAABBBCCC for history=3.
class HistoryObsWrapper(ManagerTermBase):
    def __init__(self, env: ManagerBasedRLEnvCfg, cfg: ObservationTermCfg) -> None:
        super().__init__(cfg, env)
        self.func = self.cfg.params["func"]
        if isinstance(self.func, ManagerTermBase):
            self.func = self.func(env, cfg)
        self.obs_len = self.func(env, **self.cfg.params["func_params"]).size(-1)   # TODO generalize this to multiD obs
        self.history_len = self.cfg.params["history_len"]
        self.data = torch.zeros(self.num_envs, self.history_len, self.obs_len, device=self.device)
        self.scale = self.cfg.params["scale"]

    def reset(self, env_ids: torch.Tensor = None) -> None:
        if isinstance(self.func, ManagerTermBase):
            self.func.reset(env_ids)
        self.data[env_ids] = self.func(self._env, **self.cfg.params["func_params"])[env_ids].view(-1, 1, self.obs_len).expand(-1, self.history_len, -1)

    def __call__(self, env, func, history_len, func_params, scale) -> torch.Tensor:
        self.data = self.data.roll(1, dims=1)
        self.data[:, 0] = self.func(self._env, **self.cfg.params["func_params"])
        return self.scale * self.data.clone().flatten(start_dim=1)
  • Using the Observation Group maybe with a post_init flag to stack whole observations as a history. This stacks obs terms A B C as ABCABCABC for history=3.
  • Histories could also be handled at the RL framework/policy level. It is my belief this would be better handled by Isaac Lab directly, but it is an option to just require frameworks to deal with it themselves, e.g. the storage buffer in rsl_rl.

Checklist

  • I have checked that there is no similar issue in the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

  • Option to stack histories at a per-term level
  • Option to stack histories at a per-group level
@mpgussert
Copy link
Collaborator

mpgussert commented Oct 10, 2024

Hello @KyleM73

To my knowledge our sensor base class is designed to support recording history https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/sensor_base.py

checkout the contact sensor, for example

Does this meet your needs? If not, can it be built on the current sensor history API?

@KyleM73
Copy link
Contributor Author

KyleM73 commented Oct 10, 2024

For observations that are just reading sensors that would satisfy the per-term history, but not the per-group history. In addition, not all observations come from reading sensor data, for example last action (currently only supports a single step) or generated commands. Maybe this is a larger question of "where should histories be stored", but unless the design decision is to implement a corresponding sensor for each observation type, it seems to me like there should be a way to track observation histories more generally i.e. separate from the sensor history API. If the desired solution is to always do it at a per-term level then I can PR a cleaned-up version of the code I provided above to the core isaac.lab MDP. But I do think there is value in supporting histories at a per-group level. I would love to hear feedback either way, for per-term and/or per-group histories.

@KyleM73
Copy link
Contributor Author

KyleM73 commented Oct 10, 2024

One additional consideration for dealing with this at the manager level as opposed to the observation term level is when adding noise for DR. Using the code I provided above, independent and DIFFERENT noise will be added to the history at each time step by the manager, but this is not realistic, as the noise added at time step k should be the same noise observed for the previous time step when the current step is k+1. This doesn't make a huge difference in practice as the policy learns to be robust to the noise anyways, but it is wrong.

@Mayankm96
Copy link
Contributor

Hi @KyleM73,

I agree. This is a feature we should have. There are many different ways you can do this (depending on the end goal):

  • Making an environment wrapper that does stacking of observations, similar to FrameStack in Gymnasium API
  • Adding history storage as a modifier functionality. This records the history per term (which may not be the most efficient but will be the most generic). I noticed right now that the modifiers act "before" any clipping/noisifying. However, we can change this ordering since doing it afterward makes more sense.

@jtigue-bdai will be able to provide better guidance here

@Mayankm96 Mayankm96 added the enhancement New feature or request label Oct 11, 2024
@KyleM73
Copy link
Contributor Author

KyleM73 commented Oct 11, 2024

Thrilled to work with @jtigue-bdai again :)

Some additional thoughts: The best case scenario allows for both obs terms and obs groups to be configured. For example, maybe a history of joint velocities is desired (eg to estimate acceleration if it isn't available on a given robot), but only the most recent command. In that case, the implementation would need to function at the per-term level. Alternatively, for methods like RMA or concurrent estimation, the policy and estimator, respectively, receive obs group histories, and doing so would be more efficient than handling it at the per-term level. I personally think both per-term and per-group should be supported. We could in theory handle this as an env wrapper with a config specifying how many time steps to record for each obs term and group separately. This would be the most efficient but also feels somewhat hacky and not in line with the design of the rest of the manager-based framework, although I'm happy to implement it however. Looking forward to thoughts!

@diracdelta7
Copy link

Hi @KyleM73,

I agree. This is a feature we should have. There are many different ways you can do this (depending on the end goal):

  • Making an environment wrapper that does stacking of observations, similar to FrameStack in Gymnasium API
  • Adding history storage as a modifier functionality. This records the history per term (which may not be the most efficient but will be the most generic). I noticed right now that the modifiers act "before" any clipping/noisifying. However, we can change this ordering since doing it afterward makes more sense.

@jtigue-bdai will be able to provide better guidance here

When using the modifier to record observation history, ObservationManager encounters difficulty in determining the correct dimensions for the associated observation term.

In ObservationManager, the following code attempts to call the observation function once to establish the dimensions for each term:

 # call function the first time to fill up dimensions
obs_dims = tuple(term_cfg.func(self._env, **term_cfg.params).shape)
self._group_obs_term_dim[group_name].append(obs_dims[1:])

This approach works when observations maintain fixed dimensions throughout the processing. However, the modifier changes the dimensions by appending additional historical data, which ObservationManager does not know for now. This results in a dimension mismatch.

Would this discrepancy pose any issues during training or evaluation? From my experience, it appears to only produce incorrect [INFO] Observation Manager dimension messages when starting training with rsl_rl. However, the neural network dimensions seem correct, and training proceeds without any errors.

@KyleM73
Copy link
Contributor Author

KyleM73 commented Nov 2, 2024

When you say the history tracking modifier, what are you referring to? I do not believe there is currently a modifier for tracking history, although we could use that interface to do so. In the example code I provided above, the issue you mention is avoided because the observation buffer already starts out at the correct size, but at the start of each episode the buffer is either mostly full of zeros or set to the uniform value of the first encountered value until enough steps have elapsed to overwrite each element in the buffer. So long as the output is always of fixed size (either [num_envs, obs_length*H] or [num_envs, H, obs_length]) then the manager should have no trouble picking it up during the init. Is this what you're referring to or have I misunderstood?

@diracdelta7
Copy link

Thank you for the clarification; I understand now where my approach went wrong.

In my initial approach, I appended a modifier after the func member. This modifier collects historical data into its own buffer, then outputs this buffer as a new observation to other modifiers or noise handlers, and finally as the observation itself. Here’s the code I used:

class HistoryBuffer(ModifierBase):
    def __init__(self, cfg: modifier_cfg.HistoryBufferCfg, data_dim: tuple[int, ...], device: str) -> None:
        super().__init__(cfg, data_dim, device)
        self.history_buffer = torch.zeros((*data_dim[:-1], data_dim[-1] * self._cfg.history_length), device=self._device)

    def reset(self, env_ids: Sequence[int] | None = None):
        if env_ids is None:
            env_ids = slice(None)
        self.history_buffer[env_ids] = 0.0

    def __call__(self, data: torch.Tensor) -> torch.Tensor:
        D = self._data_dim[-1]
        self.history_buffer = self.history_buffer.roll(shifts=-D, dims=-1)
        self.history_buffer[..., -D:] = data
        return self.history_buffer

Then, I set up an observation term like this:

joint_vel_history = ObsTerm(
    # This mdp.joint_vel_rel is from IsaacLab,
    # and ObservationManager uses it to determine the observation dimension.
    # However, this creates issues since the modifier changes the dimension.
    func=mdp.joint_vel_rel, 
    modifiers=[modifier_cfg.HistoryBufferCfg(history_length=2)],
    ...
)

Correct Approach Based on Your Code

The issue here is that ObservationManager calculates the observation dimension based on the original mdp.joint_vel_rel function, which doesn’t account for the modified dimensions introduced by HistoryBuffer.

The correct approach is to modify the function itself to handle historical data, avoiding the need for an additional modifier:

class HistoryObsWrapper(ManagerTermBase):
    # Here, should ManagerBasedRLEnvCfg be ManagerBasedEnv?
    # As in ManagerTermBase, it defines:
    # def __init__(self, cfg: ManagerTermBaseCfg, env: ManagerBasedEnv):
    def __init__(self, env: ManagerBasedRLEnvCfg, cfg: ObservationTermCfg) -> None:
        super().__init__(cfg, env)
        self.func = self.cfg.params["func"]  # For this, we can use mdp.joint_vel_rel from IsaacLab
    # Rest of your code

joint_vel_history = ObsTerm(
    # Use the HistoryObsWrapper as the `func` member to handle historical data,
    # allowing ObservationManager to determine the correct dimensions.
    func=HistoryObsWrapper,
    ...
)

Is this understanding correct?

@jtigue-bdai
Copy link
Collaborator

Hey sorry I have been silent on here. Been preoccupied.

So Modifiers does not allow you to change the output size of the observation so a Modifier cannot (in its current formulation) be used to return history. It can be used to store history for use in delays, filters, etc.

In order to consistently handle history there are a few things to consider:

  1. Utilize a History Observation Callable class implementation. (@diracdelta7 you are on the right track but see is_terminated_term in envs/mdp/rewards.py)
  2. Add history in the ObservationManagerpotentially separated by ObservationGroup for managing history across many observations.
  3. Observations run at the control/decimation rate so history can only be recorded at that rate. If you need a higher rate of history this can be done in the sensors (see contact_sensor)

@KyleM73
Copy link
Contributor Author

KyleM73 commented Nov 4, 2024

No worries, thanks for looking at it James!

As far as a PR would go, I think for sensor-based histories nothing needs to change, people can implement custom sensors to track histories if needed at greater than the policy rate. Similarly, for tracking at the observation term level the wrapper class I provided above (with some clean up perhaps) could be added as an observation to the isaac.lab.envs.mdp for people to wrap their obs in. The main new code needed would be for group-level obs histories, which would need to be handled by the manager, like you mentioned. I can start work on that soon. I would additionally be interested in writing documentation on how to use all three.

For the next 3 weeks I'm somewhat out of capacity due to a paper deadline, but post-thanksgiving this is a high priority for me, if you'd be willing to advise it. After thanksgiving I'll make a draft PR to work on this from. Does that all sound reasonable?

Thanks again!

@jtigue-bdai
Copy link
Collaborator

@KyleM73 that sounds great, happy to advise and help get things going. Let me know if you need anything and good luck with the paper deadline.

@jtigue-bdai jtigue-bdai self-assigned this Nov 4, 2024
@amrmousa144
Copy link
Contributor

Any update on this topic?

@jtigue-bdai
Copy link
Collaborator

Hey @KyleM73 a user here at the Institute has implemented a Observation History functionality in parallel to what we have discussed. Its a bit of a blend to of the observation term wrapper class and the observation manager level implementations. In an effort to reduce duplication I think it would be good to move it up to a PR on the open source repo. Then we can make sure it satisfies all use cases.

@KyleM73
Copy link
Contributor Author

KyleM73 commented Nov 18, 2024

Totally- I know one of the other HALO interns had a partial implementation as well, do you want to send the PR? For any missing functionality I'm happy to add/write documentation/etc., but no need to wait for me if there's already a working implementation. As long as both use cases (per-term and per-group) are supported, I'm happy. Thanks!

@jtigue-bdai
Copy link
Collaborator

@KyleM73 i have linked the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants