[Proposal] Observation Histories #1208

KyleM73 · 2024-10-10T19:26:39Z

Proposal

Many methods use observation histories (e.g. RMA/Student-Teacher Methods, Async ActorCritic, etc.) but histories are not currently supported by default. It is easy to make a wrapper class observation that stores histories on a per-observation term level, but there doesn't appear to be a way to store histories at the observation group level. Sometimes, e.g. for images it may be desirable to stack the history in terms of the individual observation term, but other times it may be more desirable to stack the history at the group level (which would require a change to the manager). This was something we had an implementation for at BDAI. Is this something that is currently already being worked on? If not I would be happy to take it on, but wasn't going to start working on it for a PR if someone else is already doing so.

Alternatives

Using the Observation Term class functionality to implement histories on a per-term basis, eg this minimal implementation. This stacks obs terms A B C as AAABBBCCC for history=3.

class HistoryObsWrapper(ManagerTermBase):
    def __init__(self, env: ManagerBasedRLEnvCfg, cfg: ObservationTermCfg) -> None:
        super().__init__(cfg, env)
        self.func = self.cfg.params["func"]
        if isinstance(self.func, ManagerTermBase):
            self.func = self.func(env, cfg)
        self.obs_len = self.func(env, **self.cfg.params["func_params"]).size(-1)   # TODO generalize this to multiD obs
        self.history_len = self.cfg.params["history_len"]
        self.data = torch.zeros(self.num_envs, self.history_len, self.obs_len, device=self.device)
        self.scale = self.cfg.params["scale"]

    def reset(self, env_ids: torch.Tensor = None) -> None:
        if isinstance(self.func, ManagerTermBase):
            self.func.reset(env_ids)
        self.data[env_ids] = self.func(self._env, **self.cfg.params["func_params"])[env_ids].view(-1, 1, self.obs_len).expand(-1, self.history_len, -1)

    def __call__(self, env, func, history_len, func_params, scale) -> torch.Tensor:
        self.data = self.data.roll(1, dims=1)
        self.data[:, 0] = self.func(self._env, **self.cfg.params["func_params"])
        return self.scale * self.data.clone().flatten(start_dim=1)

Using the Observation Group maybe with a post_init flag to stack whole observations as a history. This stacks obs terms A B C as ABCABCABC for history=3.
Histories could also be handled at the RL framework/policy level. It is my belief this would be better handled by Isaac Lab directly, but it is an option to just require frameworks to deal with it themselves, e.g. the storage buffer in rsl_rl.

Checklist

I have checked that there is no similar issue in the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

Option to stack histories at a per-term level
Option to stack histories at a per-group level

The text was updated successfully, but these errors were encountered:

mpgussert · 2024-10-10T20:22:42Z

Hello @KyleM73

To my knowledge our sensor base class is designed to support recording history https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/sensor_base.py

checkout the contact sensor, for example

IsaacLab/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/contact_sensor/contact_sensor.py

Line 291 in cb9fee6

if self.cfg.history_length > 0:

Does this meet your needs? If not, can it be built on the current sensor history API?

KyleM73 · 2024-10-10T20:36:01Z

For observations that are just reading sensors that would satisfy the per-term history, but not the per-group history. In addition, not all observations come from reading sensor data, for example last action (currently only supports a single step) or generated commands. Maybe this is a larger question of "where should histories be stored", but unless the design decision is to implement a corresponding sensor for each observation type, it seems to me like there should be a way to track observation histories more generally i.e. separate from the sensor history API. If the desired solution is to always do it at a per-term level then I can PR a cleaned-up version of the code I provided above to the core isaac.lab MDP. But I do think there is value in supporting histories at a per-group level. I would love to hear feedback either way, for per-term and/or per-group histories.

KyleM73 · 2024-10-10T20:55:48Z

One additional consideration for dealing with this at the manager level as opposed to the observation term level is when adding noise for DR. Using the code I provided above, independent and DIFFERENT noise will be added to the history at each time step by the manager, but this is not realistic, as the noise added at time step k should be the same noise observed for the previous time step when the current step is k+1. This doesn't make a huge difference in practice as the policy learns to be robust to the noise anyways, but it is wrong.

Mayankm96 · 2024-10-11T01:26:54Z

Hi @KyleM73,

I agree. This is a feature we should have. There are many different ways you can do this (depending on the end goal):

Making an environment wrapper that does stacking of observations, similar to FrameStack in Gymnasium API
Adding history storage as a modifier functionality. This records the history per term (which may not be the most efficient but will be the most generic). I noticed right now that the modifiers act "before" any clipping/noisifying. However, we can change this ordering since doing it afterward makes more sense.

@jtigue-bdai will be able to provide better guidance here

KyleM73 · 2024-10-11T03:14:28Z

Thrilled to work with @jtigue-bdai again :)

Some additional thoughts: The best case scenario allows for both obs terms and obs groups to be configured. For example, maybe a history of joint velocities is desired (eg to estimate acceleration if it isn't available on a given robot), but only the most recent command. In that case, the implementation would need to function at the per-term level. Alternatively, for methods like RMA or concurrent estimation, the policy and estimator, respectively, receive obs group histories, and doing so would be more efficient than handling it at the per-term level. I personally think both per-term and per-group should be supported. We could in theory handle this as an env wrapper with a config specifying how many time steps to record for each obs term and group separately. This would be the most efficient but also feels somewhat hacky and not in line with the design of the rest of the manager-based framework, although I'm happy to implement it however. Looking forward to thoughts!

diracdelta7 · 2024-11-02T14:28:27Z

Hi @KyleM73,

I agree. This is a feature we should have. There are many different ways you can do this (depending on the end goal):

Making an environment wrapper that does stacking of observations, similar to FrameStack in Gymnasium API

Adding history storage as a modifier functionality. This records the history per term (which may not be the most efficient but will be the most generic). I noticed right now that the modifiers act "before" any clipping/noisifying. However, we can change this ordering since doing it afterward makes more sense.

@jtigue-bdai will be able to provide better guidance here

When using the modifier to record observation history, ObservationManager encounters difficulty in determining the correct dimensions for the associated observation term.

In ObservationManager, the following code attempts to call the observation function once to establish the dimensions for each term:

 # call function the first time to fill up dimensions
obs_dims = tuple(term_cfg.func(self._env, **term_cfg.params).shape)
self._group_obs_term_dim[group_name].append(obs_dims[1:])

This approach works when observations maintain fixed dimensions throughout the processing. However, the modifier changes the dimensions by appending additional historical data, which ObservationManager does not know for now. This results in a dimension mismatch.

Would this discrepancy pose any issues during training or evaluation? From my experience, it appears to only produce incorrect [INFO] Observation Manager dimension messages when starting training with rsl_rl. However, the neural network dimensions seem correct, and training proceeds without any errors.

KyleM73 · 2024-11-02T16:38:59Z

When you say the history tracking modifier, what are you referring to? I do not believe there is currently a modifier for tracking history, although we could use that interface to do so. In the example code I provided above, the issue you mention is avoided because the observation buffer already starts out at the correct size, but at the start of each episode the buffer is either mostly full of zeros or set to the uniform value of the first encountered value until enough steps have elapsed to overwrite each element in the buffer. So long as the output is always of fixed size (either [num_envs, obs_length*H] or [num_envs, H, obs_length]) then the manager should have no trouble picking it up during the init. Is this what you're referring to or have I misunderstood?

diracdelta7 · 2024-11-03T06:26:00Z

Thank you for the clarification; I understand now where my approach went wrong.

In my initial approach, I appended a modifier after the func member. This modifier collects historical data into its own buffer, then outputs this buffer as a new observation to other modifiers or noise handlers, and finally as the observation itself. Here’s the code I used:

class HistoryBuffer(ModifierBase):
    def __init__(self, cfg: modifier_cfg.HistoryBufferCfg, data_dim: tuple[int, ...], device: str) -> None:
        super().__init__(cfg, data_dim, device)
        self.history_buffer = torch.zeros((*data_dim[:-1], data_dim[-1] * self._cfg.history_length), device=self._device)

    def reset(self, env_ids: Sequence[int] | None = None):
        if env_ids is None:
            env_ids = slice(None)
        self.history_buffer[env_ids] = 0.0

    def __call__(self, data: torch.Tensor) -> torch.Tensor:
        D = self._data_dim[-1]
        self.history_buffer = self.history_buffer.roll(shifts=-D, dims=-1)
        self.history_buffer[..., -D:] = data
        return self.history_buffer

Then, I set up an observation term like this:

joint_vel_history = ObsTerm(
    # This mdp.joint_vel_rel is from IsaacLab,
    # and ObservationManager uses it to determine the observation dimension.
    # However, this creates issues since the modifier changes the dimension.
    func=mdp.joint_vel_rel, 
    modifiers=[modifier_cfg.HistoryBufferCfg(history_length=2)],
    ...
)

Correct Approach Based on Your Code

The issue here is that ObservationManager calculates the observation dimension based on the original mdp.joint_vel_rel function, which doesn’t account for the modified dimensions introduced by HistoryBuffer.

The correct approach is to modify the function itself to handle historical data, avoiding the need for an additional modifier:

class HistoryObsWrapper(ManagerTermBase):
    # Here, should ManagerBasedRLEnvCfg be ManagerBasedEnv?
    # As in ManagerTermBase, it defines:
    # def __init__(self, cfg: ManagerTermBaseCfg, env: ManagerBasedEnv):
    def __init__(self, env: ManagerBasedRLEnvCfg, cfg: ObservationTermCfg) -> None:
        super().__init__(cfg, env)
        self.func = self.cfg.params["func"]  # For this, we can use mdp.joint_vel_rel from IsaacLab
    # Rest of your code

joint_vel_history = ObsTerm(
    # Use the HistoryObsWrapper as the `func` member to handle historical data,
    # allowing ObservationManager to determine the correct dimensions.
    func=HistoryObsWrapper,
    ...
)

Is this understanding correct?

jtigue-bdai · 2024-11-04T14:33:32Z

Hey sorry I have been silent on here. Been preoccupied.

So Modifiers does not allow you to change the output size of the observation so a Modifier cannot (in its current formulation) be used to return history. It can be used to store history for use in delays, filters, etc.

In order to consistently handle history there are a few things to consider:

Utilize a History Observation Callable class implementation. (@diracdelta7 you are on the right track but see is_terminated_term in envs/mdp/rewards.py)
Add history in the ObservationManagerpotentially separated by ObservationGroup for managing history across many observations.
Observations run at the control/decimation rate so history can only be recorded at that rate. If you need a higher rate of history this can be done in the sensors (see contact_sensor)

KyleM73 · 2024-11-04T18:40:49Z

No worries, thanks for looking at it James!

As far as a PR would go, I think for sensor-based histories nothing needs to change, people can implement custom sensors to track histories if needed at greater than the policy rate. Similarly, for tracking at the observation term level the wrapper class I provided above (with some clean up perhaps) could be added as an observation to the isaac.lab.envs.mdp for people to wrap their obs in. The main new code needed would be for group-level obs histories, which would need to be handled by the manager, like you mentioned. I can start work on that soon. I would additionally be interested in writing documentation on how to use all three.

For the next 3 weeks I'm somewhat out of capacity due to a paper deadline, but post-thanksgiving this is a high priority for me, if you'd be willing to advise it. After thanksgiving I'll make a draft PR to work on this from. Does that all sound reasonable?

Thanks again!

jtigue-bdai · 2024-11-04T19:08:16Z

@KyleM73 that sounds great, happy to advise and help get things going. Let me know if you need anything and good luck with the paper deadline.

amrmousa144 · 2024-11-18T19:26:05Z

Any update on this topic?

jtigue-bdai · 2024-11-18T20:58:20Z

Hey @KyleM73 a user here at the Institute has implemented a Observation History functionality in parallel to what we have discussed. Its a bit of a blend to of the observation term wrapper class and the observation manager level implementations. In an effort to reduce duplication I think it would be good to move it up to a PR on the open source repo. Then we can make sure it satisfies all use cases.

KyleM73 · 2024-11-18T21:36:27Z

Totally- I know one of the other HALO interns had a partial implementation as well, do you want to send the PR? For any missing functionality I'm happy to add/write documentation/etc., but no need to wait for me if there's already a working implementation. As long as both use cases (per-term and per-group) are supported, I'm happy. Thanks!

jtigue-bdai · 2024-11-19T19:20:51Z

@KyleM73 i have linked the PR

Mayankm96 added the enhancement New feature or request label Oct 11, 2024

jtigue-bdai self-assigned this Nov 4, 2024

jtigue-bdai mentioned this issue Nov 19, 2024

Adds observation term history support to Observation Manager #1439

Merged

6 tasks

kellyguo11 closed this as completed in #1439 Dec 16, 2024

kellyguo11 closed this as completed in f7b59b3 Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Observation Histories #1208

[Proposal] Observation Histories #1208

KyleM73 commented Oct 10, 2024 •

edited

Loading

mpgussert commented Oct 10, 2024 •

edited

Loading

KyleM73 commented Oct 10, 2024 •

edited

Loading

KyleM73 commented Oct 10, 2024 •

edited

Loading

Mayankm96 commented Oct 11, 2024

KyleM73 commented Oct 11, 2024

diracdelta7 commented Nov 2, 2024

KyleM73 commented Nov 2, 2024

diracdelta7 commented Nov 3, 2024

jtigue-bdai commented Nov 4, 2024

KyleM73 commented Nov 4, 2024

jtigue-bdai commented Nov 4, 2024

amrmousa144 commented Nov 18, 2024

jtigue-bdai commented Nov 18, 2024

KyleM73 commented Nov 18, 2024

jtigue-bdai commented Nov 19, 2024

[Proposal] Observation Histories #1208

[Proposal] Observation Histories #1208

Comments

KyleM73 commented Oct 10, 2024 • edited Loading

Proposal

Alternatives

Checklist

Acceptance Criteria

mpgussert commented Oct 10, 2024 • edited Loading

KyleM73 commented Oct 10, 2024 • edited Loading

KyleM73 commented Oct 10, 2024 • edited Loading

Mayankm96 commented Oct 11, 2024

KyleM73 commented Oct 11, 2024

diracdelta7 commented Nov 2, 2024

KyleM73 commented Nov 2, 2024

diracdelta7 commented Nov 3, 2024

Correct Approach Based on Your Code

jtigue-bdai commented Nov 4, 2024

KyleM73 commented Nov 4, 2024

jtigue-bdai commented Nov 4, 2024

amrmousa144 commented Nov 18, 2024

jtigue-bdai commented Nov 18, 2024

KyleM73 commented Nov 18, 2024

jtigue-bdai commented Nov 19, 2024

KyleM73 commented Oct 10, 2024 •

edited

Loading

mpgussert commented Oct 10, 2024 •

edited

Loading

KyleM73 commented Oct 10, 2024 •

edited

Loading

KyleM73 commented Oct 10, 2024 •

edited

Loading