-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Observation Histories #1208
Comments
Hello @KyleM73 To my knowledge our sensor base class is designed to support recording history https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/sensor_base.py checkout the contact sensor, for example IsaacLab/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/contact_sensor/contact_sensor.py Line 291 in cb9fee6
Does this meet your needs? If not, can it be built on the current sensor history API? |
For observations that are just reading sensors that would satisfy the per-term history, but not the per-group history. In addition, not all observations come from reading sensor data, for example last action (currently only supports a single step) or generated commands. Maybe this is a larger question of "where should histories be stored", but unless the design decision is to implement a corresponding sensor for each observation type, it seems to me like there should be a way to track observation histories more generally i.e. separate from the sensor history API. If the desired solution is to always do it at a per-term level then I can PR a cleaned-up version of the code I provided above to the core isaac.lab MDP. But I do think there is value in supporting histories at a per-group level. I would love to hear feedback either way, for per-term and/or per-group histories. |
One additional consideration for dealing with this at the manager level as opposed to the observation term level is when adding noise for DR. Using the code I provided above, independent and DIFFERENT noise will be added to the history at each time step by the manager, but this is not realistic, as the noise added at time step k should be the same noise observed for the previous time step when the current step is k+1. This doesn't make a huge difference in practice as the policy learns to be robust to the noise anyways, but it is wrong. |
Hi @KyleM73, I agree. This is a feature we should have. There are many different ways you can do this (depending on the end goal):
@jtigue-bdai will be able to provide better guidance here |
Thrilled to work with @jtigue-bdai again :) Some additional thoughts: The best case scenario allows for both obs terms and obs groups to be configured. For example, maybe a history of joint velocities is desired (eg to estimate acceleration if it isn't available on a given robot), but only the most recent command. In that case, the implementation would need to function at the per-term level. Alternatively, for methods like RMA or concurrent estimation, the policy and estimator, respectively, receive obs group histories, and doing so would be more efficient than handling it at the per-term level. I personally think both per-term and per-group should be supported. We could in theory handle this as an env wrapper with a config specifying how many time steps to record for each obs term and group separately. This would be the most efficient but also feels somewhat hacky and not in line with the design of the rest of the manager-based framework, although I'm happy to implement it however. Looking forward to thoughts! |
When using the modifier to record observation history, In ObservationManager, the following code attempts to call the observation function once to establish the dimensions for each term: # call function the first time to fill up dimensions
obs_dims = tuple(term_cfg.func(self._env, **term_cfg.params).shape)
self._group_obs_term_dim[group_name].append(obs_dims[1:]) This approach works when observations maintain fixed dimensions throughout the processing. However, the modifier changes the dimensions by appending additional historical data, which Would this discrepancy pose any issues during training or evaluation? From my experience, it appears to only produce incorrect [INFO] Observation Manager dimension messages when starting training with rsl_rl. However, the neural network dimensions seem correct, and training proceeds without any errors. |
When you say the history tracking modifier, what are you referring to? I do not believe there is currently a modifier for tracking history, although we could use that interface to do so. In the example code I provided above, the issue you mention is avoided because the observation buffer already starts out at the correct size, but at the start of each episode the buffer is either mostly full of zeros or set to the uniform value of the first encountered value until enough steps have elapsed to overwrite each element in the buffer. So long as the output is always of fixed size (either [num_envs, obs_length*H] or [num_envs, H, obs_length]) then the manager should have no trouble picking it up during the init. Is this what you're referring to or have I misunderstood? |
Thank you for the clarification; I understand now where my approach went wrong. In my initial approach, I appended a modifier after the class HistoryBuffer(ModifierBase):
def __init__(self, cfg: modifier_cfg.HistoryBufferCfg, data_dim: tuple[int, ...], device: str) -> None:
super().__init__(cfg, data_dim, device)
self.history_buffer = torch.zeros((*data_dim[:-1], data_dim[-1] * self._cfg.history_length), device=self._device)
def reset(self, env_ids: Sequence[int] | None = None):
if env_ids is None:
env_ids = slice(None)
self.history_buffer[env_ids] = 0.0
def __call__(self, data: torch.Tensor) -> torch.Tensor:
D = self._data_dim[-1]
self.history_buffer = self.history_buffer.roll(shifts=-D, dims=-1)
self.history_buffer[..., -D:] = data
return self.history_buffer Then, I set up an observation term like this: joint_vel_history = ObsTerm(
# This mdp.joint_vel_rel is from IsaacLab,
# and ObservationManager uses it to determine the observation dimension.
# However, this creates issues since the modifier changes the dimension.
func=mdp.joint_vel_rel,
modifiers=[modifier_cfg.HistoryBufferCfg(history_length=2)],
...
) Correct Approach Based on Your CodeThe issue here is that The correct approach is to modify the function itself to handle historical data, avoiding the need for an additional modifier: class HistoryObsWrapper(ManagerTermBase):
# Here, should ManagerBasedRLEnvCfg be ManagerBasedEnv?
# As in ManagerTermBase, it defines:
# def __init__(self, cfg: ManagerTermBaseCfg, env: ManagerBasedEnv):
def __init__(self, env: ManagerBasedRLEnvCfg, cfg: ObservationTermCfg) -> None:
super().__init__(cfg, env)
self.func = self.cfg.params["func"] # For this, we can use mdp.joint_vel_rel from IsaacLab
# Rest of your code
joint_vel_history = ObsTerm(
# Use the HistoryObsWrapper as the `func` member to handle historical data,
# allowing ObservationManager to determine the correct dimensions.
func=HistoryObsWrapper,
...
) Is this understanding correct? |
Hey sorry I have been silent on here. Been preoccupied. So Modifiers does not allow you to change the output size of the observation so a Modifier cannot (in its current formulation) be used to return history. It can be used to store history for use in delays, filters, etc. In order to consistently handle history there are a few things to consider:
|
No worries, thanks for looking at it James! As far as a PR would go, I think for sensor-based histories nothing needs to change, people can implement custom sensors to track histories if needed at greater than the policy rate. Similarly, for tracking at the observation term level the wrapper class I provided above (with some clean up perhaps) could be added as an observation to the isaac.lab.envs.mdp for people to wrap their obs in. The main new code needed would be for group-level obs histories, which would need to be handled by the manager, like you mentioned. I can start work on that soon. I would additionally be interested in writing documentation on how to use all three. For the next 3 weeks I'm somewhat out of capacity due to a paper deadline, but post-thanksgiving this is a high priority for me, if you'd be willing to advise it. After thanksgiving I'll make a draft PR to work on this from. Does that all sound reasonable? Thanks again! |
@KyleM73 that sounds great, happy to advise and help get things going. Let me know if you need anything and good luck with the paper deadline. |
Any update on this topic? |
Hey @KyleM73 a user here at the Institute has implemented a Observation History functionality in parallel to what we have discussed. Its a bit of a blend to of the observation term wrapper class and the observation manager level implementations. In an effort to reduce duplication I think it would be good to move it up to a PR on the open source repo. Then we can make sure it satisfies all use cases. |
Totally- I know one of the other HALO interns had a partial implementation as well, do you want to send the PR? For any missing functionality I'm happy to add/write documentation/etc., but no need to wait for me if there's already a working implementation. As long as both use cases (per-term and per-group) are supported, I'm happy. Thanks! |
@KyleM73 i have linked the PR |
Proposal
Many methods use observation histories (e.g. RMA/Student-Teacher Methods, Async ActorCritic, etc.) but histories are not currently supported by default. It is easy to make a wrapper class observation that stores histories on a per-observation term level, but there doesn't appear to be a way to store histories at the observation group level. Sometimes, e.g. for images it may be desirable to stack the history in terms of the individual observation term, but other times it may be more desirable to stack the history at the group level (which would require a change to the manager). This was something we had an implementation for at BDAI. Is this something that is currently already being worked on? If not I would be happy to take it on, but wasn't going to start working on it for a PR if someone else is already doing so.
Alternatives
Checklist
Acceptance Criteria
Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.
The text was updated successfully, but these errors were encountered: