Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How can I check the range of action values in PPO with rsl-rl? #1486

Open
H-Hisamichi opened this issue Dec 1, 2024 · 2 comments
Labels
question Further information is requested

Comments

@H-Hisamichi
Copy link

Hello everyone,

I am working on training a control policy for a hexapod using PPO.
My robot has joints with very different ranges of motion, so I am trying to remap the actions from the policy.
However, it seems that the action values output by the PPO policy in rsl-rl are larger than the range [-1, 1].

Where can I check the range of the action values from the policy?

Thank you!

@H-Hisamichi H-Hisamichi changed the title [Question] What is the range of action values in RSL-RL PPO algorithm? [Question] How can I check the range of action values in PPO with rsl-rl? Dec 1, 2024
@RandomOakForest
Copy link
Collaborator

It's seems a normalization step may be missing. Could you share how are you setting up your rsl-rl wrapper? Thanks

@RandomOakForest RandomOakForest added the question Further information is requested label Dec 6, 2024
@H-Hisamichi
Copy link
Author

Hello @RandomOakForest,

I'm using the direct workflow ANYmal C demo code rsl_rl_ppo_cfg.py as is.

# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

from omni.isaac.lab.utils import configclass

from omni.isaac.lab_tasks.utils.wrappers.rsl_rl import (
    RslRlOnPolicyRunnerCfg,
    RslRlPpoActorCriticCfg,
    RslRlPpoAlgorithmCfg,
)


@configclass
#class AnymalCFlatPPORunnerCfg(RslRlOnPolicyRunnerCfg):
class AT3RFlatPPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 500
    save_interval = 50
    experiment_name = "AT3R_flat_direct"
    empirical_normalization = False
    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        actor_hidden_dims=[128, 128, 128],
        critic_hidden_dims=[128, 128, 128],
        activation="elu",
    )
    algorithm = RslRlPpoAlgorithmCfg(
        value_loss_coef=1.0,
        use_clipped_value_loss=True,
        clip_param=0.2,
        entropy_coef=0.005,
        num_learning_epochs=5,
        num_mini_batches=4,
        learning_rate=1.0e-3,
        schedule="adaptive",
        gamma=0.99,
        lam=0.95,
        desired_kl=0.01,
        max_grad_norm=1.0,
    )


@configclass
#class AnymalCRoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
class AT3RRoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 10000
    save_interval = 50
    experiment_name = "AT3R_rough_direct" # def: anymal_c_rough_direct
    empirical_normalization = False
    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )
    algorithm = RslRlPpoAlgorithmCfg(
        value_loss_coef=1.0,
        use_clipped_value_loss=True,
        clip_param=0.2,
        entropy_coef=0.005,
        num_learning_epochs=5,
        num_mini_batches=4,
        learning_rate=1.0e-3,
        schedule="adaptive",
        gamma=0.99,
        lam=0.95,
        desired_kl=0.01,
        max_grad_norm=1.0,
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants