NVIDIA Intro to Robotics Course
Key areas of application
1. Robot Policy Learning – diffusion policies, language conditioned imitation
learning, reinforcement learning improve data efficiency and contextual
understanding. Robots learn optimal actions through language driven
rewards.
2. Value Learning – Robots learn value functions to choose actions with
lowest cost instead of policies.
3. High Level Task Planning – Foundation models enable cognitive-level
planning for complex tasks. Humanoid robots can understand high-level
instructions like picking up a specific box by leveraging these models
4. LLM Based code generation – LLMs generate code for robot training and
execution. This is shown by GenSim which can create task simulations for
various applications.
Advanced Techniques
1. Open Vocabulary Object Detection and 3D Classification: Models like Dino
V2 enhance encoding capabilities for diverse objects without predefined
categories, crucial for dynamic environments.
2. Semantic Segmentation: Open-Vocab segmentation allows robots to
recognize and interact with previously unseen objects, expanding the
operational scope.
3. 3D Representations and Affordance: Foundation models offer open-vocab
3D representations, enabling robots to infer object affordances-deciding
how to interact with objects based on their properties.
PHYSICAL AI AND SIMULATION
Physical /embodied AI systems like Voyager and MoneDojo simulate tasks to train
robots in both virtual and real world settings. They use foundational models to
improve execution and adaptability.
Foundational models = more intelligent and versatile systems.
NVIDIA COSMOS
Foundational model designed to accelerate development of physical AI systems,
including robots and autonomous vehicles. Cosmos is a family of world
foundation models built for generating physics aware videos and world states.
WFMs are neural networks that can predice and generate physics aware virtual
developments. They are trained on vast amounts of data. Cosmos WFMs offer
developers an efficient way to generate photoreal, physics-based synthetic data.
Reduces time and cost associated with training and evaluating physical AI
models.
MOBILITY FOUNDATION MODEL
MFM acts as a central brain for robotic systems processing robust vision states
and generating word modeling and action policies. Designed to work across
different robots including AMRs.
Key Features:
1. Sim based training – Model is trained using simulation data only
2. Zero-Shot Deployment – Deployed in Hubble lab without additional
training
3. Cross – Embodiment Capability – Policies developed for AMRs are
transferable to other robots such as humanoids, quadrupeds and space
forklifts.
VLM,LLM AND RAG
They transform robotics through incidental perception. Allows robots to build and
utilize memory to navigate and interact with the environment effectively.
Key Features:
1. Memory Building – Robots use models to remember events, locations, and
times, enabling them to return to specific places
2. Demo Example
3. Task Execution – Performs task like fetching items based on memory.
Locate chips in a kitchen by remembering where they were previously.
Technology Components
1. Video Language Models (VLA): Captures and captions video date every few
seconds and stores the info in a vector database for long – term memory.
2. Speech Processing: Utilizes Vsper for speech to text
3. Open Vocabulary Learning: Allows robot to understand and respond to new
and diverse queries without predefined categories
OPEN CHALLENGES IN ROBOTICS
1. Data Scarcity: Need this to train models
2. Real Time performance: Deploying foundation models in real time is
crucial
3. General Purpose Perception: Achieving a true general perception system
that can handle diverse tasks and environments is challenging
4. Unified Representation : Integrating multi-modal inputs into a cohesive
understanding is complex but essential for robust robot behavior.
5. Robustness and Safety – Use tools like OSMO to evaluate this
OPPORTUNITIES
On-Edge Deployment: Running models directly on devices enhances
performance and reduces cloud dependency, as demonstrated by NVIDIA’s
VLA demo.
Benchmark Development: Creating benchmarks helps compare
algorithms across scenarios, improving reproducibility and performance
assessment.
Human-Robot Interaction: Enhancing social navigation and
understanding human expectations improves collaboration between
humans and robots.
Course 2 INTRO TO AI BASED ROBOT DEVELOPMENT WITH ISAAC ROS
Nvidia Isaac AI robot dev platform consists of CUDA-accelerated libraries,
application frameworks, and AI models that accelerate development of AI robots
such as AMRs, manipulators, and humanoids.
ISAAC is part of 3 computer solution that includes NVIDIA DGX, NVIDIA OVX and
NVIDIA AGX.
DGX – systems for AI model training
OVX – simulations
AGX – edge de0ployment (Jetson AGX)
ISAAC ROS
Provides toolikts of packages that integrate hardware acceleration capabilities
with ROS2 framework.
They cover – Localization and Mapping – cuVSLAM
3D scene reconstruction – nvBlox
Trajectory Planning (cuMotion)
Pose estimation and tracking (FoundationPose)
Isaac ROS is compliant with ROS2. It is a reference implementation. They also
have workflows for manipulator, preceptor, and etc.
Isaac ROS uses Tensor RT, Libargus, etc.
NITROS
Nvidia Isaac Transport for ROS is the implementation of 2 hardware-acceleration
features introduces with ROS 2 Humble-type adaptation and type negotiation
Type adaptation enables ROS nodes to work in a data format optimized for
specific hardware accelerators. The adapted type is used by processing graphs to
eliminate memory copies between the CPU and the memory accelerator.
Through type negotiation, different ROS nodes in a processing graph can
advertise their supported types and the ROS framework can choose data
fromats, resulting in ideal performance.
Most middleware frameworks, including ROS, typically work with CPU-based
messages. This becomes problematic when working with hardware accelerators
like GPUs, VICs, or PVAs. Without NITROS, data processed on these accelerators
would need to be copied back to CPU memory before being sent to the next
node, causing inefficiencies.
NITROS addresses a significant limitation in traditional ROS communication. Let’s
talk about how.
How NITROS Works
NITROS introduces a more efficient way of handling data. It allows sending
handles between nodes instead of full data copies.
If the receiving node understands NITROS, it can use the GPU handle
directly, eliminating memory copies
For nodes that don’t support NITROS, the framework automatically
converts the message to a standard ROS format
This approach makes NITROS both optimal when possible and compatible when
necessary.
NITROS Technologies
NVIDIA offers three NITROS-related technologies: CUDA with NITROS, NITROS
Bridge, and PyNITROS.
CUDA with NITROS
Allows developers to write their own ROS nodes that can send and receive
GPU buffers
Compatible with other NITROS nodes developed by NVIDIA
NITROS Bridge
Designed for inter-process communication
Initially introduced to work with ROS 1 users who hadn’t migrated to ROS
2
Uses a form of CUDA IPC (Inter-Process Communication)
PyNITROS
Supports Python users
Allows sending raw PyTorch buffers to C++ nodes without memory copies
Benefits of NITROS
NITROS represents a significant improvement in ROS 2 performance for NVIDIA
hardware, allowing developers to leverage GPU acceleration more efficiently in
their robotics applications. By leveraging NITROS, you’ll be able to:
Reduce unnecessary memory copies between CPU and GPU
Improve GPU utilization by avoiding synchronization bottlenecks
Enable faster processing of large data structures like point clouds or 4K
images
Maintain compatibility with existing ROS tools and nodes
https://github.com/NVIDIA-ISAAC-ROS
https://nvidia-isaac-ros.github.io/