NVIDIA NeMo Retriever
NVIDIA NeMo™ Retriever is a collection of microservices for building multimodal extraction, reranking, and embedding pipelines with high accuracy and maximum data privacy. It delivers quick, context-aware responses for AI applications like advanced retrieval-augmented generation (RAG) and agentic AI workflows. As part of the NVIDIA NeMo platform and built with NVIDIA NIM™, NeMo Retriever allows developers to flexibly leverage these microservices to connect AI applications to large enterprise datasets wherever they reside and fine-tune them to align with specific use cases.
Announced at NVIDIA GTC 2025, new NeMo Retriever microservices provide 15x faster multimodal PDF extraction and 50% fewer incorrect answers when compared to open-source alternatives.
Documentation
Build world-class information retrieval pipelines and AI query engines with scalable data extraction and high-accuracy embedding and reranking.
Ingestion
Rapidly ingest massive volumes of data and extract text, graphs, charts, and tables at the same time for highly accurate retrieval.
Embedding
Boost text question-and-answer retrieval performance, providing high-quality embeddings for many downstream natural language processing (NLP) tasks.
Reranking
Enhance retrieval performance further with a fine-tuned reranking model, finding the most relevant passages to provide as context when querying a large language model (LLM).
How NVIDIA NeMo Retriever Works
NeMo Retriever provides components for building data extraction and information retrieval pipelines. The pipeline extracts structured and unstructured data (ex. text, charts, tables), converts it to text, and filters out duplicates. A NeMo Retriever embedding NIM converts the chunks into embeddings and stores them in a vector database, accelerated by NVIDIA cuVS, for enhanced performance and speed of indexing and search.
NeMo Retriever parse, a VLM-based OCR microservice for text and table extraction, preserves semantic structure, transcribes document images into text in reading order, classifies content types, and outputs structured markdown to retain spatial layout and formatting.
When a query is submitted, the system retrieves relevant information using vector similarity search, and then a NeMo Retriever reranking NIM reranks the results for accuracy. With the most pertinent information, an LLM NIM generates a response that’s informed, accurate, and contextually relevant. You can use various LLM NIM microservices from the NVIDIA API catalog to enable additional capabilities, such as synthetic data generation.

Introductory Resources
Learn more about building efficient information-retrieval pipelines with NeMo Retriever.
Introductory Blog
Understand the function of embedding and reranking models in information retrieval pipelines, top considerations, and more.
Read BlogIntroductory Webinar
Improve the accuracy and scalability of text retrieval for production-ready generative AI pipelines and deploy at scale.
Watch NowAI Blueprint for RAG
Learn best practices for connecting AI apps to enterprise data using industry-leading embedding and reranking models.
Blueprint
Introductory GTC Session
Learn about the latest models, tools, and techniques for creating agentic and RAG pipelines for multimodal data ingestion, extraction, and retrieval.
Watch SessionWorld-Class Information-Retrieval Performance
NeMo Retriever microservices accelerate multimodal document extraction and real-time retrieval with lower RAG costs and higher accuracy. They support reliable, multilingual, and cross-lingual retrieval, and optimize storage, performance, and adaptability for data platforms – enabling efficient vector database expansion.
50% Fewer Incorrect Answers
NeMo Retriever Multimodal Extraction Recall@5 Accuracy
compared with NeMo Retriever Off: open-source alternative: HW - 1xH100
3X Higher Embedding Throughput
NeMo Retriever Llama 3.2 Multilingual Text Embedding
15X Higher Multimodal Data Extraction Throughput
NeMo Retriever Extraction NIM Microservices
35x Improved Data Storage Efficiency
Multilingual, Long-Context, Text Embedding NIM Microservice
Ways to Get Started With NVIDIA NeMo Retriever
Use the right tools and technologies to build and deploy generative AI applications that require secure and accurate information retrieval to generate real-time business insights for organizations across every industry.
Try
Experience NeMo Retriever NIM microservices through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA’s API catalog and deployed anywhere.

Build
Jump-start building your AI solutions with NVIDIA Blueprints, customizable reference applications, available on the NVIDIA API catalog.
Deploy
Get a free license to try
NVIDIA AI Enterprise in production for 90 days using your existing infrastructure.
License
Starter Kits
Start building information retrieval pipelines and generative AI applications for multimodal data ingestion, embedding, reranking, retrieval-augmented generation, and agentic workflows by accessing NVIDIA Blueprints, tutorials, notebooks, blogs, forums, reference code, comprehensive documentation, and more.
AI Assistant for Customer Service
Build enhanced AI assistants that are more personalized and secure by leveraging RAG, NeMo Retriever, NIM, and the latest AI agent-building methodologies.
Retail Shopping Assistant
Develop a multimodal RAG application powered by LLMs that enables more personalized shopping experiences.
Digital Human for Customer Service
Bring applications to life with an AI-powered digital avatar that can transform customer service experiences.
Visual Agent for Video Search and Summarization
Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A.
NVIDIA NeMo Retriever Learning Library
More Resources
Ethical AI
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.