NVIDIA NeMo Retriever

NVIDIA NeMo™ Retriever is a collection of microservices for building multimodal extraction, reranking, and embedding pipelines with high accuracy and maximum data privacy. It delivers quick, context-aware responses for AI applications like advanced retrieval-augmented generation (RAG) and agentic AI workflows. As part of the NVIDIA NeMo platform and built with NVIDIA NIM™, NeMo Retriever allows developers to flexibly leverage these microservices to connect AI applications to large enterprise datasets wherever they reside and fine-tune them to align with specific use cases.

Announced at NVIDIA GTC 2025, new NeMo Retriever microservices provide 15x faster multimodal PDF extraction and 50% fewer incorrect answers when compared to open-source alternatives.

Try Now
Forum


Documentation

Build world-class information retrieval pipelines and AI query engines with scalable data extraction and high-accuracy embedding and reranking.

Ingestion

Rapidly ingest massive volumes of data and extract text, graphs, charts, and tables at the same time for highly accurate retrieval.

Embedding

Boost text question-and-answer retrieval performance, providing high-quality embeddings for many downstream natural language processing (NLP) tasks.

Reranking

Enhance retrieval performance further with a fine-tuned reranking model, finding the most relevant passages to provide as context when querying a large language model (LLM).


How NVIDIA NeMo Retriever Works

NeMo Retriever provides components for building data extraction and information retrieval pipelines. The pipeline extracts structured and unstructured data (ex. text, charts, tables), converts it to text, and filters out duplicates. A NeMo Retriever embedding NIM converts the chunks into embeddings and stores them in a vector database, accelerated by NVIDIA cuVS, for enhanced performance and speed of indexing and search.

NeMo Retriever parse, a VLM-based OCR microservice for text and table extraction, preserves semantic structure, transcribes document images into text in reading order, classifies content types, and outputs structured markdown to retain spatial layout and formatting.

When a query is submitted, the system retrieves relevant information using vector similarity search, and then a NeMo Retriever reranking NIM reranks the results for accuracy. With the most pertinent information, an LLM NIM generates a response that’s informed, accurate, and contextually relevant. You can use various LLM NIM microservices from the NVIDIA API catalog to enable additional capabilities, such as synthetic data generation.

A diagram showing how NVIDIA NeMo Retriever works from data ingestion to information retrieval.
NVIDIA NeMo Retriever collection of NIM microservices are used to build optimized ingestion and retrieval pipelines for highly accurate information retrieval at scale.

Introductory Resources

Learn more about building efficient information-retrieval pipelines with NeMo Retriever.

Introductory Blog

Understand the function of embedding and reranking models in information retrieval pipelines, top considerations, and more.

Read Blog

Introductory Webinar

Improve the accuracy and scalability of text retrieval for production-ready generative AI pipelines and deploy at scale.

Watch Now

AI Blueprint for RAG

Learn best practices for connecting AI apps to enterprise data using industry-leading embedding and reranking models.

Try the
Blueprint

Introductory GTC Session

Learn about the latest models, tools, and techniques for creating agentic and RAG pipelines for multimodal data ingestion, extraction, and retrieval.

Watch Session

World-Class Information-Retrieval Performance

NeMo Retriever microservices accelerate multimodal document extraction and real-time retrieval with lower RAG costs and higher accuracy. They support reliable, multilingual, and cross-lingual retrieval, and optimize storage, performance, and adaptability for data platforms – enabling efficient vector database expansion.

50% Fewer Incorrect Answers

NeMo Retriever Multimodal Extraction Recall@5 Accuracy

 A graph showing NeMo Retriever has achieved 2X throughput for fast info retrieval
Evaluated on publicly available dataset of PDFs consisting of text, charts, tables, and infographics. NeMo Retriever Extraction On: nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddle-ocr
compared with NeMo Retriever Off: open-source alternative: HW - 1xH100

3X Higher Embedding Throughput

NeMo Retriever Llama 3.2 Multilingual Text Embedding

 A graph showing NeMo Retriever has achieved high accuracy with 30% fewer incorrect answers
This test was conducted with the following requirements: 1xH100 SXM; passage token length: 512, batch size: 64, concurrent client requests: 5; OSS Alternative: FP16 compared to the NeMo Retriever lama-3.2-nv-embedqa-1b-v2, NIM: FP8

15X Higher Multimodal Data Extraction Throughput

NeMo Retriever Extraction NIM Microservices

A graph showing NeMo Retriever embedding model is a leader on the Massive Text Embedding Benchmark (MTEB) leaderboard
Pages per second, evaluated on publicly available dataset of PDFs consisting of text, charts, and tables, with NeMo Retriever extraction NIM microservices: nv-yolox-structured-image-v1, nemoretriever-page-elements-v1, nemoretriever-graphic-elements-v1, nemoretriever-table-structure-v1, PaddleOCR, nv-llama3.2-embedqa-1b-v2 compared to an open-source alternative; HW - 1xH100

35x Improved Data Storage Efficiency

Multilingual, Long-Context, Text Embedding NIM Microservice

A graph showing NeMo Retriever embedding model, llama-3.2-nv-embedqa-1b-v2.
Tested with the latest NeMo Retriever embedding model, llama-3.2-nv-embedqa-1b-v2, this shows the impact on vector storage volume with long-context support, dynamic embeddings, and efficient storage for high-performance, scalable data processing. In the chart above, DIM=dimensions.

Ways to Get Started With NVIDIA NeMo Retriever

Use the right tools and technologies to build and deploy generative AI applications that require secure and accurate information retrieval to generate real-time business insights for organizations across every industry. 

Decorative

Try

Experience NeMo Retriever NIM microservices through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA’s API catalog and deployed anywhere.

Try NeMo Retriever Microservices
Decorative icon representing source code

Build

Jump-start building your AI solutions with NVIDIA Blueprints, customizable reference applications, available on the NVIDIA API catalog.

Try the Blueprint
Decorative icon

Deploy

Get a free license to try
NVIDIA AI Enterprise in production for 90 days using your existing infrastructure.

Request a 90-Day
License

Starter Kits

Start building information retrieval pipelines and generative AI applications for multimodal data ingestion, embedding, reranking, retrieval-augmented generation, and agentic workflows by accessing NVIDIA Blueprints, tutorials, notebooks, blogs, forums, reference code, comprehensive documentation, and more.

AI Assistant for Customer Service

Build enhanced AI assistants that are more personalized and secure by leveraging RAG, NeMo Retriever, NIM, and the latest AI agent-building methodologies.

Retail Shopping Assistant

Develop a multimodal RAG application powered by LLMs that enables more personalized shopping experiences.

Digital Human for Customer Service

Bring applications to life with an AI-powered digital avatar that can transform customer service experiences.

Visual Agent for Video Search and Summarization

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A.


NVIDIA NeMo Retriever Learning Library

Loading...Loading...Loading...Loading...Loading...
Loading...

More Resources

Decorative image representing forums

Explore the Community

Get Training and Certification

Accelerate Your Startup


Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Get Started With NeMo Retriever Today.

Try Now