Retrieval-Augmented Generation

Retrieval-augmented generation (RAG), when combined with accelerated computing, enables large language models to generate highly accurate responses by processing large quantities of data faster.

Retrieval-Augmented Generation Pipeline

Click to Enlarge

How Retrieval-Augmented Generation Works

RAG enhances large language models (LLMs) by retrieving the most relevant and current information from external knowledge sources. Before a user can retrieve responses from a RAG pipeline, data must be ingested into the knowledge base.

  1. Data Extraction: Multimodal, structured, and unstructured data is extracted from various formats and converted to text so it can be filtered, chunked, and fed into the retrieval pipeline.

  2. Data Retrieval: Extracted data is passed to an embedding model to create knowledge embeddings that go into a vector database. When a user submits a query, the system embeds the query, retrieves relevant data from the vector database, reranks the results, and sends them to the LLM to return the most accurate and context-aware responses.

Evaluating RAG pipelines is crucial because these systems involve multiple interacting components, and mistakes or biases in the individual components can propagate through the system, leading to compounded errors in the generated output.

Explore RAG Technology

NVIDIA NeMo Retriever

NVIDIA NeMo™ Retriever is a collection of generative AI microservices for extraction, embedding, and reranking that enable developers to build pipelines that generate business insights in real time with high accuracy and maximum data privacy.

AI-Q NVIDIA Blueprint

AI-Q is an NVIDIA Blueprint for connecting, evaluating, and accelerating teams of AI agents. It includes the open-source NVIDIA AgentIQ toolkit to profile and enhance the accuracy of agentic systems.

NVIDIA cuVS

NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering. It enables higher throughput, lower latency, and faster index build times, and improves the efficiency of semantic search within pipelines and applications such as information retrieval or RAG.

NVIDIA NeMo Curator

NVIDIA NeMo Curator provides prebuilt pipelines for generating synthetic data to customize and evaluate embedding models to improve the performance of RAG systems.

NVIDIA NeMo Customizer

NVIDIA NeMo Customizer is a high-performance, scalable microservice that simplifies fine-tuning and alignment of generative AI models, including embedding models for domain-specific use cases, making it easier to adopt generative AI across industries.

NVIDIA NeMo Evaluator

NVIDIA NeMo Evaluator is a microservice for assessing generative AI models and RAG pipelines across academic and custom benchmarks on any platform.

Explore NVIDIA NIM Microservices for RAG

Retrieval NIM Microservices

NeMo Retriever microservices, built with NVIDIA NIM™, include extraction, embedding and reranking models to connect LLMs to enterprise data and retrieve the most accurate responses.

Mistral AI NIM Microservices

Mistral AI NIM microservices provide LLMs for state-of-the-art reasoning, knowledge, and code capabilities, delivering superior accuracy for agentic applications, multilingual tasks, GPU-accelerated generation of text embeddings, and more.

Llama 3.1 NIM Microservices

Llama 3.1 NIM microservices leverage customizable LLMs to improve the helpfulness of generated responses, refine retrieval results over multiple sources and languages, understand regional nuances, and more.

Explore NVIDIA AI Blueprints That Use RAG

NVIDIA Blueprints are reference workflows for generative AI use cases built with NVIDIA NIM microservices. With these blueprints, developers can build production-ready AI applications that empower employees with real-time insights, connecting them to AI query engines to enable transformational efficiency and productivity gains.

Build an Enterprise RAG Pipeline

Connect AI applications to enterprise data using industry-leading embedding and reranking models for information retrieval at scale.

AI Assistants for Customer Service

Develop secure, context-aware virtual assistants that meet the unique needs of your business and enhance customer service operations.

Digital Humans for Customer Service

Bring applications to life with an AI-powered digital avatar that can transform customer service experiences.

Retrieval-Augmented Generation Learning Resources

Loading...Loading...Loading...Loading...Loading...
Loading...