0% found this document useful (0 votes)

36 views43 pages

LLM Presentation

The document discusses Retrieval-Augmented Language Models (RALMs), focusing on their architecture, integration with language models, and the challenges of retrieval methods. It outlines two paths for RALMs: building a model with a built-in retrieval component and treating retrieval as a tool for existing powerful language models. The document also highlights the importance of efficient datastore management and the evolving nature of retrieval methods in enhancing language model performance.

Uploaded by

Jalel Eddine Hajlaoui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views43 pages

LLM Presentation

Uploaded by

Jalel Eddine Hajlaoui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

FALL 2024 COS597R:

DEEP DIVE INTO LARGE LANGUAGE MODELS

Danqi Chen, Sanjeev Arora

Lecture 18: Retrieval-augmented language models

https://princeton-cos597r.github.io/
Retrieval-augmented LMs (RALMs)

Input LM Output

Query DATASTORE Top-k “documents”

• Datastore:
• What should be stored in the datastore?
• How to index the datastore (e.g., granularity)?
• How to search top-k documents ef ciently?
• How to integrated the retrieved outputs with LMs?
• You can also search the datastore in multiple rounds, and you can search the datastore using “output”!

2
fi
Retrieval-augmented generation (RAG)

(Lewis et al., NeurIPS’20)

Douwe Kiela (Stanford CS25; 2023/12)

• An encoder-decoder model (BART)

• Retriever: DPR (for question answering)
• Fine-tuning on individual tasks (question
answering, question generation, fact veri cation)

3
fi
Retrieval-augmented generation (RAG)

Default RAG
framework:

https://www.datasciencecentral.com/rag-and-its-evolution/ 4
Retrieval-augmented LMs: two diverging paths
Path #1: Build a language model that has a built-in retrieval component
• You need to consider how to build the datastore/index as part of the model
• The optimal model architecture is still an ongoing exploration
• An alternative of scaling today’s parametric (Transformer-based) LMs
• Lots of interesting technical challenges, not as successful as we hoped for 🥹

5
Retrieval-augmented LMs: two diverging paths
Path #2: view retrieval as one of the “tools” that LMs learn how to use
• Assuming you already have a very powerful LM
• Retrieval can be viewed as an API or “black box” e.g., a search engine

• Research questions: when to call the retriever? how to take

retrieved results in context?
• Very popular in developer community (“frozen RAG”)

Toolformer (Schick et al., NeurIPS’23)

6
Long-context LMs and RALMs
• Today’s long-context LLMs support up to millions of tokens in their context window - Do we still need RAG?

• Long-context LMs support better RAG (more documents, more tokens)

• It puts less demand on retriever, but it can’t really replace RAG (since the datastore is still much larger)

•Though there are still a lot of questions about whether long-context LMs can really support their contexts

(Liu et al., TACL 2023) “Lost in the Middle”

7
Recommended materials

(2023/7)
https://acl2023-retrieval-lm.github.io/

• Model architectures
• Training methods
• Applications and extensions (multi-lingual and multi-modal)

8
Why retrieval-augmented LMs?

9
Retrieval-augmented LMs

• It is a language model P(xn | x1, x2, ⋯, xn−1)

The capital city of Ontario is ___

• It retrieves from an external datastore (at least during inference time)

(Also referred to semiparametric

and non-parametric models)

10
Can you cram all knowledge into parameters?

List 5 important papers authored by Geo rey Hinton

What is Kathy Saltzman’s occupation?
Geo rey Hinton is a renowned computer scientist …
Here are ve important papers authored by him:
1. "Learning Internal Representations by Error
Propagation" (with D. E. Rumelhart and R. J.
Williams) - This paper, published in 1986, ..
2. "Deep Boltzmann Machines" (with R.
Salakhutdinov) - Published in 2009, ..
…
4. "Deep Learning" (with Y. Bengio and A. Courville) -
Published as a book in 2016,… (Mallen et al., 2023)
5. "Attention Is All You Need" (with V. Vaswani, N. GPT-3 davinci-003: 20%-30% accuracy
Shazeer, et al.) - Published in 2017, this paper
introduced the Transformer model,…

(From 2023/7)

11
ff
fi
ff
Can you cram all knowledge into parameters?

List 5 research papers authored by Danqi Chen

Here are ve notable papers from her research:

1. ”A Fast and Accurate Dependency Parser using Neural Networks" (2014)

2. “Reading Wikipedia to Answer Open-Domain Questions" (2017)
3. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018)
4. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019)
5. ”Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task" (2017)

“Although primarily led by researchers at Google, Danqi contributed to

this landmark paper during her time as a visiting scientist at Google."

(From 2024/9)

12
fi
Can you cram all knowledge into parameters?

(From 2024/11)
13
LLMs knowledge is easily outdated

Who is the CEO of Twitter?

As of my knowledge cuto in
September 2021, the CEO of
Twitter is Jack Dorsey….

• Existing knowledge editing methods are still

NOT scalable (active research!)

• The datastore can be easily updated and

expanded - even without retraining! New!

14
ff
LLMs knowledge is easily outdated

15
LLM outputs are hard to interpret and verify

Generating text with citations

Can trace knowledge source

from retrieval results - better
interpretability & control

(Nakano et al. 2021; Menick et al.,

2022; Gao et al., 2023)

16
LLM outputs are hard to interpret and verify

17
Private and copyrighted data for training LMs

SILO (Min et al., ICLR’24)

• Machine unlearning is a challenging problem too (active research!)

18
Can we scale LMs with (test-time) datastore?

vs. LM

Datastore
Long-term goal: can we possibly reduce the training and
inference costs, and scale down the size of LLMs?
e.g., RETRO (Borgeaud et al., 2021): “obtains
comparable performance to GPT-3 on the Pile,
despite using 25x fewer parameters”

19
Can we scale LMs with (test-time) datastore?

20
RALMs: model architectures and training methods
(path #1)

21
Roadmap

Input LM Output

Query DATASTORE Top-k “documents”

ARCHITECTURE LEARNING
• How is the retriever represented? Granularity of datastore? • How are the LM and retriever trained together?
• How are retrieved contexts integrated with LMs? • Training from scratch? Fine-tuning?
• Sizes of LMs vs datastore? • Pre-training or instruction tuning?

22
How is retrieval implemented?
• Sim: a similarity score between two pieces of text

Sparse retrieval

Dense retrieval

• An entire eld of study on how to de ne or learn these similarity functions better

• There are ef cient data structures/infrastructure for supporting fast and accurate search from a large datastore

23
fi
fi
fi
How is retrieval implemented?

24
Example: RETRO

ARCHITECTURE
• How is the retriever represented? Granularity of datastore?
• Sizes of LMs vs datastore?
• How are retrieved contexts integrated with LMs?

• Granularity: chunks of 64 tokens

• Representation: frozen BERT encoders (pre-trained but for different tasks)
• LMs: 150M-7B parameters; datastore: up to 2T tokens (training tokens 600B tokens)

25
Example: RETRO
ARCHITECTURE

• How are retrieved contexts integrated with LMs?

The information is integrated into the intermediate layers of Transformers (with an extra encoder)
26
Example: RETRO
LEARNING
• How are the LM and retriever trained together?
• Training from scratch? Fine-tuning?
• Pre-training or instruction tuning?

• The retriever (“encoder”) is pre-trained and not updated anymore

• Trained from scratch or ne-tuned (“retro t”)

27
fi
fi
RETRO: experiments

Perplexity

In restrospect, too many works focused on improving perplexity instead of

downstream tasks at that time..
28
RETRO: experiments
Question answering

It is not better than specialized QA models

No evals on any of in-context learnings tasks in GPT-3

“With a 2 trillion token database, our Retrieval-Enhanced
Transformer (Retro) obtains comparable performance to
GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer
parameters.”

RETRO is not open-sourced… :( :(

29
RETRO++ (NVIDIA)

RETRO++ (Wang et al., ENMNLP’23) 30

Other architectures for RALMs

31
kNN-LMs
(Retrieval can be added at the output layer)

ARCHITECTURE
• Token-level datastore
• Representation = input to last FFN layer
• Integration only at the output layer

<latexit sha1_base64="FIEaEbycJVgdALf0N4COwW8O400=">AAACaHicbVFNa9wwEJWdfqSbfrhpSim9iCwFLySLHULbSyC0PbSXkEI3Cay3RtaOs2JlW0jjZhfF9D/21h/QS35F5N09pEkHBG/evBlpnjIlhcEo+uP5a/fuP3i4/qiz8fjJ02fB880TU9Waw4BXstJnGTMgRQkDFCjhTGlgRSbhNJt+auunP0EbUZXfca5gVLDzUuSCM3RUGvxSqU0QZminR0dNE17QpBBjylPs0UTpSmFFE1MXqQ35Dp05UpROwnDCmbSfm2aZZJn92oQzekAvnARmKtxNLvN29ASQNSHv0V16M3fzk8sfe7006Eb9aBH0LohXoEtWcZwGv5NxxesCSuSSGTOMI4UjyzQKLqHpJLUBxfiUncPQwZIVYEZ2YVRD3zpmTPNKu1MiXbA3OywrjJkXmVO2W5nbtZb8X21YY/5hZEWpaoSSLy/Ka0mde63rdCw0cJRzBxjXwr2V8gnTjKP7m44zIb698l1wsteP3/X3v+13Dz+u7Fgnb8g2CUlM3pND8oUckwHh5K+34W15L70rP/Bf+a+XUt9b9bwg/4S/fQ1wqbeG</latexit>

X
2
pkNN (w | ct ) / I(x = w) exp( kf✓ (c) f✓ (ct )k )
(c,x)2D only keep top-K after NN search

P (w | ct ) = PLM (w | ct ) + (1 )PkNN (w | ct ) (Linear interpolation)

<latexit sha1_base64="VUxq1HZcXbSlLONkaI2MZKe+QKc=">AAACRXicbVBJSwMxGM241rpVPXoJFqFFLDNS1ItQ9OJBSwW7QKcMmUxqQzMLyTdqGfrnvHj35j/w4kERr5ouolUfBB5vIclzI8EVmOajMTU9Mzs3n1pILy4tr6xm1tZrKowlZVUailA2XKKY4AGrAgfBGpFkxHcFq7vdk4Ffv2ZS8TC4hF7EWj65CnibUwJacjJ2JXeDbZ97mDqQx0fYFrrsEVxxEhvYLSRn5/3+RGYH5yy8+xXMfye75fJk1MlkzYI5BP5LrDHJojEqTubB9kIa+ywAKohSTcuMoJUQCZwK1k/bsWIRoV1yxZqaBsRnqpUMV+jjba14uB1KfQLAQ/VnIyG+Uj3f1UmfQEf99gbif14zhvZhK+FBFAML6OiidiwwhHgwKfa4ZBRETxNCJddvxbRDJKGgh0/rEazfX/5LansFa79QvChmS8fjOVJoE22hHLLQASqhU1RBVUTRHXpCL+jVuDeejTfjfRSdMsadDTQB4+MThOGuyQ==</latexit>

kNN-LM (Khandelwal et al., ICLR’20) 32

REPLUG
(Retrieval can be added at the input layer)

REPLUG (Shi et al., NAACL’24) 33

Why is training so hard for RALMs?

34
Why is training so hard for RALMs?

35
Different training methods

36
Training with asynchronous index updates

Examples: REALM (Guu et al., ICML’20), ATLAS (Izacard et al., JMLR’23)

37
Training with in-batch approximations

38
Training with in-batch approximations

TRIME (Zhong et al., EMNLP’22)

NPM (Min et al., ACL’23 Findings)

39
Advanced “frozen” RAG frameworks (path #2)

40
Key challenges

#1. Can LMs discern relevant and irrelevant passages?

#2. Can LMs leverage all the contexts e ectively?
#3. Can LMs synthesize information from different passages just in context?

41
ff
Active Retrieval Augmented Generation (FLARE)

FLARE (Jiang et al., EMNLP’23) 42

Self-Re ective Retrieval-Augmented Generation (Self-RAG)

Self-RAG (Asai et al., ICLR’24) 43

p42 Jiang
No ratings yet
p42 Jiang
11 pages
Retrieval-Augmented Generation for LLMs
No ratings yet
Retrieval-Augmented Generation for LLMs
17 pages
E2. Retrieval-Augmented LMs
No ratings yet
E2. Retrieval-Augmented LMs
83 pages
Advanced NLP Retrieval Techniques
No ratings yet
Advanced NLP Retrieval Techniques
37 pages
Vector Search and Question Answering Techniques
No ratings yet
Vector Search and Question Answering Techniques
16 pages
RAG Enhancements for Financial QA
No ratings yet
RAG Enhancements for Financial QA
7 pages
Retrieval-Augmented LMs Overview
No ratings yet
Retrieval-Augmented LMs Overview
120 pages
Lec7 - Large Models
No ratings yet
Lec7 - Large Models
33 pages
It's About Time: Incorporating Temporality in Retrieval Augmented Language Models
No ratings yet
It's About Time: Incorporating Temporality in Retrieval Augmented Language Models
8 pages
Rq-Rag: L R Q R A G: Earning To Efine Ueries For Etrieval Ugmented Eneration
No ratings yet
Rq-Rag: L R Q R A G: Earning To Efine Ueries For Etrieval Ugmented Eneration
18 pages
Pec Gen Ai Notes
No ratings yet
Pec Gen Ai Notes
11 pages
Large Language Models Are Built-In Autoregressive Search Engines
No ratings yet
Large Language Models Are Built-In Autoregressive Search Engines
11 pages
RAG 570 Hasnad Ahmed2
No ratings yet
RAG 570 Hasnad Ahmed2
9 pages
CrateDB and LangChain
No ratings yet
CrateDB and LangChain
14 pages
AI Engineer Interview Questions
0% (1)
AI Engineer Interview Questions
6 pages
Internship Report Hamas Khan
No ratings yet
Internship Report Hamas Khan
24 pages
Steps Involved in RAG
No ratings yet
Steps Involved in RAG
4 pages
GAI Unit2
No ratings yet
GAI Unit2
7 pages
Lecture16-Retrieval Augmented Generation With LLMs
No ratings yet
Lecture16-Retrieval Augmented Generation With LLMs
40 pages
RAG and Prompt Engineering Overview
No ratings yet
RAG and Prompt Engineering Overview
56 pages
R LL: A Framework For Developing and Evaluating Retrieval-Augmented Large Language Models
No ratings yet
R LL: A Framework For Developing and Evaluating Retrieval-Augmented Large Language Models
18 pages
Roadmap To Become An AI Generalist
No ratings yet
Roadmap To Become An AI Generalist
17 pages
Author Name Title Paper/Submission ID Submitted by Submission Date Total Pages Document Type
No ratings yet
Author Name Title Paper/Submission ID Submitted by Submission Date Total Pages Document Type
8 pages
Introduction To Training LLMs For AI Agents
No ratings yet
Introduction To Training LLMs For AI Agents
86 pages
ChatGPT and Generative AI Insights
No ratings yet
ChatGPT and Generative AI Insights
38 pages
Untitled 2
No ratings yet
Untitled 2
40 pages
AI Engineer Syllabus
No ratings yet
AI Engineer Syllabus
6 pages
Experiment No 6 1
No ratings yet
Experiment No 6 1
5 pages
RAG Techniques for Large Language Models
No ratings yet
RAG Techniques for Large Language Models
26 pages
Reading:: Sources
No ratings yet
Reading:: Sources
15 pages
14 Key Skills To Master Large Language Models 1729745509
No ratings yet
14 Key Skills To Master Large Language Models 1729745509
17 pages
Language Modeling and BERT Overview
No ratings yet
Language Modeling and BERT Overview
63 pages
4-HC24.PrimisAI - Hans Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI - Hans Bouwmeester.v4
29 pages
LangChain: Chat with Your Data Guide
No ratings yet
LangChain: Chat with Your Data Guide
32 pages
Little Guide To Building Large Language Models in 2024
No ratings yet
Little Guide To Building Large Language Models in 2024
65 pages
Agentic AI Training
No ratings yet
Agentic AI Training
53 pages
Data Engineer Generative Ai
No ratings yet
Data Engineer Generative Ai
17 pages
Generative AI for Knowledge Retrieval
No ratings yet
Generative AI for Knowledge Retrieval
8 pages
IAI Sp2025 Session 15 - Improving LLMs
No ratings yet
IAI Sp2025 Session 15 - Improving LLMs
34 pages
JCST Papers: Only For Academic and Non-Commercial Use
No ratings yet
JCST Papers: Only For Academic and Non-Commercial Use
26 pages
AI Week-05
No ratings yet
AI Week-05
49 pages
Training For AI Engineer Interns
No ratings yet
Training For AI Engineer Interns
4 pages
2509.06283v2 SFR Deepresearch
No ratings yet
2509.06283v2 SFR Deepresearch
13 pages
Survey of GPT-3 Family Language Models
No ratings yet
Survey of GPT-3 Family Language Models
48 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
Kalyan
No ratings yet
Kalyan
48 pages
AI Learning Roadmap
No ratings yet
AI Learning Roadmap
6 pages
LLM Interview Questions 1759731625
No ratings yet
LLM Interview Questions 1759731625
13 pages
CS224u: Natural Language Understanding Overview
No ratings yet
CS224u: Natural Language Understanding Overview
98 pages
2025 04 22 Intro To LLMsv1
No ratings yet
2025 04 22 Intro To LLMsv1
41 pages
Domain-Specific Knowledge and Context in Large Language Models: Challenges, Concerns, and Solutions
No ratings yet
Domain-Specific Knowledge and Context in Large Language Models: Challenges, Concerns, and Solutions
11 pages
Slides
No ratings yet
Slides
40 pages
GEN-AI Masters Program Curriculum
No ratings yet
GEN-AI Masters Program Curriculum
5 pages
Little Guide To Building Large Language Models in 2024
100% (2)
Little Guide To Building Large Language Models in 2024
65 pages
Dyslexic 1
No ratings yet
Dyslexic 1
11 pages
Sem 7 All
No ratings yet
Sem 7 All
15 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
Qa-Rag
No ratings yet
Qa-Rag
15 pages
04-07-2025
No ratings yet
04-07-2025
18 pages
(10 December 2024, NeurIPS) Tutorial On Language Modeling
No ratings yet
(10 December 2024, NeurIPS) Tutorial On Language Modeling
255 pages
Kohler K-Series - Single Cylinder Torque Values & Sequences For Fasteners
0% (1)
Kohler K-Series - Single Cylinder Torque Values & Sequences For Fasteners
2 pages
Experiment 04: Name of Exp: Dead Weight Tester Objective of Exp
No ratings yet
Experiment 04: Name of Exp: Dead Weight Tester Objective of Exp
2 pages
Conservatism and Fairness in Contemporary Politics - Unpacking The Psychological Underpinnings of Modern Racism
No ratings yet
Conservatism and Fairness in Contemporary Politics - Unpacking The Psychological Underpinnings of Modern Racism
42 pages
Restful Java With Jaxrs 1st Edition Bill Burke
No ratings yet
Restful Java With Jaxrs 1st Edition Bill Burke
489 pages
MVCC Cable Technical Specifications
No ratings yet
MVCC Cable Technical Specifications
3 pages
NU1007-XL-M1-F1-J20AA-C3-M32N Bearing
No ratings yet
NU1007-XL-M1-F1-J20AA-C3-M32N Bearing
2 pages
M11.WK17.TS 2022 Complex Numbers III - Questions
No ratings yet
M11.WK17.TS 2022 Complex Numbers III - Questions
1 page
Fuzzy Logic Control in Matlab Workshop
No ratings yet
Fuzzy Logic Control in Matlab Workshop
26 pages
Fine Filter For Control Gas HON 905: Product Information
No ratings yet
Fine Filter For Control Gas HON 905: Product Information
8 pages
585C Piston Actuator Guide
No ratings yet
585C Piston Actuator Guide
40 pages
WIS5 - Appendix 1
No ratings yet
WIS5 - Appendix 1
118 pages
151-09 - E Komatsu
100% (8)
151-09 - E Komatsu
6 pages
Atoms, Compounds, and Chemical Bonds
No ratings yet
Atoms, Compounds, and Chemical Bonds
24 pages
Unilap 100 E: Installation Tester
0% (1)
Unilap 100 E: Installation Tester
4 pages
DPS Half Yearly Physics Paper
No ratings yet
DPS Half Yearly Physics Paper
8 pages
Wind Turbine Collapse Analysis
No ratings yet
Wind Turbine Collapse Analysis
19 pages
JEE Main Physics MCQs 2021
No ratings yet
JEE Main Physics MCQs 2021
40 pages
Siemens S7 Modules Overview
No ratings yet
Siemens S7 Modules Overview
14 pages
CICIND Model Code For Steel Chimneys
100% (3)
CICIND Model Code For Steel Chimneys
38 pages
ES2 Synthesizer Guide for Logic Pro X
No ratings yet
ES2 Synthesizer Guide for Logic Pro X
4 pages
Math6 SLMQ4
No ratings yet
Math6 SLMQ4
21 pages
Brief History of Statistics
No ratings yet
Brief History of Statistics
60 pages
Data Modelling Theory
No ratings yet
Data Modelling Theory
19 pages
Shear Wall Modeling in ETABS
100% (1)
Shear Wall Modeling in ETABS
5 pages
A Level Computer Science OCR - Unit 4 Exchanging Data HW
No ratings yet
A Level Computer Science OCR - Unit 4 Exchanging Data HW
15 pages
Thesis Chapter 4: Data Analysis Guide
100% (3)
Thesis Chapter 4: Data Analysis Guide
7 pages
8RY 800 001-SF6密度继电器说明书 (ABB)
No ratings yet
8RY 800 001-SF6密度继电器说明书 (ABB)
8 pages
Bachelor of Computer Applications
No ratings yet
Bachelor of Computer Applications
29 pages
Decision Table Testing Examples
No ratings yet
Decision Table Testing Examples
3 pages
IEPDA-Module-1-Rodriguez ME Week-1
No ratings yet
IEPDA-Module-1-Rodriguez ME Week-1
18 pages