0% found this document useful (0 votes)
36 views43 pages

LLM Presentation

The document discusses Retrieval-Augmented Language Models (RALMs), focusing on their architecture, integration with language models, and the challenges of retrieval methods. It outlines two paths for RALMs: building a model with a built-in retrieval component and treating retrieval as a tool for existing powerful language models. The document also highlights the importance of efficient datastore management and the evolving nature of retrieval methods in enhancing language model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views43 pages

LLM Presentation

The document discusses Retrieval-Augmented Language Models (RALMs), focusing on their architecture, integration with language models, and the challenges of retrieval methods. It outlines two paths for RALMs: building a model with a built-in retrieval component and treating retrieval as a tool for existing powerful language models. The document also highlights the importance of efficient datastore management and the evolving nature of retrieval methods in enhancing language model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

FALL 2024 COS597R:

DEEP DIVE INTO LARGE LANGUAGE MODELS


Danqi Chen, Sanjeev Arora

Lecture 18: Retrieval-augmented language models

https://princeton-cos597r.github.io/
Retrieval-augmented LMs (RALMs)

Input LM Output

Query DATASTORE Top-k “documents”

• Datastore:
• What should be stored in the datastore?
• How to index the datastore (e.g., granularity)?
• How to search top-k documents ef ciently?
• How to integrated the retrieved outputs with LMs?
• You can also search the datastore in multiple rounds, and you can search the datastore using “output”!

2
fi
Retrieval-augmented generation (RAG)

(Lewis et al., NeurIPS’20)

Douwe Kiela (Stanford CS25; 2023/12)

• An encoder-decoder model (BART)


• Retriever: DPR (for question answering)
• Fine-tuning on individual tasks (question
answering, question generation, fact veri cation)

3
fi
Retrieval-augmented generation (RAG)

Default RAG
framework:

https://www.datasciencecentral.com/rag-and-its-evolution/ 4
Retrieval-augmented LMs: two diverging paths
Path #1: Build a language model that has a built-in retrieval component
• You need to consider how to build the datastore/index as part of the model
• The optimal model architecture is still an ongoing exploration
• An alternative of scaling today’s parametric (Transformer-based) LMs
• Lots of interesting technical challenges, not as successful as we hoped for 🥹

5
Retrieval-augmented LMs: two diverging paths
Path #2: view retrieval as one of the “tools” that LMs learn how to use
• Assuming you already have a very powerful LM
• Retrieval can be viewed as an API or “black box” e.g., a search engine

• Research questions: when to call the retriever? how to take


retrieved results in context?
• Very popular in developer community (“frozen RAG”)

Toolformer (Schick et al., NeurIPS’23)

6
Long-context LMs and RALMs
• Today’s long-context LLMs support up to millions of tokens in their context window - Do we still need RAG?

• Long-context LMs support better RAG (more documents, more tokens)

• It puts less demand on retriever, but it can’t really replace RAG (since the datastore is still much larger)

•Though there are still a lot of questions about whether long-context LMs can really support their contexts

(Liu et al., TACL 2023) “Lost in the Middle”

7
Recommended materials

(2023/7)
https://acl2023-retrieval-lm.github.io/

• Model architectures
• Training methods
• Applications and extensions (multi-lingual and multi-modal)

8
Why retrieval-augmented LMs?

9
Retrieval-augmented LMs

• It is a language model P(xn | x1, x2, ⋯, xn−1)


The capital city of Ontario is ___

• It retrieves from an external datastore (at least during inference time)

(Also referred to semiparametric


and non-parametric models)

10
Can you cram all knowledge into parameters?

List 5 important papers authored by Geo rey Hinton


What is Kathy Saltzman’s occupation?
Geo rey Hinton is a renowned computer scientist …
Here are ve important papers authored by him:
1. "Learning Internal Representations by Error
Propagation" (with D. E. Rumelhart and R. J.
Williams) - This paper, published in 1986, ..
2. "Deep Boltzmann Machines" (with R.
Salakhutdinov) - Published in 2009, ..

4. "Deep Learning" (with Y. Bengio and A. Courville) -
Published as a book in 2016,… (Mallen et al., 2023)
5. "Attention Is All You Need" (with V. Vaswani, N. GPT-3 davinci-003: 20%-30% accuracy
Shazeer, et al.) - Published in 2017, this paper
introduced the Transformer model,…

(From 2023/7)

11
ff
fi
ff
Can you cram all knowledge into parameters?

List 5 research papers authored by Danqi Chen

Here are ve notable papers from her research:

1. ”A Fast and Accurate Dependency Parser using Neural Networks" (2014)


2. “Reading Wikipedia to Answer Open-Domain Questions" (2017)
3. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018)
4. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019)
5. ”Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task" (2017)

“Although primarily led by researchers at Google, Danqi contributed to


this landmark paper during her time as a visiting scientist at Google."

(From 2024/9)

12
fi
Can you cram all knowledge into parameters?

(From 2024/11)
13
LLMs knowledge is easily outdated

Who is the CEO of Twitter?

As of my knowledge cuto in
September 2021, the CEO of
Twitter is Jack Dorsey….

• Existing knowledge editing methods are still


NOT scalable (active research!)

• The datastore can be easily updated and


expanded - even without retraining! New!

14
ff
LLMs knowledge is easily outdated

15
LLM outputs are hard to interpret and verify

Generating text with citations

Can trace knowledge source


from retrieval results - better
interpretability & control

(Nakano et al. 2021; Menick et al.,


2022; Gao et al., 2023)

16
LLM outputs are hard to interpret and verify

17
Private and copyrighted data for training LMs

SILO (Min et al., ICLR’24)

• Machine unlearning is a challenging problem too (active research!)


18
Can we scale LMs with (test-time) datastore?

LM

vs. LM

Datastore
Long-term goal: can we possibly reduce the training and
inference costs, and scale down the size of LLMs?
e.g., RETRO (Borgeaud et al., 2021): “obtains
comparable performance to GPT-3 on the Pile,
despite using 25x fewer parameters”

19
Can we scale LMs with (test-time) datastore?

20
RALMs: model architectures and training methods
(path #1)

21
Roadmap

Input LM Output

Query DATASTORE Top-k “documents”

ARCHITECTURE LEARNING
• How is the retriever represented? Granularity of datastore? • How are the LM and retriever trained together?
• How are retrieved contexts integrated with LMs? • Training from scratch? Fine-tuning?
• Sizes of LMs vs datastore? • Pre-training or instruction tuning?

22
How is retrieval implemented?
• Sim: a similarity score between two pieces of text

Sparse retrieval

Dense retrieval

• An entire eld of study on how to de ne or learn these similarity functions better

• There are ef cient data structures/infrastructure for supporting fast and accurate search from a large datastore

23
fi
fi
fi
How is retrieval implemented?

24
Example: RETRO

ARCHITECTURE
• How is the retriever represented? Granularity of datastore?
• Sizes of LMs vs datastore?
• How are retrieved contexts integrated with LMs?

• Granularity: chunks of 64 tokens


• Representation: frozen BERT encoders (pre-trained but for different tasks)
• LMs: 150M-7B parameters; datastore: up to 2T tokens (training tokens 600B tokens)

25
Example: RETRO
ARCHITECTURE

• How are retrieved contexts integrated with LMs?

The information is integrated into the intermediate layers of Transformers (with an extra encoder)
26
Example: RETRO
LEARNING
• How are the LM and retriever trained together?
• Training from scratch? Fine-tuning?
• Pre-training or instruction tuning?

• The retriever (“encoder”) is pre-trained and not updated anymore


• Trained from scratch or ne-tuned (“retro t”)

27
fi
fi
RETRO: experiments

Perplexity

In restrospect, too many works focused on improving perplexity instead of


downstream tasks at that time..
28
RETRO: experiments
Question answering

It is not better than specialized QA models

No evals on any of in-context learnings tasks in GPT-3


“With a 2 trillion token database, our Retrieval-Enhanced
Transformer (Retro) obtains comparable performance to
GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer
parameters.”

RETRO is not open-sourced… :( :(

29
RETRO++ (NVIDIA)

RETRO++ (Wang et al., ENMNLP’23) 30


Other architectures for RALMs

31
kNN-LMs
(Retrieval can be added at the output layer)

ARCHITECTURE
• Token-level datastore
• Representation = input to last FFN layer
• Integration only at the output layer

<latexit sha1_base64="FIEaEbycJVgdALf0N4COwW8O400=">AAACaHicbVFNa9wwEJWdfqSbfrhpSim9iCwFLySLHULbSyC0PbSXkEI3Cay3RtaOs2JlW0jjZhfF9D/21h/QS35F5N09pEkHBG/evBlpnjIlhcEo+uP5a/fuP3i4/qiz8fjJ02fB880TU9Waw4BXstJnGTMgRQkDFCjhTGlgRSbhNJt+auunP0EbUZXfca5gVLDzUuSCM3RUGvxSqU0QZminR0dNE17QpBBjylPs0UTpSmFFE1MXqQ35Dp05UpROwnDCmbSfm2aZZJn92oQzekAvnARmKtxNLvN29ASQNSHv0V16M3fzk8sfe7006Eb9aBH0LohXoEtWcZwGv5NxxesCSuSSGTOMI4UjyzQKLqHpJLUBxfiUncPQwZIVYEZ2YVRD3zpmTPNKu1MiXbA3OywrjJkXmVO2W5nbtZb8X21YY/5hZEWpaoSSLy/Ka0mde63rdCw0cJRzBxjXwr2V8gnTjKP7m44zIb698l1wsteP3/X3v+13Dz+u7Fgnb8g2CUlM3pND8oUckwHh5K+34W15L70rP/Bf+a+XUt9b9bwg/4S/fQ1wqbeG</latexit>

X
2
pkNN (w | ct ) / I(x = w) exp( kf✓ (c) f✓ (ct )k )
(c,x)2D only keep top-K after NN search

P (w | ct ) = PLM (w | ct ) + (1 )PkNN (w | ct ) (Linear interpolation)


<latexit sha1_base64="VUxq1HZcXbSlLONkaI2MZKe+QKc=">AAACRXicbVBJSwMxGM241rpVPXoJFqFFLDNS1ItQ9OJBSwW7QKcMmUxqQzMLyTdqGfrnvHj35j/w4kERr5ouolUfBB5vIclzI8EVmOajMTU9Mzs3n1pILy4tr6xm1tZrKowlZVUailA2XKKY4AGrAgfBGpFkxHcFq7vdk4Ffv2ZS8TC4hF7EWj65CnibUwJacjJ2JXeDbZ97mDqQx0fYFrrsEVxxEhvYLSRn5/3+RGYH5yy8+xXMfye75fJk1MlkzYI5BP5LrDHJojEqTubB9kIa+ywAKohSTcuMoJUQCZwK1k/bsWIRoV1yxZqaBsRnqpUMV+jjba14uB1KfQLAQ/VnIyG+Uj3f1UmfQEf99gbif14zhvZhK+FBFAML6OiidiwwhHgwKfa4ZBRETxNCJddvxbRDJKGgh0/rEazfX/5LansFa79QvChmS8fjOVJoE22hHLLQASqhU1RBVUTRHXpCL+jVuDeejTfjfRSdMsadDTQB4+MThOGuyQ==</latexit>

kNN-LM (Khandelwal et al., ICLR’20) 32


REPLUG
(Retrieval can be added at the input layer)

REPLUG (Shi et al., NAACL’24) 33


Why is training so hard for RALMs?

34
Why is training so hard for RALMs?

35
Different training methods

36
Training with asynchronous index updates

Examples: REALM (Guu et al., ICML’20), ATLAS (Izacard et al., JMLR’23)

37
Training with in-batch approximations

38
Training with in-batch approximations

TRIME (Zhong et al., EMNLP’22)

NPM (Min et al., ACL’23 Findings)

39
Advanced “frozen” RAG frameworks (path #2)

40
Key challenges

#1. Can LMs discern relevant and irrelevant passages?


#2. Can LMs leverage all the contexts e ectively?
#3. Can LMs synthesize information from different passages just in context?

41
ff
Active Retrieval Augmented Generation (FLARE)

FLARE (Jiang et al., EMNLP’23) 42


Self-Re ective Retrieval-Augmented Generation (Self-RAG)

Self-RAG (Asai et al., ICLR’24) 43


fl

You might also like