Uploaded by Võ Hiệp Thành

yjqHloiSmGfYh3Lj6qVd Class1-RAG

advertisement
Retrieval Augmented
Generation
Week 5 - L1
Agenda
1.
2.
3.
Recap on Large Language Models
Retrieval Augmented Generation (RAG)
RAG in conversational setting
Recap on Large
Language Models
Language modeling
A language model (LMs) is a statistical distribution over sequences of words. A
famous way to learn this distribution is by predicting the next word given the
previous context
Large Language Models
Recently, large language model take this training paradigm to the next level via
scaling both model size and training data. For example, GPT-3 (OpenAI) has 175
billion parameters and is trained on 45 terabytes of data.
LLMs are memory store?
It has been pointed out that LLMs can act as knowledge bases
However
The size of the model <<< the size of the training set. Viewing a large language
model as a compression of the training corpora is a common analogy [1].
Will increasing the size of LLM also improve its capacity to
memorize?
Figure from [2]
How to make LLMs memorize more knowledge
Why is this necessary: There may be use cases where it is necessary to
memorize customized data (internal data)
●
●
●
Personal assistant (healthcare, scheduling or just chit-chat)
Sales AI
Concept definition generation
Two approaches:
●
●
Finetuning
Retrieval Augmented Generation
Pros and cons of finetuning
Pros
●
●
Quality of the finetuned models are (likely) guaranteed
We can change things such as the tone of the LLM (may be important for
some)
Cons
●
●
Cost of building instruction dataset
Computational cost of (continued) pretraining and finetuning models
In most cases, the cons outweigh the pros
Retrieval Augmented
Generation
A RAG pipeline
Figure from retrieval-augmented-generation-notes
A basic retrieval component for RAG
Include only a vectorDB containing the documents and their vectorized
representation. Retrieve top-K documents that are most related to the query by
measuring cosine similarity
Text embedding - which model to choose?
There are a lot of models!!
MTEB leaderboard [3]
Text embedding - Sentence BERT [4]
More recent? Got it!
INSTRUCTOR [5] - hkunlp/instructor-large
E5 [6] - intfloat/e5-large-v2
Want a cheaper model? Got it!
Sent2Vec [7] is a CBOW sentence embedding model. Can operate comfortably on
CPU
Github
Populating the vector DB
Steps
1.
2.
3.
4.
Collecting the texts
Chunking the text to your desired size
Vectorize them
Index them
Cascaded Text Retrieval
Contain two stages
1.
2.
Candidate retrieval: narrow down the scope of search by choosing top-N
candidates, thereby removing totally unrelated documents
Candidate re-ranking: rank the candidates and choose the top-k with k <<< N
Why an additional step is a good idea?
There are several reasons
1.
2.
A re-ranking step may offer better precision in ranking
Several post-processing techniques may be applied, such as diversification of
results
RAG in conversational
setting
Why is conversational setting more difficult?
Due to the fact that human conversation are heavily contextualized, i.e. dependent
on previous conversation turns [8, 9]
[User]: What is the name of the scientist who proposed general relativity?
[Assistant]: It is Albert Einstein
[User]: When was he born, and how did that theory impact physics research at the
time?
Query formulation
Rewriting the user query based on the context provided by conversation history
[User]: When was he born, and how did that
theory impact physics research at the time?
[User]: When was Albert Einstein born, and how
did that theory of general relativity impact physics
research at the time?
Query formulation - A simple approach
Just prompt the LLM to do it for you:
Query formulation - pros and cons
Pros
●
No further fine-tuning of any component in RAG
Cons
●
Additional cost in LLM inference
Open discussion: what happens when the conversation
history gets too long?
References
[1] Language Models as Knowledge Bases?
[2] PaLM: Scaling Language Modeling with Pathways
[3] MTEB: Massive Text Embedding Benchmark
[4] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
[5] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
[6] Text Embeddings by Weakly-Supervised Contrastive Pre-training
[7] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
[8] Contextualized Query Embeddings for Conversational Search
[9] Few-Shot Conversational Dense Retrieval
Download