Uploaded by Ehiosun destiny

LangChain Talk

advertisement
LLM application development with LangChain
Lance Martin
Software Engineer, LangChain
@RLanceMartin
LangChain makes it as easy as possible to develop LLM-powered applications.
Platform (LangSmith) and Library (LangChain)
Use-cases
Web-Langchain
Chat-Langchain
More …
Template apps
Chains
LangChain Expression Language
Compose building blocks
Building Blocks
Open source
Document
Loaders
Document
Transformers
Storage +
Embeddings
Prompt
LLM
Platform
Observability, Data
Management, Eval
LangSmith + Hub
LangSmith Blog
Central concepts
Two ways pre-trained LLMs learn
Weight updates via fine-tuning
Prompt (e.g., via retrieval)
Cramming before a test
Open book exam
Bad for factual recall, good for tasks
Good for factual recall
Form (e.g., extraction, text-to-SQL)
Facts (e.g., QA)
OpenAI, Fine-tuning and hallucinations, Anyscale
Search engines
(Retrieval only)
Retrieval Augmented Generation
(Add task-related documents to
LLM content window / working memory)
LLM
(Memory only)
Building Blocks
Document Loaders: > 140 Integrations
Document
Loaders
Structured
Public
Unstructured
Datastores
Proprietary
Private / Company Data
.pdf, .txt, .json., ,md, …
Integrations hub
Text splitters
Document
Transformers
Playground
Beyond basic splitting: Context-aware splitting, Function calling (Doctran)
Document
Loaders
Document
Transformers
Document
Transformers
Embeddings +
Storage
Code:
Def foo
Def foo(...):
for section in sections:
for section in sections:
sect = section.find("head")
sect = section.find("head")
Markdown:
Introduction
# Introduction
Notion templates are effective . . .
Notion templates are effective . . .
PDF:
Abstract
The dominant sequence
transduction models . . .
Abstract
The dominant sequence . . .
Context-aware splitting, Doctran
> 40 vectorstore integrations, > 30 embeddings
Document Loaders
(> 140 Integrations)
Document
Transformers
Vector Storage
(> 40 Integrations)
(e.g., Text Splitters, OAI functions)
Embeddings
(> 30 Integrations)
Embeddings
Document +
Storage
Transformers
Hosted or Private vectorstore + embeddings
Hosted
Embeddings
Document +
Storage
Transformers
Private (on device)
Embeddings
Vector Storage
GPT4All embeddings
> 60 LLM integrations
LLM
Document Loaders
(> 140 Integrations)
Document
Transformers
Vector Storage
(> 40 Integrations)
(e.g., Text Splitters, OAI functions)
Embeddings
(> 30 Integrations)
LLMs
(> 60 Integrations)
LLM landscape
LLM
OpenAI
Anthropic
Llama-2 (SOTA OSS)
Context Window (tokens)
4k - 32k
100k
4k
Performance
GPT-4 SOTA (best overall)
Claude-2 getting closer to GPT4
70b on par w/ GPT-3.5-turbo*
Cost
$0.06 / 1k token (input)
4-5x cheaper than GPT-4-32K
Free
*Llama2-70b on par w/ GPT-3.5-turbo on language, but lags on coding
Math
Language
Code
Open Source LLMs
LLM
1.5M
Tuned
Fine Tune
(Instructions)
GPT4All
800K
Nous
300k
150k
Koala
Vicuna
15k
180B
OPT
Dolly
340B 400B
MPT-Instruct
15k
1T
GPT-J
GPT-NeoX-20b
52k
SOTA
300k
LLaMA-2-Chat
100k
70k
Alpaca
1.4T
1.5T
2T
LLaMA
StableLM
Falcon
Base Model
(Training tokens)
LLaMA-2
MPT
BLOOM
Nous
OSS models can run on device (private)
LLM
Llama2-13b running ~50 tok / sec (Mac M2 max, 32gb)
Ollama, Llama.cpp
Integrations Hub
Integrations hub
Use Cases
RAG: Load working memory w/ retrieved information relevant to a task
Document Loading
Splitting
Storage
Retrieval
RAG
Output
URLs
PDFs
Documents
Database
Relevant
Splits
Splits
Prompt
LLM
Answer
Query
Question
Use case documentation
Pick desired level of abstraction
RAG
Abstraction / Simplicity
VectorstoreIndexCreator
Answer
RetrievalQA
Relevant
Splits
Load_QA_chain
Answer
Answer
Use case documentation
Or, use runnables
RAG
Use case documentation
LangSmith trace for RetrievalQA chain
RAG
Trace
Prompt
Retrieved docs
Question
Response
LangSmith trace
Distilling useful ideas / tricks to improve RAG
RAG
Idea
Example
Sources
Base case RAG
Top K retrieval on embedded document chunks,
return doc chunks for LLM context window
Pinecone docs (here, here, here). Supported by
many vectorstores and LLM frameworks.
Condensed content embedding
Top K retrieval on embedded document summaries,
but return full doc for LLM context window
LangChain Multi Vector Retriever
LLama-Index Node References
Top K retrieval on embedded chunks or sentences,
but return expanded window or full doc
LangChain Parent Document Retriever
Llama-Index Sentence Window
Fine-tune RAG embeddings
Fine-tune embedding model on your data
Glean insights from fine-tuning
LangChain fine-tuning guide
Llama-Index embeddings fine-tuning
2-stage RAG
First stage keyword search followed by second
stage semantic Top K retrieval
Cohere re-rank
Agents
May benefit more complex RAG use-cases
LangChain agents
LLama-Index multi-document agents
Useful ideas / tricks
RAG
Store documents with
condensed content embedding
Documents
Retrieve full
documents
Condensed Content
Chunk
Chunk
Top K RAG can
fail when we
do not …
Summary
Embed
Top K RAG can
fail when we do
not ...
Summary
LLM
Question + Embedding
When can top
K RAG fail?
Top K RAG can
fail when we do
not ...
Answer
Questions
Question
LangChain Multi Vector Retriever
Chat: Persist conversation history
Chatbots
Retrieval (Optional)
Storage
Retrieved Chunks
Question
Prompt
LLM
Answer
Memory
Use case documentation
LangSmith trace for LLMChain w/ chat model + memory
Chatbots
Prompt
Chat history
Response
LangSmith trace, also works with retrieval
Summarization: Summarize a corpus of text
Summarize
Stuff document in context window
Fits in LLM
context window
Prompt
Extract final summary
from input list
Final
Summary
LLM
Document Loader
Sample from clusters
Embed-and-cluster
Summarize each cluster
Prompt
Summarize themes in
the group of docs
Does not fit in LLM
context window
Summarize chunks (map)
Prompt
Summarize themes in
the group of docs
LLM
LLM
Distill into summary (reduce)
Summary
Prompt
Extract final summary
from input list
Cluster
Summaries
Final
Summary
LLM
Use case documentation
Case-study: Apply to thousands of questions asked about LangChain docs
Summarize
Summarized themes from LangChain questions using different methods and LLMs
User Questions
(Thousands)
Blog Post
Extraction: Getting structured output from LLMs
Extraction
Input
Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher
than him. Claudia is a brunette and Alex is blonde.
Schema (tell LLM schema we want)
schema = {
"properties": {
"name": {"type": "string"},
"height": {"type": "integer"},
"hair_color": {"type": "string"},
},}
LLM
Function call
{'name': 'Alex', 'height': 5, 'hair_color':
'blonde'}
{'name': 'Claudia', 'height': 6,
'hair_color': 'brunette'}
Function (tell the LLM function)
"name": "information_extraction",
"description": "Extracts information from the passage.",
"parameters": {schema}
Use case documentation
LangSmith trace for LLMChain w/ function call + output parsing
Extraction
Prompt
Response
Output from function call
JSON parser
LangSmith Trace
Text-to-SQL
SQL
Query
Question
LLM
LLM
Answer
Optional: SQL Agent
Use Case Documentation
LangSmith trace for text-to-sql
SQL
Prompt
CREATE TABLE description for each table and
and three example rows in a SELECT statement
Response
LangSmith trace, Paper
Yes
Chat Chains
Agents
Basic LLM
API / Function Chains
No
Yes
Access to memory
ConversationalRetrievalChain
No
Agents
Access to tools
APIChain
Large agent ecosystem (will focus on ReAct as one example)
Short-term
Long-term
(Buffers) (> 40 vectorstores)
Action
Tools
(> 60 tools + toolkits)
Memory
Agents
(> 15 agent types)
Tools
Agent
Autonomous
Plan
LLMs
(> 60 integrations)
Action
Simulation
Agents
Yes
Say-Can
ReAct
No
Action-Observation (Tool Use)
Agents
Standard Prompting
Chain-of-thought*
No
Yes
Multi-Step Reasoning
*Condition LLM to show its work
LangSmith trace for SQL ReAct agent
Agents
Prompt
Tool / action
Observation
Uses tool at
next step
Response
(Chain-of-thought) reasoning
LangSmith trace
Case-study on reliability: Web researcher started an agent, retriever was better
Document Loader
Document
Transformation
Agents
Document Retrieval + QA
HTML
pages
Query 1
Research
Question
LLM
Query 2
Vector
Storage
Retrieved
Chunks
LLM
Answer
Query N
Blog Post
Case-study on reliability: Web researcher started an agent, retriever was better
Agents
Hosted streamlit app
Tooling
LangSmith Case study: Fine-tuning for extraction
Weight updates via fine-tuning
Prompt (e.g., via retrieval)
Cramming before a test
Open book exam
Bad for factual recall, good for tasks
Good for factual recall
Form (e.g., extraction)
Facts (e.g., QA)
OpenAI, Fine-tuning and hallucinations, Anyscale
LangSmith Case study: Fine-tuning for extraction of knowledge graph triples
Streamlit app
LangSmith Case study: Fine-tuning
App generations
Dataset
Data
Cleaning
LLM
Synthetic Fine-tune
Data
Eval
Train
Test
Blog
LangSmith evaluation: fine-tuning vs few-shot prompting for triple extraction
CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning
LLaMA-7b-chat informal answers with hallucinations
CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning
LLaMA-7b-chat fine-tuning closer to reference
CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning
Case-study lessons
●
LangSmith can help address pain points in the fine-tuning workflow.
Data collection, evaluation, and inspection of results
●
RAG or few-shot prompting should be considered first!
Few-shot prompting GPT-4 performed best
●
Fine-tuning small open source models can outperform much larger generalist models.
Fine-tuned LLaMA2-chat-7B better than GPT-3.5-turbo
Blog
Questions
Download