LLM application development with LangChain Lance Martin Software Engineer, LangChain @RLanceMartin LangChain makes it as easy as possible to develop LLM-powered applications. Platform (LangSmith) and Library (LangChain) Use-cases Web-Langchain Chat-Langchain More … Template apps Chains LangChain Expression Language Compose building blocks Building Blocks Open source Document Loaders Document Transformers Storage + Embeddings Prompt LLM Platform Observability, Data Management, Eval LangSmith + Hub LangSmith Blog Central concepts Two ways pre-trained LLMs learn Weight updates via fine-tuning Prompt (e.g., via retrieval) Cramming before a test Open book exam Bad for factual recall, good for tasks Good for factual recall Form (e.g., extraction, text-to-SQL) Facts (e.g., QA) OpenAI, Fine-tuning and hallucinations, Anyscale Search engines (Retrieval only) Retrieval Augmented Generation (Add task-related documents to LLM content window / working memory) LLM (Memory only) Building Blocks Document Loaders: > 140 Integrations Document Loaders Structured Public Unstructured Datastores Proprietary Private / Company Data .pdf, .txt, .json., ,md, … Integrations hub Text splitters Document Transformers Playground Beyond basic splitting: Context-aware splitting, Function calling (Doctran) Document Loaders Document Transformers Document Transformers Embeddings + Storage Code: Def foo Def foo(...): for section in sections: for section in sections: sect = section.find("head") sect = section.find("head") Markdown: Introduction # Introduction Notion templates are effective . . . Notion templates are effective . . . PDF: Abstract The dominant sequence transduction models . . . Abstract The dominant sequence . . . Context-aware splitting, Doctran > 40 vectorstore integrations, > 30 embeddings Document Loaders (> 140 Integrations) Document Transformers Vector Storage (> 40 Integrations) (e.g., Text Splitters, OAI functions) Embeddings (> 30 Integrations) Embeddings Document + Storage Transformers Hosted or Private vectorstore + embeddings Hosted Embeddings Document + Storage Transformers Private (on device) Embeddings Vector Storage GPT4All embeddings > 60 LLM integrations LLM Document Loaders (> 140 Integrations) Document Transformers Vector Storage (> 40 Integrations) (e.g., Text Splitters, OAI functions) Embeddings (> 30 Integrations) LLMs (> 60 Integrations) LLM landscape LLM OpenAI Anthropic Llama-2 (SOTA OSS) Context Window (tokens) 4k - 32k 100k 4k Performance GPT-4 SOTA (best overall) Claude-2 getting closer to GPT4 70b on par w/ GPT-3.5-turbo* Cost $0.06 / 1k token (input) 4-5x cheaper than GPT-4-32K Free *Llama2-70b on par w/ GPT-3.5-turbo on language, but lags on coding Math Language Code Open Source LLMs LLM 1.5M Tuned Fine Tune (Instructions) GPT4All 800K Nous 300k 150k Koala Vicuna 15k 180B OPT Dolly 340B 400B MPT-Instruct 15k 1T GPT-J GPT-NeoX-20b 52k SOTA 300k LLaMA-2-Chat 100k 70k Alpaca 1.4T 1.5T 2T LLaMA StableLM Falcon Base Model (Training tokens) LLaMA-2 MPT BLOOM Nous OSS models can run on device (private) LLM Llama2-13b running ~50 tok / sec (Mac M2 max, 32gb) Ollama, Llama.cpp Integrations Hub Integrations hub Use Cases RAG: Load working memory w/ retrieved information relevant to a task Document Loading Splitting Storage Retrieval RAG Output URLs PDFs Documents Database Relevant Splits Splits Prompt LLM Answer Query Question Use case documentation Pick desired level of abstraction RAG Abstraction / Simplicity VectorstoreIndexCreator Answer RetrievalQA Relevant Splits Load_QA_chain Answer Answer Use case documentation Or, use runnables RAG Use case documentation LangSmith trace for RetrievalQA chain RAG Trace Prompt Retrieved docs Question Response LangSmith trace Distilling useful ideas / tricks to improve RAG RAG Idea Example Sources Base case RAG Top K retrieval on embedded document chunks, return doc chunks for LLM context window Pinecone docs (here, here, here). Supported by many vectorstores and LLM frameworks. Condensed content embedding Top K retrieval on embedded document summaries, but return full doc for LLM context window LangChain Multi Vector Retriever LLama-Index Node References Top K retrieval on embedded chunks or sentences, but return expanded window or full doc LangChain Parent Document Retriever Llama-Index Sentence Window Fine-tune RAG embeddings Fine-tune embedding model on your data Glean insights from fine-tuning LangChain fine-tuning guide Llama-Index embeddings fine-tuning 2-stage RAG First stage keyword search followed by second stage semantic Top K retrieval Cohere re-rank Agents May benefit more complex RAG use-cases LangChain agents LLama-Index multi-document agents Useful ideas / tricks RAG Store documents with condensed content embedding Documents Retrieve full documents Condensed Content Chunk Chunk Top K RAG can fail when we do not … Summary Embed Top K RAG can fail when we do not ... Summary LLM Question + Embedding When can top K RAG fail? Top K RAG can fail when we do not ... Answer Questions Question LangChain Multi Vector Retriever Chat: Persist conversation history Chatbots Retrieval (Optional) Storage Retrieved Chunks Question Prompt LLM Answer Memory Use case documentation LangSmith trace for LLMChain w/ chat model + memory Chatbots Prompt Chat history Response LangSmith trace, also works with retrieval Summarization: Summarize a corpus of text Summarize Stuff document in context window Fits in LLM context window Prompt Extract final summary from input list Final Summary LLM Document Loader Sample from clusters Embed-and-cluster Summarize each cluster Prompt Summarize themes in the group of docs Does not fit in LLM context window Summarize chunks (map) Prompt Summarize themes in the group of docs LLM LLM Distill into summary (reduce) Summary Prompt Extract final summary from input list Cluster Summaries Final Summary LLM Use case documentation Case-study: Apply to thousands of questions asked about LangChain docs Summarize Summarized themes from LangChain questions using different methods and LLMs User Questions (Thousands) Blog Post Extraction: Getting structured output from LLMs Extraction Input Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde. Schema (tell LLM schema we want) schema = { "properties": { "name": {"type": "string"}, "height": {"type": "integer"}, "hair_color": {"type": "string"}, },} LLM Function call {'name': 'Alex', 'height': 5, 'hair_color': 'blonde'} {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'} Function (tell the LLM function) "name": "information_extraction", "description": "Extracts information from the passage.", "parameters": {schema} Use case documentation LangSmith trace for LLMChain w/ function call + output parsing Extraction Prompt Response Output from function call JSON parser LangSmith Trace Text-to-SQL SQL Query Question LLM LLM Answer Optional: SQL Agent Use Case Documentation LangSmith trace for text-to-sql SQL Prompt CREATE TABLE description for each table and and three example rows in a SELECT statement Response LangSmith trace, Paper Yes Chat Chains Agents Basic LLM API / Function Chains No Yes Access to memory ConversationalRetrievalChain No Agents Access to tools APIChain Large agent ecosystem (will focus on ReAct as one example) Short-term Long-term (Buffers) (> 40 vectorstores) Action Tools (> 60 tools + toolkits) Memory Agents (> 15 agent types) Tools Agent Autonomous Plan LLMs (> 60 integrations) Action Simulation Agents Yes Say-Can ReAct No Action-Observation (Tool Use) Agents Standard Prompting Chain-of-thought* No Yes Multi-Step Reasoning *Condition LLM to show its work LangSmith trace for SQL ReAct agent Agents Prompt Tool / action Observation Uses tool at next step Response (Chain-of-thought) reasoning LangSmith trace Case-study on reliability: Web researcher started an agent, retriever was better Document Loader Document Transformation Agents Document Retrieval + QA HTML pages Query 1 Research Question LLM Query 2 Vector Storage Retrieved Chunks LLM Answer Query N Blog Post Case-study on reliability: Web researcher started an agent, retriever was better Agents Hosted streamlit app Tooling LangSmith Case study: Fine-tuning for extraction Weight updates via fine-tuning Prompt (e.g., via retrieval) Cramming before a test Open book exam Bad for factual recall, good for tasks Good for factual recall Form (e.g., extraction) Facts (e.g., QA) OpenAI, Fine-tuning and hallucinations, Anyscale LangSmith Case study: Fine-tuning for extraction of knowledge graph triples Streamlit app LangSmith Case study: Fine-tuning App generations Dataset Data Cleaning LLM Synthetic Fine-tune Data Eval Train Test Blog LangSmith evaluation: fine-tuning vs few-shot prompting for triple extraction CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning LLaMA-7b-chat informal answers with hallucinations CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning LLaMA-7b-chat fine-tuning closer to reference CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning Case-study lessons ● LangSmith can help address pain points in the fine-tuning workflow. Data collection, evaluation, and inspection of results ● RAG or few-shot prompting should be considered first! Few-shot prompting GPT-4 performed best ● Fine-tuning small open source models can outperform much larger generalist models. Fine-tuned LLaMA2-chat-7B better than GPT-3.5-turbo Blog Questions