AI Engineer Prep

Session 5: LangChain, LangGraph & Agent Frameworks

You're about to learn the frameworks that power most production AI systems—and when they actually help versus when they're just ceremony. Here's the thing: every Senior AI Engineer interview will probe whether you understand why you'd pick LangGraph over CrewAI, or LCEL over raw API calls. Memorizing docs isn't enough. You need to articulate trade-offs from real experience.

This session gives you exactly that: technical depth on LangChain, LangGraph, and the major agent frameworks, plus the opinionated takes that separate "I've used it" from "I know when and why to use it." We'll cover composition patterns, stateful agent loops, tool calling, checkpointing, and the messy reality of framework choice. No fluff. No "this session provides a comprehensive overview." Just the stuff that matters when you're building—or being grilled about—production AI systems.


1. LangChain Core & LCEL

Interview Insight: Interviewers want to know you understand the abstraction, not just the syntax. They're testing: Can you explain what LCEL gives you that a for-loop doesn't? Can you justify when to use LangChain vs. hitting the API directly?

Think of LangChain like a universal remote for LLMs. You've got OpenAI, Anthropic, Google, a dozen others—each with different quirks. LangChain gives you a single interface: Chat Models, Prompts, Output Parsers. Swap providers without rewriting your orchestration. The real magic is LCEL (LangChain Expression Language): it's like Unix pipes for LLM workflows. Instead of result1 = step1(x); result2 = step2(result1); result3 = step3(result2), you write prompt | model | parser. Declarative, composable, streamable out of the box.

Core abstractions: Chat Models (ChatOpenAI, ChatAnthropic) wrap provider APIs behind a common interface. Prompts (ChatPromptTemplate) structure input with placeholders—multi-message, system/human/ai. Output Parsers (StrOutputParser, JsonOutputParser, PydanticOutputParser) turn raw LLM output into structured data. StrOutputParser extracts the string. JsonOutputParser parses JSON and can inject format instructions. PydanticOutputParser enforces a schema and tells the LLM exactly what structure to produce.

LCEL in practice: Every LCEL component is a Runnable. invoke(input) for single execution, stream(input) for incremental output, batch(inputs) for parallel processing. The pipe | connects them: output of one becomes input of the next. RunnablePassthrough forwards input unchanged or merges with extra keys. RunnableLambda wraps a Python function. RunnableParallel runs multiple Runnables on the same input and merges outputs—handy when you need several retrievers or tools in parallel.

On the data side: Document Loaders (files, URLs), Text Splitters (chunking with overlap), Retrievers, Vector Stores. Load → split → embed → store → retrieve. That's your RAG pipeline. LangChain abstracts the retrieval interface so you can swap Chroma for Pinecone without touching chain logic.

Why This Matters in Production: In an enterprise platform like yours, you need provider abstraction for cost control, fallbacks, and A/B testing. LCEL's batch and stream map directly to high-throughput and real-time UX. RunnableParallel lets you fan out to multiple tools or retrievers without blocking—critical when latency matters.

Aha Moment: LCEL chains are DAGs—directed acyclic graphs. No cycles. That's why agent loops (agent → tools → agent) aren't native. When you need loops, you step up to LangGraph.

flowchart LR
    subgraph lcelFlow["LCEL Pipeline"]
        prompt[ChatPromptTemplate] --> model[ChatOpenAI]
        model --> parser[StrOutputParser]
    end
    input["input dict"] --> prompt
    parser --> output["structured output"]

2. LangGraph: When Your Agent Needs to Loop

Interview Insight: They're checking if you know when LangGraph is the right tool. "We use LangGraph" without justification is a red flag. "We use LangGraph because we needed cycles, checkpointing, and human-in-the-loop" is the answer they want.

Think of LangGraph like a subway map for agent workflows. LangChain chains are linear: A → B → C. LangGraph adds cycles: the agent can go to tools, get results, and loop back to itself. That's the agent loop. Add state that persists across nodes, checkpointing so you can resume after a crash, and interrupts for human approval—you've got a production-grade agent runtime.

Core concepts: You define a StateGraph with a typed state schema (TypedDict or Pydantic). State is a shared dict every node reads and writes. Each node receives state, does work, and returns a state update—a partial dict that gets merged in. Reducers control the merge: default replaces; for messages, you use an append reducer so new messages add to the history.

Nodes are functions: def node(state: State) -> dict. Pure with respect to state—receive, compute, return. Edges connect them. Normal edges: add_edge("agent", "tools") means always go to tools after agent. Conditional edges: a router inspects state and returns the next node name. add_conditional_edges("agent", should_continue, {"tools": "tools", "finish": END})—if the agent has tool_calls, go to tools; else go to END. That's how you branch.

START and END are special. add_edge(START, "agent") is entry. add_edge("tools", "agent") creates the cycle. graph.compile(checkpointer=...) produces the runnable. Invoke with initial state and config = {"configurable": {"thread_id": "user-123"}} for persistence.

Why This Matters in Production: Your email booking agent loops—extract intent, maybe call RAG, maybe call a tool, maybe ask for clarification. That's cycles. Your platform needs resumable conversations (checkpointing) and human approval before sensitive actions (interrupts). LangGraph is built for exactly that.

Aha Moment: LangGraph state is immutable from the node's perspective. You never mutate in place. You return updates. That makes the flow deterministic and debuggable—you can log state at every step or replay from a checkpoint.

flowchart TB
    start([START]) --> agentNode[Agent Node]
    agentNode --> check{should_continue}
    check -->|"tool_calls present"| toolNode[ToolNode]
    check -->|"no tool_calls"| finish([END])
    toolNode --> agentNode

3. Tool Calling: From @tool to the Agent Loop

Interview Insight: They want a step-by-step mental model. "The model returns AIMessage with tool_calls, ToolNode executes them, results go back to the agent" — that's the answer. Bonus points for mentioning parallel tool execution and how tool descriptions drive model behavior.

Define tools with @tool. The docstring is the description the LLM sees—be clear and specific. Parameters come from the function signature; Pydantic validates. Bind tools to the model: model.bind_tools(tools). The model can now return AIMessage with a tool_calls field—name, args, tool_call_id. ToolNode takes a list of tools; given state with messages, it finds the last AIMessage with tool_calls, runs each tool, and returns ToolMessage objects. ToolNode runs tools in parallel when possible.

The loop in full: (1) User sends message. (2) Agent node: model with tools bound gets messages, returns AIMessage with or without tool_calls. (3) Conditional edge: if tool_calls, route to tools; else END. (4) Tools node: ToolNode executes, appends ToolMessages. (5) Edge back to agent. (6) Agent runs again with full history—model sees tool results, decides to call more tools or give final answer. Repeat until no tool_calls.

Why This Matters in Production: Your platform exposes tools (MCP, APIs, retrievers). The agent needs to discover, call, and chain them. Tool descriptions drive accuracy—vague descriptions mean wrong tool choice. Parallel execution in ToolNode cuts latency when multiple tools are needed.

Aha Moment: The model decides when to call tools. You don't hardcode "if user asks X, call tool Y." You describe tools well and let the model reason. That's the agentic shift—from deterministic pipelines to reasoning + tool use.

sequenceDiagram
    participant User
    participant Agent as Agent Node
    participant Router as should_continue
    participant Tools as ToolNode
    User->>Agent: HumanMessage
    Agent->>Agent: LLM with tools bound
    Agent->>Router: AIMessage with tool_calls
    Router->>Tools: route to tools
    Tools->>Tools: Execute get_weather etc
    Tools->>Agent: ToolMessage results
    Agent->>Agent: LLM reasons on results
    Agent->>Router: AIMessage no tool_calls
    Router->>User: route to END

4. Checkpointing: Making Agents Survive Restarts

Interview Insight: Persistence is table stakes for production. They'll ask how you handle "user comes back tomorrow" or "we need human approval before running this tool." Checkpointing + interrupts is the answer. Know MemorySaver vs PostgresSaver and when each is right.

Without checkpointing, when your process dies or the user disconnects, the agent forgets everything. Checkpointing saves state after every node. You can resume days later, implement human-in-the-loop (pause, show UI, resume with approval), and debug by replaying from earlier states.

Implementations: MemorySaver—in-memory, dev only, lost on restart. SqliteSaver—local file, small deployments. PostgresSaver—production: concurrent access, scales, multi-instance. Any server can resume any thread.

How it works: Read-execute-write. Before each node, load latest checkpoint for thread_id, init state, run node, write new checkpoint. Checkpoints are versioned—history, not overwrite. Thread management: Pass thread_id in config. Each thread = one conversation. Resume = invoke again with same thread_id.

Human-in-the-loop: Use interrupt_before or interrupt_after when compiling. Graph runs until that node, saves state, returns control with interrupt payload. Your app shows a UI, collects human input, then graph.invoke(Command(resume=...), config=config). Graph loads checkpoint, applies resume, continues. Interrupts require a checkpointer—no persistence, no resume.

flowchart TB
    generate[generate node] --> interrupt["interrupt_before human_review"]
    interrupt --> pause["Save state, return to caller"]
    pause --> ui["Show UI, human approves"]
    ui --> resume["Command resume"]
    resume --> humanReview[human_review node]
    humanReview --> finish([END])

Why This Matters in Production: Your email booking agent has human-in-the-loop—someone approves before a booking is created. Your platform needs multi-tenant conversation isolation (thread_id per user/session). PostgresSaver is the default for anything beyond a prototype.

Aha Moment: Checkpoints are append-only. You keep history. That enables time-travel debugging: "replay from step 3" or "what did state look like when we interrupted?"—invaluable for debugging production agents.


5. Subgraphs: Composing Agent Workflows

Interview Insight: Subgraphs matter when you have modular agents—research vs coding vs support. They want to know you can compose graphs, handle state mapping, and understand the checkpointing limitation with multiple subgraphs in one node.

A subgraph is a graph used as a node inside another graph. Nest them: parent has a node that invokes a compiled LangGraph. Each subgraph has its own state schema. The parent node must map parent state → subgraph input, call subgraph.invoke(...), then map output → parent state. Or design shared keys—subgraph reads/writes same keys as parent—simpler when schemas align.

Use cases: (1) Team distribution—different teams own different subgraphs (research agent, code agent). Parent routes. (2) Reuse—shared "retrieve and summarize" subgraph in multiple parents. (3) Multi-agent—each agent is a subgraph; parent coordinates handoffs. (4) Streamingstream(subgraphs=True) emits updates from nested nodes with namespacing.

Limitation: As of 2025, LangGraph doesn't support multiple subgraphs in one node with checkpointing enabled (namespacing issue). Workarounds: Send API or disable checkpointing for that subgraph.

Why This Matters in Production: Your platform likely has multiple agent types—booking, support, internal tools. Subgraphs let you modularize: each domain is its own graph, with a parent orchestrator. Clean separation, independent evolution.

Aha Moment: Subgraphs are just Runnables. graph.compile() returns something you can invoke or stream. Composing graphs is the same as composing any LCEL component—it's turtles all the way down.


6. LlamaIndex: When Your Complexity Is in the Data

Interview Insight: LangChain vs LlamaIndex is a classic question. The right answer: "Different focus. LangChain for orchestration, LlamaIndex for data-heavy RAG. Many teams use both."

LlamaIndex is data-first. It specializes in connecting LLMs to data: document loaders, node extraction, indexing, retrieval. Documents are raw units (PDF, web page). Nodes are chunks or structured pieces (text, table rows, entities). Indexes organize nodes: VectorStoreIndex (embeddings), SummaryIndex (sequential + summarization), KnowledgeGraphIndex (entities + relationships). Query Engines take a query and return an answer using the index and optionally an LLM. Retrievers fetch relevant nodes.

LlamaIndex excels at complex ingestion—PDFs with tables, hierarchical docs, multi-modal. More index types, robust node parsers. For "get the right data into the right shape," LlamaIndex is often faster to production. LangChain is stronger for agents, chains, tool orchestration. Many teams use both: LlamaIndex for ingestion/retrieval, LangChain/LangGraph for the agent that uses a retriever as a tool. Wrap a LlamaIndex query engine in a LangChain tool—best of both.

Why This Matters in Production: Your RAG for the booking agent—policy docs, rate tables—might use LlamaIndex for parsing complex PDFs or building a knowledge graph. The agent layer stays in LangGraph. Separation of concerns.

Aha Moment: The choice isn't either/or. It's "where does your complexity live?" Data pipeline complexity → LlamaIndex. Agent logic complexity → LangGraph.

flowchart TB
    subgraph layers["Application Layers"]
        ui[User Interface]
        orch[Orchestration]
        data[Data Retrieval]
        llm[LLM Providers]
    end
    subgraph frameworks["Framework Strength"]
        lg[LangGraph]
        lc[LangChain]
        li[LlamaIndex]
    end
    lg -.->|"cycles state checkpointing"| orch
    lc -.->|"chains tools RAG"| orch
    lc -.->|"moderate"| data
    li -.->|"indexes ingestion"| data

7. CrewAI, Agno, and the Framework Landscape

Interview Insight: They're probing breadth. "Have you looked at alternatives?" CrewAI and Agno come up. Know the mental model: role-based vs graph-based, rapid prototyping vs production control.

CrewAI is role-based multi-agent. Agents have role, goal, backstory—injected into prompts for consistent behavior. Tasks have description, expected_output, and an assigned agent. Crew has agents and tasks. crew.kickoff(inputs) runs it. Process: sequential (Task 1 → 2 → 3) or hierarchical (manager delegates to workers). Simpler than LangGraph for fixed pipelines. Trade-off: less control over state, cycles, and persistence. Use when you want Researcher → Writer → Editor with minimal code.

Agno is multi-modal, rapid prototyping. Built-in memory, knowledge, tools. Agent teams in "coordinate" mode—agents collaborate and hand off. Supports text, images, audio, video. Good for demos and MVPs. Lighter than LangGraph. For complex control flow or production persistence, LangGraph wins.

Why This Matters in Production: Your platform might support multiple frameworks—teams pick based on use case. CrewAI for content pipelines; LangGraph for agents that need cycles and human-in-the-loop. Abstraction over framework choice is a design challenge.

Aha Moment: Frameworks exist on a spectrum: raw API → LangChain → LangGraph (more control, more complexity). CrewAI and Agno sit between—faster to stand up, less flexible. Match the framework to the problem's complexity.


8. Framework Decision Matrix

Interview Insight: "Which framework would you use for X?"—they want a decision framework, not a memorized answer. Your reasoning matters more than the specific choice.

Use Case Framework Why
Agent loops, cycles, resumability LangGraph State, checkpointing, interrupts. Built for it.
Role-based delegation, fixed pipeline CrewAI Define agents and tasks; framework orchestrates.
Complex data ingestion, multiple indexes LlamaIndex Data-first. Node parsers, graph indexes.
Rapid prototype, multi-modal Agno Batteries included, lightweight.
Simple: one LLM call, basic RAG Raw API No framework overhead. Understand primitives first.

Anthropic's advice: Start without frameworks. One API call, one retrieval step—do it raw. When you add tools, memory, multi-step reasoning, persistence, then frameworks pay off. Frameworks add abstraction. For simple flows, that obscures what's happening. For complex flows, it saves you from reimplementing the wheel.

Why This Matters in Production: At Maersk, you've seen both: agents that needed full LangGraph (email booking with loops, human approval) and simpler flows that might have stayed closer to raw APIs. The platform abstracts LLM access—framework choice is per-agent.

Aha Moment: The worst choice is using a heavyweight framework for a trivial problem. Start simple. Add abstraction when complexity justifies it.


Code Examples

LCEL: prompt | model | parser

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, PydanticOutputParser
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from typing import List
 
# Basic chain — declarative composition
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Be concise."),
    ("human", "{input}"),
])
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
parser = StrOutputParser()
chain = prompt | model | parser
response = chain.invoke({"input": "What is the capital of France?"})
# "Paris"
 
# Structured output with Pydantic — schema enforcement
class Recipe(BaseModel):
    name: str
    ingredients: List[str]
    steps: List[str]
 
json_parser = PydanticOutputParser(pydantic_object=Recipe)
json_prompt = ChatPromptTemplate.from_messages([
    ("system", "Output valid JSON. {format_instructions}"),
    ("human", "Give me a simple recipe for {dish}"),
])
json_chain = (
    json_prompt.partial(format_instructions=json_parser.get_format_instructions())
    | model
    | json_parser
)
recipe = json_chain.invoke({"dish": "pasta"})
 
# RunnableParallel — fan out, merge results
from langchain_core.runnables import RunnableParallel
parallel_chain = RunnableParallel(
    summary=lambda x: chain.invoke({"input": f"Summarize: {x['input']}"}),
    keywords=lambda x: chain.invoke({"input": f"Extract keywords: {x['input']}"}),
)
result = parallel_chain.invoke({"input": "LangChain is a framework for LLM apps."})
# {"summary": "...", "keywords": "..."}

LangGraph Agent with ToolNode

from typing import Annotated, TypedDict, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
 
@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city. Use for weather-related queries."""
    return f"The weather in {city} is 72°F and sunny."
 
@tool
def search_knowledge_base(query: str) -> str:
    """Search the internal knowledge base. Use for company policy or product questions."""
    return f"Search results for '{query}': [Relevant excerpts...]"
 
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
 
def create_agent_graph():
    tools = [get_weather, search_knowledge_base]
    model = ChatOpenAI(model="gpt-4o-mini", temperature=0).bind_tools(tools)
    tool_node = ToolNode(tools)
 
    def agent_node(state: AgentState) -> dict:
        response = model.invoke(state["messages"])
        return {"messages": [response]}
 
    def should_continue(state: AgentState) -> Literal["tools", "__end__"]:
        last = state["messages"][-1]
        if hasattr(last, "tool_calls") and last.tool_calls:
            return "tools"
        return "__end__"
 
    builder = StateGraph(AgentState)
    builder.add_node("agent", agent_node)
    builder.add_node("tools", tool_node)
    builder.add_edge(START, "agent")
    builder.add_conditional_edges("agent", should_continue, {"tools": "tools", "__end__": END})
    builder.add_edge("tools", "agent")
    return builder.compile()
 
agent = create_agent_graph()
result = agent.invoke({"messages": [HumanMessage(content="What's the weather in San Francisco?")]})

Human-in-the-Loop with interrupt_before

from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
 
class ApprovalState(TypedDict):
    query: str
    draft_response: str
    approved: bool
    final_response: str
 
def generate_draft(state: ApprovalState) -> dict:
    draft = f"Draft for: {state['query']}. [Sensitive content needing approval.]"
    return {"draft_response": draft}
 
def human_review(state: ApprovalState) -> dict:
    # In production: pause here, show UI, resume with Command(resume=...)
    return {"approved": True, "final_response": state["draft_response"]}
 
def route_after_review(state: ApprovalState) -> str:
    return "publish" if state.get("approved") else "reject"
 
builder = StateGraph(ApprovalState)
builder.add_node("generate", generate_draft)
builder.add_node("human_review", human_review)
builder.add_node("publish", lambda s: s)
builder.add_edge(START, "generate")
builder.add_edge("generate", "human_review")
builder.add_conditional_edges("human_review", route_after_review, {"publish": "publish", "reject": "reject"})
builder.add_edge("publish", END)
 
memory = MemorySaver()
graph = builder.compile(checkpointer=memory, interrupt_before=["human_review"])
config = {"configurable": {"thread_id": "approval-1"}}
result = graph.invoke({"query": "Refund policy for premium customers?"}, config=config)
# Resuming: graph.invoke(Command(resume={"approved": True}), config=config)

CrewAI: Sequential Crew

from crewai import Agent, Task, Crew
 
researcher = Agent(
    role="Senior Researcher",
    goal="Find accurate, relevant information on the topic",
    backstory="Expert researcher with 20 years of experience.",
    verbose=True,
)
writer = Agent(
    role="Technical Writer",
    goal="Write clear content based on research",
    backstory="Skilled writer who translates complex ideas.",
    verbose=True,
)
 
research_task = Task(
    description="Research {topic} and summarize key findings.",
    expected_output="Summary of key findings with sources.",
    agent=researcher,
)
writing_task = Task(
    description="Write a 300-word article based on: {context}",
    expected_output="Well-structured article.",
    agent=writer,
    context=[research_task],
)
 
crew = Crew(agents=[researcher, writer], tasks=[research_task, writing_task], process="sequential")
result = crew.kickoff(inputs={"topic": "AI impact on software development"})

Conversational Interview Q&A

"Why would you pick LangGraph over CrewAI for agent orchestration?"

Weak answer: "LangGraph is more powerful. CrewAI is simpler but LangGraph has more features."
Vague. Doesn't tie to concrete requirements.

Strong answer: "It depends on the control flow. For our email booking agent at Maersk, we needed cycles—the agent extracts intent, might call RAG for policy lookup, might call a booking tool, and could loop back for clarification. LangGraph's StateGraph models that natively with conditional edges. We also needed human-in-the-loop: pause before executing a booking, show the draft for approval, resume. LangGraph has interrupt_before and checkpointing built in. CrewAI is great for fixed pipelines—Researcher → Writer → Editor—where you don't need cycles or custom state. For our use case, LangGraph was the right abstraction."


"Walk through what happens when the LLM decides to call a tool in LangGraph."

Weak answer: "The model returns tool calls, then we execute them and pass the results back."
Missing the state flow, conditional routing, and loop.

Strong answer: "The agent node invokes the model with tools bound via bind_tools. The model returns an AIMessage with tool_calls—name and args. The agent node returns {'messages': [that AIMessage]}. The conditional edge should_continue inspects the last message: if tool_calls is non-empty, we route to the ToolNode. ToolNode finds the tool_calls, executes each tool—in parallel when possible—and returns ToolMessage objects with results. We add an edge from tools back to the agent. The agent runs again with the full message history including tool results. The model can call more tools or produce a final answer. The loop continues until the model responds without tool_calls, at which point should_continue routes to END."


"How does LangGraph state differ from what you pass through a LangChain chain?"

Weak answer: "LangGraph has state, chains don't."
True but underspecified.

Strong answer: "LangGraph state is a first-class shared object. Every node receives the full state, returns a partial update, and reducers merge it—e.g. add_messages for message history so we append, not replace. The state is immutable from the node's perspective: you return updates, you don't mutate. LangChain chains pass output linearly: A's output → B's input. There's no shared accumulator. If you need message history in a chain, you structure your I/O to carry it through—RunnablePassthrough, dict merging—but it's not a first-class concept. LangGraph's state + reducers + conditional edges are what enable the agent loop. In a chain, you'd have to wrap the loop in a custom Runnable, which isn't idiomatic."


"When would you use raw API calls instead of LangChain?"

Weak answer: "When you want more control or less dependencies."
Correct but generic.

Strong answer: "For simple flows—one LLM call, or a basic RAG with a single retrieval step—raw API calls are clearer and easier to debug. You see exactly what goes in and out. Frameworks add abstraction; for trivial cases that obscures what's happening. I'd start raw, then add LangChain when I need composition—parallel branches, streaming, batch—or provider abstraction. Add LangGraph when I need cycles, state, or checkpointing. Our platform at Maersk abstracts LLM access centrally—guardrails, evals, MLflow—so individual agents can use frameworks or raw calls. The key is: don't reach for a framework before the complexity justifies it."


"How do you implement human-in-the-loop with LangGraph?"

Weak answer: "You use interrupts and checkpointing."
Right idea, no detail.

Strong answer: "You need a checkpointer—MemorySaver for dev, PostgresSaver for production—so state can be saved and resumed. Use interrupt_before=["tools"] or interrupt_after=["generate"] when compiling. When execution hits that point, the graph saves state and returns to the caller with an interrupt payload—e.g. the proposed tool calls or draft response. Your application shows a UI, the human approves or edits, then you invoke again with Command(resume={'approved': True}) and the same thread_id. The graph loads the checkpoint, applies the resume value, and continues. For our booking agent, we interrupt before the final booking creation—human sees the draft, approves, we resume. Dynamic interrupts are possible too: call interrupt(value) from inside a node when you need conditional pauses, e.g. only for high-risk tools."


"Compare LangChain and LlamaIndex. When do you use each?"

Weak answer: "LangChain is for agents, LlamaIndex is for RAG."
Oversimplified. LangChain does RAG too.

Strong answer: "LangChain is orchestration-first—chains, agents, tools, memory. It does RAG, but the data pipeline is one of many capabilities. LlamaIndex is data-first: document ingestion, node extraction, multiple index types—vector, summary, knowledge graph—and robust parsing for complex docs. When the hard part is 'get the right data into the right shape'—PDFs with tables, hierarchical structure—LlamaIndex often gets you to production faster. When the hard part is agent logic—tool orchestration, cycles, state—LangChain/LangGraph wins. Many teams use both: LlamaIndex for ingestion and retrieval, LangGraph for the agent that uses a retriever as a tool. Our RAG for the booking agent could use LlamaIndex for policy doc parsing; the agent layer stays in LangGraph."


From Your Experience

Your email booking agent loops: extract → RAG/tools → maybe clarify → create booking. How would you map that to a LangGraph? What nodes would you define? Where would you place interrupt_before for human approval, and how would that integrate with your existing platform's guardrails and evaluations?

Your platform centralizes LLM access, guardrails, and observability. If individual agents can use LangGraph, CrewAI, or custom code, how do you abstract over framework differences? What's your interface for tools, state persistence, and tracing? How does MLflow (or equivalent) receive traces from LangGraph runs?

You've built an AI agent platform with MCP tools, evaluations, and prompt management. How would a LangGraph agent discover and call platform-registered tools? Would you wrap them as LangChain tools, or is there a higher-level abstraction? How do evaluations run against a stateful, multi-step agent—by checkpoint replay, or by treating each run as a single trace?


Quick Fire Round

Q: What does LCEL's | operator do?
A: Composes Runnables—output of one becomes input of the next. Like Unix pipes.

Q: Why can't you express agent loops natively in LCEL?
A: LCEL chains are DAGs. No cycles. The pipe flows forward only.

Q: What's a reducer in LangGraph?
A: Function that merges state updates. Default: replace. For messages: append (e.g. add_messages).

Q: What does model.bind_tools(tools) do?
A: Creates a model that knows tool defs. It can return AIMessage with tool_calls when it decides to use them.

Q: What's the role of ToolNode?
A: Finds tool_calls in the last AIMessage, executes each tool, returns ToolMessages. Runs tools in parallel when possible.

Q: MemorySaver vs PostgresSaver?
A: MemorySaver = in-memory, dev only. PostgresSaver = persistent, concurrent, production.

Q: What's thread_id for?
A: Identifies a conversation. Each thread has its own checkpoint history. Resume = same thread_id.

Q: What do interrupt_before and interrupt_after do?
A: Pause execution at that node. Save state, return to caller. Resume with Command(resume=...). Requires checkpointer.

Q: When would you use CrewAI over LangGraph?
A: Fixed role-based pipeline (Researcher → Writer). No cycles, minimal custom state. Faster setup.

Q: LlamaIndex's strength vs LangChain?
A: Data-first: complex ingestion, multiple index types, node parsers. LangChain stronger for orchestration.

Q: What's a subgraph?
A: A graph used as a node in another graph. Enables modular agents (research subgraph, code subgraph).

Q: Why does Anthropic recommend starting without frameworks?
A: Understand primitives first. Frameworks add abstraction—helpful at scale, obscuring for simple flows. Add when complexity justifies.

Q: What's RunnableParallel?
A: Runs multiple Runnables on same input in parallel, merges outputs into a dict.


Key Takeaways (Cheat Sheet)

Topic Key Point
LangChain Provider-agnostic LLM framework. Chat models, prompts, parsers. LCEL for composition.
LCEL Pipe | composes Runnables. Passthrough, Lambda, Parallel. DAGs—no cycles.
LangGraph Stateful, cyclic agent workflows. StateGraph, nodes, edges, conditional routing.
State TypedDict/Pydantic. Nodes return updates; reducers merge. Immutable from node POV.
Tool Calling @tool, model.bind_tools(), ToolNode. Agent → should_continue → tools → agent loop.
Checkpointing MemorySaver / SqliteSaver / PostgresSaver. thread_id in config. Resume, time-travel.
Interrupts interrupt_before, interrupt_after, or interrupt() in node. Resume with Command(resume=...).
Subgraphs Graph as node. Own state or shared keys. Checkpointing limits multi-subgraph in one node.
LlamaIndex Data-first. Documents, nodes, indexes (Vector, Summary, KG). Strong for RAG data.
CrewAI Role-based. Agent (role, goal, backstory), Task, Crew. Sequential or hierarchical.
Agno Multi-modal, rapid prototyping. Built-in memory, knowledge. Good for demos.
When LangGraph Cycles, state, persistence, human-in-the-loop, custom control flow.
When CrewAI Fixed pipeline, role delegation, minimal setup.
When LlamaIndex Complex ingestion, multiple index types, data-heavy RAG.
When raw API Simple flows. Start here. Add frameworks when complexity grows.

Further Reading