Session 7: Agent Memory, State & Planning

Ever had a conversation with a chatbot that forgot your name 30 seconds after you told it? Or worse—you told it your booking reference, it confirmed it, then asked for it again two messages later? That's what happens without memory. Stateless agents are like goldfish with API keys: useful for one-off queries, useless for anything that resembles a real relationship with a user.

Here's the uncomfortable truth: your LLM has a context window, and it fills up fast. A support conversation with tool calls, RAG chunks, and verbose responses can hit 50K tokens in no time. When that happens, something has to go—usually the oldest stuff, which is often the most important (like "my order is #12345"). Memory systems exist to solve this. They're not a nice-to-have; they're the difference between a demo that impresses in a 5-minute conversation and a production system that users actually trust.

This session covers the real stuff: why memory architecture matters, the different flavors of memory (and when each one bites you), how LangGraph keeps state from falling through the cracks, and the planning patterns that separate "works in a notebook" from "works when the user asks something weird." No hand-waving. No academic fluff. Just what you need to architect agents that remember.

1. Why Memory Matters for Agents

Interview Insight: Interviewers want to know you understand the why, not just the how. "Because context windows are limited" is a start, but they're listening for: user expectations of continuity, cost of lost context in real workflows, and trade-offs between different memory strategies.

Without memory, agents are stateless—every turn starts fresh, every tool call exists in isolation. Users don't think that way. When a customer support agent asks "What was your order number?" and the user provides it, the agent should remember that number for the rest of the conversation. When a user says "I prefer summaries in bullet points," the agent should apply that preference next time. When you're extracting booking info from an email chain spanning five messages, you need context from earlier messages to build a complete picture.

Real-world analogy: Agent memory is like human memory—you don't remember every conversation word-for-word, but you remember the gist and the important details. You remember that your colleague prefers morning meetings. You remember the project reference from last week's email. Memory systems give agents that same kind of selective, contextual recall.

LLM context windows are finite. Even with 128K or 200K tokens, long conversations, multiple tool results, and rich system prompts chew through the limit. Memory systems bridge this gap: they persist information beyond the immediate context, retrieve relevant past info when needed, and maintain continuity across sessions that may span days or weeks. Modern architectures (2025–2026) treat memory as multi-tier: short-term working memory, long-term persistent storage, episodic records of past interactions, and semantic knowledge bases—each with different retrieval and update strategies.

Why This Matters in Production: A booking automation system that forgets the vessel name between extraction and confirmation is broken. Users will lose trust. Memory isn't optional for any agent that handles multi-turn or multi-session workflows.

Aha Moment: Memory is not one mechanism—it's a stack. Working memory (context window), short-term (buffer/summary), long-term (vector DB, database), episodic (past interactions), semantic (RAG). You pick from this menu based on what you need to remember and for how long.

2. Memory Types (In Depth with Examples)

Interview Insight: They'll ask you to compare memory types and explain trade-offs. Know when to use buffer vs summary vs vector store—and when each one fails.

Short-term / Working Memory

This is the current conversation context living directly in the LLM's context window. Current messages, recent tool results, immediate workflow state. It's the "active" memory the model attends to on every forward pass. Limited by context window size—typically 4K to 200K tokens. Ephemeral: once the context is truncated or the session ends, it's gone unless you persist it elsewhere.

Analogy: Working memory is like your desk—what's in front of you right now. You can see it, use it, but it disappears when you clear the desk.

Example: User asks "Which vessels are arriving at Rotterdam next week?" Agent calls a port lookup tool and gets five vessels. User then asks "What's the capacity of the third one?" The agent resolves "the third one" from the list in working memory—no other storage has it.

Conversation Buffer Memory

Stores the full conversation history verbatim. Every user and assistant message appended to a buffer, passed to the LLM on each request. Simplest approach—no summarization, no loss of detail. The problem: unbounded growth. A 50-turn conversation with tool calls can exceed 50K tokens. Eventually you truncate (losing oldest messages) or fail. LangChain's ConversationBufferMemory implements this. Use for short conversations or when you have a huge context window.

Example: A support chatbot keeps every exchange. After 30 turns, the next request sends all 60+ messages. At 32K token limit, you're dropping old messages.

Conversation Summary Memory

Uses an LLM to summarize older messages into a compact summary. Keeps the summary plus the most recent N messages. When the buffer grows beyond a threshold, oldest messages get summarized ("User asked about shipping costs. Agent provided rates for Rotterdam and Hamburg. User preferred Rotterdam."), and the summary replaces them. Trades accuracy for space—the summary loses nuance, specific numbers, exact phrasing. LangChain's ConversationSummaryMemory. Useful for long conversations when you need to stay within limits, but the model may "forget" summarized details.

Example: After 20 turns, the buffer is 15K tokens. System summarizes first 15 turns into 500 tokens. Context now: 500-token summary + last 5 turns. Model continues but may not recall the exact vessel name from turn 3.

Conversation Buffer Window Memory

Keeps only the last K messages. Simple sliding window—new message arrives, oldest drops. No summarization—just truncation. Straightforward and predictable, but loses old context entirely. If the user mentioned something important in message 1 and refers to it in message 20, and K is 10, that reference is lost. LangChain's ConversationBufferWindowMemory. Use when recent context is sufficient and you want to avoid summarization cost.

Example: K=10. User says "I'm John from Acme Corp" in message 1. By message 15, it's outside the window. When user says "What's the status of my company's booking?" the agent may not know "my company" = Acme Corp.

Long-term Memory

Persists beyond sessions using external storage. Not in the context window by default—retrieved when relevant.

Vector store–based: Each memory (e.g., "User prefers bullet points") is embedded and stored. At query time, embed the current query, retrieve top-k similar memories, inject into the prompt. Enables semantic retrieval.
Database-based: Structured facts in a relational or document DB. User preferences, profile fields. Retrieved by key (user_id) or query. Good for "User's timezone: Europe/Copenhagen."
Knowledge graph–based: Entities and relationships. "User John works at Acme Corp. Acme Corp has shipments to Rotterdam." Enables multi-hop reasoning.

Example: User says "Remember that I always want summaries in bullet points." Stored in vector store with {user_id: "u123", type: "preference"}. A week later, user asks "Summarize this document." Agent embeds query, retrieves "User prefers bullet points," injects into prompt.

Episodic Memory

Records specific past interactions and their outcomes. "Last time the user asked about vessel X, they wanted the ETA." "Last extraction missed the port—we had to retry." Useful for personalization and learning from experience. Stored with timestamps and context. Research (ReAcTree, MIRIX) shows episodic memory improves long-horizon task performance—agents that remember past failures avoid repeating them.

Example: User previously asked about Rotterdam, then Hamburg. Stored: "User tends to ask about multiple ports in sequence." Next time, agent proactively includes nearby ports.

Semantic Memory

General knowledge and facts—the RAG knowledge base. Company policies, product docs, shipping schedules, port info. Stored in vector store or knowledge graph, retrieved on demand. Unlike episodic (specific past events), semantic is enduring facts.

Example: Agent has RAG index of shipping policies. User asks "What's the cancellation policy?" Agent retrieves and answers. Knowledge exists regardless of prior questions.

Why This Matters in Production: In an email booking system, you need episodic memory for "last time this sender's format was tricky" and semantic memory for port codes and vessel specs. Mix and match.

Aha Moment: Long-term memory isn't "everything"—it's curated. Store preferences, key facts, episodic highlights. Not every utterance. Retention policies (e.g., 90 days) prevent bloat.

3. State Management in LangGraph

Interview Insight: LangGraph state is central. They want to hear: TypedDict vs Pydantic, what reducers do, and how you'd design state for a real workflow (e.g., extraction + review).

LangGraph flows a shared state object through every node. Each node reads from state, does computation (LLM, tools), and returns updates that are merged in. The state schema defines what the workflow can hold.

Analogy: State is like a shared whiteboard passing through a room. Each person (node) reads what's there, adds or changes something, and passes it on. Reducers decide how conflicting updates get merged.

TypedDict state: Simplest. Define a TypedDict with type hints for each key. Each key is a "channel." No validation beyond types—flexible but less strict.

Pydantic BaseModel state: State as a Pydantic model. Validated on update, supports defaults, clear schema docs. Use when you want runtime validation.

Reducers: Define how updates are merged when multiple nodes write to the same channel. Default: replace (new overwrites old). For accumulating channels (e.g., message history), use operator.add or a custom reducer. Node A returns {"messages": [msg1]}, Node B returns {"messages": [msg2]}—add reducer produces {"messages": [msg1, msg2]}. Custom reducers can implement merge logic (e.g., "merge extracted fields, prefer higher confidence").

Channels: Each state key is a channel. messages typically uses add (append). extracted_data, current_step typically use replace.

Example—email extraction state: messages (conversation history, add), extracted_data (vessel, port, ETA, quantities—replace), confidence_scores (replace), review_status (replace), current_email (replace). Extraction node reads current_email and messages, writes to extracted_data and confidence_scores. Review node reads extracted_data, writes to review_status.

Why This Matters in Production: In a booking extraction pipeline, you need extracted_data with replace (each step can refine), messages with add (full audit trail), and confidence_scores so the review node knows what to flag.

Aha Moment: Reducers are why multi-agent workflows don't overwrite each other. Without add on messages, the last node would erase the whole conversation.

4. Checkpointing in LangGraph

Interview Insight: Checkpointing enables persistence and human-in-the-loop. Know: thread_id, MemorySaver vs PostgresSaver, interrupt_before vs interrupt(), and how resume works.

Checkpointing saves the complete graph state at each step. Enables persistence, resumption, replay, and human-in-the-loop.

Analogy: Checkpointing is like save points in a video game. You can quit, come back later, and pick up exactly where you left off—or rewind to an earlier point to try a different path.

What it is: After each "super-step," the checkpointer serializes state and writes to storage. Each checkpoint has a thread_id—unique identifier for the conversation or workflow instance. With thread_id, the checkpointer loads the latest checkpoint and the graph continues from there.

Checkpointers:

MemorySaver: In-memory. Fast, no config, lost on restart. Dev only.
SqliteSaver: SQLite. Good for single-process or local.
PostgresSaver: PostgreSQL with JSONB. Production-ready, concurrent access, durable.

Thread IDs: Each conversation gets a unique thread_id. Pass in config: {"configurable": {"thread_id": "my-thread"}}. Resume continues from last checkpoint. New thread_id = fresh start.

Time-travel debugging: Checkpoints persist (not overwritten). Invoke with {"configurable": {"thread_id": "1", "checkpoint_id": "abc-123"}} to replay from that checkpoint. Debug: "What was the state when extraction failed?"

Human-in-the-loop: Use interrupt_before or interrupt_after to pause at a node. Graph saves state and returns. Human reviews (e.g., extracted booking data), optionally edits, resumes. interrupt() inside a node pauses at that exact line. When resumed, Command(resume=human_input)—that value becomes the return value of interrupt(). Requires checkpointer and thread_id.

Configuration: Always pass config={"configurable": {"thread_id": "..."}} when using checkpointing. Without it, nothing gets saved.

Why This Matters in Production: Human-in-the-loop booking approval requires: pause after extraction, show data to human, capture approval/edit, resume with that input. Checkpointing makes this possible without losing state.

Aha Moment: You can resume with modified state. Human edits the extracted port code? Pass it in Command(resume=edited_data). The graph continues with the corrected data.

5. Planning Patterns (In Depth)

Interview Insight: ReAct vs Plan-and-Execute is a classic. They want trade-offs, when to use each, and awareness of newer patterns (ReAcTree, RP-ReAct).

ReAct (Reason + Act)

Interleaves reasoning and action: Think (Thought) → Act → Observe → repeat until done. Implemented as: model node → tool node → conditional edge (continue or end).

Flow: User: "What's the weather in Copenhagen and when does the next flight leave?" Thought: "Need weather and flight info. Weather first." Action: get_weather("Copenhagen"). Observation: "15°C, partly cloudy." Thought: "Now flight info." Action: search_flights("Copenhagen"). Observation: "Next flight 14:30." Thought: "Have both." Answer.

Strengths: Flexible, handles diverse tasks, adapts to tool results. Weaknesses: Can loop. Sensitive to tool descriptions. No upfront plan validation.

When to use: General-purpose agents, tool-heavy workflows, path unknown in advance.

Plan-and-Execute

Creates full plan first, then executes sequentially. Separate planner and executor. Planner: "1. Search vessel info. 2. Get port schedule. 3. Synthesize."

Flow: User: "Compare ETA of vessels A and B at Rotterdam." Planner: Step 1: ETA vessel A. Step 2: ETA vessel B. Step 3: Compare. Executor runs 1, 2, 3 in order.

Strengths: Better for complex multi-step tasks. Plan validated before execution. Easier to debug. Weaknesses: Plan may need revision mid-execution. Upfront planning cost. Inflexible if plan is wrong.

When to use: Complex, well-structured tasks; when reproducibility matters.

2025 evolution: Reason-Plan-ReAct (RP-ReAct) decouples strategic planning from execution. Reasoner-Planner handles strategy; executors use ReAct for tools. Addresses trajectory instability in enterprise tasks.

Tree-of-Thought (ToT)

Explores multiple reasoning paths. At each step, generate several candidates, evaluate each, expand the best. Like BFS/DFS over reasoning.

Strengths: Good for math, strategy, creative tasks. Avoids local optima. Weaknesses: Expensive (many LLM calls). Needs good evaluation function.

2025 evolution: ReAcTree combines hierarchical decomposition with tree exploration. ~61% success on long-horizon benchmarks vs ~31% for ReAct.

Reflection

Generate output → review for issues → regenerate improved version. Self-improvement loop.

When to use: High-quality outputs (documents, code, reports) when cost is acceptable.

LATS (Language Agent Tree Search)

Monte Carlo Tree Search over agent actions. Evaluates outcomes, backtracks. Good for large action spaces.

Why This Matters in Production: Email booking extraction: Plan-and-Execute ("parse email → extract fields → validate → create booking") with ReAct-style flexibility when a field is ambiguous. Hybrid beats pure ReAct or pure Plan.

Aha Moment: ReAct is great until it loops. Plan-and-Execute is great until the plan is wrong. Production systems often combine: high-level plan, ReAct at step level.

6. Context Window Management

Interview Insight: "What happens when context fills up?" is a common question. Prioritization, summarization, and extraction-before-summary are key.

When context gets too long, you prioritize, summarize, or drop.

Analogy: It's like packing a suitcase. You keep essentials (passport, tickets), compress bulky items (summarize old messages), and leave behind what you can buy there (drop low-value history).

Priority ordering:

System prompt — Always keep.
Current task context — Immediate request and critical context.
Relevant memories — Retrieved long-term or episodic.
Recent history — Last N messages.
Older history — Summarize or drop.

Triggers: When token count exceeds threshold (e.g., 80% of window), summarize oldest messages. Or sliding window: keep last K only.

The challenge: Summarization loses detail. "My order is #12345" might become "User discussed an order." Mitigation: extract key facts (order IDs, preferences) into structured memory before summarizing. Retrieve when building context.

2025 research: ACON uses compression guidelines (26–54% reduction). ReSum enables indefinite exploration via periodic summarization. AgentFold does proactive context management at multiple scales.

Aha Moment: Don't summarize first and hope. Extract critical facts into long-term memory, then summarize. That way you don't lose "order #12345" when you compress "User discussed their order."

7. Persistent Memory Architectures

Interview Insight: Vector store retrieval is standard. MemGPT/Letta's hierarchical approach and newer architectures (CMA, MAGMA, MIRIX) show you're tracking the field.

Vector store retrieval: Each memory as text, embedded, stored with metadata. At query time, embed context/query, retrieve top-k, inject. Standard pattern.

MemGPT/Letta approach: Hierarchical memory inspired by OS virtual memory. LLM has limited "main context." Additional data in "archival storage" (slow, large). "Recall storage" holds recently retrieved items. LLM manages memory via tools: search archival, recall into main context, edit/delete. Enables unlimited data and long-term interactions beyond context window.

2025–2026: CMA (persistent storage, selective retention, associative routing, temporal chaining). MAGMA (multi-graph: semantic, temporal, causal, entity). MIRIX (six memory types, multimodal). FluxMem (context-adaptive structure, probabilistic gating).

Mermaid Diagrams

flowchart TB
    subgraph contextWindow["Context Window (Working Memory)"]
        currentMessages[Current Messages]
        toolResults[Recent Tool Results]
        immediateState[Immediate State]
    end
 
    subgraph shortTerm["Short-term Memory"]
        bufferWindow[Buffer / Window / Summary]
    end
 
    subgraph longTerm["Long-term Memory"]
        vectorStore[Vector Store]
        database[Database]
        knowledgeGraph[Knowledge Graph]
    end
 
    subgraph memoryTypes["Memory Types"]
        episodic[Episodic: Past Interactions]
        semantic[Semantic: General Knowledge]
    end
 
    currentMessages --> bufferWindow
    bufferWindow --> currentMessages
    vectorStore --> currentMessages
    database --> currentMessages
    knowledgeGraph --> currentMessages
    episodic --> vectorStore
    semantic --> vectorStore

flowchart LR
    userQuery[User Query] --> thought[Thought]
    thought --> action[Action]
    action --> toolExec[Tool Execution]
    toolExec --> observation[Observation]
    observation --> moreToDo{More to do?}
    moreToDo -->|Yes| thought
    moreToDo -->|No| finalAnswer["Final Answer"]

flowchart TB
    userTask[User Task] --> planner[Planner]
    planner --> planSteps["Plan: Step 1, 2, 3..."]
    planSteps --> executor[Executor]
    executor --> exec1[Execute Step 1]
    exec1 --> exec2[Execute Step 2]
    exec2 --> exec3[Execute Step 3]
    exec3 --> synthesize[Synthesize Result]

flowchart TB
    invoke[Invoke Graph] --> node1[Node 1]
    node1 --> node2[Node 2]
    node2 --> checkpoint[Checkpoint Saved]
    checkpoint --> interruptCheck{interrupt_before?}
    interruptCheck -->|Yes| pause[Pause - Return State]
    pause --> humanReview[Human Reviews]
    humanReview --> resume[Resume with Command]
    resume --> continueNode[Continue from Checkpoint]
    interruptCheck -->|No| continueNode

flowchart TB
    input[Input] --> extraction[Extraction Node]
    extraction --> conditional{Confidence OK?}
    conditional -->|Yes| createBooking[Create Booking]
    conditional -->|No| reviewNode[Review Node]
    reviewNode --> humanApproval[Human Approval]
    humanApproval --> createBooking

Code Examples

from typing import Annotated
from typing_extensions import TypedDict
from operator import add
 
class AgentState(TypedDict):
    messages: Annotated[list, add]  # Append new messages
    extracted_data: dict            # Replace on update
    current_step: str               # Replace on update
 
# In StateGraph
from langgraph.graph import StateGraph, START, END
 
workflow = StateGraph(AgentState)
workflow.add_node("extract", extraction_node)
workflow.add_node("review", review_node)
workflow.add_edge(START, "extract")
workflow.add_edge("extract", "review")
workflow.add_edge("review", END)
 
graph = workflow.compile()

from langgraph.checkpoint.memory import MemorySaver
 
checkpointer = MemorySaver()
graph = workflow.compile(checkpointer=checkpointer)
 
config = {"configurable": {"thread_id": "user-123-session-1"}}
result = graph.invoke({"messages": [{"role": "user", "content": "Extract booking from..."}]}, config)
 
# Later, resume the same conversation
result2 = graph.invoke({"messages": [{"role": "user", "content": "Add port Copenhagen"}]}, config)

from langgraph.types import interrupt
 
def review_node(state: AgentState):
    # Pause and ask for human approval
    approved = interrupt({
        "action": "review",
        "data": state["extracted_data"],
        "message": "Please review and approve the extracted booking data."
    })
    return {"review_status": "approved" if approved else "rejected"}
 
# Compile with interrupt_before
graph = workflow.compile(
    checkpointer=checkpointer,
    interrupt_before=["review"]
)
 
# First invoke - pauses at review
result = graph.invoke(input_data, config)
 
# Resume with human decision
from langgraph.types import Command
graph.invoke(Command(resume=True), config)  # or resume=False to reject

from langgraph.prebuilt import create_react_agent
 
agent = create_react_agent(model, tools, checkpointer=checkpointer)
 
config = {"configurable": {"thread_id": "thread-1"}}
result = agent.invoke({"messages": [{"role": "user", "content": "What's the weather in Copenhagen?"}]}, config)

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
 
# Store a memory
memory_text = "User prefers bullet-point summaries"
vectorstore.add_texts(
    [memory_text],
    metadatas=[{"user_id": "u123", "type": "preference", "timestamp": "2025-01-15"}]
)
 
# Retrieve relevant memories at query time
query = "Summarize this document"
relevant = vectorstore.similarity_search(query, k=5, filter={"user_id": "u123"})
memories = "\n".join([doc.page_content for doc in relevant])
# Inject: f"User preferences: {memories}\n\nTask: {query}"

Conversational Interview Q&A

1. "How do you implement persistent memory for an agent across sessions?"

Weak answer: "We use a vector store to store memories and retrieve them when needed."

Strong answer: "Layered approach. Short-term: last N messages in context so the agent has immediate continuity. Long-term: vector store for facts and preferences—embed and store with user_id, timestamp, type. At session start or when building context, embed the current query, retrieve top-k for that user, inject into the prompt. For structured data like timezone or output format, we use a database keyed by user_id—faster for exact lookups. The key is deciding what to persist: preferences, key facts, episodic highlights—not every utterance. At Maersk, our agent platform abstracts this: agents declare memory as a capability, orchestration handles retrieval and injection. We also have retention policies—drop memories older than 90 days—to avoid bloat and stale data."

2. "Compare ReAct vs Plan-and-Execute. When would you use each?"

Weak answer: "ReAct is more flexible, Plan-and-Execute is more structured."

Strong answer: "ReAct interleaves thinking and acting—think, act, observe, repeat. Flexible, adapts to tool results in real time. Great for open-ended tasks where the path isn't known. Plan-and-Execute creates a full plan upfront, then executes sequentially. Better when the task is complex but structured, when you want to validate the plan before execution, or when reproducibility matters. ReAct can loop; Plan-and-Execute can be inflexible if the plan's wrong. In practice we use hybrids. For our email booking automation at Maersk, the high-level flow is Plan-and-Execute—parse email, extract fields, validate, create booking. But within extraction, when a field is ambiguous or we need to cross-reference multiple emails, we use ReAct-style tool calls. Reason-Plan-ReAct formalizes this: planner for strategy, ReAct executors for execution."

3. "What happens when an agent's context window fills up? How do you manage it?"

Weak answer: "We truncate or summarize the conversation."

Strong answer: "Priority order: system prompt first—non-negotiable. Then current task context, then relevant retrieved memories, then recent history, then older. When we hit a threshold—say 80% of the window—we trigger summarization of the oldest messages. LLM produces a compact summary, we replace those messages. But here's the trap: summarization loses detail. 'My order is #12345' can become 'User discussed an order.' So before summarizing, we extract critical facts—order IDs, preferences, references—and store them in long-term memory. When building context, we retrieve those. At Maersk, in our email booking flow, we extract vessel, port, ETA into state before any summarization. That way we never lose the booking reference when we compress the email thread."

4. "How do you implement human-in-the-loop with checkpointing?"

Weak answer: "We use LangGraph's interrupt feature."

Strong answer: "Two things: a checkpointer and interrupts. We use PostgresSaver for production—SQLite or MemorySaver for dev. Every invocation gets a thread_id in config. We use interrupt_before on the review node—when the graph hits that node, it saves state and returns. Our app gets the extracted data, displays it in a UI, human approves or edits. To resume, we invoke with Command(resume=approved_or_edited_data) and the same thread_id. The checkpointer loads state, graph continues. The human's input becomes part of the flow. At Maersk, our booking system works exactly like this: extract → pause → human reviews extracted vessel, port, ETA → approve/edit → resume → create booking. State is never lost. We can also use interrupt() inside a node for more granular control—pause at a specific line, not just node boundaries."

5. "Explain LangGraph state management. How do reducers work?"

Weak answer: "State is a dictionary that gets passed between nodes. Reducers merge updates."

Strong answer: "State is a shared dict flowing through every node. Each key is a channel. Nodes read from state, return partial updates. LangGraph merges those updates—reducers define how. Default is replace: new overwrites old. For channels that accumulate, like messages, we use Annotated[list, add]—operator.add appends. So node A returns {'messages': [msg1]}, node B returns {'messages': [msg2]}, merged state has [msg1, msg2]. For our extraction workflow, messages uses add—we want full audit trail. extracted_data uses replace—each step can refine. We could use a custom reducer for confidence_scores—merge preferring higher confidence. Reducers are why multi-agent workflows don't overwrite each other. Without add on messages, the last node would erase the conversation."

6. "An agent needs to remember preferences across 100+ conversations. How do you architect this?"

Weak answer: "We'd use a vector store and retrieve on each request."

Strong answer: "Hybrid. Structured preferences in a database—summary_format, timezone, etc.—keyed by user_id. Fast lookups, easy updates. Vector store for implicit preferences: when the user says 'I prefer X' or we infer from behavior, embed and store. At query time, retrieve top-k. For 100+ conversations, we can't keep everything. We use recency and importance—keep latest preferences. Optionally run a consolidation job: if user said 'bullet points' 10 times and 'paragraphs' once, update structured preference. When building context for each request, fetch structured prefs + top-k vector memories, inject into system prompt. The agent always sees 'User prefers bullet points.' Our Maersk platform has this: structured store for format/timezone, vector store for inferred preferences. Agents get both without having to implement it themselves."

From Your Experience

1. Your platform mentions "memory" for agents. How is it implemented?

Tailored prompt: "Walk me through how memory works on the enterprise AI Agent Platform you built at Maersk—orchestration layer, long-term storage, and how agents consume it."

Reference points: LangGraph checkpointing, thread_id, vector store for facts/preferences, structured preferences store, orchestration abstraction, memory quality evaluation.

2. How does the email booking agent maintain context across a multi-step extraction process?

Tailored prompt: "The email booking system extracts from multi-message chains. How does it keep context—state schema, RAG usage, handling of follow-up emails?"

Reference points: LangGraph state (messages, extracted_data, confidence_scores, sources), add vs replace reducers, RAG for vessel/port lookup, human review routing.

3. How does human-in-the-loop work in your booking system? How is state preserved for human review?

Tailored prompt: "When extraction confidence is low, a human reviews. Describe the interrupt flow, checkpointing, and how the human's edit gets back into the pipeline."

Reference points: interrupt_before on review node, PostgresSaver, thread_id, Command(resume=edited_data), display in UI, booking creation after approval.

Quick Fire Round

Q: What's the default reducer in LangGraph?
A: Replace—new value overwrites old.

Q: When do you use add reducer?
A: For channels that accumulate—e.g., messages. Append, don't overwrite.

Q: MemorySaver vs PostgresSaver?
A: MemorySaver = in-memory, dev only. PostgresSaver = persistent, production.

Q: What's thread_id for?
A: Unique identifier for a conversation/workflow instance. Enables resume and human-in-the-loop.

Q: interrupt_before vs interrupt()?
A: interrupt_before pauses at node boundary; interrupt() pauses at a specific line inside a node.

Q: Buffer memory vs summary memory?
A: Buffer = full history verbatim, grows unbounded. Summary = LLM compresses old messages, saves space, loses detail.

Q: Episodic vs semantic memory?
A: Episodic = past interactions and outcomes. Semantic = general knowledge (RAG).

Q: When to use ReAct?
A: Open-ended tasks, path unknown, tool-heavy. General-purpose agents.

Q: When to use Plan-and-Execute?
A: Complex but structured tasks, want plan validation, reproducibility matters.

Q: What do you do before summarizing old messages?
A: Extract critical facts (order IDs, preferences) into long-term memory so they're not lost.

Q: What's a channel in LangGraph?
A: Each key in the state schema. Defines what data the workflow holds.

Q: How does Command(resume=...) work?
A: Pass human input when resuming after interrupt. That value becomes the return of interrupt() or flows into the next node.

Q: Vector store for memory—how does retrieval work?
A: Embed current query/context, similarity_search with user filter, inject top-k into prompt.

Q: Why not keep everything in working memory?
A: Context window limit. Long conversations + tool results exceed it. Need tiered memory.

Key Takeaways (Cheat Sheet)

Topic	Key Point
Why memory matters	Agents need continuity. Context windows are finite—memory bridges the gap.
Short-term memory	Current context in LLM window. Ephemeral.
Buffer memory	Full history verbatim. Simple but unbounded.
Summary memory	LLM summarizes old messages. Trades accuracy for space.
Buffer window	Last K messages. Loses old context.
Long-term memory	Vector store, database, or knowledge graph. Retrieved when relevant.
Episodic memory	Past interactions and outcomes. Personalization.
Semantic memory	General knowledge. RAG.
LangGraph state	TypedDict or Pydantic. Channels. Reducers define merge.
Reducers	Replace (default). Add (append). Custom (complex merge).
Checkpointing	Saves state per step. thread_id for resume. MemorySaver (dev), PostgresSaver (prod).
Human-in-the-loop	interrupt_before/after or interrupt(). Resume with Command(resume=...).
ReAct	Think → Act → Observe → repeat. Flexible. Can loop.
Plan-and-Execute	Plan upfront, execute sequentially. Good for structured tasks.
Tree-of-Thought	Explore multiple paths. Evaluate, select best.
Reflection	Generate → review → regenerate.
Context management	Prioritize: system > task > memories > recent > older. Extract before summarize.