Phase 02 — Agents
Module 04 of 12 · 9 min read · Free

Module 4: Memory Systems

By default, every agent wakes up with amnesia. Here's how to fix that — and why most courses skip this entirely.

This is Module 4 of a 12-part curriculum: Build Software Products with AI — From First Principles to Production Pipeline.


Here’s the fundamental problem with LLM-based agents: they are stateless by default.

Every API call starts fresh. The model has no memory of previous conversations, previous decisions, or previous actions unless you explicitly provide that history. In a world of one-off queries, this doesn’t matter. In a world of persistent agents that need to operate continuously over days, weeks, and months — it’s a foundational problem.

This module is about how to solve it. Most agent courses skip memory entirely or treat it as a footnote. That’s a mistake. Memory architecture is one of the most important design decisions you’ll make.


Three Types of Memory

Borrowing from cognitive science, it’s useful to think about agent memory in three categories:

1. Short-Term Memory (Working Memory)

This is the context window. Everything currently in the model’s input: the system prompt, the conversation history, injected documents, tool results.

Short-term memory is fast and immediately accessible — but finite. When the context window fills up, older content drops out. You cannot add to it indefinitely.

For most interactions, short-term memory is all you need. For persistent agents, it’s just the starting point.

2. Long-Term Memory (Persistent Storage)

Information that needs to survive beyond a single session. This is typically stored in files or a database and explicitly loaded into context when needed.

The most practical implementation is plain files. A MEMORY.md file that the agent reads at the start of every session. A USER.md file with persistent facts about the person it’s helping. A decisions.md with key architectural choices that shouldn’t be re-litigated.

Plain files have a lot going for them: they’re inspectable, version-controllable, editable by the human, and easy to reason about. Don’t over-engineer this into a vector database unless you genuinely have retrieval problems at scale.

3. Episodic Memory (Session Logs)

Detailed records of what happened in previous sessions. Not a summary — the raw log of actions taken, decisions made, things tried and discarded.

Episodic memory is useful when you need to reconstruct context (“what did we try last time?”), debug failures (“where did the agent go wrong?”), or audit behaviour (“did the agent do what we expected?”).

In practice: daily journal files. One file per day. Raw notes of what happened. The agent writes to them during a session and reads them at the start of the next.

🧠 MEMORY TYPES — three layers, three timescales
                  Short-term          Long-term            Episodic
                  ──────────          ─────────            ────────
What it is        context window      files / vector DB    session logs
Capacity          finite (~200k tok)  unlimited            unlimited
Lifespan          one session         persists forever     persists forever
Written by        implicit            agent explicitly     agent during session
Read by           always visible      explicitly loaded    queried on demand

Example conversation MEMORY.md memory/2026-05-11.md injected docs USER.md what happened today tool results decisions.md the agent’s diary

Start with short-term + a few plain files. Add vector search only if you have a genuine retrieval problem at scale. Most setups never need it.


The Stateless Problem in Practice

Let me show you why this matters.

Imagine you’re building a coding agent to work on a long-running project. Session 1: you establish the architecture, make some key decisions, write the first few modules. Session 2 (next day): you ask the agent to continue. With no memory system, the agent has no idea what was decided in Session 1. It might suggest a completely different architecture. It might re-ask questions you already answered. It might duplicate code that already exists.

This isn’t a model failure. It’s an infrastructure failure. The agent was never given the context it needed.

With a proper memory system: at the start of Session 2, the agent reads MEMORY.md (long-term context), reads yesterday’s memory/2026-05-08.md (episodic context), and reads relevant files from the codebase (working context). Now it has continuity. It knows what was decided, where things were left, and what to do next.


A Real Memory Architecture

Here’s the system I actually run. Not theoretical — production.

SOUL.md — The agent’s identity. Who it is, how it thinks, what its principles are. Loaded at the start of every session. Defines the persistent character of the agent.

USER.md — Facts about me that never change or change slowly. Name, timezone, professional context, priorities, working style. The agent reads this once and it shapes every interaction.

MEMORY.md — Long-term curated memory. Significant decisions, important context, things the agent needs to remember indefinitely. Actively maintained — I review it periodically and prune what’s no longer relevant.

memory/YYYY-MM-DD.md — Daily episodic notes. The agent writes to this during sessions: what happened, what was decided, what was shipped, what to follow up on. The agent reads today’s and yesterday’s at session start.

HEARTBEAT.md — A checklist of periodic tasks. Checked on a regular cadence (hourly or similar). Things like “check for new meeting transcripts,” “sync the backlog,” “check for urgent emails.”

This architecture gives the agent:

  • Persistent identity (SOUL.md)
  • Persistent context about the human (USER.md)
  • Curated long-term knowledge (MEMORY.md)
  • Recent session context (daily notes)
  • Ongoing task awareness (HEARTBEAT.md)

The total token cost at session start is low — a few thousand tokens. The gain in continuity and context is enormous.


Vector Databases: When You Actually Need Them

You’ll hear a lot about vector databases in the context of agent memory. Pinecone, Weaviate, Chroma, pgvector. They’re real tools with real use cases. They’re also frequently over-applied.

A vector database stores embeddings — numerical representations of text — and allows you to search by semantic similarity. Instead of exact keyword matching, you can find documents that are conceptually related to a query.

When you need this:

  • You have a large corpus of documents (thousands+) that can’t all fit in context
  • You need semantic search — “find everything related to pricing decisions” across a year of notes
  • You’re building a product with a retrieval step (RAG — Retrieval Augmented Generation)

When you don’t:

  • You have a small, well-organised set of memory files
  • You can afford to load the relevant files directly
  • You’re building a personal agent rather than a product serving many users

For most personal agent setups and early-stage product agents, start with files. Add a vector database when file-based retrieval becomes genuinely insufficient. Don’t add complexity before you need it.


Active Memory vs Passive Memory

One distinction that matters: the difference between memory the agent passively reads, and memory the agent actively writes.

Passive memory is context you pre-populate and inject at session start. USER.md, MEMORY.md, project docs. The agent reads it; it doesn’t write it (or only writes it when instructed).

Active memory is memory the agent writes to autonomously during operation. Daily notes, task logs, observations from tool use. The agent maintains this itself as it works.

Active memory is what makes an agent genuinely autonomous over time. An agent that only reads memory is an agent you have to manually update. An agent that writes its own memory improves its own context without you doing anything.

Design your agents to write actively. After completing a task: log what was done. After making a decision: record it and the reasoning. After a session ends: write a brief summary. These small writes compound over time into rich, useful context.


Memory Hygiene

A few practical rules:

Distinguish raw notes from curated knowledge. Daily notes are raw. MEMORY.md is curated. Don’t let raw episodic logs pollute your long-term memory — they’ll bloat the context and dilute what’s important.

Review and prune periodically. Memory that’s no longer relevant is noise. Set a cadence — weekly or monthly — to review long-term memory and remove what’s outdated.

Make memory human-readable. You should be able to open MEMORY.md and understand what the agent knows. If memory is in a format only the agent can parse, you lose the ability to audit and correct it.

Version control your memory files. Put them in git. This gives you a history of what the agent knew and when, and lets you roll back if something goes wrong.


Memory and Trust

There’s a less-discussed dimension of memory: trust and security.

Be careful about what you put in long-term memory for agents that interact with multiple people. Memory that’s useful in a private session with one person might be inappropriate to surface in a group chat. Design your memory architecture with access control in mind.

For personal agents: MEMORY.md contains personal context and shouldn’t be loaded in group or multi-user sessions. This isn’t paranoia — it’s basic hygiene. The agent should know where it is and what it’s appropriate to share.


What’s Next

You now have a memory architecture. Your agent wakes up knowing who it is, who it’s helping, what’s happened before, and what it needs to do. In Module 5, we give it the tools to act on that knowledge — skills and the tool-calling system that makes agents genuinely capable.


Further Reading

Referenced from @nikovijay

“The people who thrive next won’t be the deepest specialists.” — @pbteja1998

N+1 Newsletter
Enjoyed this module?

Subscribe to get notified when new modules and courses drop. No drip — just updates when there's something worth reading.

Subscribe on Substack →