Module 3: What is an Agent?
A chatbot answers questions. An agent gets things done. Here's the difference, and why it matters.
This is Module 3 of a 12-part curriculum: Build Software Products with AI — From First Principles to Production Pipeline.
The word “agent” is everywhere right now and mostly meaningless in common usage. People call anything from a chatbot with a slightly longer prompt to a fully autonomous system an “agent.” That ambiguity makes it hard to reason about what you’re actually building.
Let’s fix the definition, then build up from there.
The Core Distinction
A chatbot processes a message and returns a response. One input, one output. The loop begins and ends in a single exchange. The user does the work of chaining multiple interactions together.
An agent does something fundamentally different. It receives a goal, breaks it into steps, takes actions to complete those steps (including using tools and external systems), observes the results, and continues until the goal is achieved — or until it decides it can’t proceed.
The difference is autonomy and action. An agent isn’t just generating text in response to your input. It’s executing a plan.
The Agent Loop
Every agent — regardless of the framework or platform — runs on some version of this loop:
1. Perceive — receive input (user message, event, scheduled trigger)
2. Think — reason about what to do next
3. Act — call a tool, write a file, send a message, spawn a sub-agent
4. Observe — receive the result of the action
5. Repeat — use the observation to inform the next cycle
6. Terminate — when the goal is achieved or can't be achieved
This is sometimes called the ReAct loop (Reason + Act). It’s the underlying structure of every serious agent implementation.
What makes this powerful: the agent can run through this loop many times in a single “turn.” Ask an agent to research a topic and write a report, and it might: search the web (act), read the results (observe), identify gaps (think), search again with refined queries (act), synthesise what it found (think), write the draft (act), check it against the original brief (think), revise (act) — all before returning anything to you.
The user experience is: you gave it a task, it completed the task. The underlying reality is: many cycles of reasoning and action.
👁 PERCEIVE input arrives: message / event / scheduled trigger ↓ 🧠 THINK what is the best next action? ↓ ⚡ ACT call tool / write file / spawn sub-agent / send message ↓ 👁 OBSERVE what did the action return? ↓ ↺ REPEAT? goal achieved? → no → back to THINK → yes → respond to userThis loop may run dozens of times before anything is returned to you. The user sees one response. The agent ran many cycles to produce it.
Tools and Function Calling
Tools are what give an agent the ability to act in the world beyond generating text.
A tool is a function the model can choose to call. You define the function, describe what it does in natural language, and specify its parameters. The model decides when and how to call it based on the task at hand.
A simple tool definition (in JSON schema format):
{
"name": "read_file",
"description": "Read the contents of a file at the given path",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The absolute path to the file"
}
},
"required": ["path"]
}
}
When the model decides to call this tool, it returns a structured call with the parameter values filled in. Your code receives that call, executes the actual file read, and feeds the result back into the context window. The model then continues reasoning with the file contents in context.
Common categories of tools:
- File I/O — read, write, search files
- Shell execution — run commands, scripts, tests
- Web access — fetch URLs, search the web
- APIs — interact with external services (Slack, GitHub, Stripe, databases)
- Spawning — create sub-agents or spawn new sessions
The tool set defines what the agent can do. An agent with no tools is just a chatbot with more complex prompting. An agent with a rich tool set can interact with virtually any system.
Memory and Goals
Two more things distinguish a real agent from a chatbot: memory and goals.
Memory is the mechanism by which an agent maintains context across actions and — critically — across sessions. We’ll go deep on this in Module 4, but the core idea is: without memory, every agent invocation starts from scratch. With memory, an agent can maintain continuity, learn from previous runs, and build up long-term context.
Goals give the agent direction. Rather than responding to a single message, a properly designed agent has a clear objective it’s working toward. That objective shapes every decision in the loop: what action to take, what information to seek, when the task is complete.
In practice, goals are encoded in the system prompt and refined through the conversation. A well-specified goal is the difference between an agent that meanders and one that executes.
A Concrete Example
Let me make this concrete with an example from the setup I run daily.
I have an agent called Ultron — my main assistant agent. Here’s a simplified version of what happens when I tell it: “Review the open PRs in the whorang repo and summarise anything that needs my attention.”
Goal: Summarise open PRs needing attention
Cycle 1:
- Think: I need to list the open PRs in the whorang repo
- Act: call exec("gh pr list --repo nikovijay/whorang --state open")
- Observe: returns list of 3 open PRs with titles and ages
Cycle 2:
- Think: PR #47 is 4 days old — I should check the review status and diff
- Act: call exec("gh pr view 47 --json reviews,files,body")
- Observe: returns PR details — no reviews, 3 files changed
Cycle 3:
- Think: PR #51 has a failing CI check — I should check the log
- Act: call exec("gh run list --branch feat/filter-fix | head -5")
- Observe: returns failed test run
Cycle 4:
- Think: I have enough context to summarise
- Act: return response to user with summary of what needs attention
From my perspective: I asked a question. It came back with a useful summary. Underneath: four cycles of reasoning and tool use, each building on the last.
This is what separates an agent from a chatbot. The chatbot would have told me how to check PRs. The agent just did it.
Agentic Failure Modes
Building agents well requires understanding how they fail. The main failure modes:
Goal drift. The agent completes a plausible-looking task that isn’t actually what you asked for. Mitigation: precise goal specification, output validation.
Runaway loops. The agent gets stuck in a loop — taking actions, observing results that don’t satisfy the goal, taking more actions, not knowing when to stop. Mitigation: explicit termination conditions, maximum iteration limits.
Tool misuse. The agent calls the wrong tool, or calls the right tool with incorrect parameters, or calls a destructive tool when it shouldn’t. Mitigation: clear tool descriptions, approval gates for irreversible actions.
Context window exhaustion. Long agent runs fill up the context window, causing the agent to lose track of earlier context. Mitigation: periodic summarisation, context management strategies.
Hallucinated tool calls. The model invents tool calls that don’t exist, or invents parameter values. Mitigation: strict tool schemas, validation before execution.
None of these are showstoppers. All of them are manageable with good design. But you have to build with them in mind.
Agents Are Infrastructure, Not Magic
The mental model I want you to have: an agent is a piece of software infrastructure that has an LLM as its reasoning core.
The LLM decides what to do. The surrounding infrastructure — tool execution, memory management, session handling, error recovery — makes it reliable and useful in production.
Most “AI agent” failures are infrastructure failures, not model failures. The model reasoned correctly; the surrounding system didn’t execute the action properly, didn’t feed back the result, didn’t handle the error case, didn’t persist the state. Build the infrastructure right and the model performs remarkably well. Skimp on it and you’ll have a fragile system that works in demos and breaks in production.
What’s Next
You now understand what an agent is: a reasoning loop that uses tools to take actions in pursuit of a goal. The next question is: how does it maintain continuity over time? How does it remember what happened yesterday, last week, across hundreds of sessions?
That’s the memory problem. Module 4 covers it in full.
Further Reading
-
[Paper] ReAct: Synergizing Reasoning and Acting — Yao et al., 2022 — The paper that formalised the Reason+Act loop. The concrete examples are the most useful part.
-
[Blog] LLM Powered Autonomous Agents — Lilian Weng — The single best overview of agent architecture. Covers planning, memory, tools, and action in one thorough post.
-
[Docs] Anthropic Tool Use Documentation — The practical reference for implementing tool calling with Claude. Clear examples and schema definitions.
-
[Docs] Model Context Protocol (MCP) — Anthropic’s open standard for tool and context integration. Increasingly the lingua franca for agent tools.
-
[Paper] A Survey on LLM-based Autonomous Agents — Wang et al., 2023 — Comprehensive academic overview of the agent landscape. Useful for understanding the full design space.
-
[Course] Berkeley CS294 - LLM Agents — The most directly relevant university course to this curriculum. The entire syllabus is built around LLM-based agent systems. Watch alongside Modules 3–7.
-
[Course] CMU 11-667 - Large Language Models — CMU’s dedicated LLM course. Covers pretraining, fine-tuning, alignment, and agents. Technically rigorous without requiring a PhD.
-
[Course] CMU 11-711 - Advanced NLP — Graham Neubig’s course, strong on modern LLM capabilities and limitations. Good for building accurate mental models of what these systems can and can’t do.
-
[Course] Hugging Face Agents Course — Hands-on agents course with working code. Covers the same conceptual ground as this module but through implementation. Build alongside reading.
Referenced from @nikovijay
“This is my Mission Control: A Squad of 10 autonomous @openclaw agents… They create work on their own. They claim tasks on their own. They talk with each other.” — @pbteja1998
“We’re opening up a new job role for Firecrawl. This time humans aren’t allowed to apply. AI Agents only.” — @nickscamara_
Subscribe to get notified when new modules and courses drop. No drip — just updates when there's something worth reading.
Subscribe on Substack →