Module 2: Thinking in Prompts
The system prompt is code. Most people treat it like a search box. That's why their results are mediocre.
This is Module 2 of a 12-part curriculum: Build Software Products with AI — From First Principles to Production Pipeline.
If Module 1 was about understanding what an LLM is, this module is about learning to communicate with it properly.
Most people write prompts the same way they’d write a Google search: a few words, a rough intent, and an expectation that the system figures out the rest. That works for casual queries. It fails badly when you’re building production systems that need to behave consistently.
Prompting at a professional level is software engineering. It has structure, logic, and discipline. It produces reliable, repeatable outputs. Let’s cover how it actually works.
The System Prompt is Your Specification
Every serious LLM application has a system prompt. It’s the first thing the model reads before any user input. Think of it as the contract between you and the model — it defines who the model is, what it’s doing, what constraints it operates under, and how it should behave.
A weak system prompt:
You are a helpful assistant.
A strong system prompt:
You are a senior backend engineer reviewing pull requests for a TypeScript codebase.
Your job is to:
- Identify logic errors, edge cases, and security vulnerabilities
- Flag anything that violates the project's coding standards (defined below)
- Suggest concrete improvements with code examples
- Ignore cosmetic issues like variable naming unless they create ambiguity
Coding standards:
- All async functions must have explicit error handling
- No raw SQL — use the ORM
- Functions longer than 40 lines should be decomposed
Tone: direct and specific. No praise. No filler. Just the issues and the fixes.
Notice what the strong version does: it defines the role, the task, the specific behaviors, the constraints, and the tone. The model has a precise specification. It will behave more consistently and produce more useful output.
The rule: anything you’d tell a new hire on their first day belongs in the system prompt.
Context Injection
The model only knows what’s in its context window. This means that any information it needs to do the job — documents, data, previous decisions, project context — has to be explicitly inserted.
This is context injection. You’re not asking the model to recall something. You’re placing the relevant information in front of it.
Common patterns:
Document injection:
Here is the relevant section of the codebase:
<file path="src/api/users.ts">
[file contents]
</file>
Now review this PR diff:
[diff]
State injection:
Current project state:
- Sprint: 14
- Open blockers: 2 (see below)
- Last deployment: 2026-05-07
[blocker details]
Given this context, what should we prioritize this week?
History injection:
Previous decisions:
- 2026-04-20: Decided to use Postgres over MongoDB for relational consistency
- 2026-04-28: Deferred auth to Clerk, not building in-house
Given these constraints, evaluate the following architecture proposal:
Well-structured context injection is what separates mediocre AI output from genuinely useful output. The model isn’t magic — it needs the right information.
Few-Shot Patterns
One of the most reliable ways to improve output quality is to show the model examples of what you want. This is called few-shot prompting.
Instead of describing the desired output format, you demonstrate it.
Zero-shot (no examples):
Extract the action items from this meeting transcript.
Few-shot (with examples):
Extract the action items from this meeting transcript.
Here is an example of the format I want:
Input: "Sam to follow up with the client by Friday. The team will review the proposal next Tuesday."
Output:
- [ ] Sam: Follow up with client — due Friday
- [ ] Team: Review proposal — due next Tuesday
Now extract from this transcript:
[transcript]
The few-shot version produces structured, consistent output because the model has a concrete template to follow. For anything where format matters — reports, structured data, code generation — always provide examples.
Chain-of-Thought
For reasoning tasks — analysis, diagnosis, multi-step problem solving — instructing the model to reason step by step before arriving at an answer dramatically improves accuracy.
Without chain-of-thought:
Is this business model viable? [description]
With chain-of-thought:
Evaluate whether this business model is viable.
Think through this step by step:
1. What is the core value proposition?
2. Who is the customer and what is their willingness to pay?
3. What are the unit economics at scale?
4. What are the three biggest risks?
5. Based on the above, what is your overall assessment?
Why does this work? Because the model’s output at each step becomes part of the context for the next step. You’re effectively forcing the model to build a chain of intermediate reasoning before reaching a conclusion, rather than pattern-matching directly to a plausible-sounding answer.
For complex analytical tasks, explicit reasoning steps are not optional — they’re the difference between a thoughtful analysis and a confident-sounding guess.
ReAct: Reason + Act
ReAct is a prompting pattern designed for agents that need to use tools. It alternates between reasoning (what should I do next and why?) and acting (call a tool, process the result).
The structure:
Thought: I need to check the current state of the database before deciding whether to run the migration.
Action: query_database(SELECT COUNT(*) FROM users)
Observation: 847,203 rows
Thought: The table is large enough that a blocking migration would be risky. I should suggest an online migration approach.
Action: generate_migration_plan(online=true)
...
This pattern is important for agent builders because it makes the agent’s reasoning legible. You can see what it decided to do and why. That makes debugging tractable — you can identify exactly where the reasoning went wrong and fix the prompt or the tool description.
Most modern agent frameworks implement ReAct implicitly through tool calling, but understanding the underlying structure helps you design better agents and diagnose failures.
Structured Output
Production systems almost always need predictable, machine-readable output. Free-form text is fine for human consumption. It’s a nightmare for downstream processing.
The solution is to specify output format explicitly — and to enforce it where your model and framework support it.
Instruct format directly:
Return your response as a JSON object with this exact structure:
{
"severity": "low" | "medium" | "high" | "critical",
"summary": string (one sentence),
"affected_components": string[],
"recommended_action": string
}
Return only the JSON object. No preamble, no explanation.
Use JSON mode or structured outputs where available. Anthropic, OpenAI, and most providers support constrained generation — where the model is forced to produce valid JSON matching a schema. Use this in production instead of parsing free text and hoping.
Structured output is the bridge between LLM reasoning and reliable software. Get comfortable designing schemas.
Common Prompt Failure Modes
A few patterns I see regularly that degrade output quality:
Ambiguous instructions. “Write a summary” — of what length? For what audience? Including what information? The model will guess. Specify.
Competing instructions. “Be concise but thorough” is a contradiction. Pick one or define the tradeoff explicitly: “Be thorough on technical details; skip background context.”
No persona. The model defaults to a generic helpful assistant. Give it a role that carries the relevant expertise and constraints.
Asking for everything in one shot. Complex tasks benefit from decomposition. Don’t ask for a complete architecture, implementation plan, risk assessment, and cost estimate in a single prompt. Break it into steps where earlier output informs later steps.
Ignoring the system prompt in favour of user messages. In multi-turn applications, the system prompt persists. User messages extend the conversation. Don’t repeat the full context in every user turn — that’s what the system prompt is for.
Prompt Engineering is Software Engineering
The mental model shift that changes everything: a prompt is a program.
It has inputs (the user’s request, injected context), processing logic (the instructions you’ve given the model), and outputs (the model’s response). Like any program, it can be tested, versioned, iterated, and optimised.
This means:
- Store your prompts in version control, not in someone’s head
- Test prompt changes before deploying them (a changed prompt is a changed program)
- Log inputs and outputs so you can diagnose failures
- Treat prompt engineering with the same rigour as code review
The teams that build the best AI products treat prompts as first-class engineering artifacts. The teams that don’t, ship products that work sometimes and fail mysteriously.
What’s Next
You can now write prompts that behave like software — structured, intentional, testable. In Module 3, we go up another level: what happens when you give a model the ability to take actions? That’s an agent. And it changes everything about how you think about these systems.
Further Reading
-
[Paper] Chain-of-Thought Prompting Elicits Reasoning in LLMs — Wei et al., 2022 — The paper that established chain-of-thought as a reliable technique. Read the examples section.
-
[Paper] ReAct: Synergizing Reasoning and Acting in LLMs — Yao et al., 2022 — The ReAct framework formalised here is the foundation of modern agent design. Essential reading.
-
[Blog] Prompt Engineering Guide — Lilian Weng — The most comprehensive single resource on prompting techniques. Covers everything in this module and more.
-
[Docs] Anthropic Prompt Engineering Guide — Claude-specific prompting best practices from the team that built the model. Practical and well-organised.
-
[Blog] What I’ve learned about prompt engineering — Simon Willison — Sceptical, evidence-based takes from someone who has stress-tested these techniques in production.
-
[Course] Stanford CS224N - NLP with Deep Learning — The definitive university NLP course. Covers attention, transformers, and LLMs from first principles. If you want to understand why prompt structure matters at a model level, this is where to go.
-
[Course] Hugging Face NLP Course — Practical, code-heavy, free. The fastest path from zero to using transformer models in real projects. Pairs well with the concepts in this module.
Referenced from @nikovijay on x.com
“Using ChatGPT (o1 or 4o) to generate detailed, scalable database designs from a PRD… saving output as markdown for Cursor.” — @PrajwalTomar_
Subscribe to get notified when new modules and courses drop. No drip — just updates when there's something worth reading.
Subscribe on Substack →