Interactive Lesson
Agentic AI Theory
This document explains the core concepts behind agentic AI systems. No prior experience with LLMs is required, but familiarity with Python and basic APIs will help.
What Is an Agent?
A regular LLM interaction is stateless and one-shot: you send a prompt, the model replies, and the conversation ends. An AI agent changes this dynamic. It gives the language model a loop — the ability to reason, act, observe the result of that action, and then reason again.
| Dimension | Plain LLM | AI Agent |
|---|---|---|
| State | Stateless (one-shot) | Stateful (conversation history) |
| Tools | None | Filesystem, code execution, search… |
| Iteration | Single response | Loop until done or limit reached |
| Output | Text only | Text + side effects (files written, code run) |
| Risk | Low | Higher — actions have real consequences |
The Reasoning-Acting Loop (ReAct)
The dominant pattern in modern agentic systems is ReAct (Reason + Act), introduced in the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models".
The loop looks like this:
- Thought — The model reasons about what it needs to do next.
- Action — The model calls a tool (e.g.,
read_file,run_python_file). - Observation — The tool returns a result, which is fed back into the context.
- The model reads the observation and restarts the cycle.
This project implements a clean version of this loop in main.py and providers/gemini.py. The loop stops after MAX_ITERS iterations (set in config.py) to prevent runaway execution.
A Visual Walk-Through
Without an agent
The model gets a question and immediately generates a final answer — sometimes correctly, but with no ability to verify.
With an agent (ReAct)
The model can look at real files, run real code, and check real output before committing to an answer. The answer is grounded in evidence.
Why Tools Matter
Without tools, a model can only work with what is already in its context window. Tools break this constraint by letting the agent:
- Inspect the environment — read files, list directories, search for text.
- Take action — write or modify files, run code.
- Verify results — run a Python script and see whether it produces the expected output.
Tool calls are defined as structured schemas (JSON dictionaries describing the function name, parameters, and types). The LLM reads these schemas and decides which tool to call and with what arguments. The runtime then executes the actual Python function and returns the result.
The model never runs code directly. It only requests that the runtime does — which is where sandboxing and safety constraints come in. See Safety & Sandbox for details.
Concurrent Tool Execution
A naive agent calls tools one at a time, even when multiple tools could run in parallel. For example, if the model wants to read three files, there is no reason to wait for the first read to finish before starting the second.
This project uses Python's ThreadPoolExecutor to run independent tool calls concurrently within a single reasoning step, reducing latency significantly when the model issues multiple tool calls at once.
with ThreadPoolExecutor(max_workers=len(tool_calls)) as executor:
tool_results = list(executor.map(call_function, tool_calls))
System Prompts and Agent Personality
The system prompt defines the agent's identity, constraints, and operating style. It is included at the start of every conversation and shapes every decision the model makes.
A concise system prompt is better than a long one. Verbose system prompts consume tokens that could be used for reasoning and can dilute important instructions with noise. This project keeps the system prompt tightly scoped in prompts.py.
Token Budget and Iteration Limits
Agents running in a loop have two key resource constraints:
- Token budget — Every message, tool response, and model reply is appended to the conversation history. Context windows are finite.
- Iteration limit — A misconfigured agent can loop forever.
MAX_ITERSis a hard ceiling.
Keeping tool output concise (truncating long file reads, capping search results) is critical for keeping the context window healthy across many iterations.
If you increase MAX_ITERS significantly, monitor token usage carefully. Long agent runs on large codebases can exhaust the context window and cause the model to "forget" earlier reasoning steps.