Interactive Lesson

Agentic AI Theory

Note

This document explains the core concepts behind agentic AI systems. No prior experience with LLMs is required, but familiarity with Python and basic APIs will help.

What Is an Agent?

A regular LLM interaction is stateless and one-shot: you send a prompt, the model replies, and the conversation ends. An AI agent changes this dynamic. It gives the language model a loop — the ability to reason, act, observe the result of that action, and then reason again.

Dimension	Plain LLM	AI Agent
State	Stateless (one-shot)	Stateful (conversation history)
Tools	None	Filesystem, code execution, search…
Iteration	Single response	Loop until done or limit reached
Output	Text only	Text + side effects (files written, code run)
Risk	Low	Higher — actions have real consequences

The Reasoning-Acting Loop (ReAct)

The dominant pattern in modern agentic systems is ReAct (Reason + Act), introduced in the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models".

The loop looks like this:

LaTeX

\text{Thought} \to \text{Action} \to \text{Observation} \to \text{Thought} \to \cdots \to \text{Final Answer}

Thought — The model reasons about what it needs to do next.
Action — The model calls a tool (e.g., read_file, run_python_file).
Observation — The tool returns a result, which is fed back into the context.
The model reads the observation and restarts the cycle.

Tip

This project implements a clean version of this loop in main.py and providers/gemini.py. The loop stops after MAX_ITERS iterations (set in config.py) to prevent runaway execution.

A Visual Walk-Through

Columns Compare

Without an agent

The model gets a question and immediately generates a final answer — sometimes correctly, but with no ability to verify.

With an agent (ReAct)

The model can look at real files, run real code, and check real output before committing to an answer. The answer is grounded in evidence.

Why Tools Matter

Without tools, a model can only work with what is already in its context window. Tools break this constraint by letting the agent:

Inspect the environment — read files, list directories, search for text.
Take action — write or modify files, run code.
Verify results — run a Python script and see whether it produces the expected output.

Tool calls are defined as structured schemas (JSON dictionaries describing the function name, parameters, and types). The LLM reads these schemas and decides which tool to call and with what arguments. The runtime then executes the actual Python function and returns the result.

Important

The model never runs code directly. It only requests that the runtime does — which is where sandboxing and safety constraints come in. See Safety & Sandbox for details.

Concurrent Tool Execution

A naive agent calls tools one at a time, even when multiple tools could run in parallel. For example, if the model wants to read three files, there is no reason to wait for the first read to finish before starting the second.

This project uses Python's ThreadPoolExecutor to run independent tool calls concurrently within a single reasoning step, reducing latency significantly when the model issues multiple tool calls at once.

python

with ThreadPoolExecutor(max_workers=len(tool_calls)) as executor:
    tool_results = list(executor.map(call_function, tool_calls))

System Prompts and Agent Personality

The system prompt defines the agent's identity, constraints, and operating style. It is included at the start of every conversation and shapes every decision the model makes.

Tip

A concise system prompt is better than a long one. Verbose system prompts consume tokens that could be used for reasoning and can dilute important instructions with noise. This project keeps the system prompt tightly scoped in prompts.py.

Token Budget and Iteration Limits

Agents running in a loop have two key resource constraints:

Token budget — Every message, tool response, and model reply is appended to the conversation history. Context windows are finite.
Iteration limit — A misconfigured agent can loop forever. MAX_ITERS is a hard ceiling.

Keeping tool output concise (truncating long file reads, capping search results) is critical for keeping the context window healthy across many iterations.

Warning

If you increase MAX_ITERS significantly, monitor token usage carefully. Long agent runs on large codebases can exhaust the context window and cause the model to "forget" earlier reasoning steps.