Interactive Lesson

Safety & Sandbox

Warning

This project deliberately gives an LLM access to filesystem and Python execution tools. That is powerful and inherently risky. Read this page before running the agent on anything you care about.

The Sandbox Boundary

The agent's tools are scoped to the calculator/ directory. All tool functions receive the working directory as an injected argument and validate that every file path the model requests stays within it. The model is never told what the working directory is and cannot escape it through relative path tricks.

Capability Inside sandbox (calculator/) Outside sandbox
List files ✅ Allowed ❌ Blocked
Read files ✅ Allowed ❌ Blocked
Write files ✅ Allowed ❌ Blocked
Run Python files ✅ Allowed ❌ Blocked
Network access ❌ Not available ❌ Not available
Shell commands ❌ Not available ❌ Not available

Risks to Be Aware Of

Arbitrary Python Execution

The run_python_file tool runs Python files using the current interpreter. This means any valid Python code in calculator/ can be executed.

Warning

If the agent writes malicious or buggy code to a file and then runs it, it will execute. The subprocess.run call has a 30-second timeout, but that is not a security boundary — it is just a hang-prevention mechanism.

Unbounded File Writes

The write_file tool overwrites files completely. If the agent makes a mistake, the original content is lost.

Important

Always commit or stash your current state in calculator/ before asking the agent to make large changes. Use git commit or git stash — this is your most important safety practice.

Context Injection

The system prompt and tool schemas are trusted inputs. Avoid passing untrusted user content directly into the prompt without sanitisation, especially if you later extend the agent to accept input from external sources.


Operational Best Practices

  1. Start small. Give the agent narrow, observable tasks. "Fix the bug on line 12 of main.py" is better than "rewrite the whole calculator."
  2. Use --verbose. The verbose flag logs every tool call and its result. Always use it when debugging agent behaviour.
  3. Review before trusting. Read generated file edits before accepting them. The agent can be confidently wrong.
  4. Commit often. Before any agent session involving file writes, run git commit or git stash.
  5. Constrain iteration count. Keep MAX_ITERS in config.py set to a reasonable value.
Tip

Running python tests.py after an agent session is a good way to quickly verify that the sandbox files are still in a valid state.


Expanding Tool Access

If you add new tools (e.g., Git inspection, shell commands, network access), treat each new capability as a new attack surface:

  1. Scope it as narrowly as possible.
  2. Add explicit path or command validation.
  3. Test it in isolation via tests.py before exposing it to the agent loop.
  4. Document what the tool can and cannot do in its schema description.
Note

The guiding principle: make every new capability observable, constrainable, and reversible.


Test Your Understanding

Check your understanding Safety knowledge check
Question 1

What is the sandbox root directory for this agent?

Question 2

Which command should you always run before asking the agent to make large file changes?

Question 3

Which of the following is the agent NOT able to do inside the sandbox?