Agents Are Just LLMs Running Tools in a Loop. That's Why They Finally Work

Lot's of people say they are building Agents but they're not. Here's how to know for sure because we now have engineering patterns that can help us understand agent systems.

Sep, 2025

·

DT

·

Dhuv Tandon

·

6 mins

Agents

AI Engineering

Practical Insights

Agents Are Just LLMs Running Tools in a Loop. That's Why They Finally Work

Lot's of people say they are building Agents but they're not. Here's how to know for sure because we now have engineering patterns that can help us understand agent systems.

Sep, 2025

·

DT

·

Dhuv Tandon

·

6 mins

Agents

AI Engineering

Practical Insights

Agents Are Just LLMs Running Tools in a Loop. That's Why They Finally Work

Lot's of people say they are building Agents but they're not. Here's how to know for sure because we now have engineering patterns that can help us understand agent systems.

Sep, 2025

·

DT

·

Dhuv Tandon

·

6 mins

Agents

AI Engineering

Practical Insights

Agent

Human

“I’ve been trying to create an Agent for that.”

Heard from a customer during a call.

At first I felt incredibly impressed, wow you’ve been building an agent that is awesome! We’ve been working on this for months and man is it hard.

But then it usually spans into a description of a project that could be anything from a chatbot or building a node based workflow.

I’m not usually a person to dwell on definitions but I also know that they can be useful. Especially when there is a lot to digest in the AI world. Some common standards are useful.

The standard of APIs helped create a marvelous ecosystem of applications integrating with each other to drive value across use cases which helped doing business on the internet easier (Stripe) connect applications together (Zapier) or Telephony (Twilio).

The Agent ecosystem was missing this, because it felt like that the well accepted definition was “An agent is an entity that can act independently to achieve objectives, often on behalf of someone or something else”. And then superintelligence and AGI didn’t really help on part of OpenAI.

The first time I learned about the term “Agent” was from builders I respected in the space in 2023. 

Parcha’s blog, the hitchhiker's guide to AI, talked about this here - https://resources.parcha.com/building-ai-agents-in-production/ this is what got me excited about building agents.

It is dated from Dec 2023 

“An agent is fundamentally software that uses Large Language Models to achieve a goal by constructing a plan and guiding its execution. In the context of Parcha, these agents are designed with certain components in mind.”

But given this definition and where the products were at it felt oddly unsatisfying. Even the parcha folks eventually felt in their blog to “Agents aren’t all you need” in June 2024.

“While the concept of agentic behavior was promising, building reliable agentic behavior with large language models (LLMs) was a massive endeavor. Creating general-purpose "autonomous agents" could have taken us years. During that time, we wouldn’t solve the problems our customers cared most about with a product that directly addressed their needs. Our customers required accuracy, reliability, seamless integrations, and a user-friendly product experience—areas where our early versions fell short. They would much rather have a solution that was very accurate and reliable for a subset of tasks than a fully autonomous solution that could automate a workflow end-to-end but worked only 80% of the time. We needed to choose between building the agent or building the product.”

I do this not to throw shade on them but to acknowledge that Agents didn’t feel that they were fulfilling the promise. There were some early teams that I tracked like Reworkd that launched AgentGPT (a sandbox for viewing Agent thoughts in a loop) and Gumloop (initially called Agenthub) who moved away from the concept of Agents to something narrow. Decisional did this too by moving towards building a vertical AI Agent when the initial goal was building an AI Native workflow automation platform.

There were many platforms that were selling Agents that were essentially system prompt changes made with very flaky reliability in executing workflows. But now as the models have gotten better especially with Sonnet 4 and GPT 5 we now have a framework for an Agent.

“An LLM agent runs tools in a loop to achieve a goal.”

https://simonwillison.net/2025/Sep/18/agents/

 

At Decisional we started seeing this when we initially moved from RAG to Agentic RAG, where you run an LLM in a loop until it feels it has answered the question. The performance felt mind blowing and it turned out allowing the Agent to do runtime reasoning made our system have sub 1% hallucination rates.

But if you run an LLM in a loop and most LLMs are the same / similar, then what makes different Agents have different capabilities?

If you need to simplify this for the average non technical person I think it boils down to the analogy of comparing it with a human employee. Your Agent employee can do different things based on the following:

The Brain

This is the AI model under the hood, this allows the agent to have different skills and use different equipment. Some are general purpose (ChatGPT) and some are more specialised (Claude for Code).

The Memory

The Agent has access to two types of memory a Filing Cabinet (Long Term) or a Notepad (Short Term). The filing cabinet is where you store larger amounts of documents and information and the notepad is for helping you with the current thing. 

The Tools (What they can do)

Like giving different employees access to different software or equipment.

Now most Agents are setup with tools depending on the goal they are meant to achieve. You give your employee a stapler if you expect they need to staple a lot of documents. Some can search the internet. Others can edit documents, run code, or analyze data. 

The Workspace

Agents get a different workspace like a virtual computer or a browser to complete steps towards their responsibilities. The difference between a tool and a workspace is that a workspace is typically not very Agent friendly. 

Think about a task like reading the technical documents of a website. For an Agent using a browser is probably not the most efficient way to go about the job. But if you just pass the content of the website to an Agent in a text format - it can use that a lot faster. 

Now the difference between a tool and a workspace can be confusing but it is one of the single biggest unlocks for Agent systems. 

How You Work With Them

Some agents work independently after you give them a goal. Others need you to approve each step they take. Like the difference between delegating to someone vs. working alongside them. The ones you need to work alongside are copilots.

Description of Agent along with a diagram of Tools Brain Memory Workspace and Interaction Style

Let's take a look at few examples to make this clearer of some popular Agents.

Agent Examples 

Cursor the IDE Native Coding Agent

Cursor is a developer IDE native coding Agent that sits inside your repo & terminal.

  • Brain: Has a model picker and allows you to choose. Generally used with Claude Sonnet 4, GPT 5 or Gemini 2.5 Pro

  • Memory: Codebase, agent instructions, files in your IDE

  • Tools: Codebase Search, Terminal Run, Delete File, Grep, Read, List files

  • Workspace: The IDE is the workspace. This is unique because the ability of LLMs to understand and write structured code is much better than its ability to write.

  • How you work with it: Primarily copilot-style (you review/edit changes), but with bursts of autonomy (multi-file refactors, test/runs). This keeps developers in the loop where accuracy is paramount.

Granola the note first meeting agent

Granola is a notepad that listens to your meeting conversations, captures and merges your notes and transcript to create a summary.

  • Brain: Generalist LLMs tuned for summarisation of voice data

  • Memory: File Cabinet - searchable transcripts, and notes tied to meetings

  • Tools:Transcription, Search Notes

  • Workspace: Notes and Transcript Canvas, (apple like notepad)

  • How you work with it: You take notes and work with it and remain the editor of record

ChatGPT Agent - the unified, mixed workspace Agent

A generalist Agent that can do both research and use a terminal and browser (based of Operator) which is all permission gated.

  • Brain: Uses a version of the o3 series reasoning models.

  • Memory: Creates and edits files within the session

  • Tools: Virtual Browser, Web Search, Terminal Editor

  • Workspace - Chooses between Browser GUI, Text Browser, Shell Terminal, etc

  • How you work with it: Initiate via chat and supervise through chat based interventions

The lesson is that we have now technical patterns that help us identify and differentiate Agent systems. It’s a new world and going to create a whole new automotive like industry where different parts create different fit for purpose vehicles. For example the BMW N55 Engine was used in the 3 series, 5 series, X5 SUV, Z4 Roadster as well as the new Toyota Supra. 

Agent building is moving towards more engineering than vibes and that’s a great thing.


FAQs

What’s an AI Agent?

An LLM running tools in a loop to complete a specific goal.

Why did everyone get this wrong at first?

They were thinking too big. In 2023, everyone wanted to build Jarvis, they would demo systems that worked 80% of the time which wasn’t good enough.

So what’s changed?

Models got better and train to be great at calling tools in a loop specifically ChatGPT o3 and Claude Sonnet 4.

What makes one Agent different from another?

Same thing that makes one employee better than the other. Some are generally smarter (using a better model), some have better tools, some remember things better and some know the workflow it is supposed to perform.

What’s the difference between a tool and a workspace?

A tool is built for the Agent to use and operates on raw text or standard image formats. A workspace is making it click through a browser like a human.

What's going to happen next?

Same thing that happened with APIs. This will become a domain of engineering rather than some metaphysical computer god.

Agent

Human

“I’ve been trying to create an Agent for that.”

Heard from a customer during a call.

At first I felt incredibly impressed, wow you’ve been building an agent that is awesome! We’ve been working on this for months and man is it hard.

But then it usually spans into a description of a project that could be anything from a chatbot or building a node based workflow.

I’m not usually a person to dwell on definitions but I also know that they can be useful. Especially when there is a lot to digest in the AI world. Some common standards are useful.

The standard of APIs helped create a marvelous ecosystem of applications integrating with each other to drive value across use cases which helped doing business on the internet easier (Stripe) connect applications together (Zapier) or Telephony (Twilio).

The Agent ecosystem was missing this, because it felt like that the well accepted definition was “An agent is an entity that can act independently to achieve objectives, often on behalf of someone or something else”. And then superintelligence and AGI didn’t really help on part of OpenAI.

The first time I learned about the term “Agent” was from builders I respected in the space in 2023. 

Parcha’s blog, the hitchhiker's guide to AI, talked about this here - https://resources.parcha.com/building-ai-agents-in-production/ this is what got me excited about building agents.

It is dated from Dec 2023 

“An agent is fundamentally software that uses Large Language Models to achieve a goal by constructing a plan and guiding its execution. In the context of Parcha, these agents are designed with certain components in mind.”

But given this definition and where the products were at it felt oddly unsatisfying. Even the parcha folks eventually felt in their blog to “Agents aren’t all you need” in June 2024.

“While the concept of agentic behavior was promising, building reliable agentic behavior with large language models (LLMs) was a massive endeavor. Creating general-purpose "autonomous agents" could have taken us years. During that time, we wouldn’t solve the problems our customers cared most about with a product that directly addressed their needs. Our customers required accuracy, reliability, seamless integrations, and a user-friendly product experience—areas where our early versions fell short. They would much rather have a solution that was very accurate and reliable for a subset of tasks than a fully autonomous solution that could automate a workflow end-to-end but worked only 80% of the time. We needed to choose between building the agent or building the product.”

I do this not to throw shade on them but to acknowledge that Agents didn’t feel that they were fulfilling the promise. There were some early teams that I tracked like Reworkd that launched AgentGPT (a sandbox for viewing Agent thoughts in a loop) and Gumloop (initially called Agenthub) who moved away from the concept of Agents to something narrow. Decisional did this too by moving towards building a vertical AI Agent when the initial goal was building an AI Native workflow automation platform.

There were many platforms that were selling Agents that were essentially system prompt changes made with very flaky reliability in executing workflows. But now as the models have gotten better especially with Sonnet 4 and GPT 5 we now have a framework for an Agent.

“An LLM agent runs tools in a loop to achieve a goal.”

https://simonwillison.net/2025/Sep/18/agents/

 

At Decisional we started seeing this when we initially moved from RAG to Agentic RAG, where you run an LLM in a loop until it feels it has answered the question. The performance felt mind blowing and it turned out allowing the Agent to do runtime reasoning made our system have sub 1% hallucination rates.

But if you run an LLM in a loop and most LLMs are the same / similar, then what makes different Agents have different capabilities?

If you need to simplify this for the average non technical person I think it boils down to the analogy of comparing it with a human employee. Your Agent employee can do different things based on the following:

The Brain

This is the AI model under the hood, this allows the agent to have different skills and use different equipment. Some are general purpose (ChatGPT) and some are more specialised (Claude for Code).

The Memory

The Agent has access to two types of memory a Filing Cabinet (Long Term) or a Notepad (Short Term). The filing cabinet is where you store larger amounts of documents and information and the notepad is for helping you with the current thing. 

The Tools (What they can do)

Like giving different employees access to different software or equipment.

Now most Agents are setup with tools depending on the goal they are meant to achieve. You give your employee a stapler if you expect they need to staple a lot of documents. Some can search the internet. Others can edit documents, run code, or analyze data. 

The Workspace

Agents get a different workspace like a virtual computer or a browser to complete steps towards their responsibilities. The difference between a tool and a workspace is that a workspace is typically not very Agent friendly. 

Think about a task like reading the technical documents of a website. For an Agent using a browser is probably not the most efficient way to go about the job. But if you just pass the content of the website to an Agent in a text format - it can use that a lot faster. 

Now the difference between a tool and a workspace can be confusing but it is one of the single biggest unlocks for Agent systems. 

How You Work With Them

Some agents work independently after you give them a goal. Others need you to approve each step they take. Like the difference between delegating to someone vs. working alongside them. The ones you need to work alongside are copilots.

Description of Agent along with a diagram of Tools Brain Memory Workspace and Interaction Style

Let's take a look at few examples to make this clearer of some popular Agents.

Agent Examples 

Cursor the IDE Native Coding Agent

Cursor is a developer IDE native coding Agent that sits inside your repo & terminal.

  • Brain: Has a model picker and allows you to choose. Generally used with Claude Sonnet 4, GPT 5 or Gemini 2.5 Pro

  • Memory: Codebase, agent instructions, files in your IDE

  • Tools: Codebase Search, Terminal Run, Delete File, Grep, Read, List files

  • Workspace: The IDE is the workspace. This is unique because the ability of LLMs to understand and write structured code is much better than its ability to write.

  • How you work with it: Primarily copilot-style (you review/edit changes), but with bursts of autonomy (multi-file refactors, test/runs). This keeps developers in the loop where accuracy is paramount.

Granola the note first meeting agent

Granola is a notepad that listens to your meeting conversations, captures and merges your notes and transcript to create a summary.

  • Brain: Generalist LLMs tuned for summarisation of voice data

  • Memory: File Cabinet - searchable transcripts, and notes tied to meetings

  • Tools:Transcription, Search Notes

  • Workspace: Notes and Transcript Canvas, (apple like notepad)

  • How you work with it: You take notes and work with it and remain the editor of record

ChatGPT Agent - the unified, mixed workspace Agent

A generalist Agent that can do both research and use a terminal and browser (based of Operator) which is all permission gated.

  • Brain: Uses a version of the o3 series reasoning models.

  • Memory: Creates and edits files within the session

  • Tools: Virtual Browser, Web Search, Terminal Editor

  • Workspace - Chooses between Browser GUI, Text Browser, Shell Terminal, etc

  • How you work with it: Initiate via chat and supervise through chat based interventions

The lesson is that we have now technical patterns that help us identify and differentiate Agent systems. It’s a new world and going to create a whole new automotive like industry where different parts create different fit for purpose vehicles. For example the BMW N55 Engine was used in the 3 series, 5 series, X5 SUV, Z4 Roadster as well as the new Toyota Supra. 

Agent building is moving towards more engineering than vibes and that’s a great thing.


FAQs

What’s an AI Agent?

An LLM running tools in a loop to complete a specific goal.

Why did everyone get this wrong at first?

They were thinking too big. In 2023, everyone wanted to build Jarvis, they would demo systems that worked 80% of the time which wasn’t good enough.

So what’s changed?

Models got better and train to be great at calling tools in a loop specifically ChatGPT o3 and Claude Sonnet 4.

What makes one Agent different from another?

Same thing that makes one employee better than the other. Some are generally smarter (using a better model), some have better tools, some remember things better and some know the workflow it is supposed to perform.

What’s the difference between a tool and a workspace?

A tool is built for the Agent to use and operates on raw text or standard image formats. A workspace is making it click through a browser like a human.

What's going to happen next?

Same thing that happened with APIs. This will become a domain of engineering rather than some metaphysical computer god.

Agent

Human

“I’ve been trying to create an Agent for that.”

Heard from a customer during a call.

At first I felt incredibly impressed, wow you’ve been building an agent that is awesome! We’ve been working on this for months and man is it hard.

But then it usually spans into a description of a project that could be anything from a chatbot or building a node based workflow.

I’m not usually a person to dwell on definitions but I also know that they can be useful. Especially when there is a lot to digest in the AI world. Some common standards are useful.

The standard of APIs helped create a marvelous ecosystem of applications integrating with each other to drive value across use cases which helped doing business on the internet easier (Stripe) connect applications together (Zapier) or Telephony (Twilio).

The Agent ecosystem was missing this, because it felt like that the well accepted definition was “An agent is an entity that can act independently to achieve objectives, often on behalf of someone or something else”. And then superintelligence and AGI didn’t really help on part of OpenAI.

The first time I learned about the term “Agent” was from builders I respected in the space in 2023. 

Parcha’s blog, the hitchhiker's guide to AI, talked about this here - https://resources.parcha.com/building-ai-agents-in-production/ this is what got me excited about building agents.

It is dated from Dec 2023 

“An agent is fundamentally software that uses Large Language Models to achieve a goal by constructing a plan and guiding its execution. In the context of Parcha, these agents are designed with certain components in mind.”

But given this definition and where the products were at it felt oddly unsatisfying. Even the parcha folks eventually felt in their blog to “Agents aren’t all you need” in June 2024.

“While the concept of agentic behavior was promising, building reliable agentic behavior with large language models (LLMs) was a massive endeavor. Creating general-purpose "autonomous agents" could have taken us years. During that time, we wouldn’t solve the problems our customers cared most about with a product that directly addressed their needs. Our customers required accuracy, reliability, seamless integrations, and a user-friendly product experience—areas where our early versions fell short. They would much rather have a solution that was very accurate and reliable for a subset of tasks than a fully autonomous solution that could automate a workflow end-to-end but worked only 80% of the time. We needed to choose between building the agent or building the product.”

I do this not to throw shade on them but to acknowledge that Agents didn’t feel that they were fulfilling the promise. There were some early teams that I tracked like Reworkd that launched AgentGPT (a sandbox for viewing Agent thoughts in a loop) and Gumloop (initially called Agenthub) who moved away from the concept of Agents to something narrow. Decisional did this too by moving towards building a vertical AI Agent when the initial goal was building an AI Native workflow automation platform.

There were many platforms that were selling Agents that were essentially system prompt changes made with very flaky reliability in executing workflows. But now as the models have gotten better especially with Sonnet 4 and GPT 5 we now have a framework for an Agent.

“An LLM agent runs tools in a loop to achieve a goal.”

https://simonwillison.net/2025/Sep/18/agents/

 

At Decisional we started seeing this when we initially moved from RAG to Agentic RAG, where you run an LLM in a loop until it feels it has answered the question. The performance felt mind blowing and it turned out allowing the Agent to do runtime reasoning made our system have sub 1% hallucination rates.

But if you run an LLM in a loop and most LLMs are the same / similar, then what makes different Agents have different capabilities?

If you need to simplify this for the average non technical person I think it boils down to the analogy of comparing it with a human employee. Your Agent employee can do different things based on the following:

The Brain

This is the AI model under the hood, this allows the agent to have different skills and use different equipment. Some are general purpose (ChatGPT) and some are more specialised (Claude for Code).

The Memory

The Agent has access to two types of memory a Filing Cabinet (Long Term) or a Notepad (Short Term). The filing cabinet is where you store larger amounts of documents and information and the notepad is for helping you with the current thing. 

The Tools (What they can do)

Like giving different employees access to different software or equipment.

Now most Agents are setup with tools depending on the goal they are meant to achieve. You give your employee a stapler if you expect they need to staple a lot of documents. Some can search the internet. Others can edit documents, run code, or analyze data. 

The Workspace

Agents get a different workspace like a virtual computer or a browser to complete steps towards their responsibilities. The difference between a tool and a workspace is that a workspace is typically not very Agent friendly. 

Think about a task like reading the technical documents of a website. For an Agent using a browser is probably not the most efficient way to go about the job. But if you just pass the content of the website to an Agent in a text format - it can use that a lot faster. 

Now the difference between a tool and a workspace can be confusing but it is one of the single biggest unlocks for Agent systems. 

How You Work With Them

Some agents work independently after you give them a goal. Others need you to approve each step they take. Like the difference between delegating to someone vs. working alongside them. The ones you need to work alongside are copilots.

Description of Agent along with a diagram of Tools Brain Memory Workspace and Interaction Style

Let's take a look at few examples to make this clearer of some popular Agents.

Agent Examples 

Cursor the IDE Native Coding Agent

Cursor is a developer IDE native coding Agent that sits inside your repo & terminal.

  • Brain: Has a model picker and allows you to choose. Generally used with Claude Sonnet 4, GPT 5 or Gemini 2.5 Pro

  • Memory: Codebase, agent instructions, files in your IDE

  • Tools: Codebase Search, Terminal Run, Delete File, Grep, Read, List files

  • Workspace: The IDE is the workspace. This is unique because the ability of LLMs to understand and write structured code is much better than its ability to write.

  • How you work with it: Primarily copilot-style (you review/edit changes), but with bursts of autonomy (multi-file refactors, test/runs). This keeps developers in the loop where accuracy is paramount.

Granola the note first meeting agent

Granola is a notepad that listens to your meeting conversations, captures and merges your notes and transcript to create a summary.

  • Brain: Generalist LLMs tuned for summarisation of voice data

  • Memory: File Cabinet - searchable transcripts, and notes tied to meetings

  • Tools:Transcription, Search Notes

  • Workspace: Notes and Transcript Canvas, (apple like notepad)

  • How you work with it: You take notes and work with it and remain the editor of record

ChatGPT Agent - the unified, mixed workspace Agent

A generalist Agent that can do both research and use a terminal and browser (based of Operator) which is all permission gated.

  • Brain: Uses a version of the o3 series reasoning models.

  • Memory: Creates and edits files within the session

  • Tools: Virtual Browser, Web Search, Terminal Editor

  • Workspace - Chooses between Browser GUI, Text Browser, Shell Terminal, etc

  • How you work with it: Initiate via chat and supervise through chat based interventions

The lesson is that we have now technical patterns that help us identify and differentiate Agent systems. It’s a new world and going to create a whole new automotive like industry where different parts create different fit for purpose vehicles. For example the BMW N55 Engine was used in the 3 series, 5 series, X5 SUV, Z4 Roadster as well as the new Toyota Supra. 

Agent building is moving towards more engineering than vibes and that’s a great thing.


FAQs

What’s an AI Agent?

An LLM running tools in a loop to complete a specific goal.

Why did everyone get this wrong at first?

They were thinking too big. In 2023, everyone wanted to build Jarvis, they would demo systems that worked 80% of the time which wasn’t good enough.

So what’s changed?

Models got better and train to be great at calling tools in a loop specifically ChatGPT o3 and Claude Sonnet 4.

What makes one Agent different from another?

Same thing that makes one employee better than the other. Some are generally smarter (using a better model), some have better tools, some remember things better and some know the workflow it is supposed to perform.

What’s the difference between a tool and a workspace?

A tool is built for the Agent to use and operates on raw text or standard image formats. A workspace is making it click through a browser like a human.

What's going to happen next?

Same thing that happened with APIs. This will become a domain of engineering rather than some metaphysical computer god.

San Francisco HQ

6th Street
CA 94103


London
Regents Park

NW1 4SA

Copyright © 2024. All rights reserved

San Francisco HQ

6th Street
CA 94103


London
Regents Park

NW1 4SA

Copyright © 2024. All rights reserved

San Francisco HQ

6th Street
CA 94103


London
Regents Park

NW1 4SA

Copyright © 2024. All rights reserved