The 5 different levels of AI agents

What are the 5 different levels of AI agents?

AI agents are defined as artificial entities capable of perceiving their surroundings, making decisions and taking action based on the tools available to them.

Considerations on the domain

There has been a lot of hype, scaremongering and speculation about AGI or Artificial Superintelligence (ASI) and what organisations are preparing for. But what is most interesting is how to harness the power of LLMs and Autonomous AI Agents for implementations in specific domains within organisations.

The major commercial drivers of conversational user interfaces are banking, commercial, financial services companies, etc., which create artificial intelligence-based user interfaces to enable users to interact with products and services.

Any entity capable of perceiving its surroundings and performing actions can be considered an agent.

Where are we now?

Considering limited domain implementations, we are currently at levels two and three; most likely at level 2.5.

LangChain has led the creation of frameworks for agent development. DSPy in LLM programming and LLamaIndex with its agentic RAG approach.

These agents represent between 50% and 90% of skilled adults, with the ability to automate strategic tasks. Based on user input, agents can break down the user’s description, plan subtasks, and execute those tasks in an orderly fashion to reach a conclusion.

These agents are able to iterate on intermediate subtasks until they reach a conclusive answer.

Practical example

Consider the following question: Who is considered the father of the iPhone, and what is the square root of his year of birth?

This is a rather ambiguous and complex question to answer, requiring a series of steps to arrive at an answer. There is a mathematical task and purpose, but it is also necessary to recall knowledge in order to answer the question.

For this practical example, the agent has a number of actions available:

Master’s degree in Law, Mathematics
SerpApi, below is a screenshot of the SerpApi website. SerpApi allows you to extract data from search engine results.
GPT-4 (gpt-4-0314).

Let us now consider the output of this LangChain-based agent and observe how the agent moves sequentially from thought to action to observation until it reaches a final answer and the chain ends.

In the table showing the five levels of agents, you will notice that level one agents are rule-based… Rule-based agents can have a certain degree of autonomy, but in practice they consist of predefined steps that are executed on the basis of predefined steps.

Basic structure of restricted domain agents

Agents are underpinned by a large language model (LLM). Agents also have access to a range of tools. Tools can have specific functionalities, such as web search, specific APIs, RAG, mathematics, and more.

The tools are described in natural language so that the agent knows which tool to use at a specific stage of the process. The number and capabilities of the tools determine the power of the agent.

Practical considerations

If we return to considering the implementation of agents in limited domains, we need to take certain practical considerations into account.

Sensory

Most current agents are virtual and accessed via voice or text. These agents are capable of reasoning and reaching conclusions, and in turn responding with voice or text. Multimodal elements can be added where agents can receive images or videos as input, or generate images or videos as output.

However, agents generally do not have other sensory capabilities such as sight, touch, movement, etc. With all the advances made in the field of robotics, the combination of agents with sensory/physical capabilities will mark the beginning of a new era.

LLM Backbone

As I mentioned earlier, the agent has an LLM as its backbone, or more precisely, an LLM API that it references. Agents go through multiple iterations and API calls. There is a single dependency that needs to be managed, so I would say that for any production agent implementation, redundancy will need to be built into the agent’s backbone.

Self-hosted LLM models or local inference servers are the best way to ensure operational readiness.

Cost

The use of commercial LLM APIs will be very expensive, considering that for each question asked to the agent, the LLM is consulted multiple times.

Envisioning thousands of users will only exacerbate the cost issue.

Latency

Conversational systems require responses within seconds; any complex system, such as agents that must perform multiple internal steps for each dialogue turn, increases the total latency perceived by the user.

This can become a challenge to overcome.

Do not reach a conclusion

It is important to note that there are currently cases where the agent does not reach a conclusion or reaches a premature conclusion. If the user can access and view the agent’s reasoning steps, the user’s request may be satisfied through intermediate steps in the agent’s reasoning. In this case, the user can interrupt the agent and inform it that sufficient information has been provided.

Tools and costs

As AI advances, the term agent is used to describe entities that demonstrate intelligent behaviour and possess capabilities such as:

autonomy
reactivity,
proactivity and
social interactions.

In the 1950s, Alan Turing introduced the iconic Turing Test, a fundamental concept in AI designed to investigate whether machines can exhibit intelligent behaviour similar to that of humans. These AI entities are often called agents and form the fundamental components of resources.

The term agents

Agents must have access to the tools necessary to perform their tasks. There could be an entire market where tools are created in a shared manner. Where creators do not have to create tools from scratch, but select an existing tool.

These tools may be free or paid; the tools may access paid APIs.

Transfer learning

Transfer learning involves taking the knowledge acquired in one task and applying it to another.

Foundation models tend to adhere to this approach, in which a model is initially trained on a related task and subsequently refined for the specific task of interest.

Transfer learning is a powerful concept and increases the versatility of models, which can perform tasks never seen before based on previous learning.

Conclusion

The fact that autonomous AI agents represent a fundamental advance in technology is being overlooked.

Agents equipped with artificial intelligence have the ability to:

Operate independently,
Making decisions and
Act without constant human intervention.

In the future, autonomous AI agents are set to revolutionise sectors ranging from healthcare to finance, manufacturing to transport.

However, there are considerations relating to accountability, transparency, ethics, responsibility and impartiality in decision-making.

Despite these challenges, the future of autonomous AI agents is very promising. As technology evolves, these agents will become increasingly integrated into our daily lives.