Published:

The 5 different levels of AI agents

AI agents are defined as artificial entities that can perceive their environment, make decisions, and take actions based on the available tools.

Considerations on the domain

There has been a lot of uproar, alarmism, and speculation about AGI or Artificial Superintelligence (ASI) and what organizations are preparing. But the most interesting thing is how to leverage the power of LLMs and Autonomous AI Agents for domain-specific implementations within organizations.

The major commercial driver of conversational user interfaces is companies in banking, retail, financial services, etc., that create AI-based user interfaces for users to interact with products and services. 

Any entity capable of perceiving its environment and executing actions can be considered an agent.

Where are we currently?

Considering the implementations of limited domain, we are currently at levels two and three; very likely at level 2.5.

LangChain has led the creation of frameworks for the development of Agents. DSPy in programming LLM and LLamaIndex with its agentic RAG approach.

These agents are positioned between 50% and 90% of qualified adults, with capabilities for automating strategic tasks. Based on user input, agents can break down the user's description, plan subtasks, and execute those tasks orderly to reach a conclusion.

These agents are capable of iterating over intermediate subtasks until they arrive at a conclusive answer.

Practical Example

We consider the following question: Who is known as the father of the iPhone and what is the square root of the year of his birth?

This is quite an ambiguous and complex question to answer, which requires following a series of steps to arrive at a solution. There is a mathematical task at hand, but there is also a need to retrieve knowledge to answer the question.

For this practical example, the agent has a few actions available:

  1. LLM Math,

  2. SerpApi, below is a screenshot of the SerpApi website. SerpApi allows extraction of data from search engine results.

  3. GPT-4 (gpt-4-0314).

Next, consider the output of this LangChain-based agent and observe how the agent moves from thinking to action to sequential observation until it arrives at a final answer and the chain ends.

In the table showing the five levels of agents, you will notice that level one agents are rule-based… Rule-based agents may have some autonomy, but in practice, they consist of predefined steps that are executed based on set protocols.

Basic Structure of Narrow Domain Agents

Agents have a Broad Language Model (LLM) as their backbone. Agents also have access to a variety of tools. The tools may have specific capabilities, such as web search, specific APIs, RAG, mathematics, and more.

The tools are described in natural language so that the agent knows which tool to use at a specific stage of the process. The number of tools and the capabilities of the tools determine how powerful the agent is.

Practical Considerations

If we reconsider the implementations of agents in limited domains, there are some practical considerations to take into account.

Sensorial

Most current agents are virtual and are accessed via voice or text. These agents can reason and reach conclusions and, in turn, respond with voice or text. Multimodal elements can be added where agents can receive images or video as input, or generate images or video as output.

However, in general, agents do not have other sensory capabilities such as vision, touch, movement, etc. With all the development in terms of robotics, the combination of agents with sensory/physical capabilities will mark the beginning of a new era.

LLM Backbone

As I mentioned earlier, the agent has an LLM as its backbone, or more specifically, an LLM API that is called. Agents go through multiple iterations and API calls. There is a single dependency that needs to be addressed, so I would say that for any production agent implementation, redundancy will have to be incorporated into the agent's backbone.

Self-hosted LLMs or local inference servers are the best way to ensure uptime.

Cost

Using commercial LLM APIs will be very costly, considering that for every question posed to the agent, the LLM is consulted multiple times.

Imagining thousands of users will only exacerbate the cost issue.

Latency

Conversational systems demand sub-second responses; any complex system, like agents that need to perform multiple internal steps for each dialogue turn, adds to the total latency experienced by the user.

This can become a challenge to overcome.

Failure to Reach a Conclusion

It is important to note that there are currently cases where the agent does not reach a conclusion, or reaches a conclusion prematurely. If the user can access and see the agent's reasoning steps, the user's query could be satisfied through intermediate steps in the agent's reasoning. In this case, the user can stop the agent and inform it that sufficient information has been provided.

Tools and Costs

Agents need to have access to tools to perform their tasks. There may be a whole market where tools are created collaboratively. Where creators do not need to create tools from scratch, but can select an existing tool.

These tools can be free or paid; the tools may access paid APIs.

The Term Agents

As AI has progressed, the term agent is used to describe entities that demonstrate intelligent behavior and have capabilities such as:

  • autonomy,

  • reactivity,

  • proactivity, and

  • social interactions.

In the 1950s, Alan Turing introduced the iconic Turing Test, a fundamental concept in AI designed to investigate whether machines can exhibit intelligent behavior similar to that of humans. These AI entities are often referred to as agents and constitute the fundamental components of resources.

Transfer Learning

Transfer learning involves leveraging knowledge gained in one task and applying it to another.

Foundation models often adhere to this approach, where a model is initially trained on a related task and then fine-tuned for the specific downstream task of interest.

Transfer learning is a powerful concept that enhances the versatility of models, enabling them to perform unseen tasks based on prior learning.

Conclusion

Somehow, it is being overlooked that Autonomous AI Agents represent a fundamental advancement in technology.

The agents, equipped with artificial intelligence, have the ability to:

  • Operate independently,

  • Make decisions, and

  • Act without constant human intervention.

In the future, autonomous AI agents are set to revolutionize sectors ranging from healthcare and finance to manufacturing and transportation.

However, there are considerations regarding accountability, transparency, ethics, responsibility, and bias in decision-making.

Despite these challenges, the future of autonomous AI agents is very promising. As technology continues to evolve, these agents will become increasingly integrated into our daily lives.