AI agents explained: Why OpenAI, Google and Microsoft are building smarter AI agents

By Jayesh Shinde | Updated on 14-Nov-2024

14-Nov-2024

If 2022 was the birth of AI chatbots as we know it, thanks to OpenAI’s ChatGPT, then by all indications 2025 will see a lot of AI agents coming out into the open from their current secretive research bubble – and no I’m not talking about the garden variety agents referenced in The Matrix! How they will change our world is anyone’s guess at this point, but the dawn of AI agents certainly promises to inject some excitement into the AI landscape that’s becoming – dare I say – more drab by the day.

Survey

✅ Thank you for completing the survey!

In the last two years, the world has seen a lot of breakneck advancement in the Generative AI space, right from text-to-text, text-to-image and text-to-video based Generative AI capabilities. And all of that’s been nothing short of stepping stones for the next big AI breakthrough – AI agents. According to Bloomberg, OpenAI is preparing to launch its first autonomous AI agent, which is codenamed ‘Operator,’ as soon as in January 2025.

Add

As A Trusted Source For Google.

Add as a preferred source on Google

Also read: Meet Prithvi, NASA & IBM’s free AI model for better weather prediction

Apparently, this OpenAI agent – or Operator, as it’s codenamed – is designed to perform complex tasks independently. By understanding user commands through voice or text, this AI agent will seemingly do tasks related to controlling different applications in the computer, send an email, book flights, and no doubt other cool things. Stuff that ChatGPT, Copilot, Google Gemini or any other LLM-based chatbot just can’t do on its own. Knowing fully well that I’m getting way ahead of myself, are you ready for J.A.R.V.I.S., Tony Stark’s intelligent AI assistant from Iron Man, or Samantha from Her, a significantly advanced AI operating system than what we’ve experienced till now?

What are AI agents

Simply put, an AI agent is a slightly more advanced AI program that can perform certain autonomous tasks that aren’t just limited to its own base program. ChatGPT or Gemini can write code for you, if you ask for it, but it can’t go and create a website or an app from that code, where the website is live with a domain name or the app published on the app store. An AI agent will be able to do these things – I’m not saying these exact tasks that I suggested above, but these AI agents will have the ability to not just show what needs to be done but also go ahead and do some of that work.

According to Amazon’s official AWS blog, humans will set broad goals for any given AI-related task, where the AI agent will independently choose the best actions it needs to perform to achieve those goals. Amazon further explains how in a customer service scenario, a future AI agent will automatically try to satisfy a calling customer’s query – by looking up internal information, by asking different questions to the human customer, by taking stock of the situation and responding with a solution that solves the calling customer’s problem. In this scenario, the AI agent handles the customer’s call on its own – without passing the call to a human customer support expert. In fact, whether or not to transfer a call to a human customer support expert is determined automatically by the AI agent.

AI agents will be superior to simple AI chatbots thanks to their advanced reasoning capabilities, suggests IBM’s blog post on AI agents. Unlike traditional AI chatbots like ChatGPT or Gemini, which give highly scripted responses to user queries, AI agents will have the ability to plan, think through and adapt to new information, enabling them to handle much more complex tasks with minimal human intervention or supervision.

Difference between AI agents and AI chatbots

There’s a lot of sophistication baked into AI agents, which AI chatbots simply don’t have. One way to look at AI chatbots is that they’re knowledgeable in all the theories of various subjects, whereas AI agents not only have the knowledge but also the expertise to apply their learnings in different applications. Given below are three key differences between AI agents and chatbots…

Being autonomous is the name of the game here. AI agents are self-directed, capable of making their own decisions based on given human instructions by carrying out tasks like scheduling meetings or managing emails – without the need for constant human intervention. This is in stark contrast to AI chatbots like ChatGPT or Gemini that rely on constant user prompts to generate responses, where they lack the ability to initiate actions on their own independently.

Their ability to break down complex tasks and execute is another key differentiator. AI agents are equipped to tackle complex, multi-faceted tasks by drawing on information from diverse sources and making informed decisions. On the other hand, AI chatbots are generally restricted to only providing information or answering queries based on their pre-existing trained knowledge base.

According to experts, another key point of difference is the following: AI agents have the ability to learn from experiences and adapt their behaviour over time to match a set of assigned tasks, enhancing their performance in ever-changing conditions. However, AI chatbots typically lack this level of adaptability, unable to learn anything new apart from what’s there in their existing knowledge base. These are some of the top differences between AI agents and chatbots as we know it.

Different type of AI agents

Just like different AI chatbots have varying levels of competency across different tasks, so do AI agents come in all sizes and shapes – in a matter of saying, of course. An AI agent can be as simple or complex depending on its programming and the quantum of tasks it’s expected to execute. Given this scope, here’s how AI agents are being classified into three main types.

Also read: SLM vs LLM: Why smaller Gen AI models are better

Firstly, there are so-called goal-based agents which are designed to achieve specific objectives by evaluating various action sequences and selecting the most effective path to reach their goals. Unlike simple agents, goal-based agents carefully consider future outcomes and plan their actions accordingly. An example of this goal-based AI agent is a navigation system that identifies the fastest route to a destination by analysing multiple pathways and selecting the one that minimises travel time.

After goal-based agents come what’s known as utility-based agents, which extend the functionality of goal-based agents by not only aiming to achieve a goal but also optimising the quality of its final intended outcome. These AI agents use a utility function to assign a value to each potential outcome, thereby choosing actions that maximise overall satisfaction or performance of any given task. This approach is especially useful when multiple paths can lead to the same goal, allowing the AI agent to select the most advantageous one based on predefined criteria. Imagine a travel booking system that recommends flights not only based on reaching the destination but also considering factors like ticket price, travel time, and layovers to provide the most cost-effective and convenient option – this is what a utility-based AI agent will be able to perform as part of its tasks.

Finally, there are learning agents. These AI agents possess the ability to improve their performance over time by learning from experiences. By continuously interacting with their environment and incorporating feedback, learning agents adapt to new situations and refine their decision-making processes, making them suitable for dynamic and complex domains. There are also something known as hierarchical agents, which are nothing but a group of AI agents arranged in multiple tiers. In such a hierarchical structure, higher-level agents break down complex tasks and assign them to individual lower-level AI agents. These lower-evel agents run their tasks independently and hand over their results to the higher-level agents up the value chain.

Who all are developing AI agents

All the big movers and shakers of the AI industry are planning to release their version of AI agents for the public very soon in 2025, if they haven’t done it already by late 2024.

As I mentioned earlier, OpenAI’s reportedly working on getting their AI agent, codenamed Operator, out into the open for everyone to check out by January 2025. The AI agent is expected to be capable of autonomously operating certain tasks within your computer, like booking flight tickets and implementing code, among other things. Google seems to be working on several AI agent projects, one of which is known as Project Jarvis, which recently leaked on the Chrome Web Store. This AI agent will supposedly reside within Google Chrome browser, with the ability to not only automate and execute tasks within the browser but also operate other apps on the host PC or computer. While there’s no set release date for Project Jarvis yet, however, Google’s Gemini 2.0 AI model is expected to have AI agents built-in to offer enhanced capabilities – it’s expected to release later this year in 2024.

Microsoft is also working aggressively on AI agents, something that it had announced earlier in the year in 2024. According to its official blog, new capabilities in Copilot Studio will allow Microsoft customers to create powerful autonomous AI agents. Some of these demonstrations are in public preview at the moment, where AI agents can draw upon work or business data from different Microsoft Office 365 apps to undertake a variety of assistive tasks – like IT help desk, employee onboarding, coordinating sales and service, and more.

Anthropic, an AI startup competing with OpenAI, has already released its AI agent for people to try. According to Techcrunch, Anthropic has made significant upgrades to its Claude 3.5 Sonnet AI model which now lets it use the host computer – yes, it can interact with computers in a way that mimics humans. It can move the cursor around the screen, click on apps and buttons, and it can potentially interact with other softwares and programs installed in your PC to autonomously execute various tasks. How scary and cool is that?!

If the world wasn’t prepared for Generative AI back in 2022, then let me tell you it’s certainly not prepared for AI agents and all the various ways it can impact our lives – for better or worse. Let’s hope these AI agents don’t turn out to be the dystopian versions as depicted in an iconic movie 25 years ago, for your sake and mine, eh?

Also read: Meta AI manifesto: The AI-assisted resurrection of Mark Zuckerberg

Jayesh Shinde

Executive Editor at Digit. Technology journalist since Jan 2008, with stints at Indiatimes.com and PCWorld.in. Enthusiastic dad, reluctant traveler, weekend gamer, LOTR nerd, pseudo bon vivant. View Full Profile