Skip to main content

Agent

What is an agent?

An agent is an entity that is capable of perceiving its environement and act upon it. Meaning that agent is characterized by its environment and the tasks it is assigned to perform.

Based on the concept, and AI agent is a agent that can use the tools it has access to perform given task, and sometimes utilizes machine learning to gradually improve the precision.

For an AI agent, AI is the brain that reasons about the assigned task, and generates the steps to achieve this task. Given the multi-step procedure, AI agents often requires more powerful models as precision might decrease as the amount of steps increase.

With the tools an AI agent has access to and combining multiple agents to perform tasks, it is far more cost-effective to train a single model to perform all tasks.

Planning and Failure

info

Mostly digested from Chip Huyen's post.

Planning

Planning is a crucial step for AI agents to determine the actions to take. However, poorly planned steps might lead to unwanted results and is important to evaluate it make sure the steps take are secure.

To avoid such situation, planning should be decoupled from execution. And to evaluate if a plan is valid, intention classification models can be used to support planning. Once the plan is considered valid, it can be sent to an executor to carry out the plan.

If we treat each of the components mentioned above as an independent agent, we have now formed a multi-agent system.

Prompt engineering is a common approach to generate plans, where agents are provided with accesible tools and examples of expected result through carefully crafted prompts. However, hallucinations might cause the agent to carry out the wrong plan or invoking tools with incorrect parameters. Techniques for improving a model’s performance in general can be used to improve a model’s planning capabilities, including

  • Write a better system prompt with more examples.
  • Give better descriptions of the tools and their parameters so that the model understands them better.
  • Rewrite the functions themselves to make them simpler, such as refactoring a complex function into two simpler functions.
  • Use a stronger model.
  • Finetune a model for plan generation.

Despite the tips to help improve performance, there's still a planning-execution tradeoff: easy plans are hard to execute and hard plans are easy to execute. The way to deal with these problem is through hierarchical planning.

By first generating a high-level plan, gradually break each step into a more detailed plan to circumvent the possibility of wrong planning in a one-off planning approach.

With all the efforts to make sure the plan is feasible, reflection on the result is an optional but crucial method to improve the agent's performance. Reflection can be executed in different times through out the whole plan-and execute process, such as:

  • Evalute the result of user query analysis
  • Evaluate the reult of plan
  • Evaluate after each step is executed
  • Evaluate the overall result

The approach of including reflection in planning is called ReAct Framework, interleaving reasoning (including both planning and reflection) and action.

One can implement reflection in a multi-agent setting: one agent plans and takes actions and another agent evaluates the outcome after each step or after a number of steps.

Despite the approach improves the result, its main downside is latency and cost.

Failure

The more complex a task is, the more possible an agent might fail. Reasons including invalid tools, invalid parameters, task planning failure, false reflection result, time limitation, tool failure etc.

To evaluate an agent for planning failures, one option is to create a planning dataset where each example is a tuple (task, tool inventory). For each task, use the agent to generate a K number of plans. Compute the following metrics:

  • Out of all generated plans, how many are valid?
  • For a given task, how many plans does the agent have to generate to get a valid plan?
  • Out of all tool calls, how many are valid?
  • How often are invalid tools called?
  • How often are valid tools called with invalid parameters?
  • How often are valid tools called with incorrect parameter values?

Then search for patterns through the metrics and finetune the agent with the observation. Ask questions such as:

  • What types of tasks does the agent fail more on?
  • What tools does the model frequently make mistakes with?

Even though an agent can properly execute a plan, it still isn't helpful if the efficiency is low. Evaluate the performance with benchmarks such as:

  • How many steps does the agent need, on average, to complete a task?
  • How much does the agent cost, on average, to complete a task?
  • How long does each action typically take? Are there any actions that are especially time-consuming or expensive?

Agentic Design Patterns

  • Reflection
  • Tool Use
  • Planning
  • Multi-Agent Collaboration

References