AI Tools

The Economy of AI Multi-Agent Systems and how to Save Money on Token Usage

Wondering how to manage costs while using multiple AI agents? This blog explores critical strategies for tracking token consumption in AutoGen interactions. From understanding token accumulation in two-agent and group chats to implementing monitoring middleware, learn key practices for optimizing...

Have you ever wondered how multiple AI agents can interact seamlessly while keeping costs under control? When working with multiple AI agents in AutoGen, understanding and managing token consumption is critical to both cost optimisation and system performance. Let's look at how tokens are consumed in different conversation patterns and explore some strategies for efficient token usage.

Token flow in different conversation patterns

Two-Agent Conversation Pattern

How do tokens add up in a simple interaction between you and an assistant?

Token accumulation:

First exchange: 250 tokens (100 input + 150 output)
Second exchange: 200 tokens (50 input + 100 output)
Total: 450 tokens

Isn't it fascinating how each interaction builds on the previous one?

Group chat pattern

What happens when multiple agents join the conversation? Let's have a look:

Token accumulation:

Initial task: 100 tokens
Agent1 processing: 220 tokens (100 input + 120 output)
Agent2 processing: 370 tokens (220 input + 150 output)
Agent3 processing: 470 tokens (370 input + 100 output)
Final result: 550 tokens (470 input + 80 output)

Can you see how quickly tokens can accumulate in a group chat scenario?

Token tracking implementation

But How do you keep track of all these tokens?

AutoGen provides a middleware-based approach to tracking token usage. Let's explore a token counter middleware implementation:

public class TokenCounterMiddleware : IMiddleware
{
    private read-only List<ChatCompletionResponse> responses = new List<ChatCompletionResponse>()
    
    public async Task<IMessage> InvokeAsync(MiddlewareContext context, IAgent agent, 
        cancellationToken = default)
    {
        var reply = await agent.GenerateReplyAsync(context.Messages, context, 
            cancellation_token);

        if (reply is IMessage<ChatCompletionResponse> message)
        {
            Responses.Add(message.Content)
        }

        Return the reply;
    }

    public int GetCompletionTokenCount() => responses.Sum(r => r.Usage.CompletionTokens);
    public int GetPromptTokenCount() => responses.Sum(r => r.Usage.PromptTokens);
    public int GetTotalTokenCount() => responses.Sum(r => r.Usage.TotalTokens);
}

Token usage example

Let's see how this works in a real multi-agent scenario:

// Create agents with token tracking
var assistant = new AssistantAgent("assistant", modelClient)
var tokenCounter = new TokenCounterMiddleware();
assistant.RegisterMiddleware(tokenCounter)

// Track token usage throughout the conversation
await Assistant.SendAsync("Explain quantum computing")
Console.WriteLine($"Total tokens used: {tokenCounter.GetTotalTokenCount()}");
Console.WriteLine($"Estimated cost: ${tokenCounter.GetTotalTokenCount() * 0.002 / 1000:F4}");

Token usage patterns

Two-Agent Conversations

How are tokens counted in a two-agent conversation?

Enter tokens:
- Assistant system message
- User message
- Previous conversation history
Output tokens:
- Assistant's reply
Accumulation:
- Each subsequent message includes all previous messages in the prompt tokens.
- Tokens usage grows linearly with conversation length

Group chat scenarios

Group chats introduce more complexity. How do tokens accumulate?

System context:
- Every agent has its own system message
- Agent descriptions are included in the context
Message Routing:
- Messages between agents count as input and output tokens
- Each agent sees full conversation history
Token multiplication:
- Total tokens grow faster with multiple agents
- Each agent's response adds to everyone's context

Can you see why managing tokens in group chats requires a more strategic approach?

Best practices for token management

Wondering how to optimise your token usage? Here are some best practices:

Use middleware for monitoring
- Add token counting middleware to each agent
- Track both inbound and outbound tokens
- Monitor costs in real time
Implement conversation history management
- Delete conversation history when appropriate
- Use summarisation for long conversations
- Implement message pruning when token count exceeds thresholds
Optimise Agent System Messages
- Keep system prompts concise
- Remove unnecessary context
- Use role-specific instructions only when necessary
Monitor and log token usage
- Track token usage per agent
- Monitor call costs
- Set up alerts for high token usage

Conclusion

Understanding and managing token usage is essential when working with multiple agents. Token counting patterns vary significantly between two-agent conversations and group chats, each with its own considerations for optimisation. By implementing proper monitoring through middleware and following best practices, you can optimise costs and maintain efficient conversations between agents.

Remember that token usage has a direct impact on both performance and cost:

Two agent conversations have linear token growth.
Group chats have exponential token growth due to multiple participants
System messages and conversation history contribute significantly to token usage
Regular monitoring and management of token usage is critical for cost control.

The Economy of AI Multi-Agent Systems and how to Save Money on Token Usage

Token flow in different conversation patterns

Two-Agent Conversation Pattern

Group chat pattern

Token tracking implementation

Token usage example

Token usage patterns

Two-Agent Conversations

Group chat scenarios

Best practices for token management

Conclusion

Read next

What is AI Thinking when it's Lying to your Face?

The AI That Never Forgets: Understanding Infinite Conversations in Cline and other Chatbots

Empower Any AI Workflow: A Practical Blueprint for LLM Observability with Langfuse & LiteLLM

Comments ()

Token flow in different conversation patterns

Two-Agent Conversation Pattern

Group chat pattern

Token tracking implementation

Token usage example

Token usage patterns

Two-Agent Conversations

Group chat scenarios

Best practices for token management

Conclusion

Read next

Comments ( )

Comments ()