With the recent addition of A2A (Agent-to-Agent) protocol support in CleverChatty, itās now possible to build powerful, intelligent applicationsāwithout writing any custom logic. In this blog post, weāll walk through how to build an Agentic RAG (Retrieval-Augmented Generation) system using CleverChatty.
š¤ What is Agentic RAG?
The term agentic refers to an agent's ability to reason, make decisions, use tools, and interact with other agents or humans intelligently.
In the context of RAG, an Agentic RAG system doesnāt just retrieve documents based on a userās prompt. Instead, it:
- Preprocesses the userās query,
- Executes a more contextually refined search,
- Postprocesses the results, summarizing and formatting them,
- And only then returns the final answer to the user.
This kind of intelligent behavior is made possible by using a Large Language Model (LLM) as the core reasoning component.
The goal of a RAG system is to enrich the userās query with external context, especially when the required information is not available within the LLM itself. This typically involves accessing an organizationās knowledge baseāstructured or unstructuredāand providing relevant data to the LLM to enhance its responses.
š§ Basic RAG vs Agentic RAG
In a previous post, we demonstrated how to build a basic RAG setup using CleverChatty and the Model Context Protocol (MCP). That setup connects a user-facing chatbot to a knowledge base via an MCP server.
This works well if your company or app already has an indexed knowledge base (e.g., via ElasticSearch or Pinecone). The MCP server acts as a bridge, letting the agent retrieve relevant information and answer queries:
However, this basic setup relies heavily on the user-facing chatbot to formulate accurate search queries and interpret results.
With Agentic RAG, we introduce a specialized agent that handles these tasks more intelligently. It uses an LLM to enhance both the search input and the final response:
This intelligent agent receives the userās prompt from the main chatbot via the A2A protocol, processes the query through an LLM, retrieves data from the knowledge base using MCP, and returns a polished response.
š ļø How to Build an Agentic RAG System with CleverChatty
CleverChatty makes this entire architecture remarkably easy to set up. All you need is:
- A user-facing AI chat server
- An Agentic RAG server
- An MCP server (the knowledge base bridge)
Letās walk through each component.
š¦ Step 1: Install CleverChatty
Youāll need Go installed on your system. Then install CleverChatty server and CLI tools:
go install github.com/gelembjuk/cleverchatty/cleverchatty-server@latest
go install github.com/gelembjuk/cleverchatty/cleverchatty-cli@latest
You now have two commands:
cleverchatty-server
: runs an agent servercleverchatty-cli
: runs a client to communicate with agents
š§© Step 2: Create the Agentic RAG Server
Create a working directory (e.g., agentic_rag_server/
) and add a cleverchatty_config.json
file:
{
"agent_id": "agentic_rag",
"log_file_path": "log_file.txt",
"debug_mode": false,
"model": "ollama:qwen2.5:3b",
"system_instruction": "For any prompt you receive, first preprocess it to generate a more relevant search query. The query must always be forwarded to the search tool. Then, after receiving the search results, postprocess them to summarize and format the response before returning it to the user.",
"tools_servers": {
"knowledge_base_server": {
"command": "uv",
"args": ["run", "knowledge_base_server.py"],
"env": {
"API_KEY": "YOUR_API_KEY"
}
}
},
"a2a_settings": {
"enabled": true,
"agent_id_required": true,
"url": "http://localhost:8080/",
"listen_host": "0.0.0.0:8080",
"title": "Knowledge Base Agentic RAG"
}
}
Key Configuration Notes
system_instruction
: Directs the LLM to enhance the search query and summarize results.model
: You can replaceollama:qwen2.5:3b
with any other LLM provider (e.g., OpenAI, Anthropic).tools_servers
: Defines an external tool (MCP server) that the RAG agent will use to fetch knowledge.a2a_settings
: Enables A2A communication so other agents can call this one as a tool.
š Step 3: Create the MCP Knowledge Base Server
The MCP server accepts a search query and returns relevant content. Hereās a minimal example using fastmcp
:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("Demo")
@mcp.tool()
def search_in_database(query: str) -> str:
"""Search the knowledge base"""
results = ... # implement your search logic here
return results
This server should be callable by the Agentic RAG agent via the "knowledge_base_server"
tool.
š Step 4: Run the Agentic RAG Server
cleverchatty-server start --directory /path/to/agentic_rag_server
The server will listen on http://localhost:8080
.
š¤ Step 5: Configure the Main Chat Server
Now create another working directory (e.g., common_ai_chat/
) and add this config file:
{
"agent_id": "common_ai_chat",
"log_file_path": "log_file.txt",
"debug_mode": false,
"model": "ollama:llama3.1:latest",
"system_instruction": "You are a helpful AI assistant. You can answer questions and provide information.",
"tools_servers": {
"agentic_rag_server": {
"endpoint": "http://localhost:8080/",
"interface": "rag"
}
},
"a2a_settings": {
"enabled": true,
"agent_id_required": true,
"url": "http://localhost:8081/",
"listen_host": "0.0.0.0:8081",
"title": "Common AI Chat"
}
}
This setup tells the chat server to forward RAG-related queries to the Agentic RAG server via A2A.
ā¶ļø Step 6: Run the Chat Server
cleverchatty-server start --directory /path/to/common_ai_chat
The chat server listens on http://localhost:8081
.
š¬ Step 7: Test the Agentic RAG System
Now open a new terminal and start the client:
cleverchatty-cli --server http://localhost:8081/ --agent client_id
Try asking a question! Behind the scenes:
- Your prompt goes to the main chat server.
- It forwards the query to the Agentic RAG server.
- The RAG agent uses its LLM to improve the query.
- It fetches knowledge via the MCP server.
- It summarizes the results and returns them.
- The main chat server includes that context when generating a final response.
š Benefits of This Architecture
-
Cost Optimization By assigning different LLMs to different agents, you can significantly reduce token usage costs. For example, the main AI chat server may use a high-end model like OpenAIās ChatGPT-4.1 to deliver a premium user experience. Meanwhile, the Agentic RAG server can operate on a more cost-effective model such as ChatGPT-3.5 or a local LLM to handle query preprocessing and summarizationātasks that often consume more tokens but require less nuance. This way, expensive models are only used for final user-facing responses.
-
Improved Data Privacy and Access Control Many businesses are cautious about sending sensitive data to external LLM providers. This architecture allows you to isolate internal data access to local agents. For instance, the Agentic RAG server can run a local model like Ollama, ensuring that raw knowledge base content stays within your infrastructure. Only the processed summary is passed to the external-facing AI chat server, keeping sensitive information secure.
-
Modular and Extensible Design Each component in this architecture is independently configurable and replaceable. Want to switch to a different LLM provider or connect a new MCP server? You can do that by updating a single config fileāno need to modify the main AI chat server or other agents. This modularity supports rapid development, easy scaling, and seamless upgrades over time.
ā Summary
With CleverChatty, you can now:
- Combine RAG with LLM-powered agents
- Enable smart query refinement and summarization
- Orchestrate all of this via A2A + MCPāno manual code required
This modular architecture makes it easy to extend your assistantās capabilities by simply connecting new agents and tools.
š Next Steps
Want help building your own Agentic RAG pipeline? Drop a comment or open an issue on GitHub!