Working with files in AI agents that use MCP servers looks straightforward at first. In reality, it’s one of those areas where everything almost works… until you try to do something real.
I ran into this while building and testing my AI agent tool, CleverChatty. The task was trivial on paper: “Take an email attachment and upload it to my file storage.” No reasoning, no creativity, just move a file from point A to point B.
And yet, this turned out to be surprisingly painful.
The root of the problem is how most AI agent workflows are designed. Typically, every MCP tool response is passed through the LLM, which then decides what to do next. This makes sense for text, metadata, and structured responses. But it completely falls apart once files enter the picture.
If an MCP server returns a file, the “default” approach is to pass that file through the LLM as well. At that point, things get ugly. Large files burn tokens at an alarming rate, costs explode, latency grows, and you end up shoving binary or base64 data through a system that was never meant to handle it. This is a known issue with large MCP responses, but oddly enough, I couldn’t find any clear guidance or best practices on how to deal with it.