SESTEK Lead Product Owner Mert Çıkan shares why AI agents that look flawless in demos fail in production, and how SESTEK overcomes these barriers with hybrid autonomy. Discover the challenges encountered in real contact center environments and the lessons learned from the field.
Projects Failing to Move from Pilot to Production
Contact centers have long been viewed as a cost line item on the balance sheet. Yet the most emotional and decisive contact between the customer and the company often happens here. Today, this field is under pressure from three directions:
- Customers expect 24/7, uninterrupted, and instant solutions
- Senior management positions AI investments as a strategic priority
- Consumers are more open to AI-powered services than ever before
With demand, pressure, and acceptance, all present simultaneously, one would expect brilliant results. However, the vast majority of Generative AI initiatives are still struggling to move past the pilot stage. According to Gartner's June 2025 Report, more than 40% of Agentic AI projects will be abandoned before reaching production by the end of 2027.
Behind these failures lies a recurring pattern: starting without clear goal definitions, going live without backend system integration, expecting optimization without collecting performance data, and ignoring the human factor.
This picture shows that production deployment discipline is more decisive than the potential of technology.
SESTEK's Approach: Hybrid Autonomy
In late 2024, a distinct threshold was crossed in the industry. A paradigm shift began from rule-based virtual assistants toward goal-oriented autonomous agents capable of reasoning and taking action. As SESTEK, we read this signal early and launched our Agentic AI platform in early 2025, and agents quickly began working with real customers in production environments.
The platform is built on a multi-agent architecture where specialized agents, each with their own knowledge sources and toolsets, share tasks under supervisor coordination. The cognitive load piled onto a single giant agent creates technical risks such as context capacity overflow, conflicting instructions, and tool routing ambiguity. As prompts grow, not only does hallucination risk increase, but latency and token costs rise as well. Adding a new scenario requires retesting the entire system; maintenance cost grows exponentially, not linearly. The specialized multi-agent architecture distributes cognitive load, localizes changes, and makes testing and scaling sustainable.
However, most enterprise customers have spent years building NLU (Natural Language Understanding)-based intent models and carefully crafted deterministic flows that work. Telling them to " throw everything away and switch to Agentic AI" is neither realistic nor responsible. A platform that ignores this reality gets rejected before it ever gets adopted.
This is where one of our platform's most critical features comes into play: the ability to run different levels of autonomy together on a single platform. The hybrid autonomy model runs rule-based scenarios, RAG (Retrieval Augmented Generation)-supported knowledge base queries, and autonomous agent decision mechanisms together within the same session. For example, the AI Agent automatically switches to a pre-validated workflow for sensitive steps like payment confirmation or identity verification; once the transaction is complete, it returns to autonomous mode while preserving context. This allows customers to gradually transfer scenarios to the AI Agent as they become ready, while protecting their existing investments.
Different Sectors, Common Contact Point
Each sector has its own regulations, infrastructure needs, and dynamics. Deploying Agentic AI at enterprise scale necessitates addressing these differences.
In highly regulated sectors like banking and healthcare, data sovereignty is non-negotiable. The solution must run entirely on customer infrastructure, large language models must process on the customer's own GPUs, and data must never leave the organization. E-commerce and retail can benefit from powerful LLMs in cloud-based architectures; however, dynamic scaling capacity to respond to unpredictable traffic spikes during campaign periods is essential. Even as sectors change, that most critical place where you touch the customer doesn't: contact centers.
This is exactly why "Voice Agent" solutions have recently created such a huge wave of excitement: companies are pursuing transforming the most emotional contact point between brand and customer with artificial intelligence. But when projects that work flawlessly on screen hit the real-world "voice" barrier, things change completely.
Voice's Harsh Test
Voice agent projects may look flawless in demo environments. But in the real world, the voice channel is the true stress test of architecture. Users expect natural, real-time conversation with no perceptible latency. A two-second wait tolerated in chat can create a "system not working" perception on the phone.
The voice experience consists of a three-layer latency chain, each requiring separate optimization: SR (Speech Recognition), agent reasoning, and TTS (Text-to-Speech). The weakest link in this chain determines the entire experience.
And the most critical link in this chain is speech recognition. A system that mishears the customer cannot reach the correct result, no matter what you put behind it. Factors like codec losses on phone lines, background noise, different accents, and jitter constantly threaten accuracy rates.
Challenges don't end with accuracy. In voice channels, customers are much more inclined to list multiple requests one after another in a single sentence. Expressions like time, day, and date can be ambiguous; capturing structured data like phone numbers or full addresses over voice is an engineering challenge in itself. In chat, users see the correct format and fix it themselves; on the phone, there's no such visual feedback.
There are also voice-channel-specific output issues: numbers, special characters, and long sentences in agent responses cause no problems in writing but can become meaningless when read aloud. "2,500.00 TL" is visually clear; how to pronounce it when spoken is a question in itself.
Latency accumulates at every layer. SR processing time, agent external service calls, response generation, and TTS synthesis add up to seconds. These delays create an experience gap that goes unnoticed in demo environments but leads to customer loss in real calls.
Lessons from the Field
Each of the challenges above emerged in real customer interactions, and we developed concrete solutions that work in the field for each one.
Architectural Decisions
Build specialist teams instead of a single giant agent.
As a single know-it-all agent's prompt grows, loss of focus and hallucination become inevitable. When we switched to a multi-agent structure, each specialist worked with clear, concise instructions, and the error rate dropped visibly. When problems arise, now only the relevant agent is updated; the need to test the entire system is eliminated.
Build an intelligent search layer instead of dumping data into the model.
Loading all neighborhood and street names in Turkey into the model's context window to capture an address, or a catalog of thousands of lines to select a product, seems like the first solution that comes to mind. But in practice, it multiplies token costs and the model begins making wrong matches in this large pool. The solution is to put an intelligent search and verification layer between the model and the data source: when the customer says "Kayışdağı Street in Ataşehir," instead of the model scanning the entire address database, the search layer filters a few candidates, and the model only selects from this narrowed list. Our most critical rule: the agent can never make up data, it only chooses from a verified list.
Trust the model, but always verify.
Language models can sometimes say "done" without actually completing the transaction. Deterministic validation layers that confirm each critical step has really been completed are essential. In failed attempts, the customer is routed to a representative without being caught in an endless loop, transfer reasons are analyzed, and the system is continuously improved.
Voice Channel Solutions
Nothing works without accurate SR.
No matter how powerful an agent you put into a system that mishears the customer, the result will be wrong. In real contact center conditions in production environments—under different accents, line noise, and codec losses—we achieved an accuracy rate exceeding 97%.
Adapt to the customer, don't ımpose format.
Capturing structured data is one of the biggest challenges in voice channels. Our solution is a gradual capture strategy: for example, we first try to get the phone number all at once; if that fails, we switch to piecemeal mode; if that fails too, we transfer to a representative. The same logic applies to dates, times, and addresses. In addition, voice-channel-specific guardrails come into play: numbers are written to be read clearly, special characters are avoided, response length is controlled to be suitable for conversation.
Manage latency through design.
Latency is solved not at a single point, but throughout the chain. We stream agent responses token by token, sending them to the TTS engine without waiting for the complete response. When external service integrations create unavoidable waits, we prevent the customer's "did the line drop?" perception with brief informational announcements or subtle background sounds. As soon as a call begins, the customer's open records and past interactions are pulled in the background, so the agent has grasped the context before the customer even explains the issue.
Dialog Design
Millimetric differences make big outcome changes.
While a generic closing question often falls flat, a context-specific concrete question gets the customer to act. For example, asking "Is there a note you'd like to pass along?" instead of "Can I help you with anything else?" is exponentially more effective. Empathy creates the same effect: small touches like the agent expressing condolences in difficult situations directly reflect in satisfaction scores.
Conclusion
The transformation promised by Agentic AI is real.
But what wins in production is not the organization that uses the biggest model; it's the one that systematizes the speed of learning.
Architectural courage is important, but what makes the sustainable difference is the operational discipline that measures every call, traces every error to its root cause, and consciously recalibrates the system every day.
The voice channel takes AI out of a polished innovation narrative and confronts it with pure engineering reality. Latency becomes visible. Errors magnify. Ambiguity is not tolerated. From this point on, the issue is not model intelligence; it's system resilience.
Real success is not the moment you give the right answer; it's the moment you can recover safely when you misunderstand. Because trust is built as much on recovery capacity as it is on accuracy.
When properly designed, humans and AI together create not a faster system, but a more reliable decision mechanism.
Transformation happens not through promises, but through repeated improvements.
Author: Mert Çıkan, Lead Product Owner
Mert is a Lead Product Owner with over 6 years of experience in AI-powered conversational technologies and product management. At SESTEK, he leads the end-to-end design and development of the Agentic AI platform, transforming next-generation AI trends into scalable enterprise value.


