Case Study · AI & Machine Learning

AI support chatbot that reduced customer queries to human agents by 60%

How we implemented an LLM-powered chatbot with retrieval-augmented generation that handles customer guidance, resolves routine queries, and escalates intelligently to human agents.

IndustryFinancial Services
RegionPakistan
Timeline11 weeks
StackPython, FastAPI, OpenAI API, LangChain, PostgreSQL, pgvector, Vue 3

The situation

A financial services company's support team was spending 70–80% of their time answering the same 40–50 questions repeatedly: policy details, account procedures, document requirements, and process timelines. These were questions with clear, documented answers — but customers had no way to access that documentation themselves in a useful format. The company's website had a static FAQ page that customers rarely found.

Support tickets peaked on Monday mornings and after policy update announcements. During these spikes, response time degraded to 4–6 hours. Customer satisfaction scores were below target primarily due to wait time, not the quality of answers once they arrived. Hiring more agents was the default response — but the root problem was that most queries didn't require a human.

What we built

RAG-based knowledge system

The company's policy documents, FAQs, procedure manuals, and product guides were chunked, embedded using OpenAI's embedding models, and stored in PostgreSQL with the pgvector extension. When a customer sends a message, the system retrieves the most relevant document chunks and passes them as context to the LLM — ensuring answers are grounded in actual company policy, not hallucinated.

LLM conversation engine

A FastAPI service wrapping GPT-4o with a system prompt that defines the assistant's role, tone, and escalation rules. The LLM is instructed to answer only from retrieved context, to say "I'll connect you with a specialist" when the context doesn't cover the question, and to never speculate on regulatory or legal matters.

Intent classification and escalation

A lightweight classifier runs before the LLM on each message to identify escalation-required intents: complaints, refund requests, account suspension queries, and regulatory questions. These are routed directly to a human queue without attempting an LLM response. The escalation path is transparent to the customer — they're told immediately rather than receiving a non-answer.

Embeddable Vue 3 widget

A chat widget that embeds in the company's web platform with zero-dependency deployment (a single script tag). Maintains conversation history within a session. Handles the handoff to a human agent with full conversation transcript so the agent has context without the customer repeating themselves.

Admin knowledge base management

A dashboard where support managers upload new documents, mark outdated content for removal, and review flagged conversations (ones the chatbot couldn't answer confidently). The re-embedding pipeline runs automatically on document updates.

Results

  • ~60% reduction in customer support queries requiring human agents
  • ~55% improvement in query resolution time (instant vs. 4–6 hour queue)
  • ~40% reduction in dependency on human agents for routine queries
  • Support team capacity freed to focus on complex and high-value customer interactions
  • Customer satisfaction scores improved, driven primarily by resolution speed on routine queries

Handle the routine. Reserve your team for what matters.

Book Your Strategy Call