Sales-quote assistant with RAG over internal catalog

The problem

The sales team took hours to build quotes for B2B clients. Catalog data lived in a legacy ERP, pricing in a separate module, and per-client discounts in local spreadsheets. Each quote required opening three different systems and manually verifying availability.

“What killed us wasn’t building the quote — it was the back-and-forth on stock and pricing.” — Sales lead, first technical session

Why the previous attempt failed

A prior pilot tried a chatbot wired to SAP via API. The problem: answers were slow because each query hit the ERP, and the model couldn’t reason about contextual discounts without access to history.

What we built

An agent with RAG over three sources:

Product catalog vectorized with pgvector and incremental ingestion every 15 minutes.
Order history indexed per client to infer discounts and recurring volumes.
Pricing policy authored in MDX and embedded with a dedicated loader.

The agent drafts a quote in seconds. A human always validates before sending — human-in-the-loop on critical transactions, not optional.

Stack

Model: Llama 3.1 70B on on-prem GPU
Vector DB: Postgres + pgvector (same cluster as the ERP)
Orchestration: LangGraph with figure guardrails
Observability: OpenTelemetry → Grafana
Privacy: data never leaves the client’s datacenter

How we shipped it

Weeks 1-2: ingestion and quality QA against 500 historical quotes.
Weeks 3-4: pilot with 5 reps; prompt and confidence-threshold tuning.
Weeks 5-8: gradual rollout to the full sales force (~80 people) with adoption tracking.
Week 9+: 99.5% SLA operation and bi-weekly release cadence.

Outcomes

Illustrative figures based on comparable projects: quote time from hours to ~4 minutes, 97% accuracy versus the human benchmark, and 80% adoption across the sales force by the end of rollout.

Technical demo · capability showcase, not an NDA project