The problem
The sales team took hours to build quotes for B2B clients. Catalog data lived in a legacy ERP, pricing in a separate module, and per-client discounts in local spreadsheets. Each quote required opening three different systems and manually verifying availability.
“What killed us wasn’t building the quote — it was the back-and-forth on stock and pricing.” — Sales lead, first technical session
Why the previous attempt failed
A prior pilot tried a chatbot wired to SAP via API. The problem: answers were slow because each query hit the ERP, and the model couldn’t reason about contextual discounts without access to history.
What we built
An agent with RAG over three sources:
- Product catalog vectorized with pgvector and incremental ingestion every 15 minutes.
- Order history indexed per client to infer discounts and recurring volumes.
- Pricing policy authored in MDX and embedded with a dedicated loader.
The agent drafts a quote in seconds. A human always validates before sending — human-in-the-loop on critical transactions, not optional.
Stack
- Model: Llama 3.1 70B on on-prem GPU
- Vector DB: Postgres + pgvector (same cluster as the ERP)
- Orchestration: LangGraph with figure guardrails
- Observability: OpenTelemetry → Grafana
- Privacy: data never leaves the client’s datacenter
How we shipped it
- Weeks 1-2: ingestion and quality QA against 500 historical quotes.
- Weeks 3-4: pilot with 5 reps; prompt and confidence-threshold tuning.
- Weeks 5-8: gradual rollout to the full sales force (~80 people) with adoption tracking.
- Week 9+: 99.5% SLA operation and bi-weekly release cadence.
Outcomes
Illustrative figures based on comparable projects: quote time from hours to ~4 minutes, 97% accuracy versus the human benchmark, and 80% adoption across the sales force by the end of rollout.
Technical demo · capability showcase, not an NDA project