Ask anything about a store — its reviews, orders, and tickets — and get an answer that either cites the exact source or shows the SQL behind the number.
Impact
An agentic store copilot. A router sends each question down RAG (cited answers from reviews/tickets/FAQ), text-to-SQL (analytics over orders/products), or a hybrid path that detects a pattern with SQL and explains it with cited RAG. Hybrid retrieval + reranker, citation-or-decline, and dual offline evals gated in CI.
The problem
Store teams ask questions that don’t fit one tool. “What do customers say about product X?” lives in reviews; “top 5 products by revenue?” is SQL; “why are returns spiking on X?” needs both — a measured pattern and a grounded explanation. Plain vector search hallucinates and can’t show its work, and nothing stops a bad change from shipping.
The approach
A thin router classifies each question (rag / sql / hybrid). RAG fuses BM25 + dense vectors, reranks with a cross-encoder, and grounds every claim in retrieved chunks — declining when support is missing. Text-to-SQL is schema-aware and executed read-only with validate-and-retry. The hybrid path runs both: SQL detects the spike, RAG explains the cause with citations. Quality is made measurable — a Ragas faithfulness golden set and a SQL execution-accuracy harness both gate every PR in CI.
Architecture
IN
User question
natural language
↓
PROCESS
Router
LLM classify → rag · sql · hybrid
↓
PROCESS
BM25 lexical
Postgres full-text
PROCESS
Dense vector
pgvector · MiniLM 384d
PROCESS
text-to-SQL
schema-aware · read-only · validate+retry
↓
MODEL
RRF fuse + rerank
cross-encoder ms-marco MiniLM
↓
MODEL
Grounded generation
Anthropic · cite-or-decline
↓
OUT
Answer + citations / SQL rows
every claim traces to a chunk
cross-cutting
GATE
Dual evals in CI
Ragas 0.939 · SQL 24/24 · GitHub Actions gate
STORE
Langfuse tracing
spans + token cost, every request
DEPLOY
Deployed
FastAPI→Modal · pgvector→Supabase · UI→Vercel
- ·Thin router, two heavy tools: the hybrid path runs SQL and RAG together — SQL detects the pattern, RAG explains it with citations.
- ·Two retrieval signals fuse via Reciprocal Rank Fusion, then a cross-encoder reranker decides what the model sees; an answer either grounds in a retrieved chunk or the system declines.
- ·SQL is schema-aware, read-only, and validated-then-retried — no write path ever reaches the DB.
- ·Versioned prompts + Ragas-faithfulness (0.939) and SQL-accuracy (24/24) gates block quality drift on every PR.
- ·Deploy is stateless FastAPI on Modal (scale-to-zero GPU-free), pgvector on Supabase, UI on Vercel — backend URL stays server-side behind a Next proxy.
How it was built
Phase 1
Walking skeleton
- ✓Seed products / reviews / tickets / FAQ + orders with deliberate patterns
- ✓Chunk → embed into pgvector; naive top-k retrieval that cites the source
- ✓Basic text-to-SQL (schema-in-prompt, read-only) and a POST /ask that routes
Phase 2
Production retrieval + hybrid
- ✓Hybrid retrieval: BM25 + vector fused with Reciprocal Rank Fusion
- ✓Cross-encoder reranker + near-duplicate dedup; citation-or-decline gate
- ✓SQL hardening (validate + retry) and trend/time-series support
- ✓Hybrid path: SQL detects the pattern, RAG explains it — one fused answer
Phase 3
Evals + observability + CI
- ✓RAG faithfulness golden set (50 verified Q&A) scored with Ragas — mean 0.939
- ✓SQL execution-accuracy harness — 24/24 on the golden set
- ✓Langfuse tracing on every step; both evals wired into GitHub Actions as merge gates
Stack
FastAPIPostgres + pgvectorsentence-transformersBM25cross-encoder rerankAnthropicRagasLangfuseGitHub ActionsNext.js