Agent orchestration + production RAGrunning in your browser

Ask My Store

Ask the running copilot. A router sends each question to RAG, SQL, or a hybrid path — and it cites its sources or shows the SQL.

Ask a store's data anything. Every answer comes back with a citation, or the SQL it ran.

Impact

0.939

RAG faithfulness

24/24

SQL accuracy

eval-gated

A copilot over a store's own data. A router reads each question and picks a path: RAG for reviews, tickets, and FAQ; text-to-SQL for orders and products; or both for a question like 'why are returns spiking on this product?' Qualitative answers cite their sources or decline; analytical ones show the query.

The problem

Questions about a store don't split cleanly by tool. 'What do people say about this product?' lives in the reviews. 'Top 5 by revenue?' is a SQL query. 'Why are returns climbing?' needs both: find the trend, then explain it. Vector search on its own invents answers and can't point to a source, and nothing keeps a quality regression from shipping.

The approach

The router labels each question rag, sql, or hybrid. RAG fuses BM25 and vector search with RRF, reranks with a cross-encoder, and answers only from the retrieved passages; with nothing to support an answer, it declines instead of guessing. SQL is schema-aware, read-only, and retried when a query comes back invalid. Hybrid runs both and combines them: SQL finds the pattern, RAG explains it with citations. Two eval suites, Ragas faithfulness and SQL execution accuracy, run on every PR.

Architecture

User question

natural language

PROCESS

Router

LLM classify → rag · sql · hybrid

PROCESS

BM25 lexical

Postgres full-text

PROCESS

Dense vector

pgvector · MiniLM 384d

PROCESS

text-to-SQL

schema-aware · read-only · validate+retry

MODEL

RRF fuse + rerank

cross-encoder ms-marco MiniLM

MODEL

Grounded generation

Anthropic · cite-or-decline

OUT

Answer + citations / SQL rows

every claim traces to a chunk

cross-cutting

GATE

Dual evals in CI

Ragas 0.939 · SQL 24/24 · GitHub Actions gate

STORE

Langfuse tracing

spans + token cost, every request

DEPLOY

Deployed

FastAPI→Modal · pgvector→Supabase · UI→Vercel

·Thin router, two heavy tools: the hybrid path runs SQL and RAG together — SQL detects the pattern, RAG explains it with citations.
·Two retrieval signals fuse via Reciprocal Rank Fusion, then a cross-encoder reranker decides what the model sees; an answer either grounds in a retrieved chunk or the system declines.
·SQL is schema-aware, read-only, and validated-then-retried — no write path ever reaches the DB.
·Versioned prompts + Ragas-faithfulness (0.939) and SQL-accuracy (24/24) gates block quality drift on every PR.
·Deploy is stateless FastAPI on Modal (scale-to-zero GPU-free), pgvector on Supabase, UI on Vercel — backend URL stays server-side behind a Next proxy.

How it was built

Phase 1

Walking skeleton

✓Seed products / reviews / tickets / FAQ + orders with deliberate patterns
✓Chunk → embed into pgvector; naive top-k retrieval that cites the source
✓Basic text-to-SQL (schema-in-prompt, read-only) and a POST /ask that routes

Phase 2

Production retrieval + hybrid

✓Hybrid retrieval: BM25 + vector fused with Reciprocal Rank Fusion
✓Cross-encoder reranker + near-duplicate dedup; citation-or-decline gate
✓SQL hardening (validate + retry) and trend/time-series support
✓Hybrid path: SQL detects the pattern, RAG explains it — one fused answer

Phase 3

Evals + observability + CI

✓RAG faithfulness golden set (50 verified Q&A) scored with Ragas — mean 0.939
✓SQL execution-accuracy harness — 24/24 on the golden set
✓Langfuse tracing on every step; both evals wired into GitHub Actions as merge gates

Stack

FastAPIPostgres + pgvectorsentence-transformersBM25cross-encoder rerankAnthropicRagasLangfuseGitHub ActionsNext.js

Source

GitHub ↗Eval report

miskelvilaly@gmail.com ← back to work