Skip to main content

Milestone 3.4 — LLM orchestration + RAG

Status: Preview shipped — keyword RAG on all tiers; optional OpenAI planner on Team+ when EHX_OPENAI_API_KEY is set; hybrid pgvector RAG when index is built.

Shipped (preview)

SurfaceNotes
rag_retriever.pyKeyword + optional vector hybrid retrieval
rag_catalog.pyShared catalog document index
rag_embeddings_store.pypgvector storage in PostgreSQL
rag_indexer.pyStartup reindex when catalog fingerprint changes
embeddings_client.pyOpenAI embeddings (text-embedding-3-small)
llm_orchestrator.pyOptional OpenAI JSON planner (gpt-4o-mini default)
POST /chat/messageRAG citations + rule intent; LLM when Team+ and key configured
GET /chat/capabilitiesrag_mode, embeddings_indexed
GET /account/usage-summaryQuota snapshot for chat banner
/chat usage bannerShows AI generations used/limit; warning at 80%, alert at 100%
scripts/reindex_rag.pyForce rebuild embedding index

Configuration

VariablePurpose
EHX_OPENAI_API_KEYEnables LLM planner + embeddings (optional)
EHX_OPENAI_MODELDefault gpt-4o-mini
EHX_LLM_ENABLEDSet 0 to disable LLM even with key
EHX_EMBEDDINGS_ENABLEDSet 0 to disable vector index (keyword RAG only)
EHX_OPENAI_EMBEDDING_MODELDefault text-embedding-3-small
EHX_EMBEDDING_DIMENSIONSDefault 1536

Postgres: requires pgvector extension — Compose uses pgvector/pgvector:pg16.

Without API key: keyword RAG still runs (no embedding cost). With key: index builds on startup when catalog changes.

Not yet shipped

  • Embedding refresh webhook on pack sync (manual reindex_rag.py for now)
  • Full dashboard usage history (M3.3)