Milestone 3.4 — LLM orchestration + RAG
Status: Preview shipped — keyword RAG on all tiers; optional OpenAI planner on Team+ when EHX_OPENAI_API_KEY is set; hybrid pgvector RAG when index is built.
Shipped (preview)
| Surface | Notes |
|---|---|
rag_retriever.py | Keyword + optional vector hybrid retrieval |
rag_catalog.py | Shared catalog document index |
rag_embeddings_store.py | pgvector storage in PostgreSQL |
rag_indexer.py | Startup reindex when catalog fingerprint changes |
embeddings_client.py | OpenAI embeddings (text-embedding-3-small) |
llm_orchestrator.py | Optional OpenAI JSON planner (gpt-4o-mini default) |
POST /chat/message | RAG citations + rule intent; LLM when Team+ and key configured |
GET /chat/capabilities | rag_mode, embeddings_indexed |
GET /account/usage-summary | Quota snapshot for chat banner |
/chat usage banner | Shows AI generations used/limit; warning at 80%, alert at 100% |
scripts/reindex_rag.py | Force rebuild embedding index |
Configuration
| Variable | Purpose |
|---|---|
EHX_OPENAI_API_KEY | Enables LLM planner + embeddings (optional) |
EHX_OPENAI_MODEL | Default gpt-4o-mini |
EHX_LLM_ENABLED | Set 0 to disable LLM even with key |
EHX_EMBEDDINGS_ENABLED | Set 0 to disable vector index (keyword RAG only) |
EHX_OPENAI_EMBEDDING_MODEL | Default text-embedding-3-small |
EHX_EMBEDDING_DIMENSIONS | Default 1536 |
Postgres: requires pgvector extension — Compose uses pgvector/pgvector:pg16.
Without API key: keyword RAG still runs (no embedding cost). With key: index builds on startup when catalog changes.
Not yet shipped
- Embedding refresh webhook on pack sync (manual
reindex_rag.pyfor now) - Full dashboard usage history (M3.3)
Related
- M3.4a quotas: milestone-3-4a-abuse-quota-enforcement.md
- Architecture: milestone-3-0-catalog-grounded-ai-architecture.md