Build Hybrid Search with Postgres
Combine BM25 full-text and pgvector for enterprise workplace search. Replace Elasticsearch with Postgres for 30% cost savings. Production implementation guide.
I recently built a self-hosted hybrid search system that consolidates data from Google Drive, Slack, Confluence, and GitHub—without touching Elasticsearch, Pinecone, or any dedicated vector database. The entire stack runs on Postgres.
This wasn’t a compromise. Hybrid search combining BM25 full-text search with pgvector embeddings delivered better results than either approach alone, while keeping infrastructure dead simple: one database, one Docker container, zero specialized services.
Here’s what I learned building production hybrid search on Postgres.
Why Hybrid Search Beats Traditional Search
Vector search alone misses exact keyword matches. Full-text search struggles with semantic similarity. Most workplace search needs both.
When someone searches “kubernetes pod restart errors”, they want:
- Exact matches for “pod restart” (BM25)
- Semantically similar content about container failures (vector)
- Ranked results combining both signals
Traditional approaches split this across multiple systems. Postgres can handle both in a single query.
The Stack: ParadeDB and pgvector
Two Postgres extensions enable hybrid search:
ParadeDB provides BM25 full-text search using Tantivy (the Rust search library powering Quickwit). It’s Elasticsearch-quality full-text search inside Postgres.
pgvector handles vector similarity with HNSW indexes for fast approximate nearest neighbor search.
Both run as Postgres extensions. No external services, no data synchronization headaches.
-- Install extensions
CREATE EXTENSION paradedb;
CREATE EXTENSION vector;
-- Create search table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
source TEXT NOT NULL, -- 'slack', 'gdrive', etc
title TEXT,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI ada-002 dimensions
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Building the BM25 Index
ParadeDB’s BM25 index creation is straightforward:
-- Create BM25 index for full-text search
CALL paradedb.create_bm25(
index_name => 'search_idx',
table_name => 'documents',
key_field => 'id',
text_fields => '{
"title": {"tokenizer": "en_stem"},
"content": {"tokenizer": "en_stem"}
}'::jsonb
);
The en_stem tokenizer handles English language stemming, so searches for “running” match “run” and “runs”.
BM25 scoring accounts for term frequency and document length, giving better relevance than basic tsvector full-text search.
Creating Vector Embeddings
I use OpenAI’s text-embedding-ada-002 model, but you can swap in any embedding provider. The key is consistency—all documents and queries must use the same model and dimensions.
import openai
from typing import List
def generate_embedding(text: str) -> List[float]:
"""Generate embedding for text using OpenAI."""
response = openai.Embedding.create(
model="text-embedding-ada-002",
input=text
)
return response['data'][0]['embedding']
def index_document(conn, source: str, title: str, content: str):
"""Index document with both text and vector."""
# Combine title and content for richer embeddings
text = f"{title}\n\n{content}"
embedding = generate_embedding(text)
with conn.cursor() as cur:
cur.execute(
"""
INSERT INTO documents (source, title, content, embedding)
VALUES (%s, %s, %s, %s)
""",
(source, title, content, embedding)
)
I batch embedding generation to reduce API calls—process 100 documents at once instead of one-by-one. OpenAI’s API supports batch requests up to 2048 inputs.
Vector Index Optimization
pgvector’s HNSW (Hierarchical Navigable Small World) index provides fast approximate nearest neighbor search:
-- Create HNSW index for vector similarity
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
The parameters matter:
- m: Max connections per node (default 16). Higher = better recall, more memory
- ef_construction: Size of dynamic candidate list during construction (default 64). Higher = better index quality, slower builds
I tested various values on 100k documents. m=16, ef_construction=64 hit the sweet spot—recall above 0.95 with sub-100ms query times.
For smaller datasets (<10k docs), you can skip the index and use brute force vector scan. It’s fast enough.
Hybrid Search Implementation
The magic happens when combining BM25 and vector search in a single query. Postgres CTEs make this elegant:
-- Hybrid search combining BM25 and vector similarity
WITH bm25_results AS (
SELECT
id,
paradedb.score(id) AS bm25_score
FROM documents
WHERE documents @@@ paradedb.parse('kubernetes pod restart')
ORDER BY bm25_score DESC
LIMIT 20
),
vector_results AS (
SELECT
id,
1 - (embedding <=> $1::vector) AS vector_score
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 20
)
SELECT
d.id,
d.title,
d.content,
d.source,
COALESCE(b.bm25_score, 0) * 0.5 +
COALESCE(v.vector_score, 0) * 0.5 AS combined_score
FROM documents d
LEFT JOIN bm25_results b ON d.id = b.id
LEFT JOIN vector_results v ON d.id = v.id
WHERE b.id IS NOT NULL OR v.id IS NOT NULL
ORDER BY combined_score DESC
LIMIT 10;
The scoring weights (0.5 each) are starting points. I tuned them based on user feedback:
- BM25-heavy (0.7/0.3): Better for technical queries with specific terms
- Vector-heavy (0.3/0.7): Better for conceptual “how do I…” questions
- Balanced (0.5/0.5): Good default for mixed workplace search
Performance at Scale
My test dataset: 250,000 documents from a mid-size engineering team (Slack, Confluence, GitHub, Google Drive).
Index build times:
- BM25 index: 45 seconds
- Vector index (HNSW): 12 minutes
- Total storage: 18GB (14GB vectors, 4GB text + indexes)
Query performance (p95):
- BM25 only: 15ms
- Vector only: 35ms
- Hybrid (combined): 55ms
All queries ran on a single Postgres instance: 8 vCPU, 32GB RAM, NVMe SSD. No query took over 100ms.
Compare this to Elasticsearch + Pinecone:
- Two separate systems to maintain
- Data sync between systems
- Network latency between services
- 2x the operational complexity
Postgres consolidates everything.
Real-World Deployment
I deployed this as a Docker container with FastAPI, wrapping the hybrid search query in a simple REST endpoint. The entire service runs in 512MB RAM and handles 50+ concurrent requests easily.
Lessons from Production
1. Embedding quality trumps index optimization
I spent days tuning HNSW parameters. Then I switched from text-embedding-ada-002 to OpenAI’s newer text-embedding-3-small model—instant 15% improvement in result relevance. Better embeddings matter more than perfect indexes.
2. Batch everything
Generating embeddings one document at a time destroyed performance. Batching 100 documents per API call reduced indexing time from 4 hours to 25 minutes for 250k docs.
3. Normalize scores before combining
BM25 scores range from 0-15+. Vector cosine similarity is 0-1. If you don’t normalize them to the same scale, one dominates. I normalize both to 0-1 range before applying weights.
4. Postgres handles more than you think
I assumed I’d need dedicated search infrastructure. Turns out Postgres with the right extensions outperforms specialized tools for datasets under 1M documents. And it’s one less system to manage.
Cost Comparison
Running this on AWS RDS Postgres (db.r6g.2xlarge) plus OpenAI embeddings costs ~$510/month. An equivalent Elasticsearch + Pinecone stack runs $750/month—30% more expensive with 2x the operational complexity.
When Not to Use This Approach
Postgres hybrid search works great up to ~1M documents. Beyond that, you hit limits:
- Index build times become painful (hours instead of minutes)
- Memory requirements for vector indexes grow significantly
- Query performance degrades without aggressive query optimization
At 5M+ documents, specialized systems like Elasticsearch or purpose-built vector databases make more sense. They’re designed for massive scale.
But for most teams? You don’t have 5 million documents. Start with Postgres. Scale when you need to, not before.
Next Steps
If you want to try this:
- Spin up ParadeDB:
docker run paradedb/paradedb(includes pgvector) - Create tables and indexes using the SQL above
- Generate embeddings for your documents
- Test hybrid queries with real searches
The entire setup takes under an hour. You’ll have production-quality hybrid search without touching a single specialized service.
Postgres is better at search than most teams realize. Sometimes the boring technology choice is the right one.