March 2, 2026

12 min read

Dillon Browne

Build Hybrid Search with Postgres

Combine BM25 full-text and pgvector for enterprise workplace search. Replace Elasticsearch with Postgres for 30% cost savings. Production implementation guide.

postgres rag ai vector-search paradedb

I recently built a self-hosted hybrid search system that consolidates data from Google Drive, Slack, Confluence, and GitHub—without touching Elasticsearch, Pinecone, or any dedicated vector database. The entire stack runs on Postgres.

This wasn’t a compromise. Hybrid search combining BM25 full-text search with pgvector embeddings delivered better results than either approach alone, while keeping infrastructure dead simple: one database, one Docker container, zero specialized services.

Here’s what I learned building production hybrid search on Postgres.

Why Hybrid Search Beats Traditional Search

Vector search alone misses exact keyword matches. Full-text search struggles with semantic similarity. Most workplace search needs both.

When someone searches “kubernetes pod restart errors”, they want:

Exact matches for “pod restart” (BM25)
Semantically similar content about container failures (vector)
Ranked results combining both signals

Traditional approaches split this across multiple systems. Postgres can handle both in a single query.

The Stack: ParadeDB and pgvector

Two Postgres extensions enable hybrid search:

ParadeDB provides BM25 full-text search using Tantivy (the Rust search library powering Quickwit). It’s Elasticsearch-quality full-text search inside Postgres.

pgvector handles vector similarity with HNSW indexes for fast approximate nearest neighbor search.

Both run as Postgres extensions. No external services, no data synchronization headaches.

-- Install extensions
CREATE EXTENSION paradedb;
CREATE EXTENSION vector;

-- Create search table
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  source TEXT NOT NULL,  -- 'slack', 'gdrive', etc
  title TEXT,
  content TEXT NOT NULL,
  embedding vector(1536),  -- OpenAI ada-002 dimensions
  metadata JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Building the BM25 Index

ParadeDB’s BM25 index creation is straightforward:

-- Create BM25 index for full-text search
CALL paradedb.create_bm25(
  index_name => 'search_idx',
  table_name => 'documents',
  key_field => 'id',
  text_fields => '{
    "title": {"tokenizer": "en_stem"},
    "content": {"tokenizer": "en_stem"}
  }'::jsonb
);

The en_stem tokenizer handles English language stemming, so searches for “running” match “run” and “runs”.

BM25 scoring accounts for term frequency and document length, giving better relevance than basic tsvector full-text search.

Creating Vector Embeddings

I use OpenAI’s text-embedding-ada-002 model, but you can swap in any embedding provider. The key is consistency—all documents and queries must use the same model and dimensions.

import openai
from typing import List

def generate_embedding(text: str) -> List[float]:
    """Generate embedding for text using OpenAI."""
    response = openai.Embedding.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response['data'][0]['embedding']

def index_document(conn, source: str, title: str, content: str):
    """Index document with both text and vector."""
    # Combine title and content for richer embeddings
    text = f"{title}\n\n{content}"
    embedding = generate_embedding(text)
    
    with conn.cursor() as cur:
        cur.execute(
            """
            INSERT INTO documents (source, title, content, embedding)
            VALUES (%s, %s, %s, %s)
            """,
            (source, title, content, embedding)
        )

I batch embedding generation to reduce API calls—process 100 documents at once instead of one-by-one. OpenAI’s API supports batch requests up to 2048 inputs.

Vector Index Optimization

pgvector’s HNSW (Hierarchical Navigable Small World) index provides fast approximate nearest neighbor search:

-- Create HNSW index for vector similarity
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

The parameters matter:

m: Max connections per node (default 16). Higher = better recall, more memory
ef_construction: Size of dynamic candidate list during construction (default 64). Higher = better index quality, slower builds

I tested various values on 100k documents. m=16, ef_construction=64 hit the sweet spot—recall above 0.95 with sub-100ms query times.

For smaller datasets (<10k docs), you can skip the index and use brute force vector scan. It’s fast enough.

Hybrid Search Implementation

The magic happens when combining BM25 and vector search in a single query. Postgres CTEs make this elegant:

-- Hybrid search combining BM25 and vector similarity
WITH bm25_results AS (
  SELECT 
    id,
    paradedb.score(id) AS bm25_score
  FROM documents
  WHERE documents @@@ paradedb.parse('kubernetes pod restart')
  ORDER BY bm25_score DESC
  LIMIT 20
),
vector_results AS (
  SELECT
    id,
    1 - (embedding <=> $1::vector) AS vector_score
  FROM documents
  ORDER BY embedding <=> $1::vector
  LIMIT 20
)
SELECT
  d.id,
  d.title,
  d.content,
  d.source,
  COALESCE(b.bm25_score, 0) * 0.5 + 
  COALESCE(v.vector_score, 0) * 0.5 AS combined_score
FROM documents d
LEFT JOIN bm25_results b ON d.id = b.id
LEFT JOIN vector_results v ON d.id = v.id
WHERE b.id IS NOT NULL OR v.id IS NOT NULL
ORDER BY combined_score DESC
LIMIT 10;

The scoring weights (0.5 each) are starting points. I tuned them based on user feedback:

BM25-heavy (0.7/0.3): Better for technical queries with specific terms
Vector-heavy (0.3/0.7): Better for conceptual “how do I…” questions
Balanced (0.5/0.5): Good default for mixed workplace search

Performance at Scale

My test dataset: 250,000 documents from a mid-size engineering team (Slack, Confluence, GitHub, Google Drive).

Index build times:

BM25 index: 45 seconds
Vector index (HNSW): 12 minutes
Total storage: 18GB (14GB vectors, 4GB text + indexes)

Query performance (p95):

BM25 only: 15ms
Vector only: 35ms
Hybrid (combined): 55ms

All queries ran on a single Postgres instance: 8 vCPU, 32GB RAM, NVMe SSD. No query took over 100ms.

Compare this to Elasticsearch + Pinecone:

Two separate systems to maintain
Data sync between systems
Network latency between services
2x the operational complexity

Postgres consolidates everything.

Real-World Deployment

I deployed this as a Docker container with FastAPI, wrapping the hybrid search query in a simple REST endpoint. The entire service runs in 512MB RAM and handles 50+ concurrent requests easily.

Lessons from Production

1. Embedding quality trumps index optimization

I spent days tuning HNSW parameters. Then I switched from text-embedding-ada-002 to OpenAI’s newer text-embedding-3-small model—instant 15% improvement in result relevance. Better embeddings matter more than perfect indexes.

2. Batch everything

Generating embeddings one document at a time destroyed performance. Batching 100 documents per API call reduced indexing time from 4 hours to 25 minutes for 250k docs.

3. Normalize scores before combining

BM25 scores range from 0-15+. Vector cosine similarity is 0-1. If you don’t normalize them to the same scale, one dominates. I normalize both to 0-1 range before applying weights.

4. Postgres handles more than you think

I assumed I’d need dedicated search infrastructure. Turns out Postgres with the right extensions outperforms specialized tools for datasets under 1M documents. And it’s one less system to manage.

Cost Comparison

Running this on AWS RDS Postgres (db.r6g.2xlarge) plus OpenAI embeddings costs ~$510/month. An equivalent Elasticsearch + Pinecone stack runs $750/month—30% more expensive with 2x the operational complexity.

When Not to Use This Approach

Postgres hybrid search works great up to ~1M documents. Beyond that, you hit limits:

Index build times become painful (hours instead of minutes)
Memory requirements for vector indexes grow significantly
Query performance degrades without aggressive query optimization

At 5M+ documents, specialized systems like Elasticsearch or purpose-built vector databases make more sense. They’re designed for massive scale.

But for most teams? You don’t have 5 million documents. Start with Postgres. Scale when you need to, not before.

Next Steps

If you want to try this:

Spin up ParadeDB: docker run paradedb/paradedb (includes pgvector)
Create tables and indexes using the SQL above
Generate embeddings for your documents
Test hybrid queries with real searches

The entire setup takes under an hour. You’ll have production-quality hybrid search without touching a single specialized service.

Postgres is better at search than most teams realize. Sometimes the boring technology choice is the right one.

Found this helpful? Share it with others:

Share Share