January 3, 2026

12 min read

Dillon Browne

Rethinking I/O Performance Infrastructure

I/O bottlenecks shaped infrastructure for decades. Modern NVMe and cloud storage changed the game—here's what that means for your architecture today.

infrastructure performance cloud storage devops

Rethinking I/O Performance Infrastructure

For years, the phrase “I/O is the bottleneck” was gospel in infrastructure engineering. We designed entire systems around minimizing disk access. We cached aggressively. We denormalized databases. We threw memory at problems that storage could have solved cheaper.

I built my career making those same trade-offs. But in 2026, that conventional wisdom is obsolete—and recognizing this I/O performance shift earlier than your competition creates real architectural advantages that simplify infrastructure while improving speed.

Understanding Traditional I/O Performance Bottlenecks

When I started in DevOps, the performance hierarchy was clear:

L1 Cache: ~1 ns
L2 Cache: ~4 ns
RAM: ~100 ns
SSD: ~100 μs (1000x slower than RAM)
HDD: ~10 ms (100,000x slower than RAM)

This enormous gap drove architectural decisions at every level. In my early cloud infrastructure work, I’ve seen teams:

Design elaborate in-memory caching layers to avoid database reads
Pre-compute and store aggregations because real-time queries were “too expensive”
Choose NoSQL databases primarily to avoid JOIN operations that required disk seeks
Build complex application-level sharding schemes to reduce per-node I/O

These weren’t premature optimizations—they were survival strategies. I/O was genuinely the bottleneck.

Optimizing Infrastructure: NVMe and Cloud Storage Evolution

Three technology shifts fundamentally altered the I/O landscape:

1. Deploy NVMe for 10x Performance Gains

NVMe drives aren’t just faster SSDs—they represent a paradigm shift. When I migrated production workloads from SATA SSDs to NVMe in 2021, I saw:

# SATA SSD Performance
$ fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread \
  --bs=4k --direct=1 --size=4G --numjobs=4 --runtime=60 --group_reporting

READ: bw=400MiB/s, iops=102400, runt=10240msec

# NVMe Performance (Same Test)
$ fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread \
  --bs=4k --direct=1 --size=4G --numjobs=4 --runtime=60 --group_reporting

READ: bw=3200MiB/s, iops=819200, runt=1280msec

8x improvement in real-world random reads. But the latency gains mattered more:

SATA SSD: ~100 μs
NVMe: ~20 μs (5x reduction)
NVMe over PCIe 4.0: ~10 μs (10x reduction)

Suddenly, the gap between RAM and storage shrank from 1000x to 100x—and continues narrowing.

2. Leverage Fast Cloud Storage Solutions

AWS EBS gp3 volumes deliver:

16,000 IOPS baseline
1,000 MB/s throughput
Single-digit millisecond latency

More importantly, cloud providers abstracted away traditional storage failure modes. In my Kubernetes infrastructure, I’ve replaced elaborate local storage management with simple EBS volumes—and applications run faster while being more reliable.

3. Recognize the CPU-Storage Performance Shift

While storage improved 10-100x, CPU single-thread performance plateaued. My production workloads increasingly show:

# Profiling a typical API endpoint
import cProfile
import pstats

profiler = cProfile.Profile()
profiler.enable()

# Run production request handler
response = handle_api_request(request)

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)

# Results that surprised me:
# 67% - JSON serialization (CPU-bound)
# 18% - Business logic (CPU-bound) 
# 8%  - Database query (I/O)
# 7%  - Network transmission (I/O)

The database query was no longer the bottleneck. The CPU time spent processing the result dominated.

Redesigning Infrastructure Architecture for Modern Performance

The performance flip has profound implications for how we build systems:

1. Optimize Database Schema with Normalization

I used to religiously denormalize data to avoid JOINs. Now, with modern storage performance, normalized schemas often win:

-- Old approach: Denormalized for "performance"
CREATE TABLE orders_denormalized (
    order_id BIGINT PRIMARY KEY,
    customer_name VARCHAR(255),
    customer_email VARCHAR(255),
    customer_address TEXT,
    product_name VARCHAR(255),
    product_price DECIMAL(10,2),
    -- ... 30 more duplicated columns
);

-- Modern approach: Normalized, JOIN is fast
CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id BIGINT REFERENCES customers(id),
    product_id BIGINT REFERENCES products(id),
    created_at TIMESTAMP
);

-- This query is now competitive with denormalized versions
SELECT o.order_id, c.name, p.name, p.price
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at > NOW() - INTERVAL '7 days';

The normalized schema is:

Easier to maintain
More accurate (no stale denormalized data)
Often faster (better cache utilization, smaller indexes)

In production PostgreSQL instances with NVMe backing, I’ve seen JOIN-heavy queries outperform denormalized equivalents when indexes are properly tuned.

2. Eliminate Unnecessary Caching Layers

I used to add Redis automatically. Now I question every caching layer:

Bad reason: “Database queries are slow”
Good reason: “This computation is expensive AND requested frequently”

In a recent microservices refactor, I removed Redis from three services. Response times improved because:

Eliminated cache invalidation bugs
Removed network hop to Redis
Database was already fast with proper indexes
Simplified failure modes

This Go code illustrates the pattern I now prefer:

// Old: Always cache
func GetUser(ctx context.Context, userID int64) (*User, error) {
    // Check cache first
    if cached, err := redis.Get(ctx, fmt.Sprintf("user:%d", userID)); err == nil {
        return deserialize(cached), nil
    }
    
    // Cache miss, hit database
    user, err := db.QueryUser(ctx, userID)
    if err != nil {
        return nil, err
    }
    
    // Populate cache (with all the complexity that entails)
    _ = redis.Set(ctx, fmt.Sprintf("user:%d", userID), serialize(user), 5*time.Minute)
    return user, nil
}

// New: Database-first, cache only when justified
func GetUser(ctx context.Context, userID int64) (*User, error) {
    return db.QueryUser(ctx, userID)
}

With properly indexed queries and modern storage, the second version is often faster and always more reliable.

3. Implement Storage-Heavy Patterns Confidently

Modern infrastructure makes previously “expensive” patterns viable:

Full-text search in PostgreSQL: pgvector and tsvector queries that I would have offloaded to Elasticsearch now run directly in Postgres:

-- This query was "too slow" in 2018, perfectly fine in 2026
CREATE INDEX idx_products_search ON products 
USING GIN(to_tsvector('english', name || ' ' || description));

SELECT * FROM products 
WHERE to_tsvector('english', name || ' ' || description) 
  @@ plainto_tsquery('english', 'kubernetes monitoring tools')
ORDER BY ts_rank(to_tsvector('english', name || ' ' || description),
                 plainto_tsquery('english', 'kubernetes monitoring tools')) DESC
LIMIT 50;

Time-series data retention: With cheap, fast storage, I keep detailed metrics longer. A Prometheus instance with NVMe can retain high-cardinality metrics for months—eliminating the need for separate aggregation pipelines.

Event sourcing: The write amplification of event sourcing (every state change = new event) was prohibitive with slow storage. NVMe makes it practical for more use cases.

Simplify Infrastructure Architecture for Better Performance

The biggest lesson from modern I/O performance is this: The best optimization is often simplification.

I’ve spent 2025 removing complexity from systems I built in the 2010s:

Removed 3 Redis clusters (just use Postgres with good indexes)
Eliminated a Kafka pipeline for aggregations (materialized views are fine)
Deleted a complex cache invalidation system (don’t cache at all)
Simplified a sharded MongoDB setup to a single Postgres instance

Every removal made systems:

Faster (fewer network hops)
More reliable (fewer failure modes)
Cheaper (fewer services to run)
Easier to operate (less mental overhead)

Identify When I/O Performance Still Matters

To be clear: storage performance isn’t infinite. I/O remains the bottleneck when:

You’re actually doing a lot of I/O: Analytics workloads scanning terabytes of data still need optimization
You’re on constrained hardware: Lambda cold starts, edge computing, and budget VMs haven’t caught up
Your data doesn’t fit modern patterns: Append-heavy workloads on log-structured storage can thrash

But for typical web applications, API servers, and microservices? The old rules no longer apply.

Actionable Takeaways

Here’s how I approach infrastructure decisions in 2026:

Profile first, optimize second: Don’t assume I/O is the bottleneck. Measure.
Question caching: If you can’t articulate the specific performance problem caching solves, don’t add it.
Prefer simpler schemas: Denormalization should be the exception, not the default.
Invest in database tuning: Learn proper indexing, query optimization, and PostgreSQL’s modern features.
Use managed storage: Cloud providers have solved the hard problems. Let them.

The infrastructure world spent decades working around slow I/O performance. That era is ending. The engineers who recognize this I/O performance infrastructure shift first will build simpler, faster systems—and ship products while others are still optimizing for problems that no longer exist.

What performance assumptions are you ready to question?

Found this helpful? Share it with others:

Share Share