PostgreSQL in Production: 40% Faster Queries (Case Notes)
Working on a structural monitoring system processing 50,000+ sensor readings daily, I learned PostgreSQL optimization the hard way. These techniques reduced our query response times by 40% and improved system reliability.
The Problem
Our sensor data table grew rapidly. What started as snappy queries became progressively slower:
- Dashboard loading time: 8+ seconds
- Historical data queries timing out
- Aggregate reports taking minutes
- Database CPU usage spiking during peak hours
1. Understanding Query Plans with EXPLAIN ANALYZE
Before optimizing, measure. EXPLAIN ANALYZE shows exactly what PostgreSQL is doing:
-- Problematic query
EXPLAIN ANALYZE
SELECT
sensor_id,
AVG(value) as avg_value,
MAX(value) as max_value
FROM sensor_readings
WHERE timestamp >= NOW() - INTERVAL '7 days'
GROUP BY sensor_id;
-- Output showed:
-- Seq Scan on sensor_readings (cost=0.00..2500000.00)
-- Planning Time: 2.3ms
-- Execution Time: 8421.5msKey insight: Sequential scan through 2M+ rows. We needed indexes.
2. Strategic Indexing
Indexes are critical but come with trade-offs. Here's what worked:
-- B-tree index for timestamp range queries
CREATE INDEX idx_readings_timestamp
ON sensor_readings(timestamp DESC);
-- Composite index for common query patterns
CREATE INDEX idx_readings_sensor_time
ON sensor_readings(sensor_id, timestamp DESC);
-- Partial index for recent data (hot data)
CREATE INDEX idx_readings_recent
ON sensor_readings(timestamp)
WHERE timestamp >= NOW() - INTERVAL '30 days';
-- After indexes, same query:
-- Index Scan using idx_readings_timestamp
-- Execution Time: 421.3ms (95% improvement)Index selection guidelines:
- Create indexes on WHERE, JOIN, and ORDER BY columns
- Composite indexes for multi-column queries (most selective column first)
- Partial indexes for frequently queried subsets
- Monitor index usage with
pg_stat_user_indexes
3. Table Partitioning for Time-Series Data
Our sensor readings were perfect for range partitioning by timestamp:
-- Create partitioned table
CREATE TABLE sensor_readings (
id BIGSERIAL,
sensor_id INTEGER NOT NULL,
value DECIMAL(10, 2),
timestamp TIMESTAMP NOT NULL
) PARTITION BY RANGE (timestamp);
-- Create monthly partitions
CREATE TABLE sensor_readings_2024_11
PARTITION OF sensor_readings
FOR VALUES FROM ('2024-11-01') TO ('2024-12-01');
CREATE TABLE sensor_readings_2024_12
PARTITION OF sensor_readings
FOR VALUES FROM ('2024-12-01') TO ('2025-01-01');
-- Automatic partition creation with pg_cron or application logic
CREATE OR REPLACE FUNCTION create_monthly_partition()
RETURNS void AS $$
DECLARE
start_date DATE := DATE_TRUNC('month', NOW() + INTERVAL '1 month');
end_date DATE := start_date + INTERVAL '1 month';
partition_name TEXT := 'sensor_readings_' || TO_CHAR(start_date, 'YYYY_MM');
BEGIN
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF sensor_readings
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
END;
$$ LANGUAGE plpgsql;Benefits:
- Query pruning - scans only relevant partitions
- Faster maintenance (VACUUM, ANALYZE)
- Easy archival - drop old partitions
- Better index performance on smaller partitions
4. Materialized Views for Aggregates
Real-time aggregation on millions of rows is expensive. Pre-compute common aggregates:
-- Create materialized view for hourly averages
CREATE MATERIALIZED VIEW sensor_hourly_avg AS
SELECT
sensor_id,
DATE_TRUNC('hour', timestamp) as hour,
AVG(value) as avg_value,
MIN(value) as min_value,
MAX(value) as max_value,
COUNT(*) as reading_count
FROM sensor_readings
GROUP BY sensor_id, DATE_TRUNC('hour', timestamp);
-- Index for fast lookups
CREATE INDEX idx_hourly_avg_sensor_hour
ON sensor_hourly_avg(sensor_id, hour DESC);
-- Refresh strategy (using pg_cron)
-- Refresh every hour, only recent data
REFRESH MATERIALIZED VIEW CONCURRENTLY sensor_hourly_avg;
-- Query becomes instant:
SELECT * FROM sensor_hourly_avg
WHERE sensor_id = 42
AND hour >= NOW() - INTERVAL '7 days'
ORDER BY hour DESC;
-- Execution Time: 12ms (instead of 4000ms)5. Query Optimization Techniques
Use CTEs for Readability, JOINs for Performance
-- Before: Slow CTE
WITH recent_readings AS (
SELECT * FROM sensor_readings
WHERE timestamp >= NOW() - INTERVAL '1 day'
)
SELECT sensor_id, AVG(value)
FROM recent_readings
GROUP BY sensor_id;
-- After: Optimized single query
SELECT sensor_id, AVG(value)
FROM sensor_readings
WHERE timestamp >= NOW() - INTERVAL '1 day'
GROUP BY sensor_id;LIMIT with Indexes
-- Slow: Large offset
SELECT * FROM sensor_readings
ORDER BY timestamp DESC
OFFSET 10000 LIMIT 20;
-- Fast: Cursor-based pagination
SELECT * FROM sensor_readings
WHERE timestamp < '2024-11-01 12:00:00'
ORDER BY timestamp DESC
LIMIT 20;Batch Inserts
-- Slow: Individual inserts
for (const reading of readings) {
await db.query(
'INSERT INTO sensor_readings (sensor_id, value, timestamp) VALUES ($1, $2, $3)',
[reading.sensorId, reading.value, reading.timestamp]
)
}
-- Fast: Batch insert
const values = readings.map(r =>
`(${r.sensorId}, ${r.value}, '${r.timestamp}')`
).join(',')
await db.query(
`INSERT INTO sensor_readings (sensor_id, value, timestamp) VALUES ${values}`
)
// 10x faster for large batches6. Connection Pooling
Connection overhead adds up. Use pgBouncer or application-level pooling:
// Node.js with pg library
import { Pool } from 'pg'
const pool = new Pool({
host: 'localhost',
database: 'sensors',
max: 20, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
})
// Reuse connections
const result = await pool.query('SELECT ...')
// vs creating new connection each time (slow)
const client = new Client()
await client.connect()
await client.query('SELECT ...')
await client.end()7. Monitoring and Maintenance
Optimization is ongoing. Monitor these metrics:
-- Slow queries
SELECT
calls,
mean_exec_time::numeric(10,2) as avg_time_ms,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;Regular maintenance tasks:
VACUUM ANALYZE- Reclaim space and update statisticsREINDEX- Rebuild bloated indexes- Monitor with
pg_stat_statementsextension - Set up alerts for slow queries (>1s)
Results
Performance Improvements
- 📊 Dashboard load time: 8s → 1.2s (85% improvement)
- ⚡ Average query response: 2.5s → 0.6s (76% improvement)
- 💾 Database CPU usage: 75% → 35% average
- 📈 Throughput: 2x increase in concurrent queries
Key Takeaways
- Measure first - Use EXPLAIN ANALYZE to identify bottlenecks
- Index strategically - Focus on query patterns, not every column
- Partition large tables - Especially for time-series data
- Pre-compute aggregates - Materialized views for expensive calculations
- Optimize at multiple levels - Queries, indexes, schema design, and application code
- Monitor continuously - Performance degrades over time without maintenance
Additional Resources
Need help optimizing your database? Let's talk.